Senior Platform & Reliability Engineer (SRE)
$200k - $250kVizcom Technologies, Inc.
Agency Notice: We are not currently working with recruiting agencies for this role. Please do not contact Vizcom employees regarding this position. Any resumes submitted without a prior agreement will be considered unsolicited. About Vizcom Vizcom is a visual creation platform that combines modern web tooling with AI-powered workflows. Our stack includes React/TypeScript frontend, Node/Koa + PostGraphile API services, PostgreSQL, Redis, BullMQ queues, and Kubernetes-based production infrastructure. We're hiring a senior owner of stability and infrastructure to ensure the platform is reliable, fast, and resilient as we scale. Role Mission Own service reliability end-to-end: prevent incidents, reduce blast radius when failures happen, and lead fast, high-quality recovery when production degrades. This is a hands-on technical leadership role with authority to set reliability standards and enforce production guardrails. Compensation $200,000 - $250,000 base salary + meaningful equity What You'll Own
- Reliability bar: Set and enforce SLIs/SLOs/error budgets for critical user flows.
- Production architecture resilience: Drive failure isolation across API, workers, queues, and dependencies so one subsystem cannot take down core access.
- Kubernetes runtime reliability: Define probe contracts, rollout/rollback standards, graceful shutdown behavior, scaling/resource policies, and startup safety.
- Queue + job safety (BullMQ/Redis): Own poison pill containment and workload isolation.
- Incident command quality: Lead Sev1/Sev2 response end-to-end (containment, communications, technical direction, RCA, corrective action execution).
- Reliability operating system: Own observability quality (signals over noise), on-call effectiveness, runbooks, and postmortem discipline.
- Release safety authority: Gate risky deploys and enforce reliability guardrails when production health is at risk.
- Calm, structured incident commander under pressure.
- Thinks in failure modes and blast radius by default.
- Pragmatic: can stabilize quickly, then implement durable fixes.
- High ownership and strong written communication.
- Establish baseline reliability metrics and identify top platform risks.
- Tighten incident response mechanics (roles, comms cadence, runbooks, status updates).
- Deliver high-impact hardening fixes across probes/startup paths/queue safety.
- Publish a prioritized 6-12 month reliability roadmap with clear ownership and milestones.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Platform & Reliability Engineer (SRE) in San Francisco, CA vacancy
- An innovative R&D company in San Francisco is seeking a Site Reliability Engineer to join its Platform Engineering team. This position focuses on ensuring the reliability and performance of an AI-powered code review platform. The ideal candidate will have 6-8 years of experience...Senior
$202.8k - $327.63k
...Intelligent Agreement Management platform, companies can create, commit, and... ...management (CLM). What you’ll do The Senior Director, SRE Platform Engineering is a senior engineering leader... ...Service Management (ITSM) and Site Reliability Engineering (SRE) capabilities, applying...SeniorPermanent employmentContract workWork at officeLocal areaRemote work2 days per week- ...What you will do Ensure reliability and performance of Plaud.ai's... ...SLIs, and error budgets with engineering teams Drive postmortems and... ...reliability improvements across the platform Lead incident response and... ...~8+ years in SRE, Infra, or Platform Engineering...SeniorFull timeWork at officeWorldwide3 days per week
- A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform...Senior
- We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for... ...is a layer 1 blockchain and developer platform that connects any L1 and L2, from Ethereum... ...development processes. DevOps Engineer/SRE Transitioning to Blockchain An...SeniorRemote job
- CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing... ...requirements. Necessities 5+ years in SRE, DevOps, or Platform Engineering roles. Sturdy expertise with cloud...Senior
- Stellar is seeking a Director of Site Reliability Engineering to lead a distributed SRE team and shape service operations. This role is crucial for improving the reliability and operational maturity of services within the Stellar ecosystem. The ideal candidate will have...
- ...AngelList Venture in San Francisco is seeking a Senior Infrastructure Engineer to build and optimize platform infrastructure that supports billions in venture... ...enhance developer productivity through automation and reliability practices. The ideal candidate has a solid...SeniorWork at office
- ...Senior Platform & Reliability Engineer About OpenArt OpenArt is an AI Storytelling and Visual Creation Platform used by millions worldwide. We're building the next generation of creative tools powered by cutting-edge AI, enabling anyone to create videos, visuals...SeniorRemote workWorldwideVisa sponsorship
$200k - $250k
A leading visual creation platform in San Francisco is seeking a Senior Owner of Stability and Infrastructure. This hands-on technical leadership role demands expertise in service reliability to ensure the platform's performance as it scales. Responsibilities include setting...Senior- ...Staff Platform Engineer Saviynt is a leader in identity security, delivering an AI-powered platform that governs and secures access to... ...-on engineering and technical leadership role. You will own reliability for major platform domains, design scalable solutions on Kubernetes...Senior
$182k - $250k
...Senior Platform Reliability Engineer Grow Therapy is on a mission to serve as the trusted partner for therapists growing their practice, and patients accessing high-quality care. Powered by technology, we are a three-sided marketplace that empowers providers, augments...SeniorFull timeWork at officeLocal areaRemote workHome officeFlexible hoursDay shift3 days per week- OpenArt AI in San Francisco is seeking a Senior Platform & Reliability Engineer to design and improve the reliability of its infrastructure. The role emphasizes building and operating production systems while collaborating with product engineers to ensure platform scalability...Senior
- ...About the Role:- As a Senior+ contributor, you'll join this Core Platform Team with bottom-up ownership of our... ...Develop and maintain services to meet reliability and scalability demands. •... ...• A thorough understanding of engineering best practices from appropriate...SeniorPermanent employmentFull time
- Speakeasy Events, Inc. seeks a passionate engineer in San Francisco to improve product reliability and collaborate with ex-founders. You’ll work on an early product with a fast-moving team, identifying architectural changes and fostering a culture of reliability. Your...
$232k - $319k
...talk. The Infrastructure Platform and Shared Services Team... ...service with great people and reliable, cost-effective, and efficient... ...and various initiatives across SRE & Infrastructure organization... ...velocity of SRE and product engineering by developing robust platforms...SeniorPermanent employmentLocal areaWorldwideFlexible hours- ...Speakeasy in San Francisco is looking for a candidate to enhance product reliability and performance by collaborating with a dynamic team. The role involves identifying architectural changes, fostering a reliable culture, and participating in on-call rotations. The ideal...Senior
- Slope in San Francisco is looking for a reliability engineer focused on managing call completion for its Voice AI platform. You will be key in establishing incident management processes and improving system stability through effective monitoring and capacity planning....
$300k
...startup building out their AI and cloud platform, powered by thousands of H100s, H200s... ..., or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability,... ...Must Have: ~7+ years of experience in SRE, DevOps, or Infrastructure Engineering...SeniorPermanent employment$159.8k - $235k
Fairygodboss is looking for a Software Engineer for its Reliability Platform team in San Francisco. This role involves designing, building, and maintaining critical services and infrastructure. The ideal candidate will have experience in backend development, particularly...- Join Saviynt as a Staff Platform Engineer based in San Francisco, CA, where you will be instrumental in ensuring the reliability and scalability of our cloud-native SaaS platform. Your role involves designing and building core infrastructure services, primarily using Kubernetes...
$200k - $275k
...Senior Backend Engineer (Infra/Platform/SRE) Title of Role: Senior Backend Engineer (Infra/Platform/SRE) Location: San Francisco, hybrid... ...GCP, and Azure. Optimize database performance and reliability with PostgreSQL and Redis. Collaborate with cross-...SeniorWork at office- Megaport is hiring a Senior Platform Engineer to enhance production reliability and ensure robust systems in a supportive environment. This position involves collaborating with teams globally, championing DevOps practices, and working on cloud infrastructure technologies...Senior
- Invisible Technologies in San Francisco is seeking a Principal Software Engineer (SRE/DevOps) to join their innovative team. This full-time remote role requires expertise in cloud architecture, Kubernetes deployment, and technical leadership for multiple initiatives. The...Remote jobFull time
- ...Site Reliability Engineer (SRE) FLUIX is building the AI operating system that plans, designs, and optimizes AI infrastructure. We are based... ..., and performance of our hybrid-based (Cloud & On-Prem) platform while supporting our AI/ML infrastructure. You will work closely...Work at officeWeekend work
$170k - $230k
...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is an AI infrastructure platform built to make GPU compute more accessible and affordable for the world's leading enterprises, AI startups, and the AI research community,...Work at officeLocal area1 day per week$350k
...Site Reliability Engineer (SRE) San Francisco Thinking Machines Lab's mission is to empower humanity through advancing collaborative general... ..., features, and novel use-cases. We're hiring to grow the platform alongside the Tinker community. About the Role We're...Local areaVisa sponsorshipWork visaRelocation package- ...and change lives along the way. The Role As a Site Reliability Engineer (SRE) at Air Apps, you will be responsible for ensuring the... ...and resource utilization for AWS, Azure, or Google Cloud Platform (GCP) . Participate in on-call rotations to quickly address...Temporary workWorldwide
- ...We're a team of doctors, engineers, designers, researchers, and... ...-end. Improve operational reliability: Identify recurring issues... ...cloud infrastructure, and core platform services, with growing ownership... ...looking for ~3-6+ years in SRE, DevOps, Platform, or...SeniorWork at officeWorldwide
- ...date. About the role: Anyscale is looking for a Senior Site Reliability Engineer to join the Infrastructure team. Anyscale aims to provide... ...the critical infrastructure that powers Anyscale's cloud platform. You will have the opportunity to work on open-source...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Platform & Reliability Engineer (SRE). Be the first to apply!
Related searches
- platform developer San Francisco, CA
- senior platform engineer San Francisco, CA
- platform engineering manager San Francisco, CA
- platform engineer San Francisco, CA
- client platform engineer San Francisco, CA
- data platform engineer San Francisco, CA
- network reliability engineer San Francisco, CA
- principal reliability engineer San Francisco, CA
- reliability maintenance engineering technician San Francisco, CA
- reliability engineering manager San Francisco, CA

