Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Platform & Reliability Engineer (SRE)

$200k - $250k

Vizcom Technologies, Inc.

Agency Notice: We are not currently working with recruiting agencies for this role. Please do not contact Vizcom employees regarding this position. Any resumes submitted without a prior agreement will be considered unsolicited.

About Vizcom

Vizcom is a visual creation platform that combines modern web tooling with AI-powered workflows. Our stack includes React/TypeScript frontend, Node/Koa + PostGraphile API services, PostgreSQL, Redis, BullMQ queues, and Kubernetes-based production infrastructure.

We're hiring a senior owner of stability and infrastructure to ensure the platform is reliable, fast, and resilient as we scale.

Role Mission

Own service reliability end-to-end: prevent incidents, reduce blast radius when failures happen, and lead fast, high-quality recovery when production degrades.

This is a hands-on technical leadership role with authority to set reliability standards and enforce production guardrails.

Compensation

$200,000 - $250,000 base salary + meaningful equity

What You'll Own
  • Reliability bar: Set and enforce SLIs/SLOs/error budgets for critical user flows.
  • Production architecture resilience: Drive failure isolation across API, workers, queues, and dependencies so one subsystem cannot take down core access.
  • Kubernetes runtime reliability: Define probe contracts, rollout/rollback standards, graceful shutdown behavior, scaling/resource policies, and startup safety.
  • Queue + job safety (BullMQ/Redis): Own poison pill containment and workload isolation.
  • Incident command quality: Lead Sev1/Sev2 response end-to-end (containment, communications, technical direction, RCA, corrective action execution).
  • Reliability operating system: Own observability quality (signals over noise), on-call effectiveness, runbooks, and postmortem discipline.
  • Release safety authority: Gate risky deploys and enforce reliability guardrails when production health is at risk.
Traits We're Looking For
  • Calm, structured incident commander under pressure.
  • Thinks in failure modes and blast radius by default.
  • Pragmatic: can stabilize quickly, then implement durable fixes.
  • High ownership and strong written communication.
First 90 Days
  • Establish baseline reliability metrics and identify top platform risks.
  • Tighten incident response mechanics (roles, comms cadence, runbooks, status updates).
  • Deliver high-impact hardening fixes across probes/startup paths/queue safety.
  • Publish a prioritized 6-12 month reliability roadmap with clear ownership and milestones.

If possible please include one incident you personally led and send to View email address on click.appcast.io :

1) what failed,

2) how you contained it,

3) what permanent fixes you shipped, and measured.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Platform & Reliability Engineer (SRE) in San Francisco, CA vacancy
  • An innovative R&D company in San Francisco is seeking a Site Reliability Engineer to join its Platform Engineering team. This position focuses on ensuring the reliability and performance of an AI-powered code review platform. The ideal candidate will have 6-8 years of experience... 
    Senior

    CodeRabbit

    San Francisco, CA
    1 day ago
  • $202.8k - $327.63k

     ...Intelligent Agreement Management platform, companies can create, commit, and...  ...management (CLM). What you’ll do The Senior Director, SRE Platform Engineering is a senior engineering leader...  ...Service Management (ITSM) and Site Reliability Engineering (SRE) capabilities, applying... 
    Senior
    Permanent employment
    Contract work
    Work at office
    Local area
    Remote work
    2 days per week

    DocuSign, Inc.

    San Francisco, CA
    2 days ago
  •  ...What you will do Ensure reliability and performance of Plaud.ai's...  ...SLIs, and error budgets with engineering teams Drive postmortems and...  ...reliability improvements across the platform Lead incident response and...  ...~8+ years in SRE, Infra, or Platform Engineering... 
    Senior
    Full time
    Work at office
    Worldwide
    3 days per week

    Plaud

    San Francisco, CA
    1 day ago
  • A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform... 
    Senior

    Speak

    San Francisco, CA
    1 day ago
  • We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for...  ...is a layer 1 blockchain and developer platform that connects any L1 and L2, from Ethereum...  ...development processes. DevOps Engineer/SRE Transitioning to Blockchain An... 
    Senior
    Remote job

    Blockchain Works

    San Francisco, CA
    10 days ago
  • CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing...  ...requirements. Necessities 5+ years in SRE, DevOps, or Platform Engineering roles. Sturdy expertise with cloud... 
    Senior

    The10minutecareersolution

    San Francisco, CA
    12 hours ago
  • Stellar is seeking a Director of Site Reliability Engineering to lead a distributed SRE team and shape service operations. This role is crucial for improving the reliability and operational maturity of services within the Stellar ecosystem. The ideal candidate will have... 

    Stellar

    San Francisco, CA
    2 days ago
  •  ...AngelList Venture in San Francisco is seeking a Senior Infrastructure Engineer to build and optimize platform infrastructure that supports billions in venture...  ...enhance developer productivity through automation and reliability practices. The ideal candidate has a solid... 
    Senior
    Work at office

    AngelList Venture

    San Francisco, CA
    4 days ago
  •  ...Senior Platform & Reliability Engineer About OpenArt OpenArt is an AI Storytelling and Visual Creation Platform used by millions worldwide. We're building the next generation of creative tools powered by cutting-edge AI, enabling anyone to create videos, visuals... 
    Senior
    Remote work
    Worldwide
    Visa sponsorship

    Open Art

    San Francisco, CA
    3 days ago
  • $200k - $250k

    A leading visual creation platform in San Francisco is seeking a Senior Owner of Stability and Infrastructure. This hands-on technical leadership role demands expertise in service reliability to ensure the platform's performance as it scales. Responsibilities include setting... 
    Senior

    Vizcom

    San Francisco, CA
    3 days ago
  •  ...Staff Platform Engineer Saviynt is a leader in identity security, delivering an AI-powered platform that governs and secures access to...  ...-on engineering and technical leadership role. You will own reliability for major platform domains, design scalable solutions on Kubernetes... 
    Senior

    Saviynt

    San Francisco, CA
    12 hours ago
  • $182k - $250k

     ...Senior Platform Reliability Engineer Grow Therapy is on a mission to serve as the trusted partner for therapists growing their practice, and patients accessing high-quality care. Powered by technology, we are a three-sided marketplace that empowers providers, augments... 
    Senior
    Full time
    Work at office
    Local area
    Remote work
    Home office
    Flexible hours
    Day shift
    3 days per week

    Grow Therapy

    San Francisco, CA
    1 day ago
  • OpenArt AI in San Francisco is seeking a Senior Platform & Reliability Engineer to design and improve the reliability of its infrastructure. The role emphasizes building and operating production systems while collaborating with product engineers to ensure platform scalability... 
    Senior

    OpenArt AI

    San Francisco, CA
    2 days ago
  •  ...About the Role:- As a Senior+ contributor, you'll join this Core Platform Team with bottom-up ownership of our...  ...Develop and maintain services to meet reliability and scalability demands. •...  ...• A thorough understanding of engineering best practices from appropriate... 
    Senior
    Permanent employment
    Full time

    Prophecy Technologies

    San Francisco, CA
    2 days ago
  • Speakeasy Events, Inc. seeks a passionate engineer in San Francisco to improve product reliability and collaborate with ex-founders. You’ll work on an early product with a fast-moving team, identifying architectural changes and fostering a culture of reliability. Your... 

    Speakeasy Events, Inc.

    San Francisco, CA
    2 days ago
  • $232k - $319k

     ...talk. The Infrastructure Platform and Shared Services Team...  ...service with great people and reliable, cost-effective, and efficient...  ...and various initiatives across SRE & Infrastructure organization...  ...velocity of SRE and product engineering by developing robust platforms... 
    Senior
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    San Francisco, CA
    3 days ago
  •  ...Speakeasy in San Francisco is looking for a candidate to enhance product reliability and performance by collaborating with a dynamic team. The role involves identifying architectural changes, fostering a reliable culture, and participating in on-call rotations. The ideal... 
    Senior

    Speakeasy

    San Francisco, CA
    4 days ago
  • Slope in San Francisco is looking for a reliability engineer focused on managing call completion for its Voice AI platform. You will be key in establishing incident management processes and improving system stability through effective monitoring and capacity planning.... 

    Slope

    San Francisco, CA
    12 hours ago
  • $300k

     ...startup building out their AI and cloud platform, powered by thousands of H100s, H200s...  ..., or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability,...  ...Must Have: ~7+ years of experience in SRE, DevOps, or Infrastructure Engineering... 
    Senior
    Permanent employment
    San Francisco, CA
    more than 2 months ago
  • $159.8k - $235k

    Fairygodboss is looking for a Software Engineer for its Reliability Platform team in San Francisco. This role involves designing, building, and maintaining critical services and infrastructure. The ideal candidate will have experience in backend development, particularly... 

    Fairygodboss

    San Francisco, CA
    4 days ago
  • Join Saviynt as a Staff Platform Engineer based in San Francisco, CA, where you will be instrumental in ensuring the reliability and scalability of our cloud-native SaaS platform. Your role involves designing and building core infrastructure services, primarily using Kubernetes... 

    Saviynt

    San Francisco, CA
    3 days ago
  • $200k - $275k

     ...Senior Backend Engineer (Infra/Platform/SRE) Title of Role: Senior Backend Engineer (Infra/Platform/SRE) Location: San Francisco, hybrid...  ...GCP, and Azure. Optimize database performance and reliability with PostgreSQL and Redis. Collaborate with cross-... 
    Senior
    Work at office

    Recruiting from Scratch

    San Francisco, CA
    2 days ago
  • Megaport is hiring a Senior Platform Engineer to enhance production reliability and ensure robust systems in a supportive environment. This position involves collaborating with teams globally, championing DevOps practices, and working on cloud infrastructure technologies... 
    Senior

    Megaport

    Brisbane, CA
    3 days ago
  • Invisible Technologies in San Francisco is seeking a Principal Software Engineer (SRE/DevOps) to join their innovative team. This full-time remote role requires expertise in cloud architecture, Kubernetes deployment, and technical leadership for multiple initiatives. The... 
    Remote job
    Full time

    Invisible-Technologies

    San Francisco, CA
    4 days ago
  •  ...Site Reliability Engineer (SRE) FLUIX is building the AI operating system that plans, designs, and optimizes AI infrastructure. We are based...  ..., and performance of our hybrid-based (Cloud & On-Prem) platform while supporting our AI/ML infrastructure. You will work closely... 
    Work at office
    Weekend work

    Fluix AI

    San Francisco, CA
    2 days ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is an AI infrastructure platform built to make GPU compute more accessible and affordable for the world's leading enterprises, AI startups, and the AI research community,... 
    Work at office
    Local area
    1 day per week

    Mithril

    San Francisco, CA
    4 days ago
  • $350k

     ...Site Reliability Engineer (SRE) San Francisco Thinking Machines Lab's mission is to empower humanity through advancing collaborative general...  ..., features, and novel use-cases. We're hiring to grow the platform alongside the Tinker community. About the Role We're... 
    Local area
    Visa sponsorship
    Work visa
    Relocation package

    Thinking Machines Lab

    San Francisco, CA
    3 days ago
  •  ...and change lives along the way. The Role As a Site Reliability Engineer (SRE) at Air Apps, you will be responsible for ensuring the...  ...and resource utilization for AWS, Azure, or Google Cloud Platform (GCP) . Participate in on-call rotations to quickly address... 
    Temporary work
    Worldwide

    Air Apps

    San Francisco, CA
    4 days ago
  •  ...We're a team of doctors, engineers, designers, researchers, and...  ...-end. Improve operational reliability: Identify recurring issues...  ...cloud infrastructure, and core platform services, with growing ownership...  ...looking for ~3-6+ years in SRE, DevOps, Platform, or... 
    Senior
    Work at office
    Worldwide

    Heidi Health

    San Francisco, CA
    4 days ago
  •  ...date. About the role: Anyscale is looking for a Senior Site Reliability Engineer to join the Infrastructure team. Anyscale aims to provide...  ...the critical infrastructure that powers Anyscale's cloud platform. You will have the opportunity to work on open-source... 
    Senior

    Anyscale, Inc

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Platform & Reliability Engineer (SRE). Be the first to apply!