Site Reliability Engineer

Gamma

We're building the creative layer for modern communication. Every month, over a billion people make presentations — but the tools they use to make them haven't evolved in decades. We're changing that, using AI to disrupt a massive market. Millions of people rely on Gamma to create, teach, and persuade, creating more than 1 million gammas every day. We see Gamma as the next great workplace tool, combining viral B2C love with a massive B2B opportunity. We believe AI can be a true creative partner: one that understands context, clarity, and taste. We’ve reached a $2.1B valuation , crossed $100M in annual recurring revenue , and have been profitable since 2023. We're an imaginative, passionate team who takes our work seriously, but not ourselves. Our culture is warm, a little quirky, and fueled by curiosity. About the role Gamma's infrastructure needs to be rock-solid for millions of daily users while enabling our engineering teams to ship fast. You'll own the operational health of our full backend platform, building automation and tooling that improves reliability and partnering with engineering to design systems that are observable, resilient, and easy to operate. Your work directly impacts every Gamma user's experience. This is a high-impact role where you'll balance reliability with velocity, knowing when to move fast and when to prioritize stability. You'll lead incident response, drive systemic improvements, and help shape how Gamma scales to serve its next 100 million users. Our team has a strong in-office culture and works in person 4–5 days per week in San Francisco. We love working together to stay creative and connected, with flexibility to work from home when focus matters most. What you'll do Own reliability, availability, and performance of Gamma's production systems across primarily AWS infrastructure Build observability infrastructure with metrics, logging, tracing, and alerting that provide deep visibility into system health Design automation to reduce toil, improve deployment safety, and accelerate incident resolution Lead incident response, conduct blameless post-mortems, and drive systemic improvements to prevent recurring issues Partner with engineering teams on architecture reviews, SLOs/SLIs, and reliability best practices Manage and optimize our infrastructure including compute, networking, databases, and managed services What you'll bring 5+ years in Site Reliability Engineering, DevOps, or systems engineering roles with deep AWS expertise Strong programming skills (Python, Go, or TypeScript/Node.js) for building tools and automation Experience with infrastructure-as-code (Terraform, CloudFormation) and comprehensive observability solutions Track record improving system reliability through automation, monitoring, and architectural improvements Solid understanding of networking, distributed systems, containerization (Docker, Kubernetes), and database performance Strong incident management and debugging skills for complex production issues (Nice to have) Experience scaling SaaS applications to millions of users (Nice to have) Background with real-time collaborative systems, Kafka, chaos engineering, or service mesh technologies (Nice to have) AWS certifications or experience with security/compliance requirements (SOC 2, ISO 27001) Compensation range Final offer amounts are determined by multiple factors, including but not limited to experience and expertise in the requirements listed above. If you're interested in this role but you don't meet every requirement, we encourage you to apply anyway! We're always excited about meeting great people. We're building on a full Typescript stack centered around some of the most modern and popular technologies. We use our own custom, open-source AI prompting framework, AIJSX. We have a lot of custom tools built in-house, but also new ones like Vercel AI SDK. Our tiny team operates at massive scale: 1M+ 70M users around the world 6M+ AI images generated daily 1 trillion LLM tokens processed per month Life at Gamma You get energy from small teams doing big things. You love when design, code, and storytelling overlap. You default to action, even when the answer isn’t clear yet. You value details, but know when to ship and move on. You bring both the spreadsheets and the sparkle, equal parts workhorse and unicorn. You believe AI should amplify creativity, not replace it. You know kindness and intensity are not opposites. You like working with people who care deeply: about their craft, their teammates, and the users on the other side of the screen. Who we are Gamma is full of imaginative, passionate people who take their work seriously but not themselves. The culture is warm, a little quirky, and fueled by curiosity. It’s the kind of place where you’ll debate a pixel on Monday, laugh over someone’s keyboard setup on Tuesday, and ship something remarkable by Friday. We care about craft, move with intention, and don’t mind getting a little scrappy. It’s fast, creative, and occasionally chaotic — but that’s what makes it interesting. Here’s a bit about what it’s like to work here, from people on the inside: “quirky, inspiring, fun, a little wild in the best way” “You can have an idea and just run with it.” “Everyone’s talented and humble — the mix keeps you sharp.” “We ship cool stuff, learn a ton, and laugh a lot doing it.” Meet the team We're a team of dreamers and doers building in beautiful San Francisco We're kabbadi enthusiasts, pickleballers, dog herders, woodworkers, keyboard nerds, potters, and more — and we can't wait to meet you! #J-18808-Ljbffr

Apply

Vacancy posted 6 hours ago

Similar jobs that could be interesting for youBased on the Site Reliability Engineer in San Francisco, CA vacancy

Senior Site Reliability Engineer
...global culture at OutSystems!Hybrid Onsite in Menlo Park, CASite Reliability Engineering (SRE) is a discipline that incorporates aspects of software... ...delivering a smooth and frictionless Customer Experience.Site Reliability Engineer RoleAs an SRE at OutSystems here are...
Suggested
Immediate start
Remote work
Worldwide
OutSystems
San Francisco, CA
6 hours ago
Site Reliability Engineer
...The role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance, and scalability of our AI infrastructure platform. You’ll be building and operating the core systems that power agentic AI at scale. Your mission: keep our ultra-...
Suggested
Blaxel
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
# Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values:** At Drata, we help companies earn and keep the trust of... ...**Job Summary:**Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part...
Suggested
Work at office
Immediate start
Worldwide
Monday to Friday
Flexible hours
Careers at Drata
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
$140k - $220k
...About the Job You’ll own reliability and operational excellence for Pylon's production systems. This means designing and implementing... ...scale as we grow. You'll build tooling that makes the entire engineering team more effective, establish on-call rotations and runbooks...
Suggested
Pylon
San Francisco, CA
4 days ago
Site Reliability Engineer
...manifesto. About the Role We're looking for an Infrastructure Engineer to take the lead on scaling our operational resilience as we grow... ...This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling,...
Suggested
Worldwide
Shift work
Happy Robot
San Francisco, CA
20 hours ago
Senior Site Reliability Engineer
...customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like... ...of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly and...
Unify
San Francisco, CA
4 days ago
Sr. Site Reliability Engineer
$163k - $203k
...will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain the applications that run on our...
Work experience placement
Work at office
Local area
Remote work
Flexible hours
2 days per week
Prosper.com
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
$166.9k - $225.9k
...Job Summary Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team... ...organization. What you’ll bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or...
Flexible hours
Drata
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
...role, we encourage you to apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and industry... ...company goals are met. What You Will Be Doing Improving production reliability and system resilience within an SRE scoped team Championing...
Flexible hours
Megaport
Brisbane, CA
2 days ago
Site Reliability Engineer
...Open Source LLM Gateway Engineer LiteLLM is an open-source LLM Gateway with 34K+ stars on GitHub and trusted by companies like NASA... ...expanding and seeking our 6th Engineer focused on owning reliability, performance, and infrastructure stability for the LiteLLM proxy...
BerriAI
San Francisco, CA
3 days ago
Site Reliability Engineer
...enterprise that runs the real economy. Learn more about our vision in our manifesto. About the Role We're looking for a Site Reliability Engineer to take the lead on scaling our operational resilience as we grow. You'll own the stability, observability, and debugging...
Worldwide
Shift work
Happy Robot
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
...experiences. Join our team! We're building a world where Identity belongs to you. The Engineering Opportunity We are looking for an experienced Senior Site Reliability Engineer to join Okta's Emerging Products Group (EPG). Our mission is to build highly reliable...
Local area
Worldwide
Flexible hours
Okta, Inc.
San Francisco, CA
3 days ago
Site Reliability Engineer (SRE)
$350k
...Site Reliability Engineer (SRE) San Francisco Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique...
Local area
Visa sponsorship
Work visa
Relocation package
Thinking Machines Lab
San Francisco, CA
3 days ago
Senior Site Reliability Engineer
...Site Reliability Engineer (SRE) We're looking for an experienced Site Reliability Engineer (SRE) to help us scale our platform with reliability, observability, and operational excellence at the core. You'll partner with engineers and data scientists to build, automate...
Alembic Technologies
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects founders, investors, and limited partners through world-class software, purpose-built for everyone in venture capital, private...
Full time
Work at office
Carta
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
...advanced algorithms that significantly outperforms individual engineers. We combine language models with human ingenuity to push the... ...quality. The Role: We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Bay Area...
CodeRabbit
San Francisco, CA
1 day ago
Site Reliability Engineer
...$10 billion. We work in-person five days a week in our San Francisco, NYC, or London offices. About the Role As a Site Reliability Engineer (SRE) at Mercor, you'll own production reliability across our most critical systems, partnering directly with infrastructure...
Work at office
Relocation package
Mercor Alabaster
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
$159.2k - $301.6k
...running Graphs on the cloud. In this reliability-focused role, you will own the availability... .... You'll partner with the backend engineers building these APIs to make sure the system... ...Science. ~5-10 years of experience in site reliability engineering, infrastructure,...
Temporary work
Local area
Worldwide
Adobe
San Francisco, CA
2 days ago
Site Reliability Engineer
$230k - $310k
...millions of daily users while enabling our engineering teams to ship fast. You'll own the... ...building automation and tooling that improves reliability and partnering with engineering to... ...What you'll bring ~5+ years in site reliability engineering, DevOps, or systems...
Full time
Work at office
Work from home
Gamma
San Francisco, CA
18 hours ago
Site Reliability Engineer
$260k - $300k
...agents. We're the makers of Devin, the first AI software engineer, and Windsurf, the AI-native IDE. Together, they represent our... ...faster than anyone expects. You will own both the production reliability of our user-facing products and the platform engineering that...
Cognition Corp
San Francisco, CA
2 days ago
Site Reliability Engineer
...Site Reliability Engineer Job Location: San Francisco, CA or Charlotte, NC. Job Type: Contract Work with local API development squads, platform teams, product owners, scrum masters, and architects. The SRE ensures that both our internally critical and our externally...
Contract work
Local area
InterSources
San Francisco, CA
2 days ago
Site Reliability Engineer (SRE)
...globe. Join us on this journey to redefine resource management-and change lives along the way. The Role As a Site Reliability Engineer (SRE) at Air Apps, you will be responsible for ensuring the reliability, availability, and scalability of our systems. You...
Temporary work
Worldwide
Air Apps
San Francisco, CA
4 days ago
Site Reliability Engineer
...an SRE to join our infrastructure team. This role will be responsible for building software to ensure the reliability of our back-end systems, working with engineers who develop them, and planning for our future growth. You will work with our existing production...
Worldwide
Home office
Flexible hours
Superhuman
San Francisco, CA
18 hours ago
Senior Site Reliability Engineer
$195k - $240k
...Senior Site Reliability Engineer San Francisco (Hybrid) At You.com, we are building the AI Search Infrastructure that powers modern AI systems. Our goal is to create the trusted knowledge layer that agents, applications, and enterprises rely on to retrieve real-time...
Full time
Immediate start
Remote work
Work from home
Flexible hours
Y.O.U.
San Francisco, CA
2 days ago
Site Reliability Engineer
...are a small, fast growing team who hail from Anduril, Tesla, Uber, and the U.S. Special Forces. The Role We're hiring a Site Reliability Engineer to own the operational health of our connected sensor platform - spanning a live fleet of edge hardware deployed at...
Remote work
Specter Services LLC
San Francisco, CA
4 days ago
Site Reliability Engineer
...JOB DESCRIPTION Project Outline: We are looking for a Site Reliability Engineer with experience in incident response. In this role, you will help Shipt understand where we can improve stability and reliability. There will be a focus on the intersection of systems...
BayOne Solutions
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
$117k - $209.33k
...Job Requisition ID # 26WD99273 Position Overview Want to help make a better world? As a Senior Site Reliability Engineer at Autodesk, you can help us build and operate reliable, secure, and scalable cloud services for Autodesk GovCloud products. As part of a new...
For contractors
Autodesk
San Francisco, CA
3 days ago
Senior Site Reliability Engineer, Fleet Management
$127k - $249k
...The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational... ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper). As...
Work at office
Local area
Remote work
Worldwide
Flexible hours
MongoDB
San Francisco, CA
3 days ago
Senior Site Reliability Engineer (SRE) - AI Inftastructure
$300k
...thousands of H100s, H200s, and B200s, ready for experimentation, full-scale model training, or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability, performance, and automation of this GPU-powered infrastructure, ensuring...
Hamilton Barnes Associates Limited
San Francisco, CA
4 days ago
Site Reliability Engineer - AI Infrastructure
...Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time About Andromeda Andromeda Cluster was founded by Nat Friedman and Daniel Gross to give early-stage startups access to the kind of scaled AI infrastructure once reserved only...
Full time
Remote work
Andromeda Cluster
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer. Be the first to apply!