Staff + Sr. Software Engineer, AI Reliability
$325kMenlo Ventures
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths -- every hop from the SDK through our network, API layers, serving infrastructure, accelerators and back. We jump into the trenches alongside partner teams to make the systems that deliver Claude more robust and resilient, be it during an incident or collaborating on projects. Reliability here is an emergent phenomenon that transcends any single team's boundaries, so someone has to zoom out and look at the whole picture. That's us -- and it means few teams at Anthropic offer this kind of dynamic, cross-cutting exposure to the systems that matter most. Claude has your back. AIRE has Claude's. Help us keep Claude reliable for everyone who depends on it. Responsibilities Develop appropriate Service Level Objectives for large language model serving systems, balancing availability and latency with development velocity. Design and implement monitoring and observability systems across the token path. Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers. Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements. Support the reliability of safeguard model serving -- critical for both site reliability and Anthropic's safety commitments. You may be a good fit if you: Have strong distributed systems, infrastructure, or reliability backgrounds -- we're looking for reliability-minded software engineers and SREs. Are curious and brave -- comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don't have deep expertise yet. Think holistically about how systems compose and where the seams are. Can build lasting relationships across teams -- our engagement model depends on being welcomed as teammates, not outsiders with opinions. Care about users and feel ownership over outcomes, even for systems you don't own. Have excellent communication and collaboration skills -- you'll be partnering across the entire company. Bring diverse experience -- the team's strength comes from people who've built product stacks, scaled databases, run massive distributed systems, and everything in between. Strong candidates may also: Have been an SRE, Production Engineer, or in similar reliability-focused roles on large scale systems. Have experience operating large-scale model serving or training infrastructure (>1000 GPUs). Have experience with one or more ML hardware accelerators (GPUs, TPUs, Trainium). Understand ML-specific networking optimizations like RDMA and InfiniBand. Have expertise in AI-specific observability tools and frameworks. Have experience with chaos engineering and systematic resilience testing. Have contributed to open-source infrastructure or ML tooling. Annual Salary: $325,000 – $485,000 USD Logistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Your safety matters to us: To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links—visit directly for confirmed position openings. #J-18808-Ljbffr
$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly...SeniorWork at officeVisa sponsorshipFlexible hours$163k - $203k
...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s... .... This is as much of a platform engineering role as it is SRE role — you will maintain... ...realm. We are building an agentic AI‑first operations model where AI agents handle...SeniorWork experience placementWork at officeLocal areaRemote workFlexible hours2 days per week$261k - $326k
...A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands over 15 years of experience in production engineering or related fields and involves setting technical directions...Senior$160k - $300k
...About Hebbia The AI platform for investors and bankers that generates alpha and drives upside. Founded in 2020... ...market leadership. The Role We are looking for a Site Reliability Engineer who thinks like a software engineer first. You will own critical production systems...Suggested$180k - $250k
...running at scale. You own the reliability and availability of customer-... ...infrastructure Leverage AI to an extreme level to automate... ...production issues, and improve software development speed, reliability... ..., runbooks, and chaos engineering Requirements 5+ years experience...SuggestedCurrently hiringRelocationVisa sponsorship- ...A technology company in San Francisco is seeking a DevOps Engineer to enhance the reliability and operational health of their production systems. You will set observability standards, build internal tooling, and partner with engineers for system design. The ideal candidate...Senior
$230k
...Join the engineering teams that bring OpenAI's ideas safely to the world... ...distribute the benefits of AI, while ensuring that this powerful... ...that they are performant and reliable. You will work in a deeply... ...-functional teams, including software engineers, product managers,...Work experience placementRelocation package$150k - $176k
...Software Engineer, Reliability Denver, Colorado, United States; San Francisco, California, United States Checkr is building the data platform... ...40,000 companies and millions of people rely on Checkr for AI verification in the moments that matter most: getting a new...Work at officeLocal areaRemote workRelocationFlexible hours3 days per week$190k - $270k
...AI Chopping Block, Inc. is looking for an AI Infrastructure Engineer to maintain user-facing services and production systems. You'll lead operations with tools like... ..., Terraform, and Kubernetes while ensuring reliability and scalability. The role requires a strong background...Senior$325k
...Anthropic is seeking a Reliability Engineer to enhance the resilience of AI systems. The successful candidate will develop Service Level Objectives and design observability systems while leading incident responses for critical services. The ideal candidate has a strong...Senior$121.5k - $145.5k
...Team/Role We are seeking a seasoned Sr. Software Engineer in the WEX Mobility Engineering... ...documents, and ensure lasting performance and reliability. Conduct objective and... ...and SQL ~ Experience in leveraging AI-enabled development tools such as Cursor...SeniorRemote workFlexible hours$170k - $260k
...Sr. Software Engineer Job Summary At Pantomath, we are building the autopilot for the data-driven... ...automate the entire lifecycle of data reliability. Our platform doesn't just monitor; it... ...systems, infrastructure, and applied AI. You'll build critical systems that integrate...SeniorWork at officeRemote workNight shift$164.2k - $225.7k
...operating the world’s best data and AI infrastructure platform so... ...business impact. Founded by engineers and driven by customer... ...re only getting started. As a Sr. Software Engineer for Customer Experience... ...upholding quality, safety, and reliability standards Design agentic...SeniorLocal areaWorldwide- ...About the Team We’re hiring Software Engineers to join our Applied Infrastructure organization,... ...shared mandate to raise the bar on safety, reliability, and velocity across OpenAI. About the... ...powers some of the most widely used AI systems in the world. You’ll help ensure...
$179.4k - $263.12k
...About the Role You are a Data Engineer, who is passionate about writing beautiful code and... ...build data transformations efficiently and reliably for different purposes (e.g. reporting,... ...queries Hands‑on experience using modern AI coding assistants (e.g., Claude Code, Windsurf...SeniorFull time$190k - $270k
...AI Chopping Block, Inc. in San Francisco is seeking an AI Infrastructure Engineer to maintain user-facing services and production systems. The role involves building and... ...tools like Ansible and Kubernetes, ensuring reliability and scalability. Candidates should have over...Senior- Engineering at Finalis Our engineering team is building... ...capital markets. As a Senior Software Engineer, you'll work... ...problems, and create ai-native technical... ...growth. The Role As a Sr. Software Engineer at Finalis... ...a passion for creating reliable, secure, and elegant...SeniorWork at officeRemote work
$180k - $220k
...future of healthcare with AI. As the leading provider of... ...About the Role As a Sr. Infrastructure Engineer at AKASA, you'll work closely... ...ensuring our infrastructure is reliable, observable, and easy to... ...customers. You'll collaborate with software engineers to embed...SeniorWork at officeLocal areaRemote workHome officeFlexible hours- ...A cutting-edge AI startup in San Francisco is seeking a Senior Infrastructure Engineer to build platforms for AI agents. Your role will involve creating systems that other engineers rely on, ensuring reliability and fast deployment. You'll work with technologies like...Senior
$181.1k - $318.4k
...AIML - Sr. Software Development Engineer, Evaluation At Apple, we create world-class innovative products... ...development and optimization of Apple's AI/ML features. Responsibilities:... ...Strong ability and passion for creating reliable, resilient, high-performance,...SeniorImmediate startRelocation$180k - $220k
...future of healthcare with AI. As the leading provider of... ...About the Role As a Sr. Infrastructure Engineer at AKASA, you’ll work closely... ...ensuring our infrastructure is reliable, observable, and easy to... ...customers. You'll collaborate with software engineers to embed...SeniorWork at officeLocal areaRemote work$163k - $203k
...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s... .... This is as much of a platform engineering role as it is SRE role — you will maintain... ...realm.We are building an agentic AI-first operations model where AI agents handle...SeniorWork experience placementWork at officeLocal areaRemote workFlexible hours2 days per week$166k - $267k
...The Role Pilot is hiring a Senior Software Engineer to join our Empowerment team. Team removes friction... ...-party platforms Design and implement reliable workflow orchestration across services,... ...systems Familiarity with agentic or AI‑assisted systems in production environments...SeniorFull timeTemporary workPart timeWork at officeFlexible hours3 days per week- ...achieve more. About the Role As a Sr Software Engineer on the Auto Refinance team, you will... ...web applications to deliver scalable, reliable solutions that improve customer outcomes... ...cloud services Experience leveraging AI tools to improve engineering workflows...SeniorWork experience placementWork at officeLocal areaRemote workRelocationFlexible hours
- ...A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...Senior
$140k - $260k
...Profound AI Marketing Platform Profound is the marketing platform for the AI era.... ...backbone that turns complex AI work into reliable, composable workflows. You will shape the... ...What You'll Do Build core workflow engine primitives used to orchestrate agents, tools...Work at officeVisa sponsorshipShift work$149.6k - $308k
...you love? It’s Possible. At Pinterest, AI isn't just a feature, it's a powerful partner... ...for inquisitive, well-rounded Backend engineers to join our Core, Monetization, and Tech... ...Experience in following best practices in writing reliable and maintainable code that may be used by...SeniorLocal areaRelocation package- 53 Stations is seeking a DevOps Engineer to enhance the systems powering Flux's platform. You’ll tackle operations from billing to onboarding while ensuring high system reliability and performance. With a focus on collaboration and ownership, you will develop internal...Senior
- ...OpenArt AI in San Francisco is seeking a Senior Platform & Reliability Engineer to design and improve the reliability of its infrastructure. The role emphasizes building... ...of experience in production systems, strong software engineering skills, and familiarity with cloud-...Senior
- ...Hyperbolic Labs is on a mission to democratize AI by breaking down the barriers to computing power with our Open-Access AI Cloud... ...to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff + Sr. Software Engineer, AI Reliability. Be the first to apply!
- software sales engineer San Francisco, CA
- software engineer internship remote San Francisco, CA
- IT software developer San Francisco, CA
- new grad software engineer San Francisco, CA
- software engineer staff San Francisco, CA
- integration software engineer San Francisco, CA
- machine learning software engineer San Francisco, CA
- software engineer part time San Francisco, CA
- facebook software engineer San Francisco, CA
- senior robotics software engineer San Francisco, CA


