Staff + Sr. Software Engineer, AI Reliability
$325kMenlo Ventures
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths -- every hop from the SDK through our network, API layers, serving infrastructure, accelerators and back. We jump into the trenches alongside partner teams to make the systems that deliver Claude more robust and resilient, be it during an incident or collaborating on projects. Reliability here is an emergent phenomenon that transcends any single team's boundaries, so someone has to zoom out and look at the whole picture. That's us -- and it means few teams at Anthropic offer this kind of dynamic, cross-cutting exposure to the systems that matter most. Claude has your back. AIRE has Claude's. Help us keep Claude reliable for everyone who depends on it. Responsibilities Develop appropriate Service Level Objectives for large language model serving systems, balancing availability and latency with development velocity. Design and implement monitoring and observability systems across the token path. Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers. Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements. Support the reliability of safeguard model serving -- critical for both site reliability and Anthropic's safety commitments. You may be a good fit if you: Have strong distributed systems, infrastructure, or reliability backgrounds -- we're looking for reliability-minded software engineers and SREs. Are curious and brave -- comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don't have deep expertise yet. Think holistically about how systems compose and where the seams are. Can build lasting relationships across teams -- our engagement model depends on being welcomed as teammates, not outsiders with opinions. Care about users and feel ownership over outcomes, even for systems you don't own. Have excellent communication and collaboration skills -- you'll be partnering across the entire company. Bring diverse experience -- the team's strength comes from people who've built product stacks, scaled databases, run massive distributed systems, and everything in between. Strong candidates may also: Have been an SRE, Production Engineer, or in similar reliability-focused roles on large scale systems. Have experience operating large-scale model serving or training infrastructure (>1000 GPUs). Have experience with one or more ML hardware accelerators (GPUs, TPUs, Trainium). Understand ML-specific networking optimizations like RDMA and InfiniBand. Have expertise in AI-specific observability tools and frameworks. Have experience with chaos engineering and systematic resilience testing. Have contributed to open-source infrastructure or ML tooling. Annual Salary: $325,000 – $485,000 USD Logistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Your safety matters to us: To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links—visit directly for confirmed position openings. #J-18808-Ljbffr Menlo Ventures
$320k
...Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be... ...committed researchers, engineers, policy experts, and... ...Have significant software engineering experience, with... ...Currently, we expect all staff to be in one of our offices...SeniorWork at officeVisa sponsorshipFlexible hours$170k - $240k
SENIOR SOFTWARE ENGINEER - OBSERVABILITY AND RELIABILITY ABOUT THE ROLE We are growing the engineering team and looking for engineers who have the chops... ...comprehensive benefits package. About us: Sigma is the AI Apps and agentic analytics platform built on the cloud...SeniorFull timeWork at officeFlexible hours$163k - $203k
...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s... ...portfolio. This is as much a platform engineering role as it is an SRE role— you will... ...realm. We are building an agentic AI‑first operations model where AI agents handle...SeniorWork experience placementWork at officeRemote workFlexible hours2 days per week$230k
...Join the engineering teams that bring OpenAI's ideas safely to the world... ...distribute the benefits of AI, while ensuring that this powerful... ...that they are performant and reliable. You will work in a deeply... ...-functional teams, including software engineers, product managers,...SuggestedWork experience placementRelocation package$163k - $203k
...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s... .... This is as much of a platform engineering role as it is SRE role — you will maintain... ...realm.We are building an agentic AI-first operations model where AI agents handle...SeniorWork experience placementWork at officeLocal areaRemote workFlexible hours2 days per week$200k - $300k
...Senior Software Engineer - San Francisco, CA (onsite) A fast growing AI platform supporting more than one thousand physical locations and tens of millions of... ...Engineers who have operated in environments where reliability, scale, and performance are non negotiable...SeniorRemote workRelocation package$261k - $326k
A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands over 15 years of experience in production engineering or related fields and involves setting technical directions...Senior$170k - $260k
...automate the entire lifecycle of data reliability. Our platform doesn't just monitor; it... ...Opportunity We're looking for a Senior Software Engineer to join our founding engineering team... ...systems, infrastructure, and applied AI. You'll build critical systems that...SeniorFull timeWork at officeRemote work3 days per week$121.5k - $145.5k
...Team/Role We are seeking a seasoned Sr. Software Engineer in the WEX Mobility Engineering... ...documents, and ensure lasting performance and reliability. Conduct objective and... ...and SQL ~ Experience in leveraging AI-enabled development tools such as Cursor...SeniorRemote workFlexible hours$120k - $150k
...Team The Store Systems Engineering organization at Williams-Sonoma... ...responsible for delivering reliable, scalable, and high-performing... ...About the Role The Senior Software Engineer - POS serves as a... ...coverage using Mabl. Leverage AI tools such as GitHub Copilot...SeniorWork experience placementH1bWork at officeLocal areaHome officeRelocation packageMonday to Thursday$180k - $220k
...future of healthcare with AI. As the leading provider of... ...About the Role As a Sr. Infrastructure Engineer at AKASA, you'll work closely... ...ensuring our infrastructure is reliable, observable, and easy to... ...customers. You'll collaborate with software engineers to embed...SeniorWork at officeLocal areaRemote workHome officeFlexible hours$181.1k - $318.4k
...AIML - Sr. Software Development Engineer, Evaluation At Apple, we create world-class innovative products... ...development and optimization of Apple's AI/ML features. Responsibilities:... ...Strong ability and passion for creating reliable, resilient, high-performance,...SeniorImmediate startRelocation$193.3k - $261.5k
...passionate Android/React Native engineer to join our team, where... ...ambient personal AI. The successful... ...internship professional software development experience... ...architecture (design patterns, reliability and scaling) of new and... ..., supervisors, and staff; adhere to standards of...SeniorInternshipLocal areaFlexible hours$166k - $267k
...The Role Pilot is hiring a Senior Software Engineer to join our Empowerment team. This team... ...party platforms Design and implement reliable workflow orchestration across services,... ...systems Familiarity with agentic or AI-assisted systems in production environments...SeniorFull timeTemporary workPart timeWork at officeFlexible hours3 days per week- A technology company in San Francisco is seeking a DevOps Engineer to enhance the reliability and operational health of their production systems. You will set observability standards, build internal tooling, and partner with engineers for system design. The ideal candidate...Senior
$180k - $250k
...running at scale. You own the reliability and availability of customer-... ...infrastructure Leverage AI to an extreme level to automate... ...production issues, and improve software development speed,... ...automation, runbooks, and chaos engineering Requirements 5+ years experience...Currently hiringRelocationVisa sponsorship$160k - $300k
About Hebbia The AI platform for investors and bankers that generates alpha and drives upside. Founded in 2020... ...market leadership. The Role We are looking for a Site Reliability Engineer who thinks like a software engineer first. You will own critical production...$190k - $270k
AI Chopping Block, Inc. is looking for an AI Infrastructure Engineer to maintain user-facing services and production systems. You'll lead operations with tools like... ..., Terraform, and Kubernetes while ensuring reliability and scalability. The role requires a strong background...Senior$181.1k - $318.4k
...Staff/Sr. iOS Engineer - AI, Search & Knowledge Platforms Work Locations (2) Submit Resume Do you want to make Apple products smarter for... ...large codebases and practical solutions ~ Knowledge of software patterns that allow for testing ~ Excellent interpersonal...SeniorWork experience placementRelocation$140k - $260k
...Profound Workflow Runner Engineer Profound is building the foundational agentic layer for modern companies. Our Workflow Runner is the execution backbone that turns complex AI work into reliable, composable workflows. You will shape the core primitives, execution,...Work at officeVisa sponsorship$179.4k - $263.12k
About the Role You are a Data Engineer, who is passionate about writing beautiful code and... ...build data transformations efficiently and reliably for different purposes (e.g. reporting,... ...queries Hands‑on experience using modern AI coding assistants (e.g., Claude Code, Windsurf...SeniorFull time$149.6k - $308k
...you love? It’s Possible. At Pinterest, AI isn't just a feature, it's a powerful partner... ...for inquisitive, well-rounded Backend engineers to join our Core, Monetization, and Tech... ...Experience in following best practices in writing reliable and maintainable code that may be used by...SeniorLocal areaRelocation package$190k - $270k
AI Chopping Block, Inc. in San Francisco is seeking an AI Infrastructure Engineer to maintain user-facing services and production systems. The role involves building and... ...tools like Ansible and Kubernetes, ensuring reliability and scalability. Candidates should have over...Senior$180k - $220k
...future of healthcare with AI. As the leading provider of... ...reality. About the Role As a Sr. Infrastructure Engineer at AKASA, you’ll work... ...ensuring our infrastructure is reliable, observable, and easy to operate... .... You'll collaborate with software engineers to embed...SeniorWork at officeLocal areaRemote work$164.2k - $225.7k
...operating the world’s best data and AI infrastructure platform so... ...business impact. Founded by engineers and driven by customer... ...only getting started. As a Sr. Software Engineer for Customer Experience... ...upholding quality, safety, and reliability standards Design agentic...SeniorLocal areaWorldwide- A cutting-edge AI startup in San Francisco is seeking a Senior Infrastructure Engineer to build platforms for AI agents. Your role will involve creating systems that other engineers rely on, ensuring reliability and fast deployment. You'll work with technologies like Python...Senior
- About the Team We’re hiring Software Engineers to join our Applied Infrastructure organization, and... ...mandate to raise the bar on safety, reliability, and velocity across OpenAI. About the... ...that powers some of the most widely used AI systems in the world. You’ll help ensure...
$127k - $191k
...Description Job Description Senior Software Engineer I (Octothorpe) About Invoca: Invoca is the leading AI-powered conversation... ...checks, and rollback quickly and reliably. Octothorpe owns and... ...contributor reporting to the Sr. Software Engineering Manager....SeniorWork experience placementCurrently hiringRemote workFlexible hours$200k - $260k
...Description About Us We’re building the AI infrastructure powering the future of... ...into regulated industries where precision, reliability, and performance matter most. About the Role We're seeking a Sr Software Engineer, Product to help us reshape how millions...SeniorFull timeWork at officeImmediate startRelocation$193.3k - $261.5k
...Description At Frontier AI & Robotics, we're not... ...solutions to ensure reliable model serving at scale... ...compilers Maintain high engineering standards through... ...internship professional software development experience... ...employees, supervisors, and staff; adhere to standards of...SeniorInternshipLocal areaFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff + Sr. Software Engineer, AI Reliability. Be the first to apply!
- graduate software developer San Francisco, CA
- rust software engineer San Francisco, CA
- senior software design engineer San Francisco, CA
- software engineer student San Francisco, CA
- software engineer amazon San Francisco, CA
- software developer positions San Francisco, CA
- software engineer full time San Francisco, CA
- software qa engineer San Francisco, CA
- new graduate software engineer San Francisco, CA
- junior software developer San Francisco, CA


