Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff + Sr. Software Engineer, AI Reliability

$325k

Menlo Ventures

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths -- every hop from the SDK through our network, API layers, serving infrastructure, accelerators and back. We jump into the trenches alongside partner teams to make the systems that deliver Claude more robust and resilient, be it during an incident or collaborating on projects. Reliability here is an emergent phenomenon that transcends any single team's boundaries, so someone has to zoom out and look at the whole picture. That's us -- and it means few teams at Anthropic offer this kind of dynamic, cross-cutting exposure to the systems that matter most. Claude has your back. AIRE has Claude's. Help us keep Claude reliable for everyone who depends on it. Responsibilities Develop appropriate Service Level Objectives for large language model serving systems, balancing availability and latency with development velocity. Design and implement monitoring and observability systems across the token path. Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers. Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements. Support the reliability of safeguard model serving -- critical for both site reliability and Anthropic's safety commitments. You may be a good fit if you: Have strong distributed systems, infrastructure, or reliability backgrounds -- we're looking for reliability-minded software engineers and SREs. Are curious and brave -- comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don't have deep expertise yet. Think holistically about how systems compose and where the seams are. Can build lasting relationships across teams -- our engagement model depends on being welcomed as teammates, not outsiders with opinions. Care about users and feel ownership over outcomes, even for systems you don't own. Have excellent communication and collaboration skills -- you'll be partnering across the entire company. Bring diverse experience -- the team's strength comes from people who've built product stacks, scaled databases, run massive distributed systems, and everything in between. Strong candidates may also: Have been an SRE, Production Engineer, or in similar reliability-focused roles on large scale systems. Have experience operating large-scale model serving or training infrastructure (>1000 GPUs). Have experience with one or more ML hardware accelerators (GPUs, TPUs, Trainium). Understand ML-specific networking optimizations like RDMA and InfiniBand. Have expertise in AI-specific observability tools and frameworks. Have experience with chaos engineering and systematic resilience testing. Have contributed to open-source infrastructure or ML tooling. Annual Salary: $325,000 – $485,000 USD Logistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Your safety matters to us: To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links—visit directly for confirmed position openings. #J-18808-Ljbffr Menlo Ventures

Vacancy posted 14 hours ago
Similar jobs that could be interesting for youBased on the Staff + Sr. Software Engineer, AI Reliability in San Francisco, CA vacancy
  • $320k

     ...Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be...  ...committed researchers, engineers, policy experts, and...  ...Have significant software engineering experience, with...  ...Currently, we expect all staff to be in one of our offices... 
    Senior
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    14 hours ago
  • $170k - $240k

    SENIOR SOFTWARE ENGINEER - OBSERVABILITY AND RELIABILITY ABOUT THE ROLE We are growing the engineering team and looking for engineers who have the chops...  ...comprehensive benefits package. About us: Sigma is the AI Apps and agentic analytics platform built on the cloud... 
    Senior
    Full time
    Work at office
    Flexible hours

    Sigma Computing

    San Francisco, CA
    3 days ago
  • $163k - $203k

     ...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s...  ...portfolio. This is as much a platform engineering role as it is an SRE role— you will...  ...realm. We are building an agentic AI‑first operations model where AI agents handle... 
    Senior
    Work experience placement
    Work at office
    Remote work
    Flexible hours
    2 days per week

    GoTo Meeting

    San Francisco, CA
    3 days ago
  • $230k

     ...Join the engineering teams that bring OpenAI's ideas safely to the world...  ...distribute the benefits of AI, while ensuring that this powerful...  ...that they are performant and reliable. You will work in a deeply...  ...-functional teams, including software engineers, product managers,... 
    Suggested
    Work experience placement
    Relocation package

    OpenAI

    San Francisco, CA
    14 hours ago
  • $163k - $203k

     ...contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s...  .... This is as much of a platform engineering role as it is SRE role — you will maintain...  ...realm.We are building an agentic AI-first operations model where AI agents handle... 
    Senior
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Prosper

    San Francisco, CA
    2 days ago
  • $200k - $300k

     ...Senior Software Engineer - San Francisco, CA (onsite) A fast growing AI platform supporting more than one thousand physical locations and tens of millions of...  ...Engineers who have operated in environments where reliability, scale, and performance are non negotiable... 
    Senior
    Remote work
    Relocation package

    Connect Staffing

    San Francisco, CA
    2 days ago
  • $261k - $326k

    A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands over 15 years of experience in production engineering or related fields and involves setting technical directions... 
    Senior

    Crusoe

    San Francisco, CA
    3 days ago
  • $170k - $260k

     ...automate the entire lifecycle of data reliability. Our platform doesn't just monitor; it...  ...Opportunity We're looking for a Senior Software Engineer to join our founding engineering team...  ...systems, infrastructure, and applied AI. You'll build critical systems that... 
    Senior
    Full time
    Work at office
    Remote work
    3 days per week

    Pantomath

    San Francisco, CA
    2 days ago
  • $121.5k - $145.5k

     ...Team/Role We are seeking a seasoned Sr. Software Engineer in the WEX Mobility Engineering...  ...documents, and ensure lasting performance and reliability. Conduct objective and...  ...and SQL ~ Experience in leveraging AI-enabled development tools such as Cursor... 
    Senior
    Remote work
    Flexible hours

    WEX

    San Francisco, CA
    4 days ago
  • $120k - $150k

     ...Team The Store Systems Engineering organization at Williams-Sonoma...  ...responsible for delivering reliable, scalable, and high-performing...  ...About the Role The Senior Software Engineer - POS serves as a...  ...coverage using Mabl. Leverage AI tools such as GitHub Copilot... 
    Senior
    Work experience placement
    H1b
    Work at office
    Local area
    Home office
    Relocation package
    Monday to Thursday

    Williams-Sonoma

    San Francisco, CA
    1 day ago
  • $180k - $220k

     ...future of healthcare with AI. As the leading provider of...  ...About the Role As a Sr. Infrastructure Engineer at AKASA, you'll work closely...  ...ensuring our infrastructure is reliable, observable, and easy to...  ...customers. You'll collaborate with software engineers to embed... 
    Senior
    Work at office
    Local area
    Remote work
    Home office
    Flexible hours

    Akasa, Inc

    San Francisco, CA
    2 days ago
  • $181.1k - $318.4k

     ...AIML - Sr. Software Development Engineer, Evaluation At Apple, we create world-class innovative products...  ...development and optimization of Apple's AI/ML features. Responsibilities:...  ...Strong ability and passion for creating reliable, resilient, high-performance,... 
    Senior
    Immediate start
    Relocation

    Apple

    San Francisco, CA
    2 days ago
  • $193.3k - $261.5k

     ...passionate Android/React Native engineer to join our team, where...  ...ambient personal AI. The successful...  ...internship professional software development experience...  ...architecture (design patterns, reliability and scaling) of new and...  ..., supervisors, and staff; adhere to standards of... 
    Senior
    Internship
    Local area
    Flexible hours

    Amazon

    San Francisco, CA
    1 day ago
  • $166k - $267k

     ...The Role Pilot is hiring a Senior Software Engineer to join our Empowerment team. This team...  ...party platforms Design and implement reliable workflow orchestration across services,...  ...systems Familiarity with agentic or AI-assisted systems in production environments... 
    Senior
    Full time
    Temporary work
    Part time
    Work at office
    Flexible hours
    3 days per week

    Pilot.com

    San Francisco, CA
    4 days ago
  • A technology company in San Francisco is seeking a DevOps Engineer to enhance the reliability and operational health of their production systems. You will set observability standards, build internal tooling, and partner with engineers for system design. The ideal candidate... 
    Senior

    Flux Enterprise

    San Francisco, CA
    3 days ago
  • $180k - $250k

     ...running at scale. You own the reliability and availability of customer-...  ...infrastructure Leverage AI to an extreme level to automate...  ...production issues, and improve software development speed,...  ...automation, runbooks, and chaos engineering Requirements 5+ years experience... 
    Currently hiring
    Relocation
    Visa sponsorship

    Fal

    San Francisco, CA
    4 days ago
  • $160k - $300k

    About Hebbia The AI platform for investors and bankers that generates alpha and drives upside. Founded in 2020...  ...market leadership. The Role We are looking for a Site Reliability Engineer who thinks like a software engineer first. You will own critical production... 

    Hebbia, Inc.

    San Francisco, CA
    3 days ago
  • $190k - $270k

    AI Chopping Block, Inc. is looking for an AI Infrastructure Engineer to maintain user-facing services and production systems. You'll lead operations with tools like...  ..., Terraform, and Kubernetes while ensuring reliability and scalability. The role requires a strong background... 
    Senior

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  • $181.1k - $318.4k

     ...Staff/Sr. iOS Engineer - AI, Search & Knowledge Platforms Work Locations (2) Submit Resume Do you want to make Apple products smarter for...  ...large codebases and practical solutions ~ Knowledge of software patterns that allow for testing ~ Excellent interpersonal... 
    Senior
    Work experience placement
    Relocation

    Apple

    San Francisco, CA
    2 days ago
  • $140k - $260k

     ...Profound Workflow Runner Engineer Profound is building the foundational agentic layer for modern companies. Our Workflow Runner is the execution backbone that turns complex AI work into reliable, composable workflows. You will shape the core primitives, execution,... 
    Work at office
    Visa sponsorship

    Profound

    San Francisco, CA
    2 days ago
  • $179.4k - $263.12k

    About the Role You are a Data Engineer, who is passionate about writing beautiful code and...  ...build data transformations efficiently and reliably for different purposes (e.g. reporting,...  ...queries Hands‑on experience using modern AI coding assistants (e.g., Claude Code, Windsurf... 
    Senior
    Full time

    6Sense

    San Francisco, CA
    2 days ago
  • $149.6k - $308k

     ...you love? It’s Possible. At Pinterest, AI isn't just a feature, it's a powerful partner...  ...for inquisitive, well-rounded Backend engineers to join our Core, Monetization, and Tech...  ...Experience in following best practices in writing reliable and maintainable code that may be used by... 
    Senior
    Local area
    Relocation package

    Pinterest

    San Francisco, CA
    14 hours ago
  • $190k - $270k

    AI Chopping Block, Inc. in San Francisco is seeking an AI Infrastructure Engineer to maintain user-facing services and production systems. The role involves building and...  ...tools like Ansible and Kubernetes, ensuring reliability and scalability. Candidates should have over... 
    Senior

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  • $180k - $220k

     ...future of healthcare with AI. As the leading provider of...  ...reality. About the Role As a Sr. Infrastructure Engineer at AKASA, you’ll work...  ...ensuring our infrastructure is reliable, observable, and easy to operate...  .... You'll collaborate with software engineers to embed... 
    Senior
    Work at office
    Local area
    Remote work

    AKASA

    San Francisco, CA
    1 day ago
  • $164.2k - $225.7k

     ...operating the world’s best data and AI infrastructure platform so...  ...business impact. Founded by engineers and driven by customer...  ...only getting started. As a Sr. Software Engineer for Customer Experience...  ...upholding quality, safety, and reliability standards Design agentic... 
    Senior
    Local area
    Worldwide

    Databricks Inc.

    San Francisco, CA
    2 days ago
  • A cutting-edge AI startup in San Francisco is seeking a Senior Infrastructure Engineer to build platforms for AI agents. Your role will involve creating systems that other engineers rely on, ensuring reliability and fast deployment. You'll work with technologies like Python... 
    Senior

    Giga

    San Francisco, CA
    1 day ago
  • About the Team We’re hiring Software Engineers to join our Applied Infrastructure organization, and...  ...mandate to raise the bar on safety, reliability, and velocity across OpenAI. About the...  ...that powers some of the most widely used AI systems in the world. You’ll help ensure... 

    Slope

    San Francisco, CA
    2 days ago
  • $127k - $191k

     ...Description Job Description Senior Software Engineer I (Octothorpe) About Invoca: Invoca is the leading AI-powered conversation...  ...checks, and rollback quickly and reliably. Octothorpe owns and...  ...contributor reporting to the Sr. Software Engineering Manager.... 
    Senior
    Work experience placement
    Currently hiring
    Remote work
    Flexible hours

    Invoca

    San Francisco, CA
    9 days ago
  • $200k - $260k

     ...Description About Us We’re building the AI infrastructure powering the future of...  ...into regulated industries where precision, reliability, and performance matter most. About the Role We're seeking a Sr Software Engineer, Product to help us reshape how millions... 
    Senior
    Full time
    Work at office
    Immediate start
    Relocation

    Salient

    San Francisco, CA
    7 days ago
  • $193.3k - $261.5k

     ...Description At Frontier AI & Robotics, we're not...  ...solutions to ensure reliable model serving at scale...  ...compilers Maintain high engineering standards through...  ...internship professional software development experience...  ...employees, supervisors, and staff; adhere to standards of... 
    Senior
    Internship
    Local area
    Flexible hours

    Amazon

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff + Sr. Software Engineer, AI Reliability. Be the first to apply!