Staff+ Software Engineer, Inference Runtime

$405k

Menlo Ventures

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About The Role Anthropic's Inference organization serves Claude to millions of users and enterprise customers with the speed, reliability, and efficiency that frontier AI demands. We build across GPUs, TPUs, and Trainium, and the complexity of our development environment grows with every platform we add. We’re looking for a Staff Engineer to be a technical lead for Inference Runtime: the team that owns the shared, accelerator‑agnostic core of our inference serving stack, whose performance, correctness, and abstractions every accelerator builds on. This is a senior IC role with broad technical ownership. You’ll set technical direction for the runtime’s architecture, its release and validation systems, and the workflows engineers use to develop on top of it. You will partner across Inferencing to make hard calls on boundaries, prioritization, and tradeoffs across heterogeneous accelerator platforms. You’ll pair with the team’s Engineering Manager, who owns hiring and people development, while you own the technical roadmap and drive the work, representing the team in cross‑org efforts spanning serving, scaling, and accelerator teams. This role is for someone who has been the technical anchor of a platform with many internal consumers, who thinks in systems and feedback loops, and who gets real satisfaction from building abstractions that hold up as the system scales another order of magnitude. Key Responsibilities Set technical direction for the team, owning the architecture and roadmap for the shared runtime of the inference serving stack Own and evolve the accelerator‑agnostic runtime itself – its interfaces, internal boundaries, and build structure – including hands‑on work in a performance‑sensitive Rust and Python codebase Keep the platform’s expansion cost low by ensuring new models and deployment targets pay only for their own specialization, and edge cases stitch back into the core easily Drive efficient accelerator usage – utilization, scheduling, memory management – across GPU, TPU, and Trainium Build the runtime’s validation surface around partitioned builds, change‑scoped testing, and canary/shadow/rollback as first‑class mechanisms Act as a technical counterpart to Anthropic’s central Infrastructure org on the compilers, build systems, and toolchains the runtime depends on, contributing Inference’s performance and correctness requirements, and making the call on build vs. adopt Mentor engineers on the team through design review, code review, and direct collaboration, raising the technical bar without owning headcount Minimum Qualifications Deep background in systems engineering or ML infrastructure, with the ability to go hands‑on with performance profiling, latency and throughput optimization, and systems debugging at scale Real depth in at least one accelerator ecosystem (CUDA/GPU, TPU, or Trainium/AWS Neuron) and genuine appetite to keep the runtime agnostic across all of them Have significant software engineering experience, with a strong background in high‑performance, large‑scale distributed systems serving millions of users A track record of defining and using engineering metrics to drive improvement: you’ve set SLOs on platform surfaces, and driven escape rates, release times, latency, or throughput in a measurable direction Experience driving technical alignment across organizational boundaries, advocating for your team’s needs while contributing to shared infrastructure Strong written and verbal communication, and the ability to influence technical direction without formal authority Preferred Qualifications 8+ years of software engineering experience, with significant time as the technical lead or anchor on a platform, inference runtime, or ML infrastructure team Experience with ML compiler toolchains (XLA, Triton, NeuronX) or accelerator driver/firmware management at scale Background operating production as a validation surface at scale: shadow traffic, canary populations, automated baseline comparison, fast rollback Experience with deterministic or simulation‑based testing for hardware‑dependent systems Experience with CI/CD systems at scale, particularly for workloads involving accelerator hardware Familiarity with Kubernetes‑based development and job scheduling environments Prior tech lead experience on a developer productivity or platform engineering team at a fast‑growing AI/ML company Annual Salary

$405,000—$485,000 USD

Logistics Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position Location‑based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren’t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. #J-18808-Ljbffr Menlo Ventures

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Staff+ Software Engineer, Inference Runtime in New York, NY vacancy

Staff Software Engineer - Agent Runtime & Infrastructure
...workstreams — the agent runtime and backend... ...infrastructure. This is staff-level ownership from architecture... ...gateway and rules engine — YAML-configurable evaluation... ...APIs to self-hosted inference — own the decision and... ...of autonomous financial software, this is your role....
Suggested
Siza- Buso Consulting
New York, NY
1 day ago
Lead AI Inference Engineer 100% Remote
About the job You will own the inference backbone behind QVAC's local AI stack: the C++ systems layer that makes models run... ...predictably on real user hardware. The role is centered on engineering quality at runtime level, including startup behavior, memory pressure,...
Suggested
Remote job
Local area
Framework Ventures
New York, NY
4 days ago
Remote Software Engineer II Console Runtime
...A leading communications technology company is seeking a remote Software Engineer (L2) to join the Console Runtime team. This position will involve designing and developing new capabilities for the Twilio Console platform using technologies like GraphQL, NodeJS, and React...
Suggested
Remote work
Feedinkoo
New York, NY
1 day ago
Member of Technical Staff - Research Software Engineer
...infrastructure Distributed training and inference systems Experiment... ...The goal is to build the engineering foundation that allows... ...model sharding Orchestration & Runtime Systems Ray, Kubernetes, Slurm... ...About You You are a strong software engineer who speaks the language...
Suggested
Relocation package
Reflection
New York, NY
3 days ago
Senior Staff+ Software Engineer, Kubernetes Platform
$405k
...with research, training, and inference to understand workload... ...Qualifications Significant software engineering experience building and operating... ...etcd, client‑go, controller‑runtime, or similar. Experience building... ...: Currently, we expect all staff to be in one of our offices...
Suggested
Work at office
Visa sponsorship
Flexible hours
Menlo Ventures
New York, NY
3 days ago
Staff Software Engineer
$160k - $230k
...technology ecosystems. About the Role: We’re looking for a Staff Software Engineer who thrives at the intersection of AI systems design, large... ...distributed, event-driven systems and APIs that handle data and inference at scale. Conduct deep‑dive code reviews and performance...
Work at office
Local area
Standard Template Labs
New York, NY
2 days ago
Core Developer - Polkadot Runtime
...remote-first, global team develop open-source software that anyone can use or improve. This includes... ...About the Role As a Core Developer within the Runtime function you'll collaborate with product, design, and engineering teams to build and maintain core protocol functionality...
For contractors
Remote work
Parity Inc
New York, NY
1 day ago
Senior ML Inference Engineer - Platform
$128.7k - $261.3k
...Team The Model Deployment & Inference Solutions team in GM AV deploys... ...performed manually by engineers. Build the developer experience... ...surfaces deployment risk (compile, runtime, parity, latency) early in... ...designing clean, well-tested software with clear interfaces and good...
Flexible hours
Shift work
General Motors
New York, NY
1 day ago
Edge Inference Developer Tooling Founder
$250k
...fragmenting across dozens of chipsets and runtimes. No single vendor's tools cover the... ...for granted. The Opportunity Build the software and tooling layer that makes edge hardware... ...what those models are doing in the field. Inference latency, memory pressure, thermal headroom...
Forum Ventures
New York, NY
2 days ago
Staff Software Engineer, Community Builders
$217k - $303.9k
Staff Software Engineer, Community Builders Remote - United States Reddit is a community of communities. It’s built on shared interests, passion... ...design the backend architectures, retrieval systems, and inference pipelines required to scale them. You understand the...
Work experience placement
Remote work
Reddit, Inc.
New York, NY
1 day ago
New York, NY Staff Software Engineer
...for a senior, backend-leaning full‑stack engineer who can make strong architecture... ...What we’re looking for 8-12 years of software engineering experience Experience in a... ...become deeply trusted by customers. The edge runtime, ingestion pipeline, site correlator, incident...
Eino
New York, NY
2 days ago
Staff Software Engineer, Localization
...providers. For more information, visit We are looking for a Staff Software Engineer to join the Localization team and own significant parts of... ..., Kalibr, Ceres, GTSAM Experience building online/runtime monitoring and defining safety‑relevant thresholds within a...
Work experience placement
AeroVect
New York, NY
4 days ago
Staff / Senior Software Engineer, Product Teams
...About the Role Suno is growing fast, and we’re hiring Staff and Senior Software Engineers to work on the products that define how people experience... ...videos, and image-to-video experiences using cutting‑edge inference models all the while inventing novel interface design...
Full time
Work at office
Local area
Immediate start
Suno
New York, NY
1 day ago
Staff Software Engineer Blockchain Protocols
...the flows of data on blockchain networks and we are seeking a Staff Engineer with deep expertise in network protocol design and... ...feasibility and production, with a focus on network or blockchain runtimes. Responsibilities Architect and build highly scalable, reliable...
Full time
Remote work
Framework Ventures
New York, NY
1 day ago
Staff Software Engineer, Enterprise GenAI New York, NY Apply →
$188k - $235k
...platform that provides APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a strong engineer to join our team and help us build and scale... ...candidate will have a strong understanding of software engineering principles and practices, as well...
Full time
Shift work
Scale AI, Inc.
New York, NY
18 hours ago
Senior Staff Software Engineer, Storage
$244k - $305k
...tenets. As a member of this team you will be working with top engineering on a modern distributed database system. You will be the technological... ...(ORM), schema definition, schema life‑cycle management, and runtime schema discovery. Your Expertise 12+ years of relevant...
Work experience placement
Casual work
Live in
Work at office
Remote work
Airbnb
New York, NY
3 days ago
Staff Software Engineer, Claude Design
$320k
...growing group of researchers, engineers, policy experts, and business... ...The Role We’re looking for Software Engineers to help build and shape... ...‑time editing, design‑system inference, and AI‑driven generation... ...policy: Currently, we expect all staff to be in one of our offices...
Work at office
Visa sponsorship
Flexible hours
Shift work
Menlo Ventures
New York, NY
1 day ago
Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)
$229.9k - $262.4k
...Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)Overview:At Capital One, we are creating responsible and reliable AI... ...with Capital One.Design, develop, test, deploy, and support AI software components including foundation model training, large...
Full time
Part time
Local area
Capital One
New York, NY
1 day ago
Lead AI Engineer (FM Hosting, LLM Inference)
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking for good... ...Capital One. Design, develop, test, deploy, and support AI software components including foundation model training, large...
Local area
Capital One National Association
New York, NY
2 days ago
Software Engineer, Model Routing & Inference Engineering · · New York; San Francisco Apply →
...using a combination of inventive research, design, and engineering. Our organization is very flat, and our team is... ...crazy ideas, and shipping code. About the Role As a Software Engineer on the Model Routing & Inference team at Cursor, you'll build the inference platform...
Anysphere
New York, NY
4 days ago
Staff+ Software Engineer, Backend and Infra
$225k
...Staff+ Software Engineer, Backend and Infra Haize Labs takes AI-based applications from proof-of-concept to production. We eliminate risk and... ...powering functionality like model evaluations, red team attacks, runtime guardrails, and more Collaborate closely with the research...
Work at office
Visa sponsorship
Enboarder
New York, NY
2 days ago
Staff Infrastructure Software Engineer, Enterprise AI San Francisco, CA Apply →
$248.4k - $310.5k
...products. Our platform provides APIs for knowledge retrieval, inference, and evaluation, enabling customers to build and deploy... ...Enterprise use cases. We're looking for a Senior Infrastructure Software Engineer to build and scale our core infrastructure in a fast-paced...
Full time
Scale AI, Inc.
New York, NY
4 days ago
Staff Software Engineer, Agent Orchestration
$300k - $430k
...actions across millions of interactions. About the Role As a Staff Software Engineer on the Agent Orchestration team, you will own the long term... ...bar Even better if you have Experience designing runtimes, execution engines, or agent frameworks Experience with model...
Work at office
Decagon
New York, NY
3 days ago
Mobile LLM Inference Engineer (Vulkan)
Framework Ventures is seeking an experienced AI Model Engineer to drive innovations in kernel development, model optimization, and GPU acceleration. This role involves optimizing inference frameworks for language models, with a strong emphasis on mobile and integrated...
Framework Ventures
New York, NY
1 day ago
Product, Platform & Enterprise Full Stack Sr/Staff Software Engineer (Remote - US)
$165k - $300k
Product, Platform & Enterprise Full Stack Sr/Staff Software Engineer (Remote - US) Be part of a team that values safety, inclusion, and excellence... ...systems, micro‑services, data platforms, serverless runtimes, customer experiences, and applying AI/ML to develop scalable...
Remote job
Full time
H1b
BNSF
New York, NY
1 day ago
Senior Go Engineer Cloud Runtime Protection (Hybrid)
CrowdStrike Holdings, Inc. is looking for experienced engineers to join their Cloud Runtime Protection team in New York. You will design and develop cloud-based systems that protect workloads, collaborating with cross-functional teams and continuously improving product...
Work at office
CrowdStrike Holdings, Inc.
New York, NY
5 days ago
Sr. Software Engineer - Cloud - Cloud Runtime Protection (Hybrid)
$140k - $215k
...of cybersecurity starts with you.About the Role:Join our Cloud Runtime Protection team and help build the technology that stops breaches... ...for thousands of customers worldwide.We're seeking passionate engineers to build cutting-edge runtime protection capabilities,...
Work experience placement
Work at office
Local area
Worldwide
CrowdStrike Holdings, Inc.
New York, NY
5 days ago
Senior Staff Software Engineer - Semantic Data Modeling
$170k
Senior Staff Software Engineer - Semantic Data Modeling Join to apply for the Senior Staff Software Engineer - Semantic Data Modeling role at... ...Referrals increase your chances of interviewing at WEX by 2x Inferred from the description for this job 401(k) Vision insurance...
Full time
Freelance
Remote work
WEX
New York, NY
1 day ago
Remote AI Engineer — Voice AI & Inference Systems
...first Voice AI startup is looking for an AI Engineer to join their early team. The role... ...as infrastructure for model training and inference. Ideal candidates have a Bachelor's degree... ...equivalent experience and significant software development experience. This is a fully...
Remote job
Incept AI
New York, NY
1 day ago
Senior AI Engineer, Agentic Systems & Runtime Architecture
$160k - $174k
## Senior AI Engineer, Agentic Systems & Runtime ArchitectureApplylocations: New York, NYtime type: Full timeposted on: Posted 9 Days Agojob requisition id: JR0032666*****Together we fight for everyone’s opportunity for a better financial future.*****We will do this together...
Part time
Work experience placement
Local area
Flexible hours
Voya Financial, Inc.
New York, NY
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff+ Software Engineer, Inference Runtime. Be the first to apply!