Staff+ Software Engineer, Inference Runtime
$405kMenlo Ventures
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About The Role Anthropic's Inference organization serves Claude to millions of users and enterprise customers with the speed, reliability, and efficiency that frontier AI demands. We build across GPUs, TPUs, and Trainium, and the complexity of our development environment grows with every platform we add. We’re looking for a Staff Engineer to be a technical lead for Inference Runtime: the team that owns the shared, accelerator‑agnostic core of our inference serving stack, whose performance, correctness, and abstractions every accelerator builds on. This is a senior IC role with broad technical ownership. You’ll set technical direction for the runtime’s architecture, its release and validation systems, and the workflows engineers use to develop on top of it. You will partner across Inferencing to make hard calls on boundaries, prioritization, and tradeoffs across heterogeneous accelerator platforms. You’ll pair with the team’s Engineering Manager, who owns hiring and people development, while you own the technical roadmap and drive the work, representing the team in cross‑org efforts spanning serving, scaling, and accelerator teams. This role is for someone who has been the technical anchor of a platform with many internal consumers, who thinks in systems and feedback loops, and who gets real satisfaction from building abstractions that hold up as the system scales another order of magnitude. Key Responsibilities Set technical direction for the team, owning the architecture and roadmap for the shared runtime of the inference serving stack Own and evolve the accelerator‑agnostic runtime itself – its interfaces, internal boundaries, and build structure – including hands‑on work in a performance‑sensitive Rust and Python codebase Keep the platform’s expansion cost low by ensuring new models and deployment targets pay only for their own specialization, and edge cases stitch back into the core easily Drive efficient accelerator usage – utilization, scheduling, memory management – across GPU, TPU, and Trainium Build the runtime’s validation surface around partitioned builds, change‑scoped testing, and canary/shadow/rollback as first‑class mechanisms Act as a technical counterpart to Anthropic’s central Infrastructure org on the compilers, build systems, and toolchains the runtime depends on, contributing Inference’s performance and correctness requirements, and making the call on build vs. adopt Mentor engineers on the team through design review, code review, and direct collaboration, raising the technical bar without owning headcount Minimum Qualifications Deep background in systems engineering or ML infrastructure, with the ability to go hands‑on with performance profiling, latency and throughput optimization, and systems debugging at scale Real depth in at least one accelerator ecosystem (CUDA/GPU, TPU, or Trainium/AWS Neuron) and genuine appetite to keep the runtime agnostic across all of them Have significant software engineering experience, with a strong background in high‑performance, large‑scale distributed systems serving millions of users A track record of defining and using engineering metrics to drive improvement: you’ve set SLOs on platform surfaces, and driven escape rates, release times, latency, or throughput in a measurable direction Experience driving technical alignment across organizational boundaries, advocating for your team’s needs while contributing to shared infrastructure Strong written and verbal communication, and the ability to influence technical direction without formal authority Preferred Qualifications 8+ years of software engineering experience, with significant time as the technical lead or anchor on a platform, inference runtime, or ML infrastructure team Experience with ML compiler toolchains (XLA, Triton, NeuronX) or accelerator driver/firmware management at scale Background operating production as a validation surface at scale: shadow traffic, canary populations, automated baseline comparison, fast rollback Experience with deterministic or simulation‑based testing for hardware‑dependent systems Experience with CI/CD systems at scale, particularly for workloads involving accelerator hardware Familiarity with Kubernetes‑based development and job scheduling environments Prior tech lead experience on a developer productivity or platform engineering team at a fast‑growing AI/ML company Annual Salary
$405,000—$485,000 USD
Logistics Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position Location‑based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren’t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. #J-18808-Ljbffr Menlo Ventures- ...workstreams — the agent runtime and backend... ...infrastructure. This is staff-level ownership from architecture... ...gateway and rules engine — YAML-configurable evaluation... ...APIs to self-hosted inference — own the decision and... ...of autonomous financial software, this is your role....Suggested
- About the job You will own the inference backbone behind QVAC's local AI stack: the C++ systems layer that makes models run... ...predictably on real user hardware. The role is centered on engineering quality at runtime level, including startup behavior, memory pressure,...SuggestedRemote jobLocal area
- ...A leading communications technology company is seeking a remote Software Engineer (L2) to join the Console Runtime team. This position will involve designing and developing new capabilities for the Twilio Console platform using technologies like GraphQL, NodeJS, and React...SuggestedRemote work
- ...infrastructure Distributed training and inference systems Experiment... ...The goal is to build the engineering foundation that allows... ...model sharding Orchestration & Runtime Systems Ray, Kubernetes, Slurm... ...About You You are a strong software engineer who speaks the language...SuggestedRelocation package
$405k
...with research, training, and inference to understand workload... ...Qualifications Significant software engineering experience building and operating... ...etcd, client‑go, controller‑runtime, or similar. Experience building... ...: Currently, we expect all staff to be in one of our offices...SuggestedWork at officeVisa sponsorshipFlexible hours$160k - $230k
...technology ecosystems. About the Role: We’re looking for a Staff Software Engineer who thrives at the intersection of AI systems design, large... ...distributed, event-driven systems and APIs that handle data and inference at scale. Conduct deep‑dive code reviews and performance...Work at officeLocal area- ...remote-first, global team develop open-source software that anyone can use or improve. This includes... ...About the Role As a Core Developer within the Runtime function you'll collaborate with product, design, and engineering teams to build and maintain core protocol functionality...For contractorsRemote work
$128.7k - $261.3k
...Team The Model Deployment & Inference Solutions team in GM AV deploys... ...performed manually by engineers. Build the developer experience... ...surfaces deployment risk (compile, runtime, parity, latency) early in... ...designing clean, well-tested software with clear interfaces and good...Flexible hoursShift work$250k
...fragmenting across dozens of chipsets and runtimes. No single vendor's tools cover the... ...for granted. The Opportunity Build the software and tooling layer that makes edge hardware... ...what those models are doing in the field. Inference latency, memory pressure, thermal headroom...$217k - $303.9k
Staff Software Engineer, Community Builders Remote - United States Reddit is a community of communities. It’s built on shared interests, passion... ...design the backend architectures, retrieval systems, and inference pipelines required to scale them. You understand the...Work experience placementRemote work- ...for a senior, backend-leaning full‑stack engineer who can make strong architecture... ...What we’re looking for 8-12 years of software engineering experience Experience in a... ...become deeply trusted by customers. The edge runtime, ingestion pipeline, site correlator, incident...
- ...providers. For more information, visit We are looking for a Staff Software Engineer to join the Localization team and own significant parts of... ..., Kalibr, Ceres, GTSAM Experience building online/runtime monitoring and defining safety‑relevant thresholds within a...Work experience placement
- ...About the Role Suno is growing fast, and we’re hiring Staff and Senior Software Engineers to work on the products that define how people experience... ...videos, and image-to-video experiences using cutting‑edge inference models all the while inventing novel interface design...Full timeWork at officeLocal areaImmediate start
- ...the flows of data on blockchain networks and we are seeking a Staff Engineer with deep expertise in network protocol design and... ...feasibility and production, with a focus on network or blockchain runtimes. Responsibilities Architect and build highly scalable, reliable...Full timeRemote work
$188k - $235k
...platform that provides APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a strong engineer to join our team and help us build and scale... ...candidate will have a strong understanding of software engineering principles and practices, as well...Full timeShift work$244k - $305k
...tenets. As a member of this team you will be working with top engineering on a modern distributed database system. You will be the technological... ...(ORM), schema definition, schema life‑cycle management, and runtime schema discovery. Your Expertise 12+ years of relevant...Work experience placementCasual workLive inWork at officeRemote work$320k
...growing group of researchers, engineers, policy experts, and business... ...The Role We’re looking for Software Engineers to help build and shape... ...‑time editing, design‑system inference, and AI‑driven generation... ...policy: Currently, we expect all staff to be in one of our offices...Work at officeVisa sponsorshipFlexible hoursShift work$229.9k - $262.4k
...Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)Overview:At Capital One, we are creating responsible and reliable AI... ...with Capital One.Design, develop, test, deploy, and support AI software components including foundation model training, large...Full timePart timeLocal area$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking for good... ...Capital One. Design, develop, test, deploy, and support AI software components including foundation model training, large...Local area- ...using a combination of inventive research, design, and engineering. Our organization is very flat, and our team is... ...crazy ideas, and shipping code. About the Role As a Software Engineer on the Model Routing & Inference team at Cursor, you'll build the inference platform...
$225k
...Staff+ Software Engineer, Backend and Infra Haize Labs takes AI-based applications from proof-of-concept to production. We eliminate risk and... ...powering functionality like model evaluations, red team attacks, runtime guardrails, and more Collaborate closely with the research...Work at officeVisa sponsorship$248.4k - $310.5k
...products. Our platform provides APIs for knowledge retrieval, inference, and evaluation, enabling customers to build and deploy... ...Enterprise use cases. We're looking for a Senior Infrastructure Software Engineer to build and scale our core infrastructure in a fast-paced...Full time$300k - $430k
...actions across millions of interactions. About the Role As a Staff Software Engineer on the Agent Orchestration team, you will own the long term... ...bar Even better if you have Experience designing runtimes, execution engines, or agent frameworks Experience with model...Work at office- Framework Ventures is seeking an experienced AI Model Engineer to drive innovations in kernel development, model optimization, and GPU acceleration. This role involves optimizing inference frameworks for language models, with a strong emphasis on mobile and integrated...
$165k - $300k
Product, Platform & Enterprise Full Stack Sr/Staff Software Engineer (Remote - US) Be part of a team that values safety, inclusion, and excellence... ...systems, micro‑services, data platforms, serverless runtimes, customer experiences, and applying AI/ML to develop scalable...Remote jobFull timeH1b- CrowdStrike Holdings, Inc. is looking for experienced engineers to join their Cloud Runtime Protection team in New York. You will design and develop cloud-based systems that protect workloads, collaborating with cross-functional teams and continuously improving product...Work at office
$140k - $215k
...of cybersecurity starts with you.About the Role:Join our Cloud Runtime Protection team and help build the technology that stops breaches... ...for thousands of customers worldwide.We're seeking passionate engineers to build cutting-edge runtime protection capabilities,...Work experience placementWork at officeLocal areaWorldwide$170k
Senior Staff Software Engineer - Semantic Data Modeling Join to apply for the Senior Staff Software Engineer - Semantic Data Modeling role at... ...Referrals increase your chances of interviewing at WEX by 2x Inferred from the description for this job 401(k) Vision insurance...Full timeFreelanceRemote work- ...first Voice AI startup is looking for an AI Engineer to join their early team. The role... ...as infrastructure for model training and inference. Ideal candidates have a Bachelor's degree... ...equivalent experience and significant software development experience. This is a fully...Remote job
$160k - $174k
## Senior AI Engineer, Agentic Systems & Runtime ArchitectureApplylocations: New York, NYtime type: Full timeposted on: Posted 9 Days Agojob requisition id: JR0032666*****Together we fight for everyone’s opportunity for a better financial future.*****We will do this together...Part timeWork experience placementLocal areaFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff+ Software Engineer, Inference Runtime. Be the first to apply!
- javascript software engineer New York, NY
- software technical support engineer New York, NY
- software support New York, NY
- software sales New York, NY
- software integrator New York, NY
- embedded software New York, NY
- software applications developer New York, NY
- software engineer - cloud services New York, NY
- software sales representative New York, NY
- remote software sales New York, NY

