Staff AI Runtime Engineer

$180k - $225k

FlexAI

About FlexAI

Build and Deploy AI the right way, anywhere.

The FlexAI Compute Infrastructure Platform provides an "end-to-end AI compute layer" for running and managing workloads across any cloud, any GPU, and any deployment model (public, hybrid, or on-prem). It brings together "1-click simplicity" for users with "enterprise-grade orchestration, security, and automation" under the hood.

Founded by Brijesh Tripathi, who bring experience from Nvidia, Apple, Tesla, Intel and Zoox, FlexAI is not just building a product - we're shaping the future of AI. Our teams are strategically distributed across Silicon Valley and Bengaluru, united by a shared mission: to deliver more compute with less complexity.

If you're passionate about shaping the future of artificial intelligence, driving innovation, and contributing to a sustainable and inclusive AI ecosystem, FlexAI is the place for you !

Role Overview

At FlexAI, we're building a high-performance, cloud-agnostic AI compute platform designed for next-generation training and inference workloads. As a Staff AI Runtime Engineer , you'll play a pivotal role in the design, development, and optimization of the core runtime infrastructure that powers distributed training and deployment of large AI models (LLMs and beyond).

This is a hands-on leadership role - perfect for a systems-minded software engineer who thrives at the intersection of AI workloads, runtimes, and performance-critical infrastructure. You'll own critical components of our PyTorch-based stack, lead technical direction, and collaborate across engineering, research, and product to push the boundaries of elastic, fault-tolerant, high-performance model execution.

What You'll Do

Lead Runtime Design & Development:

Own the core runtime architecture supporting AI training and inference at scale.
Design resilient and elastic runtime features (e.g. dynamic node scaling, job recovery) within our custom PyTorch stack.
Optimize distributed training reliability, orchestration, and job-level fault tolerance.

Drive Performance at Scale:

Profile and enhance low-level system performance across training and inference pipelines.
Improve packaging, deployment, and integration of customer models in production environments.
Ensure consistent throughput, latency, and reliability metrics across multi-node, multi-GPU setups.

Build Internal Tooling & Frameworks:

Design and maintain libraries and services that support model lifecycle: training, checkpointing, fault recovery, packaging, and deployment.
Implement observability hooks, diagnostics, and resilience mechanisms for deep learning workloads.
Champion best practices in CI/CD, testing, and software quality across the AI Runtime stack.

Collaborate & Mentor:

Work cross-functionally with Research, Infrastructure, and Product teams to align runtime development with customer and platform needs.
Guide technical discussions, mentor junior engineers, and help scale the AI Runtime team's capabilities.

What You'll Need to Be Successful

8+ years of experience in systems/software engineering, with deep exposure to AI runtime, distributed systems, or compiler/runtime interaction.
Experience in delivering PaaS services.
Proven experience optimizing and scaling deep learning runtimes (e.g. PyTorch, TensorFlow, JAX) for large-scale training and/or inference.
Strong programming skills in Python and C++ (Go or Rust is a plus).
Familiarity with distributed training frameworks, low-level performance tuning, and resource orchestration.
Experience working with multi-GPU, multi-node, or cloud-native AI workloads.
Solid understanding of containerized workloads, job scheduling, and failure recovery in production environments.

Nice to Have

Contributions to PyTorch internals or open-source DL infrastructure projects.
Familiarity with LLM training pipelines, checkpointing, or elastic training orchestration.
Experience with Kubernetes, Ray, TorchElastic, or custom AI job orchestrators.
Background in systems research, compilers, or runtime architecture for HPC or ML.
Start up previous experience

This position is In-Person and located at our Santa Clara, CA Office.

What We Offer

A competitive salary and benefits package
Work on cutting-edge AI infrastructure
Build products used by developers and enterprises
High ownership, fast execution, real impact
Collaborative, high-caliber team

The pay range for this role is:

180,000 - 225,000 USD per year (US)

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Staff AI Runtime Engineer in Santa Clara, CA vacancy

Senior AI Runtime Engineer: Distributed Training & Scale
...A forward-thinking AI infrastructure company is seeking a Staff AI Runtime Engineer to lead the design and optimization of their AI compute platform. In this leadership role, you'll enhance AI training and inference capabilities. Successful candidates will have over 8...
Suggested
FlexAI
Santa Clara, CA
4 days ago
Staff AI Runtime Engineer
...Build and Deploy AI the right way, anywhere. The FlexAI Compute Infrastructure Platform provides an "end-to-end AI compute... ...for next-generation training and inference workloads. As a Staff AI Runtime Engineer , you’ll play a pivotal role in the design, development, and...
Suggested
Work at office
FlexAI
Santa Clara, CA
5 days ago
Senior Cloud & AI Runtime Engineer - C++, eBPF, Kubernetes
$140k - $215k
...CrowdStrike, Inc. is seeking a Software Development Engineer for the Cloud Runtime Protection team. In this role, you will design critical features for the Falcon platform, focusing on AI and cloud-native workloads. Experience with C/C++, Linux, and eBPF is required. This...
Suggested
Work at office
Koitecc Solutions
Sunnyvale, CA
4 days ago
Senior Staff Physical AI Data Algorithm Engineer
$203.45k - $344.3k
...Senior Staff Physical AI Data Algorithm Engineer Santa Clara, CA XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric...
Suggested
Full time
Temporary work
Work experience placement
XPENG
Santa Clara, CA
5 days ago
Senior AI Systems Engineer: Inference Kernels & Runtimes
$184k - $287.5k
NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact...
Suggested
NVIDIA Gruppe
Santa Clara, CA
3 days ago
Staff AI/ML Full-Stack Engineer: End-to-End AI Platforms
...end ML platforms and developer‑facing products to support AI and ML innovation across teams such as Embodied AI, Simulation... ...across diverse infrastructures. Position Overview As a Staff AI/ML Full‑Stack Engineer, you will design and build end‑to‑end software products,...
Israelvcforum
Sunnyvale, CA
4 days ago
Staff AI/ML Full-Stack Engineer: AV Infra & Scale
...A leading automotive company in California is seeking a Staff AI/ML Full-Stack Engineer to design and build end-to-end software products for autonomous vehicles. This hands-on role emphasizes technical depth and system design, involving mentorship, full-stack development...
General Motors
Sunnyvale, CA
4 days ago
Senior Staff AI Data Infrastructure/Pipeline Engineer
$203.45k - $344.3k
...Senior Staff AI Data Infrastructure/Pipeline Engineer Santa Clara, CA XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric...
Full time
Overseas
XPENG
Santa Clara, CA
2 days ago
Software Engineer, Metal Runtime (API & Abstractions)
$100k
...Software Engineer, Metal Runtime (API & Abstractions) Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to...
Tenstorrent
Santa Clara, CA
12 days ago
Senior Runtime Software Engineer, Autonomous Truck AI
$150k - $225k
...role requires substantial expertise in C++ and Python, alongside a passion for innovation in the AI field. Your responsibilities will include working closely with runtime teams, optimizing software performance, and ensuring quality management standards are met. The position...
PlusAI, Inc.
Santa Clara, CA
4 days ago
Senior AI Software Engineer, Kernel Libraries
$184k - $287.5k
...We're looking for outstanding AI systems engineers to develop groundbreaking technologies in the inference systems software stack! We build... ...efficient attention kernel implementations, new LLM inference runtimes components, and kernel code generators to accelerate large language...
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior High Performance AI Engineer
$148k - $235.75k
...tapping into the unlimited potential of AI to define the next era of computing. An era... ...outstanding Senior High Performance AI Engineer to build groundbreaking multi-agent systems... ...ecosystem. We build innovative agentic runtimes and compiler-integrated orchestration that...
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior/Staff Software Engineer, Runtime
$150k - $225k
...PlusAI is a Physical AI company pioneering AI-based virtual driver software for factory-built autonomous trucks. Headquartered in... ...teams. Responsibilities: Work closely with our autonomy and runtime teams to improve our redundant on-vehicle platform and autonomous...
PlusAI, Inc.
Santa Clara, CA
4 days ago
Senior AI Inference Engineer - High-Performance LLM Serving
$152k - $241.5k
...NVIDIA Gruppe is seeking a Senior Software Engineer – AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building...
NVIDIA Gruppe
Santa Clara, CA
4 days ago
NYC Sr. AI Engineer: Assets, Formats & Placements
$139k - $229k
...NYC Sr. AI Engineer: Assets, Formats & Placements The AFP team defines how ad creatives are packaged and delivered across LinkedIn —... ...Measurement, and Serving to provide the schema, APIs, and delivery runtime that enable creative portability, policy compliance, and cross...
For contractors
Work at office
Flexible hours
LinkedIn
Sunnyvale, CA
1 day ago
Principal AI Inference Engineer Open-Source & GPU-Focused
$272k - $431.25k
...NVIDIA Gruppe is looking for a Principal Software Engineer to advance open-source AI inference. This hands-on role emphasizes running high-performance... .... Key responsibilities include optimizing inference runtimes, improving efficiency, and mentoring other engineers. A...
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior AI Algorithm Engineer in oneDNN
$195.2k - $275.58k
The Software and AI (SAI) organization is seeking a highly skilled Software Development Engineer to contribute to the development and optimization of oneDNN, a complex, cross... ...such as OpenVINO, TensorFlow, PyTorch, ONNX Runtime, and many others. This is a unique...
Local area
Remote work
Worldwide
Flexible hours
Shift work
Intel Corporation
Santa Clara, CA
2 days ago
AI Inference Performance Engineer - New College Grad 2026
$124k - $195.5k
## AI Inference Performance Engineer - New College Grad 2026Applylocations: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition... ...(CUTLASS, cuteDSL, tilelang, OpenAI Triton) or compiler/runtime paths (torch.compile, graph lowering, operator fusion)....
NVIDIA Corporation
Santa Clara, CA
2 days ago
Embedded Runtime Engineer Intern: Linux/RTOS, RISC-V
...MixMode is seeking an enthusiastic Runtime Engineer Intern to work on embedded systems development and Linux software implementation. The ideal candidate is pursuing a Bachelor's degree in Computer Science or a related field, with experience in C/C++ development for Linux...
Internship
Work at office
MixMode
Santa Clara, CA
5 days ago
AI/ML Engineer - Agentic
...AI/ML Engineer - Agentic This role has been designed as 'Hybrid' with an expectation that you will work on average 2 days per week from... ...) that power orchestration, tool execution, and agent runtime behavior. Own observability and reliability for non-deterministic...
Work at office
2 days per week
Hewlett Packard Enterprise
Alviso, CA
2 days ago
Remote AI Agent Engineer - End-to-End Orchestration
$150k - $250k
Love Freedom Solution is looking for an AI Agent Engineer in Sunnyvale, California. This is a unique opportunity to influence the development... .... You'll lead the architecture and execution for the agent's runtime and work in a startup environment that is backed by prominent...
Remote job
Love Freedom Solution
Sunnyvale, CA
16 hours ago
Software Engineer, Metal Runtime (Core Systems)
$100k
...Software Engineer, Metal Runtime (Core Systems) Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify...
Tenstorrent
Santa Clara, CA
7 days ago
Agentic AI Systems Engineer
$152k - $208.5k
...Materials is a global leader in materials engineering solutions used to produce virtually every... ...that literally connect our world – like AI and IoT. If you want to push the boundaries... ...authoring, versioning, distribution, and runtime loading across Java, Python, C++, and web...
Full time
Relocation
Applied Materials
Santa Clara, CA
7 days ago
Sr. Software Engineer II, Sensor - Sensor Event Runtime (Hybrid)
...Senior Software Engineer II As a global leader in cybersecurity, CrowdStrike protects the... ...security with the world's most advanced AI-native platform. We work on large scale distributed... ...Engineer II to join our Sensor Event Runtime (SER) team. This role is responsible for...
Work at office
Flexible hours
CrowdStrike
Sunnyvale, CA
2 days ago
Applied AI Engineer - AI Agent (New Graduate)
$179.5k - $260k
...cybersecurity company building a next-generation AI-driven operations platform, designed to... .... We’re looking for an Applied AI Engineer with strong backend and AI experience who... ...Proficiency with at least one modern backend runtime/language (e.g., Python, Go) and...
Full time
Night shift
Fortinet, Inc.
Sunnyvale, CA
3 days ago
Principal AI Inference Systems Engineer
...-generation computing experiences-from AI and data centers, to PCs, gaming and embedded... ...ROLE: AMD is looking for a Senior Staff AI Infra Engineer who is passionate about improving the... ...issues across GPU, network, and runtime layers. • Drive technical excellence,...
Advanced Micro Devices , Inc.
Santa Clara, CA
4 days ago
Principal AI/ML System Software Engineer
...Principal AI/ML System Software Engineer At d-Matrix, we are focused on unleashing the potential of generative AI to power the transformation... ...PyTorch and TensorFlow) Experience with deep learning runtimes (such as ONNX Runtime, TensorRT, etc.). Experience with...
Work experience placement
3 days per week
d-Matrix
Santa Clara, CA
1 day ago
Sr. Engineer - Cloud & AI Runtime - C++/eBPF/K8s (Hybrid)
$140k - $215k
...redefined modern security with the world's most advanced AI-native platform. Our customers span all industries, and they... ...with you. About the Role: This is a Software Development Engineer role on the Cloud Runtime Protection team that builds the core of the CrowdStrike Falcon...
Full time
Work experience placement
Work at office
Local area
2 days per week
3 days per week
Koitecc Solutions
Sunnyvale, CA
5 days ago
Principal AI Agent / ML Software Engineer (OCI)
...Principal AI Agent / ML Software Engineer The Principal AI Agent / ML Software Engineer is a Senior Staff-level, hands-on technical leadership role responsible for defining, building... ...serving, inference gateways, agent runtimes, workflow engines, developer platforms,...
Oracle
Santa Clara, CA
1 day ago
Software Engineer - Runtime Platform, Robot Software
...Wayve is the leading developer of Embodied AI technology. Our advanced AI software and... ...join a motivated and talented team of engineers to deliver a reliable, stable and flexible... ...critical code paths and algorithms to improve runtime efficiency, reduce latency, and enhance...
Full time
Work at office
Work from home
Flexible hours
Icehouseventures
Sunnyvale, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff AI Runtime Engineer. Be the first to apply!