Engineering Manager, Inference ML Runtime

Dormont Manufacturing Company

Cerebras Systems builds the world’s largest AI chip, 56 times larger than GPUs. Our novel wafer‑scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry‑leading training and inference speeds and empowers machine learning users to effortlessly run large‑scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras’ current customers include top model labs, global enterprises, and cutting‑edge AI‑native startups. OpenAI recently announced a multi‑year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high‑speed inference. Thanks to the groundbreaking wafer‑scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU‑based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real‑time iteration and increasing intelligence via additional agentic computation. About the Role The Inference ML Engineering team at Cerebras builds the runtime, APIs, and systems that power the fastest generative AI inference platform in the world. As an Engineering Manager, Inference ML Runtime, you will lead a team responsible for designing and scaling the systems that enable seamless execution of state‑of‑the‑art AI models on Cerebras hardware. You will operate at the intersection of machine learning, distributed systems, and high‑performance runtime engineering, translating cutting‑edge research into production‑ready infrastructure to serve a variety of text‑only and multimodal models. This role combines technical leadership, people management, and execution ownership, with direct impact on Cerebras’ core inference platform. What You’ll Do Technical Leadership Own the architecture and evolution of the ML inference runtime and serving systems. Guide the design of: high‑throughput, low‑latency inference pipelines; multimodal model execution (text, image, audio, video); scalable serving infrastructure for concurrent workloads. Partner with cloud, compiler, core runtime, hardware, and ML teams to optimize end‑to‑end performance. Team Leadership Build, manage, and grow a team of ML systems and infrastructure engineers. Provide technical direction, mentorship, and career development. Foster a culture of ownership, velocity, and engineering excellence. Recruit top talent in ML systems, distributed systems, and runtime engineering. Execution & Delivery Drive execution of complex, cross‑functional initiatives across: ML engineering; compiler/runtime teams; cloud and infrastructure teams. Own delivery of features such as: advanced inference capabilities (structured outputs, sampling strategies); heterogeneous model types, including test and multimodal; performance optimization (latency, throughput, memory efficiency); observability and reliability across the inference stack. Ensure high‑quality releases through strong testing, validation, and operational rigor. Platform & Performance Ownership Scale Cerebras’ inference platform to handle large volumes of concurrent requests at very fast speed. Drive improvements in: latency; throughput; compute efficiency. Identify and prioritize technical debt and system bottlenecks. Maintain Cerebras’ industry‑leading inference speed advantage. Cross‑Functional Collaboration Partner with: ML researchers (model enablement); compiler teams (model execution optimization); cloud/platform teams (deployment and scaling). Act as a bridge between research, infrastructure, and production systems. What You Bring Required 8+ years of experience in: large‑scale software engineering; ML systems or distributed systems. 2+ years of engineering management experience. Strong programming skills in: Python (production systems); C++ (performance‑critical systems). Experience building and scaling large‑scale inference systems (LLMs or multimodal). Experience working with cloud infrastructures and following best‑practices for building scalable microservices and applications. Preferred Experience with: LLM serving frameworks (e.g., vLLM, TensorRT‑LLM, SGLang); PyTorch and deep learning frameworks; distributed systems and high‑performance computing. Familiarity with: ML runtime systems; model execution pipelines; performance optimization for AI workloads. Why This Role Matters This team is central to Cerebras’ mission of delivering the fastest AI inference in the world. Your work will directly enable real‑time AI applications and unlock new capabilities across enterprise and frontier AI use cases. Why Join Cerebras People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras: Build a breakthrough AI platform beyond the constraints of the GPU. Publish and open source their cutting‑edge AI research. Work on one of the fastest AI supercomputers in the world. Enjoy job stability with startup vitality. Our simple, non‑corporate work culture that respects individual beliefs. Apply today and become part of the forefront of groundbreaking advancements in AI! Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them. This website or its third‑party tools process personal data. For more details, click here to review our CCPA disclosure notice. #J-18808-Ljbffr

Apply

Vacancy posted 8 hours ago

Similar jobs that could be interesting for youBased on the Engineering Manager, Inference ML Runtime in Sunnyvale, CA vacancy

Engineering Manager, Inference Benchmarking AI Perf
$224k - $356.5k
...serving performance across various inference frameworks. Hyperscalers, cloud providers... ..., and scaling. As Technical Lead Manager, you will lead the engineering team within NVIDIA’s Dynamo... ...performance-critical infrastructure, ML tooling, or distributed systems. ~...
Suggested
Local area
Worldwide
NVIDIA
Santa Clara, CA
5 days ago
Engineering Manager, Inference Benchmarking AI Perf
$224k - $356.5k
...serving performance across various inference frameworks. Hyperscalers,... ...with expertise in systems engineering, inference infrastructure, and... ...pushed forward. As Technical Lead Manager, you will lead the... ...performance‑critical infrastructure, ML tooling, or distributed...
Suggested
Local area
Worldwide
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Engineering Manager, Inference Benchmarking AI Perf
$224k - $356.5k
...Position Summary As Technical Lead Manager, you will lead the engineering team within NVIDIA’s Dynamo organization... ..., diffusion, and computer vision inference. Key Responsibilities Develop and execute... ...‑critical infrastructure, ML tooling, or distributed systems. 3+...
Suggested
Local area
NVIDIA
Santa Clara, CA
8 hours ago
Principal Machine Learning Engineer, Mobile AI Inference Optimization
$278.1k - $347.6k
...Principal Machine Learning Engineer, Mobile AI Inference Optimization Location... ...decisions across the full mobile ML stack, and mentor a team of... ...and select inference runtimes (e.g., CoreML, ONNX Runtime... ...platform engineers, product managers, and runtime teams to align...
Suggested
Work at office
Worldwide
Relocation package
Unity Technologies
Mountain View, CA
3 days ago
Principal AI Inference Systems Engineer
...looking for a Senior Staff AI Infra Engineer who is passionate about... ...benchmarks, with a special focus on AI/ML workloads and GPU-accelerated... ...accelerate LLM training and inference on AMD GPUs, improving kernel,... ...across GPU, network, and runtime layers. • Drive technical excellence...
Suggested
Advanced Micro Devices , Inc.
Santa Clara, CA
5 days ago
Principal GenAI Inference Optimization Engineer
...ROLE We are seeking a Principal GenAI Inference Optimization Engineer to join our Models and Applications... ...across multiple layers—from kernels and runtimes to frameworks and serving systems—and... ...architectures. Experience with ML frameworks (PyTorch, JAX, or TensorFlow...
Advanced Micro Devices , Inc.
San Jose, CA
8 hours ago
Engineering Manager, Kernel Reliability
...deliver industry-leading training and inference speeds and empowers machine... ...to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.... ...for a deeply technical, hands-on engineering leader for our on-field Kernel Reliability...
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
5 days ago
Inference Benchmarking Engineering Manager
$224k - $356.5k
...NVIDIA Corporation is seeking a Technical Lead Manager to lead the engineering team in developing the AIPerf platform, a benchmark tool for LLM and computer vision inference workloads. The ideal candidate will have extensive experience in software engineering and leadership...
NVIDIA
Santa Clara, CA
8 hours ago
Engineering Manager, Runtime Analysis Tools
$198.3k - $342.8k
...Engineering Manager, Runtime Analysis Tools The Runtime Tools team is looking for developers with a passion for memory and resource optimization to enhance, adapt, and innovate in creating tools for the next generation of software and hardware on Apple's platforms....
Worldwide
Relocation
Apple
Cupertino, CA
2 days ago
LLM Inference Engineering Manager Hybrid | Equity
...A leading technology company in California is seeking an Engineering Manager to lead the development of cutting-edge LLM/VLM technologies.... ...leadership role, you will manage a team responsible for optimizing runtime and frameworks, while collaborating with cross-functional...
NVIDIA
Santa Clara, CA
8 hours ago
Swift Compiler Engineering Manager, Languages & Runtimes
$198.3k - $342.8k
...Swift Compiler Engineering Manager, Languages & Runtimes The Swift Compiler Team at Apple is a unique opportunity to evolve the Swift programming language and related developer tools that shape the experience of writing Swift code. We are looking for a software engineering...
Relocation
Apple
Cupertino, CA
3 days ago
Engineering Manager: Runtime Analysis & Memory Tools
...A leading technology company in Cupertino is seeking an Engineering Manager for Runtime Analysis Tools to innovate and improve memory analysis tools. The ideal candidate will have at least 5 years of experience in macOS or iOS development, strong expertise in C/C++, and...
Apple
Cupertino, CA
8 hours ago
Engineering Manager - ML, Self-Driving Systems
$255.7k - $346k
...flexibility and trust our employees to manage their schedules responsibly. This may... ...and Europe. We are looking for an Engineering Manager to lead ML teams within SDS Core. This is a large... ...stack from training code to onboard inference. Experience managing through architecture...
Full time
For contractors
For subcontractor
Casual work
Work at office
Remote work
Day shift
Decisive Point
Sunnyvale, CA
7 hours ago
Engineering Manager - ML, Self-Driving Systems
$255.7k - $346k
...ADAS in the United States and Europe. We are looking for an Engineering Manager to lead ML teams within SDS Core. This is a large organization... ...comfortable across the full stack from training code to onboard inference. Experience managing through architecture transitions (...
Full time
Applied Intuition
Sunnyvale, CA
8 hours ago
Principal Engineer, ML (VLA Automated Driving)
$235.52k - $323.04k
...everyone around it. Role Summary: The Principal Engineer, ML (VLA Automated Driving) is the technical anchor for Vision... ...-aware optimization Familiarity with TensorRT, ONNX Runtime, or similar inference frameworks Experience deploying models on embedded or...
Permanent employment
Temporary work
Cariad, Inc.
Mountain View, CA
12 days ago
Sr Engineering Program Manager, AI/ML
$141.8k - $258.6k
...is seeking a Technical Program Manager to help shape the future of scalable... ...‑functional coordination across engineering teams to deliver robust, high‑performance ML systems that operate across... ...performance distributed computing, or inference optimization technologies....
Relocation
Apple Inc.
Cupertino, CA
1 day ago
Principal Engineer - Distributed AI Systems Architecture (Heterogeneous Compute)
$255.85k - $361.2k
...We are seeking a Principal Engineer to define and architect the... ...across diverse hardware while managing state, locality, and... ...Computation Graphs Define a runtime model for executing AI workloads... ...a PhD. Experience with AI/ML systems, inference infrastructure, or large‑...
Local area
Shift work
Intel Corporation
Santa Clara, CA
4 days ago
Machine Learning Engineering Manager, Proactive - On-Device Modeling
$198.3k - $342.8k
...Machine Learning Engineering Manager, Proactive - On-Device Modeling Santa Clara, California, United... ...on their devices. As an Applied ML team, we're pushing the boundaries of Apple... ...transformers, attention mechanisms, and inference optimization Strong software engineering...
Work experience placement
Relocation
Apple
Santa Clara, CA
8 hours ago
Senior Machine Learning Engineering Manager - Ads Predictions
$228.1k - $393.8k
...Senior Machine Learning Engineering Manager – Ads Predictions At Apple, we focus deeply on our... ...will lead the strategy and development of inference models that predict and optimize user... ..., Agile environment and are a hands-on ML leader who can drive execution while building...
Relocation
Apple
Cupertino, CA
3 days ago
Principal AI Inference Engineer Open-Source & GPU-Focused
$272k - $431.25k
...NVIDIA Gruppe is looking for a Principal Software Engineer to advance open-source AI inference. This hands-on role emphasizes running high-performance inference... .... Key responsibilities include optimizing inference runtimes, improving efficiency, and mentoring other engineers....
NVIDIA Gruppe
Santa Clara, CA
8 hours ago
Engineering Manager, AI Observability
...Observability team at Netflix makes AI, ML, and Agentic systems... ...Partner with ML researchers, engineers, and platform teams to embed... ...model training, online inference, and agent orchestration. Drive... ...engineering experience and 3+ years of management experience. Experience...
Hourly pay
Full time
Immediate start
Flexible hours
Netflix
Los Gatos, CA
1 day ago
Engineering Manager, Express AI Foundations
$146.3k - $289.9k
...personalization. We are looking for an Engineering Manager to lead and grow a team of engineers... ...workstreams — including LLM orchestration, inference services, data pipelines, and... ...foundation in distributed systems, AI/ML infrastructure, or large-scale service...
Temporary work
Local area
Immediate start
Worldwide
Flexible hours
Adobe
San Jose, CA
2 days ago
Engineering Manager, AI/ML Recommendations & Rankings
$197k - $291k
...A leading technology company seeks a Software Engineering Manager II for YouTube to lead engineering teams in optimizing ML infrastructure and building recommendation systems. The ideal candidate has extensive software development experience, strong technical leadership...
Full time
Jobleads-US
Mountain View, CA
1 day ago
Engineering Manager (ML), Perception, Vehicle Understanding
$251k - $310k
...foundational models, large-scale 3rd party data, and partner teams in Research, Oracles, and Simulation. Manage a team (~10) with diverse skills including engineering, modeling, ML infrastructure, of senior and junior SWEs, foster an inclusive culture for sustained team growth,...
Temporary work
Immediate start
Neura Market
Mountain View, CA
8 hours ago
Engineering Manager, AI
...safer and more connected. We are looking for a technically deep Engineering Manager to lead the AI team at Coram. This team is small, highly... ...multimodal systems Track record of shipping production‑grade ML systems at scale Ability to balance technical debt, product velocity...
Shift work
Coram AI
Sunnyvale, CA
5 days ago
Engineering Manager, Multimodal ML (Tapestry)
$207k - $304k
...l M L ( T a p e s t r y ) Software Engineering Mountain View, CA (HQ) About Tapestry... ...development and deployment of frontier ML techniques spanning Multimodal ML to solve... ...experience in a technical leadership or people management role, with a focus on guiding ML-centric...
Full time
Currently hiring
Flexible hours
X: The Moonshot Factory
Mountain View, CA
1 day ago
Engineering Manager, ML & Scene Understanding (Hybrid)
$251k - $310k
...Waymo is looking for a Technical Lead Manager (TLM) for Scene Understanding in Mountain View, California. This high-impact role involves... ...in Python and C++, and a strong background in managing engineering teams. This role offers a competitive salary between $251,000...
Neura Market
Mountain View, CA
8 hours ago
Engineering Manager (ML), Perception, Traffic Control Understanding
$251k - $310k
...you will act as a true "player-coach" for a team of roughly 6-10 engineers. This is a high impact role, where your models will directly... ...this hybrid role you will report to a Sr Staff Technical Lead Manager. You will: Own a specific domain within scene understanding, such...
Full time
Temporary work
Remote work
Neura Market
Mountain View, CA
8 hours ago
Director of Engineering - AI
$250k - $300k
...online. Our unified ecommerce management solutions empower brands to... ...The Role We're looking for an Engineering Leader with a Data Science /... ...team. Design and build scalable ML infrastructure to support... ...and pipelines for training and inference. Fluency in backend and ML‑oriented...
Temporary work
Commerceiq
Mountain View, CA
9 hours ago
Engineering Manager: ML Platform & Infra
...Decisive Point is looking for an Engineering Manager for the ML Platform team in Sunnyvale, California. This role involves leading a team to build and optimize the infrastructure for Physical AI, managing GPU clusters, and ensuring the delivery of high-performance ML...
Decisive Point
Sunnyvale, CA
7 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Engineering Manager, Inference ML Runtime. Be the first to apply!