Engineering Manager, Inference ML Runtime
Dormont Manufacturing Company
Cerebras Systems builds the world’s largest AI chip, 56 times larger than GPUs. Our novel wafer‑scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry‑leading training and inference speeds and empowers machine learning users to effortlessly run large‑scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras’ current customers include top model labs, global enterprises, and cutting‑edge AI‑native startups. OpenAI recently announced a multi‑year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high‑speed inference. Thanks to the groundbreaking wafer‑scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU‑based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real‑time iteration and increasing intelligence via additional agentic computation. About the Role The Inference ML Engineering team at Cerebras builds the runtime, APIs, and systems that power the fastest generative AI inference platform in the world. As an Engineering Manager, Inference ML Runtime, you will lead a team responsible for designing and scaling the systems that enable seamless execution of state‑of‑the‑art AI models on Cerebras hardware. You will operate at the intersection of machine learning, distributed systems, and high‑performance runtime engineering, translating cutting‑edge research into production‑ready infrastructure to serve a variety of text‑only and multimodal models. This role combines technical leadership, people management, and execution ownership, with direct impact on Cerebras’ core inference platform. What You’ll Do Technical Leadership Own the architecture and evolution of the ML inference runtime and serving systems. Guide the design of: high‑throughput, low‑latency inference pipelines; multimodal model execution (text, image, audio, video); scalable serving infrastructure for concurrent workloads. Partner with cloud, compiler, core runtime, hardware, and ML teams to optimize end‑to‑end performance. Team Leadership Build, manage, and grow a team of ML systems and infrastructure engineers. Provide technical direction, mentorship, and career development. Foster a culture of ownership, velocity, and engineering excellence. Recruit top talent in ML systems, distributed systems, and runtime engineering. Execution & Delivery Drive execution of complex, cross‑functional initiatives across: ML engineering; compiler/runtime teams; cloud and infrastructure teams. Own delivery of features such as: advanced inference capabilities (structured outputs, sampling strategies); heterogeneous model types, including test and multimodal; performance optimization (latency, throughput, memory efficiency); observability and reliability across the inference stack. Ensure high‑quality releases through strong testing, validation, and operational rigor. Platform & Performance Ownership Scale Cerebras’ inference platform to handle large volumes of concurrent requests at very fast speed. Drive improvements in: latency; throughput; compute efficiency. Identify and prioritize technical debt and system bottlenecks. Maintain Cerebras’ industry‑leading inference speed advantage. Cross‑Functional Collaboration Partner with: ML researchers (model enablement); compiler teams (model execution optimization); cloud/platform teams (deployment and scaling). Act as a bridge between research, infrastructure, and production systems. What You Bring Required 8+ years of experience in: large‑scale software engineering; ML systems or distributed systems. 2+ years of engineering management experience. Strong programming skills in: Python (production systems); C++ (performance‑critical systems). Experience building and scaling large‑scale inference systems (LLMs or multimodal). Experience working with cloud infrastructures and following best‑practices for building scalable microservices and applications. Preferred Experience with: LLM serving frameworks (e.g., vLLM, TensorRT‑LLM, SGLang); PyTorch and deep learning frameworks; distributed systems and high‑performance computing. Familiarity with: ML runtime systems; model execution pipelines; performance optimization for AI workloads. Why This Role Matters This team is central to Cerebras’ mission of delivering the fastest AI inference in the world. Your work will directly enable real‑time AI applications and unlock new capabilities across enterprise and frontier AI use cases. Why Join Cerebras People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras: Build a breakthrough AI platform beyond the constraints of the GPU. Publish and open source their cutting‑edge AI research. Work on one of the fastest AI supercomputers in the world. Enjoy job stability with startup vitality. Our simple, non‑corporate work culture that respects individual beliefs. Apply today and become part of the forefront of groundbreaking advancements in AI! Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them. This website or its third‑party tools process personal data. For more details, click here to review our CCPA disclosure notice. #J-18808-Ljbffr
$224k - $356.5k
...serving performance across various inference frameworks. Hyperscalers, cloud providers... ..., and scaling. As Technical Lead Manager, you will lead the engineering team within NVIDIA’s Dynamo... ...performance-critical infrastructure, ML tooling, or distributed systems. ~...SuggestedLocal areaWorldwide$224k - $356.5k
...serving performance across various inference frameworks. Hyperscalers,... ...with expertise in systems engineering, inference infrastructure, and... ...pushed forward. As Technical Lead Manager, you will lead the... ...performance‑critical infrastructure, ML tooling, or distributed...SuggestedLocal areaWorldwide$224k - $356.5k
...Position Summary As Technical Lead Manager, you will lead the engineering team within NVIDIA’s Dynamo organization... ..., diffusion, and computer vision inference. Key Responsibilities Develop and execute... ...‑critical infrastructure, ML tooling, or distributed systems. 3+...SuggestedLocal area$278.1k - $347.6k
...Principal Machine Learning Engineer, Mobile AI Inference Optimization Location... ...decisions across the full mobile ML stack, and mentor a team of... ...and select inference runtimes (e.g., CoreML, ONNX Runtime... ...platform engineers, product managers, and runtime teams to align...SuggestedWork at officeWorldwideRelocation package- ...looking for a Senior Staff AI Infra Engineer who is passionate about... ...benchmarks, with a special focus on AI/ML workloads and GPU-accelerated... ...accelerate LLM training and inference on AMD GPUs, improving kernel,... ...across GPU, network, and runtime layers. • Drive technical excellence...Suggested
- ...ROLE We are seeking a Principal GenAI Inference Optimization Engineer to join our Models and Applications... ...across multiple layers—from kernels and runtimes to frameworks and serving systems—and... ...architectures. Experience with ML frameworks (PyTorch, JAX, or TensorFlow...
- ...deliver industry-leading training and inference speeds and empowers machine... ...to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.... ...for a deeply technical, hands-on engineering leader for our on-field Kernel Reliability...
$224k - $356.5k
...NVIDIA Corporation is seeking a Technical Lead Manager to lead the engineering team in developing the AIPerf platform, a benchmark tool for LLM and computer vision inference workloads. The ideal candidate will have extensive experience in software engineering and leadership...$198.3k - $342.8k
...Engineering Manager, Runtime Analysis Tools The Runtime Tools team is looking for developers with a passion for memory and resource optimization to enhance, adapt, and innovate in creating tools for the next generation of software and hardware on Apple's platforms....WorldwideRelocation- ...A leading technology company in California is seeking an Engineering Manager to lead the development of cutting-edge LLM/VLM technologies.... ...leadership role, you will manage a team responsible for optimizing runtime and frameworks, while collaborating with cross-functional...
$198.3k - $342.8k
...Swift Compiler Engineering Manager, Languages & Runtimes The Swift Compiler Team at Apple is a unique opportunity to evolve the Swift programming language and related developer tools that shape the experience of writing Swift code. We are looking for a software engineering...Relocation- ...A leading technology company in Cupertino is seeking an Engineering Manager for Runtime Analysis Tools to innovate and improve memory analysis tools. The ideal candidate will have at least 5 years of experience in macOS or iOS development, strong expertise in C/C++, and...
$255.7k - $346k
...flexibility and trust our employees to manage their schedules responsibly. This may... ...and Europe. We are looking for an Engineering Manager to lead ML teams within SDS Core. This is a large... ...stack from training code to onboard inference. Experience managing through architecture...Full timeFor contractorsFor subcontractorCasual workWork at officeRemote workDay shift$255.7k - $346k
...ADAS in the United States and Europe. We are looking for an Engineering Manager to lead ML teams within SDS Core. This is a large organization... ...comfortable across the full stack from training code to onboard inference. Experience managing through architecture transitions (...Full time$235.52k - $323.04k
...everyone around it. Role Summary: The Principal Engineer, ML (VLA Automated Driving) is the technical anchor for Vision... ...-aware optimization Familiarity with TensorRT, ONNX Runtime, or similar inference frameworks Experience deploying models on embedded or...Permanent employmentTemporary work$141.8k - $258.6k
...is seeking a Technical Program Manager to help shape the future of scalable... ...‑functional coordination across engineering teams to deliver robust, high‑performance ML systems that operate across... ...performance distributed computing, or inference optimization technologies....Relocation$255.85k - $361.2k
...We are seeking a Principal Engineer to define and architect the... ...across diverse hardware while managing state, locality, and... ...Computation Graphs Define a runtime model for executing AI workloads... ...a PhD. Experience with AI/ML systems, inference infrastructure, or large‑...Local areaShift work$198.3k - $342.8k
...Machine Learning Engineering Manager, Proactive - On-Device Modeling Santa Clara, California, United... ...on their devices. As an Applied ML team, we're pushing the boundaries of Apple... ...transformers, attention mechanisms, and inference optimization Strong software engineering...Work experience placementRelocation$228.1k - $393.8k
...Senior Machine Learning Engineering Manager – Ads Predictions At Apple, we focus deeply on our... ...will lead the strategy and development of inference models that predict and optimize user... ..., Agile environment and are a hands-on ML leader who can drive execution while building...Relocation$272k - $431.25k
...NVIDIA Gruppe is looking for a Principal Software Engineer to advance open-source AI inference. This hands-on role emphasizes running high-performance inference... .... Key responsibilities include optimizing inference runtimes, improving efficiency, and mentoring other engineers....- ...Observability team at Netflix makes AI, ML, and Agentic systems... ...Partner with ML researchers, engineers, and platform teams to embed... ...model training, online inference, and agent orchestration. Drive... ...engineering experience and 3+ years of management experience. Experience...Hourly payFull timeImmediate startFlexible hours
$146.3k - $289.9k
...personalization. We are looking for an Engineering Manager to lead and grow a team of engineers... ...workstreams — including LLM orchestration, inference services, data pipelines, and... ...foundation in distributed systems, AI/ML infrastructure, or large-scale service...Temporary workLocal areaImmediate startWorldwideFlexible hours$197k - $291k
...A leading technology company seeks a Software Engineering Manager II for YouTube to lead engineering teams in optimizing ML infrastructure and building recommendation systems. The ideal candidate has extensive software development experience, strong technical leadership...Full time$251k - $310k
...foundational models, large-scale 3rd party data, and partner teams in Research, Oracles, and Simulation. Manage a team (~10) with diverse skills including engineering, modeling, ML infrastructure, of senior and junior SWEs, foster an inclusive culture for sustained team growth,...Temporary workImmediate start- ...safer and more connected. We are looking for a technically deep Engineering Manager to lead the AI team at Coram. This team is small, highly... ...multimodal systems Track record of shipping production‑grade ML systems at scale Ability to balance technical debt, product velocity...Shift work
$207k - $304k
...l M L ( T a p e s t r y ) Software Engineering Mountain View, CA (HQ) About Tapestry... ...development and deployment of frontier ML techniques spanning Multimodal ML to solve... ...experience in a technical leadership or people management role, with a focus on guiding ML-centric...Full timeCurrently hiringFlexible hours$251k - $310k
...Waymo is looking for a Technical Lead Manager (TLM) for Scene Understanding in Mountain View, California. This high-impact role involves... ...in Python and C++, and a strong background in managing engineering teams. This role offers a competitive salary between $251,000...$251k - $310k
...you will act as a true "player-coach" for a team of roughly 6-10 engineers. This is a high impact role, where your models will directly... ...this hybrid role you will report to a Sr Staff Technical Lead Manager. You will: Own a specific domain within scene understanding, such...Full timeTemporary workRemote work$250k - $300k
...online. Our unified ecommerce management solutions empower brands to... ...The Role We're looking for an Engineering Leader with a Data Science /... ...team. Design and build scalable ML infrastructure to support... ...and pipelines for training and inference. Fluency in backend and ML‑oriented...Temporary work- ...Decisive Point is looking for an Engineering Manager for the ML Platform team in Sunnyvale, California. This role involves leading a team to build and optimize the infrastructure for Physical AI, managing GPU clusters, and ensuring the delivery of high-performance ML...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Engineering Manager, Inference ML Runtime. Be the first to apply!
- machine learning intern Sunnyvale, CA
- machine learning part time Sunnyvale, CA
- machine learning Sunnyvale, CA
- artificial intelligence - machine learning intern Sunnyvale, CA
- machine learning research scientist Sunnyvale, CA
- data engineer machine learning Sunnyvale, CA
- machine learning scientist Sunnyvale, CA
- internship machine learning Sunnyvale, CA
- machine learning remote Sunnyvale, CA
- amd machine learning


