Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Engineering Manager, Inference ML Runtime

Dormont Manufacturing Company

Cerebras Systems builds the world’s largest AI chip, 56 times larger than GPUs. Our novel wafer‑scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry‑leading training and inference speeds and empowers machine learning users to effortlessly run large‑scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras’ current customers include top model labs, global enterprises, and cutting‑edge AI‑native startups. OpenAI recently announced a multi‑year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high‑speed inference. Thanks to the groundbreaking wafer‑scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU‑based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real‑time iteration and increasing intelligence via additional agentic computation. About the Role The Inference ML Engineering team at Cerebras builds the runtime, APIs, and systems that power the fastest generative AI inference platform in the world. As an Engineering Manager, Inference ML Runtime, you will lead a team responsible for designing and scaling the systems that enable seamless execution of state‑of‑the‑art AI models on Cerebras hardware. You will operate at the intersection of machine learning, distributed systems, and high‑performance runtime engineering, translating cutting‑edge research into production‑ready infrastructure to serve a variety of text‑only and multimodal models. This role combines technical leadership, people management, and execution ownership, with direct impact on Cerebras’ core inference platform. What You’ll Do Technical Leadership Own the architecture and evolution of the ML inference runtime and serving systems. Guide the design of: high‑throughput, low‑latency inference pipelines; multimodal model execution (text, image, audio, video); scalable serving infrastructure for concurrent workloads. Partner with cloud, compiler, core runtime, hardware, and ML teams to optimize end‑to‑end performance. Team Leadership Build, manage, and grow a team of ML systems and infrastructure engineers. Provide technical direction, mentorship, and career development. Foster a culture of ownership, velocity, and engineering excellence. Recruit top talent in ML systems, distributed systems, and runtime engineering. Execution & Delivery Drive execution of complex, cross‑functional initiatives across: ML engineering; compiler/runtime teams; cloud and infrastructure teams. Own delivery of features such as: advanced inference capabilities (structured outputs, sampling strategies); heterogeneous model types, including test and multimodal; performance optimization (latency, throughput, memory efficiency); observability and reliability across the inference stack. Ensure high‑quality releases through strong testing, validation, and operational rigor. Platform & Performance Ownership Scale Cerebras’ inference platform to handle large volumes of concurrent requests at very fast speed. Drive improvements in: latency; throughput; compute efficiency. Identify and prioritize technical debt and system bottlenecks. Maintain Cerebras’ industry‑leading inference speed advantage. Cross‑Functional Collaboration Partner with: ML researchers (model enablement); compiler teams (model execution optimization); cloud/platform teams (deployment and scaling). Act as a bridge between research, infrastructure, and production systems. What You Bring Required 8+ years of experience in: large‑scale software engineering; ML systems or distributed systems. 2+ years of engineering management experience. Strong programming skills in: Python (production systems); C++ (performance‑critical systems). Experience building and scaling large‑scale inference systems (LLMs or multimodal). Experience working with cloud infrastructures and following best‑practices for building scalable microservices and applications. Preferred Experience with: LLM serving frameworks (e.g., vLLM, TensorRT‑LLM, SGLang); PyTorch and deep learning frameworks; distributed systems and high‑performance computing. Familiarity with: ML runtime systems; model execution pipelines; performance optimization for AI workloads. Why This Role Matters This team is central to Cerebras’ mission of delivering the fastest AI inference in the world. Your work will directly enable real‑time AI applications and unlock new capabilities across enterprise and frontier AI use cases. Why Join Cerebras People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras: Build a breakthrough AI platform beyond the constraints of the GPU. Publish and open source their cutting‑edge AI research. Work on one of the fastest AI supercomputers in the world. Enjoy job stability with startup vitality. Our simple, non‑corporate work culture that respects individual beliefs. Apply today and become part of the forefront of groundbreaking advancements in AI! Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them. This website or its third‑party tools process personal data. For more details, click here to review our CCPA disclosure notice. #J-18808-Ljbffr

Vacancy posted 8 hours ago
Similar jobs that could be interesting for youBased on the Engineering Manager, Inference ML Runtime in Sunnyvale, CA vacancy
  • $224k - $356.5k

     ...serving performance across various inference frameworks. Hyperscalers, cloud providers...  ..., and scaling. As Technical Lead Manager, you will lead the engineering team within NVIDIA’s Dynamo...  ...performance-critical infrastructure, ML tooling, or distributed systems. ~... 
    Suggested
    Local area
    Worldwide

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $224k - $356.5k

     ...serving performance across various inference frameworks. Hyperscalers,...  ...with expertise in systems engineering, inference infrastructure, and...  ...pushed forward. As Technical Lead Manager, you will lead the...  ...performance‑critical infrastructure, ML tooling, or distributed... 
    Suggested
    Local area
    Worldwide

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $224k - $356.5k

     ...Position Summary As Technical Lead Manager, you will lead the engineering team within NVIDIA’s Dynamo organization...  ..., diffusion, and computer vision inference. Key Responsibilities Develop and execute...  ...‑critical infrastructure, ML tooling, or distributed systems. 3+... 
    Suggested
    Local area

    NVIDIA

    Santa Clara, CA
    8 hours ago
  • $278.1k - $347.6k

     ...Principal Machine Learning Engineer, Mobile AI Inference Optimization Location...  ...decisions across the full mobile ML stack, and mentor a team of...  ...and select inference runtimes (e.g., CoreML, ONNX Runtime...  ...platform engineers, product managers, and runtime teams to align... 
    Suggested
    Work at office
    Worldwide
    Relocation package

    Unity Technologies

    Mountain View, CA
    3 days ago
  •  ...looking for a Senior Staff AI Infra Engineer who is passionate about...  ...benchmarks, with a special focus on AI/ML workloads and GPU-accelerated...  ...accelerate LLM training and inference on AMD GPUs, improving kernel,...  ...across GPU, network, and runtime layers. • Drive technical excellence... 
    Suggested

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    5 days ago
  •  ...ROLE We are seeking a Principal GenAI Inference Optimization Engineer to join our Models and Applications...  ...across multiple layers—from kernels and runtimes to frameworks and serving systems—and...  ...architectures. Experience with ML frameworks (PyTorch, JAX, or TensorFlow... 

    Advanced Micro Devices , Inc.

    San Jose, CA
    8 hours ago
  •  ...deliver industry-leading training and inference speeds and empowers machine...  ...to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs....  ...for a deeply technical, hands-on engineering leader for our on-field Kernel Reliability... 

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    5 days ago
  • $224k - $356.5k

     ...NVIDIA Corporation is seeking a Technical Lead Manager to lead the engineering team in developing the AIPerf platform, a benchmark tool for LLM and computer vision inference workloads. The ideal candidate will have extensive experience in software engineering and leadership... 

    NVIDIA

    Santa Clara, CA
    8 hours ago
  • $198.3k - $342.8k

     ...Engineering Manager, Runtime Analysis Tools The Runtime Tools team is looking for developers with a passion for memory and resource optimization to enhance, adapt, and innovate in creating tools for the next generation of software and hardware on Apple's platforms.... 
    Worldwide
    Relocation

    Apple

    Cupertino, CA
    2 days ago
  •  ...A leading technology company in California is seeking an Engineering Manager to lead the development of cutting-edge LLM/VLM technologies....  ...leadership role, you will manage a team responsible for optimizing runtime and frameworks, while collaborating with cross-functional... 

    NVIDIA

    Santa Clara, CA
    8 hours ago
  • $198.3k - $342.8k

     ...Swift Compiler Engineering Manager, Languages & Runtimes The Swift Compiler Team at Apple is a unique opportunity to evolve the Swift programming language and related developer tools that shape the experience of writing Swift code. We are looking for a software engineering... 
    Relocation

    Apple

    Cupertino, CA
    3 days ago
  •  ...A leading technology company in Cupertino is seeking an Engineering Manager for Runtime Analysis Tools to innovate and improve memory analysis tools. The ideal candidate will have at least 5 years of experience in macOS or iOS development, strong expertise in C/C++, and... 

    Apple

    Cupertino, CA
    8 hours ago
  • $255.7k - $346k

     ...flexibility and trust our employees to manage their schedules responsibly. This may...  ...and Europe. We are looking for an Engineering Manager to lead ML teams within SDS Core. This is a large...  ...stack from training code to onboard inference. Experience managing through architecture... 
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Remote work
    Day shift

    Decisive Point

    Sunnyvale, CA
    7 hours ago
  • $255.7k - $346k

     ...ADAS in the United States and Europe. We are looking for an Engineering Manager to lead ML teams within SDS Core. This is a large organization...  ...comfortable across the full stack from training code to onboard inference. Experience managing through architecture transitions (... 
    Full time

    Applied Intuition

    Sunnyvale, CA
    8 hours ago
  • $235.52k - $323.04k

     ...everyone around it. ​ Role Summary: ​ The Principal Engineer, ML (VLA Automated Driving) is the technical anchor for Vision...  ...-aware optimization Familiarity with TensorRT, ONNX Runtime, or similar inference frameworks Experience deploying models on embedded or... 
    Permanent employment
    Temporary work

    Cariad, Inc.

    Mountain View, CA
    12 days ago
  • $141.8k - $258.6k

     ...is seeking a Technical Program Manager to help shape the future of scalable...  ...‑functional coordination across engineering teams to deliver robust, high‑performance ML systems that operate across...  ...performance distributed computing, or inference optimization technologies.... 
    Relocation

    Apple Inc.

    Cupertino, CA
    1 day ago
  • $255.85k - $361.2k

     ...We are seeking a Principal Engineer to define and architect the...  ...across diverse hardware while managing state, locality, and...  ...Computation Graphs Define a runtime model for executing AI workloads...  ...a PhD. Experience with AI/ML systems, inference infrastructure, or large‑... 
    Local area
    Shift work

    Intel Corporation

    Santa Clara, CA
    4 days ago
  • $198.3k - $342.8k

     ...Machine Learning Engineering Manager, Proactive - On-Device Modeling Santa Clara, California, United...  ...on their devices. As an Applied ML team, we're pushing the boundaries of Apple...  ...transformers, attention mechanisms, and inference optimization Strong software engineering... 
    Work experience placement
    Relocation

    Apple

    Santa Clara, CA
    8 hours ago
  • $228.1k - $393.8k

     ...Senior Machine Learning Engineering Manager – Ads Predictions At Apple, we focus deeply on our...  ...will lead the strategy and development of inference models that predict and optimize user...  ..., Agile environment and are a hands-on ML leader who can drive execution while building... 
    Relocation

    Apple

    Cupertino, CA
    3 days ago
  • $272k - $431.25k

     ...NVIDIA Gruppe is looking for a Principal Software Engineer to advance open-source AI inference. This hands-on role emphasizes running high-performance inference...  .... Key responsibilities include optimizing inference runtimes, improving efficiency, and mentoring other engineers.... 

    NVIDIA Gruppe

    Santa Clara, CA
    8 hours ago
  •  ...Observability team at Netflix makes AI, ML, and Agentic systems...  ...Partner with ML researchers, engineers, and platform teams to embed...  ...model training, online inference, and agent orchestration. Drive...  ...engineering experience and 3+ years of management experience. Experience... 
    Hourly pay
    Full time
    Immediate start
    Flexible hours

    Netflix

    Los Gatos, CA
    1 day ago
  • $146.3k - $289.9k

     ...personalization. We are looking for an Engineering Manager to lead and grow a team of engineers...  ...workstreams — including LLM orchestration, inference services, data pipelines, and...  ...foundation in distributed systems, AI/ML infrastructure, or large-scale service... 
    Temporary work
    Local area
    Immediate start
    Worldwide
    Flexible hours

    Adobe

    San Jose, CA
    2 days ago
  • $197k - $291k

     ...A leading technology company seeks a Software Engineering Manager II for YouTube to lead engineering teams in optimizing ML infrastructure and building recommendation systems. The ideal candidate has extensive software development experience, strong technical leadership... 
    Full time

    Jobleads-US

    Mountain View, CA
    1 day ago
  • $251k - $310k

     ...foundational models, large-scale 3rd party data, and partner teams in Research, Oracles, and Simulation. Manage a team (~10) with diverse skills including engineering, modeling, ML infrastructure, of senior and junior SWEs, foster an inclusive culture for sustained team growth,... 
    Temporary work
    Immediate start

    Neura Market

    Mountain View, CA
    8 hours ago
  •  ...safer and more connected. We are looking for a technically deep Engineering Manager to lead the AI team at Coram. This team is small, highly...  ...multimodal systems Track record of shipping production‑grade ML systems at scale Ability to balance technical debt, product velocity... 
    Shift work

    Coram AI

    Sunnyvale, CA
    5 days ago
  • $207k - $304k

     ...l M L ( T a p e s t r y ) Software Engineering Mountain View, CA (HQ) About Tapestry...  ...development and deployment of frontier ML techniques spanning Multimodal ML to solve...  ...experience in a technical leadership or people management role, with a focus on guiding ML-centric... 
    Full time
    Currently hiring
    Flexible hours

    X: The Moonshot Factory

    Mountain View, CA
    1 day ago
  • $251k - $310k

     ...Waymo is looking for a Technical Lead Manager (TLM) for Scene Understanding in Mountain View, California. This high-impact role involves...  ...in Python and C++, and a strong background in managing engineering teams. This role offers a competitive salary between $251,000... 

    Neura Market

    Mountain View, CA
    8 hours ago
  • $251k - $310k

     ...you will act as a true "player-coach" for a team of roughly 6-10 engineers. This is a high impact role, where your models will directly...  ...this hybrid role you will report to a Sr Staff Technical Lead Manager. You will: Own a specific domain within scene understanding, such... 
    Full time
    Temporary work
    Remote work

    Neura Market

    Mountain View, CA
    8 hours ago
  • $250k - $300k

     ...online. Our unified ecommerce management solutions empower brands to...  ...The Role We're looking for an Engineering Leader with a Data Science /...  ...team. Design and build scalable ML infrastructure to support...  ...and pipelines for training and inference. Fluency in backend and ML‑oriented... 
    Temporary work

    Commerceiq

    Mountain View, CA
    9 hours ago
  •  ...Decisive Point is looking for an Engineering Manager for the ML Platform team in Sunnyvale, California. This role involves leading a team to build and optimize the infrastructure for Physical AI, managing GPU clusters, and ensuring the delivery of high-performance ML... 

    Decisive Point

    Sunnyvale, CA
    7 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Engineering Manager, Inference ML Runtime. Be the first to apply!