TPM - Inference Capacity

Cerebras Systems

Technical Program Manager

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

As demand for AI continues to accelerate, intelligent capacity management becomes one of the company's most strategic challenges. Every customer commitment, model launch, and infrastructure investment depends on making the right capacity decisions at the right time.

We're looking for an experienced Technical Program Manager to lead capacity planning and fleet strategy for our Inference Service organization. This is a highly visible role working directly with Engineering, Product, Infrastructure, SRE, Operations, and executive leadership to maximize utilization of one of the world's most advanced AI inference fleets.

Capacity planning and forecasting. Build and maintain the 6 / 12 / 26-week rolling capacity model across every cluster. Work with product team to translate customer contracts and sales pipeline asks into capacity requirements. Forecast model replicas, system-hours, and spares by customer and by model. Reconcile against actuals weekly. Maintain the source-of-truth doc.

New Datacenter Capacity bring-up. Collaborate with datacenter infrastructure and operations teams to support new datacenter bringup and ensure production readiness. Drive engineering efforts and related automation to ensure on-time and quality delivery.

Allocation and cluster placement. Partner closely with the SRE and product team to run the weekly capacity review across different customers/models/clusters. Decide model placement and re-balancing: which customer tenants land where, which clusters absorb new launches, which freezes are in effect etc. Run the weekly capacity and utilization report for the Inference Service leadership. Post capacity allocation, drive downstream tasks w.r.t deploying models across the allocated capacity with SRE team.

Drive capacity planning tool adoption. Partner with console engineering team to drive stakeholder adoption of the inhouse built capacity planning and allocation tool, including user acceptance testing, issue resolution, tracking changes, pilot testing and deployment. In general, contribute to the continuous process improvement and development of internal capacity management tools.

Incident tracking and postmortems. Proactively identify and mitigate capacity bottlenecks, risks, and dependencies. In case of any SLA drop due to capacity misallocations, drive related resolution and postmortem.

Key Responsibilities

Run weekly capacity planning and daily capacity and deployment tracking with Engineering, product and operations team. Own fleet utilization reporting and forecasting
Drive capacity planning for new customer deployments and major model launches
Drive continuous improvement and stakeholder adoption of new capacity management platform
Drive org level strategic initiatives related to capacity expansion, improving fleet efficiency and maximizing effective utilization of available systems
Lead planning around major infrastructure events including but not limited to new customer commits, new model releases, change to DC/cluster architecture, etc. that impacts capacity and fleet utilization. Update capacity plans and forecasts accordingly.
Maintain Jira EPICs and Confluence pages related to capacity planning, reporting and change management to ensure execution transparency across teams

Qualifications

5+ years of TPM, technical program management, or product operations experience in cloud infrastructure, large-scale ML serving, or hyperscaler capacity planning
Experience leading large cross-functional programs involving Engineering, Product, and Operations
Comfort with the inference serving stack: model replicas, batching, prefill/decode, KV cache, accelerator scheduling
Strong data fluency: SQL, Grafana, basic Python or Flux to pull your own numbers without waiting for an analyst
Track record of running a recurring cross-functional ritual involving senior engineers and LT
Direct experience with AI accelerator fleet operations such as Habana, TPU pods, Inferentia, Trainium

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras, we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we've reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Find out more about what it's like to work at Cerebras here!

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

Cerebras Systems

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the TPM - Inference Capacity in Sunnyvale, CA vacancy

Staff TPM — AI Inference Platform Leader
Crusoe in Sunnyvale, California is searching for a Technical Program Manager to drive the Managed Inference platform for AI workloads. The role focuses on program delivery and team alignment across various functions while managing the product lifecycle. The ideal candidate...
Suggested
Energy Jobline ZR
Sunnyvale, CA
3 days ago
Senior/Staff TPM — End-to-End Data Center Capacity
Cerebras Systems, Inc. is seeking a DC Delivery E2E TPM to oversee the delivery of AI-optimized data center capacity. This pivotal role involves managing milestones, risks, and capacity outcomes while collaborating with various internal teams and external partners. The...
Suggested
Cerebras Systems, Inc.
Sunnyvale, CA
16 hours ago
Technical Program Manager, Inference
$198k - $264k
...Technical Program Manager, Inference CoreWeave is The Essential Cloud for AI. Built for pioneers by pioneers, CoreWeave delivers a platform... ...turn compute into capability. What You'll Do: The AI/ML TPM team owns delivery and execution across CoreWeave's AI/ML...
Suggested
Temporary work
Casual work
Work at office
Flexible hours
CoreWeave
Sunnyvale, CA
16 hours ago
Senior GPU Capacity Planner
$160k - $195k
...Senior GPU Capacity and Optimization Planner Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the... ...Synthesize commercial demand signals and large-scale AI training/inference architectural trends to inform hardware placement and...
Suggested
Temporary work
Crusoe
Sunnyvale, CA
5 days ago
Software Engineer, Inference Platform
...hiring a Software Engineer to help contribute to projects on our Inference Platform team. Our team primarily owns the orchestration layer... ...SLOs. Drive system-level improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand....
Suggested
Cerebras
Sunnyvale, CA
3 days ago
Software Engineer, Inference Platform
...architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud... ...SLOs. Drive system-level improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand....
Cerebras Systems, Inc.
Sunnyvale, CA
16 hours ago
Principal Engineer, Inference Cloud
...architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU‑based hyperscale cloud... ...) with clear SLOs. Drive improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand. Code &...
Cerebras Systems, Inc.
Sunnyvale, CA
16 hours ago
Senior Manager, Capacity and Performance Engineering
$230k - $375k
**ABOUT THE ROLE**As Senior Manager of AV Cloud Capacity & Performance Engineering, you own the team and function responsible for making... ...: GPU training clusters, ML pipeline orchestration, or inference serving at production scale.Familiarity with complex engineering...
Work experience placement
Work at office
Flexible hours
General Motors
Sunnyvale, CA
1 day ago
Staff Software Engineer, Inference Cloud
...Staff Engineer to own major areas of the architecture of our Inference Cloud Platform. This team owns the cloud layer behind our Inference... ...SLOs. Drive system‑level improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand....
Cerebras Systems, Inc.
Sunnyvale, CA
16 hours ago
Capacity Planning Manager
$300 per month
...join a team that’s setting the pace for responsible, transformative cloud infrastructure. About This Role: Join Crusoe Energy as a Capacity Planning Manager, a pivotal role providing critical leadership to our Capacity Planning & Efficiency team. You will drive the...
Full time
Temporary work
Shift work
Crusoe Energy Systems
Sunnyvale, CA
4 days ago
Senior Performance Engineer, Inference
...the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is...
Contract work
Shift work
Cerebras Systems, Inc.
Sunnyvale, CA
16 hours ago
Display Silicon TPM
...dependencies of our uLED backplane silicon programs, from early development through high-volume manufacturing. As a Display Silicon TPM, you will work across two primary sets of relationships: between the display silicon team and its cross-functional partners inside (uLED...
Hourly pay
Full time
Contract work
Shift work
Ursus
Sunnyvale, CA
3 days ago
Technical Program Manager, Inference
$198k - $264k
...CRWV) in March 2025. Learn more at What You'll Do: The AI/ML TPM team owns delivery and execution across CoreWeave's AI/ML Platform... ...customers. As a Technical Program Manager focused on inference, you will lead complex, cross-functional programs spanning inference...
Permanent employment
Full time
Temporary work
Casual work
Work at office
Flexible hours
CoreWeave
Sunnyvale, CA
1 day ago
Senior Inference Performance Engineer
...seeks a Senior Performance Engineer to join the Product team in Sunnyvale, CA. The role involves developing benchmarks to measure inference performance and creating competitive pricing models. The ideal candidate will have deep knowledge of open-source inference frameworks...
Cerebras
Sunnyvale, CA
5 days ago
Director, Compute TPM & Silicon Operations
Google Inc. is looking for a Director, TPM Compute, Strategic Sourcing and Silicon Operations in Sunnyvale, CA. This role is crucial in bridging business demand with supply chain and capacity plans. A Bachelor’s degree and 15 years of experience in product management or...
Google Inc.
Sunnyvale, CA
3 days ago
AI Inference Performance Engineer
Cerebras Systems, Inc. is seeking engineers for its Inference Core Platform group in Sunnyvale, California. This role involves building foundational software and hardware infrastructure to enhance AI inference performance on the Cerebras Wafer-Scale Engine. Ideal candidates...
Cerebras Systems, Inc.
Sunnyvale, CA
16 hours ago
Forecasting & Capacity Product Manager
$163k - $237k
Google is seeking a Product Manager in Sunnyvale, CA, to guide products from conception to launch, working cross-functionally to enhance user engagement. The ideal candidate will have strong product management experience and the ability to drive strategy for Google Cloud...
Google
Sunnyvale, CA
1 day ago
Finance Manager, Cloud Capacity Finance
$171k - $248k
...presentation skills. About the job Google Cloud Platform (GCP) is changing the way the world accesses compute power. As global compute capacity rapidly shifts to the public cloud, GCP delivers virtualized compute, big data/analytics, and machine learning tools to its...
Temporary work
Shift work
Google Inc.
Sunnyvale, CA
5 days ago
Product Manager, Forecasting and Capacity Management
$163k - $237k
Benefits Health, dental, vision, life, disability insurance Retirement Benefits: 401(k) with company match Paid Time Off: 20 days of vacation per year, accruing at a rate of 6.15 hours per pay period for the first five years of employment Sick Time: 40 hours/year (increased...
Temporary work
Google
Sunnyvale, CA
1 day ago
Embedded ML Inference Optimization Engineer
Decisive Point is seeking a Software Engineer in Sunnyvale, California, with expertise in optimizing machine learning models for embedded systems. This role involves performance optimization for embedded compute platforms, collaborating with ML engineers, and requires strong...
Decisive Point
Sunnyvale, CA
1 day ago
Senior AI Inference Performance Engineer
...benchmarking and competitive pricing models for their AI chip. The ideal candidate will have extensive experience with open-source inference frameworks and an understanding of ML systems. This role critically combines technical acumen and market analysis to ensure...
Cerebras Systems, Inc.
Sunnyvale, CA
16 hours ago
Staff ML Engineer, Inference Platform
$195k - $298k
...times per week, or another frequency dictated by the business. This job is eligible for relocation assistance. About the Team The ML Inference Platform is part of the AI Compute Platforms organization within Infrastructure Platforms. Our team owns the cloud-agnostic,...
Local area
Relocation package
Flexible hours
Israelvcforum
Sunnyvale, CA
5 days ago
Staff ML Inference Platform Engineer
Israelvcforum is seeking a Staff ML Infrastructure Engineer in Sunnyvale, CA. This hybrid role involves designing scalable platforms for ML workflows, collaborating with engineers to enhance model serving, and leading technical decision-making. The ideal candidate has a...
Israelvcforum
Sunnyvale, CA
1 day ago
Senior Inference Performance & Pricing Analyst
...California, is hiring a Senior Performance Analyst. You will focus on performance benchmarking and competitive pricing for their inference product. The role requires deep knowledge of ML systems and open-source frameworks, with at least 5 years of experience. Responsibilities...
Cerebras Systems
Sunnyvale, CA
3 days ago
Staff ML Engineer, Inference Platform
$195k - $298k
...week, at minimum or other frequency dictated by the business. This job is eligible for relocation assistance. About the Team The ML Inference Platform is part of the AI Compute Platforms organization within Infrastructure Platforms. Our team owns the cloud-agnostic,...
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
5 days ago
AI Infra & Capacity Planning Lead
$300 per month
...part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role: Join Crusoe Energy as a Capacity Planning Manager, a pivotal role providing critical leadership to our Capacity Planning & Efficiency team. You will drive the...
Full time
Temporary work
Shift work
Crusoe Energy Systems
Sunnyvale, CA
3 days ago
Staff Compute & Storage Capacity Planner
$195k - $235k
...be part of a high-performing team that believes in each other, come build with us at Crusoe. About the Role At Crusoe, the Capacity Planning & Efficiency team operates at the critical intersection of all Crusoe products, services, and underlying physical infrastructure...
Shift work
Crusoe
Sunnyvale, CA
5 days ago
Senior Platform Engineer, Inference & Kubernetes
Cerebras is seeking a Software Engineer to join our Inference Platform team in Sunnyvale, California. This role involves developing and leading projects that integrate cloud and ML components. You will contribute to shaping the technical direction and improve system performance...
Cerebras
Sunnyvale, CA
3 days ago
Senior TPM — Cloud Infrastructure Leader
$227k - $320k
A leading technology company is seeking a Senior Technical Program Manager II to join their Infrastructure team in Sunnyvale, CA. This role involves managing complex, multi-disciplinary projects and collaborating with stakeholders to ensure effective communication and project...
Google Inc.
Sunnyvale, CA
5 days ago
Technical Leader, Google Cloud Capacity
$240k - $334k
...Technical Leader, Google Cloud Capacity Advanced experience owning outcomes and decision making, solving ambiguous problems and influencing stakeholders; deep expertise in domain. Minimum qualifications: Bachelor's degree in a technical field, or equivalent practical...
Worldwide
Google
Sunnyvale, CA
16 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to TPM - Inference Capacity. Be the first to apply!