TPM - Inference Capacity
Cerebras Systems
Technical Program Manager
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.
As demand for AI continues to accelerate, intelligent capacity management becomes one of the company's most strategic challenges. Every customer commitment, model launch, and infrastructure investment depends on making the right capacity decisions at the right time.
We're looking for an experienced Technical Program Manager to lead capacity planning and fleet strategy for our Inference Service organization. This is a highly visible role working directly with Engineering, Product, Infrastructure, SRE, Operations, and executive leadership to maximize utilization of one of the world's most advanced AI inference fleets.
Capacity planning and forecasting. Build and maintain the 6 / 12 / 26-week rolling capacity model across every cluster. Work with product team to translate customer contracts and sales pipeline asks into capacity requirements. Forecast model replicas, system-hours, and spares by customer and by model. Reconcile against actuals weekly. Maintain the source-of-truth doc.
New Datacenter Capacity bring-up. Collaborate with datacenter infrastructure and operations teams to support new datacenter bringup and ensure production readiness. Drive engineering efforts and related automation to ensure on-time and quality delivery.
Allocation and cluster placement. Partner closely with the SRE and product team to run the weekly capacity review across different customers/models/clusters. Decide model placement and re-balancing: which customer tenants land where, which clusters absorb new launches, which freezes are in effect etc. Run the weekly capacity and utilization report for the Inference Service leadership. Post capacity allocation, drive downstream tasks w.r.t deploying models across the allocated capacity with SRE team.
Drive capacity planning tool adoption. Partner with console engineering team to drive stakeholder adoption of the inhouse built capacity planning and allocation tool, including user acceptance testing, issue resolution, tracking changes, pilot testing and deployment. In general, contribute to the continuous process improvement and development of internal capacity management tools.
Incident tracking and postmortems. Proactively identify and mitigate capacity bottlenecks, risks, and dependencies. In case of any SLA drop due to capacity misallocations, drive related resolution and postmortem.
Key Responsibilities
- Run weekly capacity planning and daily capacity and deployment tracking with Engineering, product and operations team. Own fleet utilization reporting and forecasting
- Drive capacity planning for new customer deployments and major model launches
- Drive continuous improvement and stakeholder adoption of new capacity management platform
- Drive org level strategic initiatives related to capacity expansion, improving fleet efficiency and maximizing effective utilization of available systems
- Lead planning around major infrastructure events including but not limited to new customer commits, new model releases, change to DC/cluster architecture, etc. that impacts capacity and fleet utilization. Update capacity plans and forecasts accordingly.
- Maintain Jira EPICs and Confluence pages related to capacity planning, reporting and change management to ensure execution transparency across teams
Qualifications
- 5+ years of TPM, technical program management, or product operations experience in cloud infrastructure, large-scale ML serving, or hyperscaler capacity planning
- Experience leading large cross-functional programs involving Engineering, Product, and Operations
- Comfort with the inference serving stack: model replicas, batching, prefill/decode, KV cache, accelerator scheduling
- Strong data fluency: SQL, Grafana, basic Python or Flux to pull your own numbers without waiting for an analyst
- Track record of running a recurring cross-functional ritual involving senior engineers and LT
- Direct experience with AI accelerator fleet operations such as Habana, TPU pods, Inferentia, Trainium
Why Join Cerebras
People who are serious about software make their own hardware. At Cerebras, we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we've reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:
- Build a breakthrough AI platform beyond the constraints of the GPU.
- Publish and open source their cutting-edge AI research.
- Work on one of the fastest AI supercomputers in the world.
- Enjoy job stability with startup vitality.
- Our simple, non-corporate work culture that respects individual beliefs.
Find out more about what it's like to work at Cerebras here!
Apply today and become part of the forefront of groundbreaking advancements in AI!
Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.
Cerebras Systems- Crusoe in Sunnyvale, California is searching for a Technical Program Manager to drive the Managed Inference platform for AI workloads. The role focuses on program delivery and team alignment across various functions while managing the product lifecycle. The ideal candidate...Suggested
- Cerebras Systems, Inc. is seeking a DC Delivery E2E TPM to oversee the delivery of AI-optimized data center capacity. This pivotal role involves managing milestones, risks, and capacity outcomes while collaborating with various internal teams and external partners. The...Suggested
$198k - $264k
...Technical Program Manager, Inference CoreWeave is The Essential Cloud for AI. Built for pioneers by pioneers, CoreWeave delivers a platform... ...turn compute into capability. What You'll Do: The AI/ML TPM team owns delivery and execution across CoreWeave's AI/ML...SuggestedTemporary workCasual workWork at officeFlexible hours$160k - $195k
...Senior GPU Capacity and Optimization Planner Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the... ...Synthesize commercial demand signals and large-scale AI training/inference architectural trends to inform hardware placement and...SuggestedTemporary work- ...hiring a Software Engineer to help contribute to projects on our Inference Platform team. Our team primarily owns the orchestration layer... ...SLOs. Drive system-level improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand....Suggested
- ...architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud... ...SLOs. Drive system-level improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand....
- ...architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU‑based hyperscale cloud... ...) with clear SLOs. Drive improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand. Code &...
$230k - $375k
**ABOUT THE ROLE**As Senior Manager of AV Cloud Capacity & Performance Engineering, you own the team and function responsible for making... ...: GPU training clusters, ML pipeline orchestration, or inference serving at production scale.Familiarity with complex engineering...Work experience placementWork at officeFlexible hours- ...Staff Engineer to own major areas of the architecture of our Inference Cloud Platform. This team owns the cloud layer behind our Inference... ...SLOs. Drive system‑level improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand....
$300 per month
...join a team that’s setting the pace for responsible, transformative cloud infrastructure. About This Role: Join Crusoe Energy as a Capacity Planning Manager, a pivotal role providing critical leadership to our Capacity Planning & Efficiency team. You will drive the...Full timeTemporary workShift work- ...the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is...Contract workShift work
- ...dependencies of our uLED backplane silicon programs, from early development through high-volume manufacturing. As a Display Silicon TPM, you will work across two primary sets of relationships: between the display silicon team and its cross-functional partners inside (uLED...Hourly payFull timeContract workShift work
$198k - $264k
...CRWV) in March 2025. Learn more at What You'll Do: The AI/ML TPM team owns delivery and execution across CoreWeave's AI/ML Platform... ...customers. As a Technical Program Manager focused on inference, you will lead complex, cross-functional programs spanning inference...Permanent employmentFull timeTemporary workCasual workWork at officeFlexible hours- ...seeks a Senior Performance Engineer to join the Product team in Sunnyvale, CA. The role involves developing benchmarks to measure inference performance and creating competitive pricing models. The ideal candidate will have deep knowledge of open-source inference frameworks...
- Google Inc. is looking for a Director, TPM Compute, Strategic Sourcing and Silicon Operations in Sunnyvale, CA. This role is crucial in bridging business demand with supply chain and capacity plans. A Bachelor’s degree and 15 years of experience in product management or...
- Cerebras Systems, Inc. is seeking engineers for its Inference Core Platform group in Sunnyvale, California. This role involves building foundational software and hardware infrastructure to enhance AI inference performance on the Cerebras Wafer-Scale Engine. Ideal candidates...
$163k - $237k
Google is seeking a Product Manager in Sunnyvale, CA, to guide products from conception to launch, working cross-functionally to enhance user engagement. The ideal candidate will have strong product management experience and the ability to drive strategy for Google Cloud...$171k - $248k
...presentation skills. About the job Google Cloud Platform (GCP) is changing the way the world accesses compute power. As global compute capacity rapidly shifts to the public cloud, GCP delivers virtualized compute, big data/analytics, and machine learning tools to its...Temporary workShift work$163k - $237k
Benefits Health, dental, vision, life, disability insurance Retirement Benefits: 401(k) with company match Paid Time Off: 20 days of vacation per year, accruing at a rate of 6.15 hours per pay period for the first five years of employment Sick Time: 40 hours/year (increased...Temporary work- Decisive Point is seeking a Software Engineer in Sunnyvale, California, with expertise in optimizing machine learning models for embedded systems. This role involves performance optimization for embedded compute platforms, collaborating with ML engineers, and requires strong...
- ...benchmarking and competitive pricing models for their AI chip. The ideal candidate will have extensive experience with open-source inference frameworks and an understanding of ML systems. This role critically combines technical acumen and market analysis to ensure...
$195k - $298k
...times per week, or another frequency dictated by the business. This job is eligible for relocation assistance. About the Team The ML Inference Platform is part of the AI Compute Platforms organization within Infrastructure Platforms. Our team owns the cloud-agnostic,...Local areaRelocation packageFlexible hours- Israelvcforum is seeking a Staff ML Infrastructure Engineer in Sunnyvale, CA. This hybrid role involves designing scalable platforms for ML workflows, collaborating with engineers to enhance model serving, and leading technical decision-making. The ideal candidate has a...
- ...California, is hiring a Senior Performance Analyst. You will focus on performance benchmarking and competitive pricing for their inference product. The role requires deep knowledge of ML systems and open-source frameworks, with at least 5 years of experience. Responsibilities...
$195k - $298k
...week, at minimum or other frequency dictated by the business. This job is eligible for relocation assistance. About the Team The ML Inference Platform is part of the AI Compute Platforms organization within Infrastructure Platforms. Our team owns the cloud-agnostic,...Relocation packageFlexible hours$300 per month
...part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role: Join Crusoe Energy as a Capacity Planning Manager, a pivotal role providing critical leadership to our Capacity Planning & Efficiency team. You will drive the...Full timeTemporary workShift work$195k - $235k
...be part of a high-performing team that believes in each other, come build with us at Crusoe. About the Role At Crusoe, the Capacity Planning & Efficiency team operates at the critical intersection of all Crusoe products, services, and underlying physical infrastructure...Shift work- Cerebras is seeking a Software Engineer to join our Inference Platform team in Sunnyvale, California. This role involves developing and leading projects that integrate cloud and ML components. You will contribute to shaping the technical direction and improve system performance...
$227k - $320k
A leading technology company is seeking a Senior Technical Program Manager II to join their Infrastructure team in Sunnyvale, CA. This role involves managing complex, multi-disciplinary projects and collaborating with stakeholders to ensure effective communication and project...$240k - $334k
...Technical Leader, Google Cloud Capacity Advanced experience owning outcomes and decision making, solving ambiguous problems and influencing stakeholders; deep expertise in domain. Minimum qualifications: Bachelor's degree in a technical field, or equivalent practical...Worldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to TPM - Inference Capacity. Be the first to apply!
