Sr./Staff TPM - Inference Capacity
Cerebras Systems
Technical Program Manager
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.
As demand for AI continues to accelerate, intelligent capacity management becomes one of the company's most strategic challenges. Every customer commitment, model launch, and infrastructure investment depends on making the right capacity decisions at the right time.
We're looking for an experienced Technical Program Manager to lead capacity planning and fleet strategy for our Inference Service organization. This is a highly visible role working directly with Engineering, Product, Infrastructure, SRE, Operations, and executive leadership to maximize utilization of one of the world's most advanced AI inference fleets.
Capacity planning and forecasting. Build and maintain the 6 / 12 / 26-week rolling capacity model across every cluster. Work with product team to translate customer contracts and sales pipeline asks into capacity requirements. Forecast model replicas, system-hours, and spares by customer and by model. Reconcile against actuals weekly. Maintain the source-of-truth doc.
New Datacenter Capacity bring-up. Collaborate with datacenter infrastructure and operations teams to support new datacenter bringup and ensure production readiness. Drive engineering efforts and related automation to ensure on-time and quality delivery.
Allocation and cluster placement. Partner closely with the SRE and product team to run the weekly capacity review across different customers/models/clusters. Decide model placement and re-balancing: which customer tenants land where, which clusters absorb new launches, which freezes are in effect etc. Run the weekly capacity and utilization report for the Inference Service leadership. Post capacity allocation, drive downstream tasks w.r.t deploying models across the allocated capacity with SRE team.
Drive capacity planning tool adoption. Partner with console engineering team to drive stakeholder adoption of the inhouse built capacity planning and allocation tool, including user acceptance testing, issue resolution, tracking changes, pilot testing and deployment. In general, contribute to the continuous process improvement and development of internal capacity management tools.
Incident tracking and postmortems. Proactively identify and mitigate capacity bottlenecks, risks, and dependencies. In case of any SLA drop due to capacity misallocations, drive related resolution and postmortem.
Key Responsibilities
- Run weekly capacity planning and daily capacity and deployment tracking with Engineering, product and operations team. Own fleet utilization reporting and forecasting
- Drive capacity planning for new customer deployments and major model launches
- Drive continuous improvement and stakeholder adoption of new capacity management platform
- Drive org level strategic initiatives related to capacity expansion, improving fleet efficiency and maximizing effective utilization of available systems
- Lead planning around major infrastructure events including but not limited to new customer commits, new model releases, change to DC/cluster architecture, etc. that impacts capacity and fleet utilization. Update capacity plans and forecasts accordingly.
- Maintain Jira EPICs and Confluence pages related to capacity planning, reporting and change management to ensure execution transparency across teams
Qualifications
- 5+ years of TPM, technical program management, or product operations experience in cloud infrastructure, large-scale ML serving, or hyperscaler capacity planning
- Experience leading large cross-functional programs involving Engineering, Product, and Operations
- Comfort with the inference serving stack: model replicas, batching, prefill/decode, KV cache, accelerator scheduling
- Strong data fluency: SQL, Grafana, basic Python or Flux to pull your own numbers without waiting for an analyst
- Track record of running a recurring cross-functional ritual involving senior engineers and LT
- Direct experience with AI accelerator fleet operations such as Habana, TPU pods, Inferentia, Trainium
Why Join Cerebras
People who are serious about software make their own hardware. At Cerebras, we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we've reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:
- Build a breakthrough AI platform beyond the constraints of the GPU.
- Publish and open source their cutting-edge AI research.
- Work on one of the fastest AI supercomputers in the world.
- Enjoy job stability with startup vitality.
- Our simple, non-corporate work culture that respects individual beliefs.
Find out more about what it's like to work at Cerebras here!
Apply today and become part of the forefront of groundbreaking advancements in AI!
Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.
Cerebras Systems- Cerebras Systems, Inc. is seeking a DC Delivery E2E TPM to oversee the delivery of AI-optimized data center capacity. This pivotal role involves managing milestones, risks, and capacity outcomes while collaborating with various internal teams and external partners. The...Senior
- Crusoe in Sunnyvale, California is searching for a Technical Program Manager to drive the Managed Inference platform for AI workloads. The role focuses on program delivery and team alignment across various functions while managing the product lifecycle. The ideal candidate...Suggested
- Crusoe is seeking a Senior Staff TPM in Sunnyvale, California, to lead the Vera Rubin NPI introduction. This pivotal role involves defining and driving the generation engagement model across hardware and engineering teams, ensuring successful customer cluster delivery....Senior
- Uber is seeking a passionate Senior Staff TPM to manage a portfolio of programs aimed at enhancing customer care through innovative solutions. This role involves working with cutting-edge technologies, including generative AI, to develop automation solutions and build customer...Senior
$193k - $234k
...Crusoe’s Cloud Product team is seeking a Staff Technical Program Manager to bridge the gap... ...component orchestration. As a Staff TPM, you are the connective tissue between Engineering... ...decisions to unblock progress. Capacity & Investment Planning: Assess resource needs...SuggestedTemporary work- A leading technology company is seeking a Senior Technical Program Manager in Santa Clara, CA to lead capacity management for EDA Farm. Key responsibilities include strategizing capacity management, optimizing infrastructure, and collaborating across engineering and procurement...Senior
$152k - $241.5k
...image classification to speech recognition to natural language processing. We are a fast-paced team building a highly-performant AI inference platform to make design and deployment of new AI models easier and accessible to all users.**What you'll be doing:*** Develop...Senior$163.8k - $226.22k
42dot Inc. is seeking a Sr. Staff Technical Project Manager to lead complex projects for software-defined vehicles. This role involves cross-functional collaboration, ensuring technical milestones, and managing vendor relationships. The ideal candidate has over 6 years...Senior$230k - $250k
Cerebras Systems in Sunnyvale, CA, seeks a Sr. Member of Technical Staff to develop resilient software for their AI chip. Responsibilities include designing robust software features, maintaining deployment workflows using AWS, and debugging software issues. Candidates should...SeniorRemote job- A global data and AI company is seeking a Senior Staff Technical Program Manager to lead Reliability initiatives within product engineering teams. This role requires over 10 years of experience in managing cloud infrastructure programs and driving improvements in reliability...SeniorLocal area
$167.6k - $271.15k
...TTPs. Provide high‑level technical guidance and mentorship to Staff and Senior security engineers, fostering a culture of engineering... ...architecture, with at least 4+ years in a Principal or Lead capacity. Deep, hands‑on architectural expertise in at least one major...SeniorFull timeWork at office- Databricks is seeking a Staff TPM to lead complex, cross-functional programs at their Mountain View location. This pivotal role involves defining program structures, aligning engineering and business teams, and ensuring successful product launches. Ideal candidates will...
- United States Digital Space LLC in Mountain View is looking for a Staff TPM to lead complex, high-visibility programs crucial to the company’s success. This role focuses on delivering foundational capabilities through collaboration across engineering, product, and marketing...
- Cerebras Systems, Inc. in Sunnyvale, California is seeking a Staff SRE to lead a high-performance team. You will guide self-service delivery pipelines and improve operational reliability for AI inference services. This role starts with hands-on immersion to understand...Shift work
- ...Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based... ...inference. About The Role The DC Delivery E2E TPM is the single-threaded owner for delivering data center capacity from forecast → site strategy → design → construction...SeniorContract workRemote work
- Databricks Inc. is seeking a Staff TPM to lead complex, high-visibility programs that align across engineering, product, and go-to-market teams. The role involves defining program structure and managing cross-organizational alignment. Ideal candidates should have over 1...
- ...architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU‑based hyperscale cloud... ...) with clear SLOs. Drive improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand. Code &...
- Mind Robotics Inc. is seeking a Senior Staff Technical Program Manager to manage the integration of hardware, software, and data collection in developing their robotics platform. This role involves owning the master program plan, identifying risks, and leading cross-functional...Senior
- Garuda Ventures is seeking a Senior Staff Technical Program Manager to play a crucial role in the development of Mind Robotics' core platform. This role involves leading program management across hardware, software, and data collection teams to ensure efficiency from prototype...Senior
- GEICO is seeking a Sr. Staff Technical Program Manager to act as Chief of Staff and Technical Program Manager within their Sales Engineering department. This hybrid position demands leadership skills to manage cross-functional teams and oversee large-scale engineering projects...Senior
$284.9k - $427.3k
Role Overview Qualcomm Data Center team is developing high‑performance, energy‑efficient server solutions for data center applications. We are looking for a highly experienced Server Product Architect to define the architecture of a Server SoC that meets critical KPIs...SeniorWork from home- Cerebras Systems, Inc. is seeking a Principal Engineer to lead their Inference Cloud Platform team. This pivotal role involves identifying key platform issues, defining long-term architecture, and contributing production code to enhance performance and reliability. The...
$131k - $213.5k
Job Summary As our Sr Staff Google Workspace administrator, you will be a critical part of our Information Technology team and the backbone of our corporate systems. You will have ownership over key platforms the organization relies on daily, challenged to not only maintain...SeniorVisa sponsorshipWork visa$200k - $322k
...Infrastructure is seeking a Technical Program Manager to lead Infrastructure Capacity Management programs and workstreams. Given this Infrastructure... .... This is a fast paced and evolving landscape that requires a TPM to guide engineering roadmaps to be delivered with high quality...Senior$230k - $375k
**ABOUT THE ROLE**As Senior Manager of AV Cloud Capacity & Performance Engineering, you own the team and function responsible for making... ...: GPU training clusters, ML pipeline orchestration, or inference serving at production scale.Familiarity with complex engineering...SeniorWork experience placementWork at officeFlexible hours- REEVO ENGINEERING BUILDER - JD We are seeking a Senior Software Engineer who thrives as a T-shaped individual—bringing deep technical expertise in software engineering while also possessing a broad range of skills that allow them to creatively tackle diverse challenges....Senior
- DoorDash is seeking a Staff Technical Program Manager for our Marketplace Engineering Teams in California. You will drive important engineering-wide initiatives related to next-gen platforms. Responsibilities include owning Marketplace programs, planning and execution optimization...Flexible hours
$159.4k - $245k
A leading automotive manufacturer is seeking a Staff Technical Program Manager for Embodied AI to oversee model development from research to production. This role involves cross-functional leadership, driving execution of AI initiatives, and ensuring performance and safety...- General Motors is seeking a Staff Technical Program Manager for their Simulation team. This role involves leading test creation strategy for GM's Autonomous Driving Program, ensuring effective cross-functional collaboration, and managing complex program metrics. The ideal...Remote job
- Arohana Tech Solutions Private Limited seeks a Staff Technical Program Manager in Mountain View, California. The role involves driving high-impact technical programs in Infrastructure Security, leading large-scale cross-functional teams, and mentoring other TPMs. Candidates...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Sr./Staff TPM - Inference Capacity. Be the first to apply!
- senior cloud service delivery manager Sunnyvale, CA
- senior business analyst contract Sunnyvale, CA
- senior product design engineer Sunnyvale, CA
- senior game producer Sunnyvale, CA
- senior software manager Sunnyvale, CA
- senior manager business analytics Sunnyvale, CA
- senior marketing account manager Sunnyvale, CA
- senior marketing manager Sunnyvale, CA
- senior contracts analyst Sunnyvale, CA
- sr operations manager Sunnyvale, CA

