Staff TPM: Cluster Orchestration & AI Training
CoreWeave
CoreWeave is looking for a Staff Technical Program Manager to steer complex and cross-functional programs focusing on AI/ML Platform Services. The candidate will coordinate between teams to improve workload interaction and manage training platforms, ensuring operational excellence and fostering innovation. Applicants should have a solid technical background, including expertise in scheduling systems like Kubernetes and hands-on experience in AI training workflows. This role requires strong program management skills coupled with excellent communication abilities. #J-18808-Ljbffr CoreWeave
- General Motors is seeking a Staff Technical Program Manager to lead their autonomous driving... ...infrastructure, ensuring efficient model training and operational reliability. The ideal... ...a strong background in ML operations and AI infrastructure. You will work collaboratively...Training
- Responsibilities CoreWeave is seeking a Staff Technical Program Manager to lead... ...cross-functional programs across Cluster Orchestration and Applied Training within our AI/ML Platform Services organization... ...cross‑functional role for a TPM who combines strong technical...TrainingPermanent employment
$193k - $234k
...intelligence . As the only vertically integrated AI infrastructure company built from the... ...’s Cloud Product team is seeking a Staff Technical Program Manager to bridge... .../OS bundles and automating component orchestration. As a Staff TPM, you are the connective tissue between...SuggestedTemporary work$159.4k - $245k
A leading automotive manufacturer is seeking a Staff Technical Program Manager for Embodied AI to oversee model development from research to production. This role involves cross-functional leadership, driving execution of AI initiatives, and ensuring performance and safety...Suggested$163.8k - $226.22k
42dot Inc. is seeking a Sr. Staff Technical Project Manager to lead complex projects for software-defined vehicles. This role involves cross-functional collaboration, ensuring technical milestones, and managing vendor relationships. The ideal candidate has over 6 years...Suggested$272k - $425.5k
...leader to manage our Server Software Technical Program Management (TPM) team. This role is at the cross-section of execution and... ...GPUs, NVLink, InfiniBand networking, Grace CPUs, and our optimized AI/HPC software stack. This deep technical leadership role focused on...- Crusoe in Sunnyvale, California is searching for a Technical Program Manager to drive the Managed Inference platform for AI workloads. The role focuses on program delivery and team alignment across various functions while managing the product lifecycle. The ideal candidate...
- ...Acer is expanding its portfolio of high-performance computing products to meet the growing demand for AI development, training, and enterprise deployment. As the Business Manager for AI Workstations & Servers, you will own the commercial success, channel strategy, and...Training
$152.91k - $172.02k
...Staff User Experience Researcher, Foundations and Enablement Mountain View, CA / Remote (... ...an expert who can lead our transition into AI‑enhanced research operations—acting as the... ...both qual and quant) by providing playbooks, training, and mentorship necessary to maintain...TrainingImmediate startRemote workWorldwideMonday to FridayShift workWeekend work- Rhoda AI in Mountain View is seeking a Research Engineer to build and maintain a training platform that powers model development. The successful candidate will develop tooling... ...training processes across large-scale GPU clusters. This role offers high visibility, direct...Training
- Crusoe Energy Systems is seeking a Staff Technical Program Manager in Sunnyvale, California, to drive technical projects in AI infrastructure. The ideal candidate will manage technical programs across various teams, proactively identifying risks and aligning product and...
- ...Role In this role, you will be the security czar for Cerebras’s AI cluster product. Such AI clusters have hundreds of wafer‑scale... ...deployment to the suite of software that enables multi‑tenant training and inference services on these extensive clusters. Your role...Training
$262k - $365k
...experience working on Artificial Intelligence/Machine Learning (AI/ML) recommendations. 8 years of industry experience in the recommendations... ..., rich user model generation, and knowledge distillation for training more compact models. This role specializes in state-of-the-art...Training$185k - $275k
...is The Essential Cloud for AI™. Built for pioneers by pioneers... ...the role As part of the Cluster Orchestration team, you will play a key role... ...foundation that powers AI training and inference at scale. This... ...AI. What You'll Do As a Staff Engineer, you will be a technical...TrainingPermanent employmentTemporary workCasual workWork at officeFlexible hours$262k - $365k
...including job-related skills, experience, and relevant education or training. US: $262000 - $365000 (USD) + 25% bonus target + equity +... ...Lead the charge in defining and developing the next generation of AI-powered shopping features within YouTube Ads. This is a unique...Training$262k - $365k
Senior Staff Technical Lead, Google Ads Recommendations corporate_fare Google place Mountain... ...: Experience building and scaling AI-driven product features using Large Language... ...skills, experience, and relevant education or training. US: $262000 - $365000 (USD) + 25% bonus...TrainingShift work$139k - $204k
...CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave... .... About the role As part of the Cluster Orchestration team, you will play a key role in advancing... ...-native foundation that powers AI training and inference at scale. This is an opportunity...TrainingPermanent employmentTemporary workCasual workWork at officeFlexible hours- A leading AI infrastructure company is seeking a Member of Technical Staff to focus on foundation model architecture and AI systems engineering. You will drive architectural... ...ownership, product reliability, and scalable training while deploying AI solutions in industrial...TrainingFull time
- Databricks is seeking a Staff TPM to lead complex, cross-functional programs at their Mountain View location. This pivotal role involves defining program structures, aligning engineering and business teams, and ensuring successful product launches. Ideal candidates will...
- ...and developer‑facing products to support AI and ML innovation across teams such as Embodied... .... ML Compute: Streamlines large‑scale ML training and inference across cloud and on‑prem... ...infrastructures. Position Overview As a Staff AI/ML Full‑Stack Engineer, you will design...Training
$172.42k - $297.5k
...that supports the end‑to‑end AI lifecycle of ML pipelines - from... ...and large‑scale training to evaluation, lineage tracking... ...and ML initiatives. Role As a Staff AI/ML Full‑Stack Engineer, you... ...Experience with open‑source orchestration platforms such as Kubeflow, Flyte...TrainingLocal areaRemote workRelocationRelocation packageFlexible hours- ...innovative accelerated computing platforms for AI and HPC. Because of our work, scientists,... ...Senior Solutions Architect to join the Cluster Design and Architecture team with a focus... ...communication patterns in distributed training as it pertains to networking patterns and...Training
- Build and Deploy AI the right way, anywhere. The FlexAI Compute... ...users with "enterprise-grade orchestration, security, and automation"... ...— enabling large-scale model training, inference, and orchestration... ...with Kubernetes, Docker, and cluster orchestration Familiarity with...TrainingWork at office
$203.3k - $305.6k
Staff/Sr. Software Engineer, AI, Search & Knowledge Platforms Santa Clara, California, United States Machine Learning and AI Our team builds the... ...Apple Music to the App Store and beyond. We develop agentic orchestration frameworks, configuration management infrastructure,...WorldwideRelocation$272k - $431.25k
...We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware Infrastructure... ..., GPFS, BeeGFS), scheduling & orchestration (e.g., Slurm, Kubernetes, LSF),... ...substantial distributed training operations using PyTorch (DDP, FSDP...Training$224k - $356.5k
...tapping into the unlimited potential of AI to define the next era of computing. An era... ...with teams throughout the company on the cluster architecture, at-scale bringup, and... ...applications, including multi‑GPU and multi‑node training and inference workloads Expertise with...TrainingRemote work$224k - $356.5k
...team that’s revolutionizing the field of AI with data center scale solutions? We are... ...reference architectures and libraries through training and workshops; we help them develop... ...design, high‑speed interconnect InfiniBand, cluster storage and scheduling related design and...Training$184k - $287.5k
...optimizing the performance of world‑class AI, deep learning, and HPC ecosystems. Come... ...benchmarking suites to stress-test high-performance clusters and establish performance baselines.... ...experience optimizing distributed AI training workloads, LLMs, or large-scale high-...Training- ...development with a cohesive, end-to-end platform. This platform consists of three core pillars: NVIDIA DGX systems for massively parallel AI training in the data center; NVIDIA Omniverse on OVX systems for physically accurate simulation, synthetic data generation, and validation...Training
- ...HCL Domino Servers from version v9.0.1 to latest version 12.x. Train staff for Domino server administrative tasks and provide support... ...experience. Experience configuring multiple Domino servers with clustering and replication environment. Experience supporting and...TrainingPermanent employmentContract work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff TPM: Cluster Orchestration & AI Training. Be the first to apply!


