Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model
$181.1k - $318.4kApple Oakbrook
Santa Clara, California, United States Machine Learning and AI Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other’s ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It’s the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you’ll do more than join something — you’ll add something! Description As a Senior/Staff Engineer on the Foundation Model Compute Infrastructure team, you will lead the design and development of scheduling and orchestration systems for large‑scale TPU workloads across multi‑region clusters. You will work on distributed systems that manage thousands of accelerators and enable reliable, efficient execution of large‑scale training and inference jobs. This role spans scheduling algorithms, cluster lifecycle management, workload orchestration, reliability engineering, and performance optimization. Responsibilities Design and evolve large-scale scheduling systems for TPU‑based training and inference workloads across multi‑region clusters Build topology‑aware, quota‑aware, and fault‑tolerant schedulers to improve utilization, fairness, startup latency, and reliability Develop orchestration systems for distributed ML workloads running on Kubernetes and accelerator infrastructure Improve cluster efficiency and operational scalability through automation of provisioning, resource management, quota workflows, and recovery handling Collaborate closely with foundation model teams to support advanced distributed training and inference frameworks such as Pathways, Ray, and JAX‑based workloads Mentor engineers and influence architectural direction across Apple’s distributed AI compute platform Minimum Qualifications 7+ years of industry experience building large‑scale distributed systems or cloud infrastructure Strong programming skills in Python, Go, C++, or similar systems languages Extensive experience with compute infrastructure and workload scheduling Strong expertise in distributed systems, scalability, reliability, and performance engineering Experience with Kubernetes, container orchestration, or large‑scale cluster management systems Experience designing backend services or infrastructure platforms operating at production scale Strong communication and collaboration skills across engineering and research teams Bachelor’s degree in Computer Science, Engineering, or related field Preferred Qualifications Experience building schedulers, resource managers, or orchestration systems for distributed workloads Experience with accelerator infrastructure such as TPU, GPU Experience with distributed ML training or inference systems Familiarity with frameworks such as JAX, PyTorch, TensorFlow, Ray, Pathways Experience operating large‑scale multi‑tenant infrastructure in cloud or hybrid environments Background in performance optimization, fault tolerance, or resource efficiency for large distributed systems MS or PhD in Computer Science, Engineering, or related field Compensation and Benefits At Apple, base pay is one part of our total compensation package and is determined within a range. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. Other benefits include comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and reimbursement for certain educational expenses—including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or other legally protected characteristics. #J-18808-Ljbffr Apple
- Apple Inc. is seeking a Senior/Staff Engineer in Santa Clara, California, to lead the design of scheduling systems for TPU workloads. The ideal candidate will have over 7 years of... ...orchestration systems for distributed ML workloads and mentoring engineers. Benefits...Senior
$181.1k - $318.4k
...multimodal models that power Siri... ...modeling engineers train models... ...underneath foundation model training... ...pipelines for ML models where... ...(GCP TPU, AWS GPU clusters... ..., debugging schedulers, working around... ...with infrastructure-as-code, CLI... ...Bachelors degree in Computer Science or...SeniorFoundationRelocation$181.1k - $318.4k
ML Compute Efficiency Automation Engineer, Infrastructure & Planning Cupertino, California, United States... ...AIML organization. Foundation models are central to Apple's... ..., to shaping workload scheduling and capacity allocation... ..., GPU and TPU utilization, training...FoundationRelocation- ...) efforts. We provide an infrastructure platform for teams developing... ...(SOTA) machine learning models with an emphasis on... ...prioritizing high‑impact, ML‑centric use cases. About... ...Senior ML Infrastructure Engineer to build and scale robust compute platforms for simulation...SeniorLocal areaWork from home
$195k - $298k
...Center - Cole Engineering Center Podium... ...the Team The ML Compute Platform is part... ...organization within Infrastructure Platforms. Our... ...learning models with a focus... ...are seeking a Staff ML Engineer to... ...can discover, schedule, and debug jobs... ...Experience with GPU/TPU optimizations....SuggestedLocal areaRelocation packageFlexible hours- ...proud to serve as the infrastructure platform for teams... ...SOTA) machine learning models, with a focus on performance... ...high-impact, ML-centric use cases. About... ...We are seeking a Staff MLInfrastructure engineer to help build and scale robust Compute platforms for Simulation...
$138k - $198k
Google Inc. seeks an ASIC Design Verification Engineer in Sunnyvale, CA, to shape future AI/ML hardware acceleration. You will drive TPU technology for cutting-edge applications and be responsible for verifying complex digital designs. Ideal candidates hold a Bachelor'...Full time$116k - $166k
...Software Integration Engineer, TPU Cloud corporate_fare... ...Electrical Engineering, Computer Engineering, Computer... ...the future of AI/ML hardware acceleration... ...meet chip and system schedules and achieve readiness... ...environments. The AI and Infrastructure team is redefining...Worldwide$147k - $211k
...developing large‑scale infrastructure, distributed systems or... ...networks, or experience with compute technologies, storage... ...job Google's software engineers develop the next-... ...Tensor Processing Units (TPU) are Google’s custom-built... ...machine learning (ML) workloads. TPU are designed...Full time- ...leading technology company in Sunnyvale is seeking a CVML Engineer to drive innovation in 3D and 4D computer vision and health applications. The role requires a... ...be proficient in Python and PyTorch, with a strong foundation in linear algebra and optimization principles....Foundation
$138k - $198k
...Qualifications Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science, or a... ..., you’ll work to shape the future of AI/ML hardware acceleration. You will have an opportunity to drive cutting‑edge TPU (Tensor Processing Unit) technology that...Worldwide$152k - $241.5k
NVIDIA Gruppe is seeking an experienced engineer to join the Scheduling team to design and enhance GPU compute clusters for AI/ML workloads. Candidates should have a Bachelor's degree in Computer Science and 5+ years of relevant experience in system programming and batch...Senior- ...creating the digital infrastructure needed to bring... ...employees to manage their schedules responsibly. This... ...pretraining world-action foundation model with various world... ...Scientists and Engineers on high quality research... ...machine learning, computer vision and graphics,...FoundationFor contractorsFor subcontractorCasual workInternshipWork at officeImmediate startRemote workDay shift
$189.3k - $290.7k
...world scenarios. As a Staff ML Infra Engineer, you will drive the... ...advanced Autonomous Driving models. From enabling large foundation driving models to... ...systems on modern cloud infrastructure‑performance. End‑to‑... ...potential. BS, MS, or PhD in Computer Science, Mathematics,...FoundationLocal areaRemote workRelocationRelocation packageFlexible hours- ...machine‑learning infrastructure that enables rapid... ...perception models. Own the integration... ...CD pipelines for ML systems, including... ...Partner with ML engineers, researchers, and... ...ML and scientific‑computing libraries such as... ...including GPUs, cluster scheduling, and performance...Local areaRemote workRelocation packageFlexible hours
$181.1k - $318.4k
...add something! We are a team of computer vision and machine learning (CVML) engineers building real-time 3D perception... ...engineer with both 3D geometry and ML background, optimally with experience... ...industry experience Strong foundation in computer vision in one or multiple...SeniorFoundationRelocation$224k - $356.5k
...for a senior or principal engineer who specializes in building cutting‑edge infrastructure for large‑scale foundation model training in the... ...see Bachelor's degree in Computer Science, Robotics, Engineering... ...HPC environments, and job scheduling/orchestration tools (e.g....SeniorFoundationFull time$147.4k - $272.1k
...Computer Vision & Machine Learning Engineer Apple is where individual imaginations gather together, committing... ..., facial analysis, and behavioral modeling. You will build motion synthesis... ...research ideas Strong mathematical foundations in machine learning, computer...FoundationRelocation$150k
...Machine Learning Engineer About the Institute of Foundation Models: We are a dedicated research lab for building... ...global hub for high-performance computing in deep learning, driving... ...Machine Learning Engineer focused on ML infrastructure and MLOps to design and operate...FoundationVisa sponsorship- A major automotive company in Sunnyvale, California is seeking a Staff ML Infrastructure engineer to enhance AI validation efforts. This role focuses on building and scaling compute platforms for simulation and data workflows. Ideal candidates should have extensive experience...
- Google Inc. is seeking a Staff Software Engineer in Machine Learning Compiler for its Mountain... ...developing open-source compiler infrastructure and High-Performance Computing using LLVM and MLIR frameworks for the Tensor Processing Unit (TPU) family. The successful candidate...
$163k - $237k
...work to shape the future of AI/ML hardware acceleration. You... ...opportunity to drive cutting‑edge TPU (Tensor Processing Unit)... ...accelerators that drive the computational workloads behind Google's most... ...Bachelor's degree in Electrical Engineering, Computer Engineering,...SeniorWorldwide$184k - $287.5k
...implementations of Python APIs for numerical computing. In the last decade, Python has become... ...of runtime systems that underlay the foundation of multi‑GPU computing at NVIDIA What... ...Science, Applied Math, Electrical Engineering or related field (or equivalent experience...SeniorFoundation$180k - $300k
MixMode is seeking a Principal Software ML Test Engineer to lead testing for the d-Matrix AI compute engine in Santa Clara, California. This role involves overseeing test planning, automation, and execution, while collaborating closely with software development teams....Senior$272k - $431.25k
...a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA... ...join our Hardware Infrastructure team. As an Engineer... ...similar background in Computer Science or related... ...Lustre, GPFS, BeeGFS), scheduling & orchestration (e.g... ...data processing, model training, and inference...- A leading technology firm located in Sunnyvale, California, is seeking a Machine Learning Engineer to innovate machine learning algorithms for computational photography and computer vision. The role involves collaborating with cross-functional teams, designing algorithms...
$152k - $241.5k
...Intelligence, High-Performance Computing, and Visualization.... ...now looking for a ML Platform Engineer to help accelerate... ...high-performance ML infrastructure using modern... ...the most advanced ML models on some of the world... ...orchestration, resource scheduling, and platform...Senior- ...Sunnyvale, CA, is looking for a Senior Staff Software Engineer focused on TPU Performance. This role is crucial... ...benchmarks and optimizing ML performance for Google's Cloud services... ...experience in software development and ML infrastructure, with proficiency in Python and C++...Senior
$128.7k - $261.3k
...enables repeatable, high‑velocity model deployments through... ...developers, deployment, and infrastructure engineers to ship numerically robust,... ...Qualifications Bachelor’s degree in Computer Science, Electrical... ..., Mathematics, Data Science/ML, or a closely related quantitative...SeniorLocal areaRemote workFlexible hours- ...production‑grade ML systems with end‑... ...ownership of the model lifecycle from conception... ...to shape the foundations of the AI stack,... ...of the ML infrastructure and processes for... ...Master’s degree in Computer Science, Machine... ...experience in ML engineering. Strong programming...FoundationFull time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model. Be the first to apply!
- engineering aide Santa Clara, CA
- senior staff systems engineer Santa Clara, CA
- staff engineer Santa Clara, CA
- technology administrator Santa Clara, CA
- senior staff engineer Santa Clara, CA
- assistant engineer Santa Clara, CA
- software engineer staff Santa Clara, CA
- machine learning ai engineer Santa Clara, CA
- ai ml engineer Santa Clara, CA
- senior ml engineer Santa Clara, CA


