Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model

$181.1k - $318.4k

Apple Oakbrook

Santa Clara, California, United States Machine Learning and AI Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other’s ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It’s the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you’ll do more than join something — you’ll add something! Description As a Senior/Staff Engineer on the Foundation Model Compute Infrastructure team, you will lead the design and development of scheduling and orchestration systems for large‑scale TPU workloads across multi‑region clusters. You will work on distributed systems that manage thousands of accelerators and enable reliable, efficient execution of large‑scale training and inference jobs. This role spans scheduling algorithms, cluster lifecycle management, workload orchestration, reliability engineering, and performance optimization. Responsibilities Design and evolve large-scale scheduling systems for TPU‑based training and inference workloads across multi‑region clusters Build topology‑aware, quota‑aware, and fault‑tolerant schedulers to improve utilization, fairness, startup latency, and reliability Develop orchestration systems for distributed ML workloads running on Kubernetes and accelerator infrastructure Improve cluster efficiency and operational scalability through automation of provisioning, resource management, quota workflows, and recovery handling Collaborate closely with foundation model teams to support advanced distributed training and inference frameworks such as Pathways, Ray, and JAX‑based workloads Mentor engineers and influence architectural direction across Apple’s distributed AI compute platform Minimum Qualifications 7+ years of industry experience building large‑scale distributed systems or cloud infrastructure Strong programming skills in Python, Go, C++, or similar systems languages Extensive experience with compute infrastructure and workload scheduling Strong expertise in distributed systems, scalability, reliability, and performance engineering Experience with Kubernetes, container orchestration, or large‑scale cluster management systems Experience designing backend services or infrastructure platforms operating at production scale Strong communication and collaboration skills across engineering and research teams Bachelor’s degree in Computer Science, Engineering, or related field Preferred Qualifications Experience building schedulers, resource managers, or orchestration systems for distributed workloads Experience with accelerator infrastructure such as TPU, GPU Experience with distributed ML training or inference systems Familiarity with frameworks such as JAX, PyTorch, TensorFlow, Ray, Pathways Experience operating large‑scale multi‑tenant infrastructure in cloud or hybrid environments Background in performance optimization, fault tolerance, or resource efficiency for large distributed systems MS or PhD in Computer Science, Engineering, or related field Compensation and Benefits At Apple, base pay is one part of our total compensation package and is determined within a range. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. Other benefits include comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and reimbursement for certain educational expenses—including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or other legally protected characteristics. #J-18808-Ljbffr Apple

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model in Santa Clara, CA vacancy
  • Apple Inc. is seeking a Senior/Staff Engineer in Santa Clara, California, to lead the design of scheduling systems for TPU workloads. The ideal candidate will have over 7 years of...  ...orchestration systems for distributed ML workloads and mentoring engineers. Benefits... 
    Senior

    Apple

    Santa Clara, CA
    2 days ago
  • $181.1k - $318.4k

     ...multimodal models that power Siri...  ...modeling engineers train models...  ...underneath foundation model training...  ...pipelines for ML models where...  ...(GCP TPU, AWS GPU clusters...  ..., debugging schedulers, working around...  ...with infrastructure-as-code, CLI...  ...Bachelors degree in Computer Science or... 
    Senior
    Foundation
    Relocation

    Apple

    Cupertino, CA
    1 day ago
  • $181.1k - $318.4k

    ML Compute Efficiency Automation Engineer, Infrastructure & Planning Cupertino, California, United States...  ...AIML organization. Foundation models are central to Apple's...  ..., to shaping workload scheduling and capacity allocation...  ..., GPU and TPU utilization, training... 
    Foundation
    Relocation

    Apple Inc.

    Cupertino, CA
    1 day ago
  •  ...) efforts. We provide an infrastructure platform for teams developing...  ...(SOTA) machine learning models with an emphasis on...  ...prioritizing high‑impact, ML‑centric use cases. About...  ...Senior ML Infrastructure Engineer to build and scale robust compute platforms for simulation... 
    Senior
    Local area
    Work from home

    General Motors

    Sunnyvale, CA
    2 days ago
  • $195k - $298k

     ...Center - Cole Engineering Center Podium...  ...the Team The ML Compute Platform is part...  ...organization within Infrastructure Platforms. Our...  ...learning models with a focus...  ...are seeking a Staff ML Engineer to...  ...can discover, schedule, and debug jobs...  ...Experience with GPU/TPU optimizations.... 
    Suggested
    Local area
    Relocation package
    Flexible hours

    General Motors

    Mountain View, CA
    2 days ago
  •  ...proud to serve as the infrastructure platform for teams...  ...SOTA) machine learning models, with a focus on performance...  ...high-impact, ML-centric use cases. About...  ...We are seeking a Staff MLInfrastructure engineer to help build and scale robust Compute platforms for Simulation... 

    General Motors

    Sunnyvale, CA
    4 days ago
  • $138k - $198k

    Google Inc. seeks an ASIC Design Verification Engineer in Sunnyvale, CA, to shape future AI/ML hardware acceleration. You will drive TPU technology for cutting-edge applications and be responsible for verifying complex digital designs. Ideal candidates hold a Bachelor'... 
    Full time

    Google Inc.

    Sunnyvale, CA
    5 days ago
  • $116k - $166k

     ...Software Integration Engineer, TPU Cloud corporate_fare...  ...Electrical Engineering, Computer Engineering, Computer...  ...the future of AI/ML hardware acceleration...  ...meet chip and system schedules and achieve readiness...  ...environments. The AI and Infrastructure team is redefining... 
    Worldwide

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $147k - $211k

     ...developing large‑scale infrastructure, distributed systems or...  ...networks, or experience with compute technologies, storage...  ...job Google's software engineers develop the next-...  ...Tensor Processing Units (TPU) are Google’s custom-built...  ...machine learning (ML) workloads. TPU are designed... 
    Full time

    Google Inc.

    Sunnyvale, CA
    2 days ago
  •  ...leading technology company in Sunnyvale is seeking a CVML Engineer to drive innovation in 3D and 4D computer vision and health applications. The role requires a...  ...be proficient in Python and PyTorch, with a strong foundation in linear algebra and optimization principles.... 
    Foundation

    Apple

    Sunnyvale, CA
    2 days ago
  • $138k - $198k

     ...Qualifications Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science, or a...  ..., you’ll work to shape the future of AI/ML hardware acceleration. You will have an opportunity to drive cutting‑edge TPU (Tensor Processing Unit) technology that... 
    Worldwide

    Google Inc.

    Sunnyvale, CA
    5 days ago
  • $152k - $241.5k

    NVIDIA Gruppe is seeking an experienced engineer to join the Scheduling team to design and enhance GPU compute clusters for AI/ML workloads. Candidates should have a Bachelor's degree in Computer Science and 5+ years of relevant experience in system programming and batch... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  •  ...creating the digital infrastructure needed to bring...  ...employees to manage their schedules responsibly. This...  ...pretraining world-action foundation model with various world...  ...Scientists and Engineers on high quality research...  ...machine learning, computer vision and graphics,... 
    Foundation
    For contractors
    For subcontractor
    Casual work
    Internship
    Work at office
    Immediate start
    Remote work
    Day shift

    Applied Intuition

    Sunnyvale, CA
    3 days ago
  • $189.3k - $290.7k

     ...world scenarios. As a Staff ML Infra Engineer, you will drive the...  ...advanced Autonomous Driving models. From enabling large foundation driving models to...  ...systems on modern cloud infrastructure‑performance. End‑to‑...  ...potential. BS, MS, or PhD in Computer Science, Mathematics,... 
    Foundation
    Local area
    Remote work
    Relocation
    Relocation package
    Flexible hours

    Israelvcforum

    Sunnyvale, CA
    2 days ago
  •  ...machine‑learning infrastructure that enables rapid...  ...perception models. Own the integration...  ...CD pipelines for ML systems, including...  ...Partner with ML engineers, researchers, and...  ...ML and scientific‑computing libraries such as...  ...including GPUs, cluster scheduling, and performance... 
    Local area
    Remote work
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    2 days ago
  • $181.1k - $318.4k

     ...add something! We are a team of computer vision and machine learning (CVML) engineers building real-time 3D perception...  ...engineer with both 3D geometry and ML background, optimally with experience...  ...industry experience Strong foundation in computer vision in one or multiple... 
    Senior
    Foundation
    Relocation

    Apple

    Sunnyvale, CA
    3 days ago
  • $224k - $356.5k

     ...for a senior or principal engineer who specializes in building cutting‑edge infrastructure for large‑scale foundation model training in the...  ...see Bachelor's degree in Computer Science, Robotics, Engineering...  ...HPC environments, and job scheduling/orchestration tools (e.g.... 
    Senior
    Foundation
    Full time

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $147.4k - $272.1k

     ...Computer Vision & Machine Learning Engineer Apple is where individual imaginations gather together, committing...  ..., facial analysis, and behavioral modeling. You will build motion synthesis...  ...research ideas Strong mathematical foundations in machine learning, computer... 
    Foundation
    Relocation

    Apple

    Sunnyvale, CA
    5 days ago
  • $150k

     ...Machine Learning Engineer About the Institute of Foundation Models: We are a dedicated research lab for building...  ...global hub for high-performance computing in deep learning, driving...  ...Machine Learning Engineer focused on ML infrastructure and MLOps to design and operate... 
    Foundation
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    1 day ago
  • A major automotive company in Sunnyvale, California is seeking a Staff ML Infrastructure engineer to enhance AI validation efforts. This role focuses on building and scaling compute platforms for simulation and data workflows. Ideal candidates should have extensive experience... 

    General Motors

    Sunnyvale, CA
    4 days ago
  • Google Inc. is seeking a Staff Software Engineer in Machine Learning Compiler for its Mountain...  ...developing open-source compiler infrastructure and High-Performance Computing using LLVM and MLIR frameworks for the Tensor Processing Unit (TPU) family. The successful candidate... 

    Google Inc.

    Mountain View, CA
    2 days ago
  • $163k - $237k

     ...work to shape the future of AI/ML hardware acceleration. You...  ...opportunity to drive cutting‑edge TPU (Tensor Processing Unit)...  ...accelerators that drive the computational workloads behind Google's most...  ...Bachelor's degree in Electrical Engineering, Computer Engineering,... 
    Senior
    Worldwide

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $184k - $287.5k

     ...implementations of Python APIs for numerical computing. In the last decade, Python has become...  ...of runtime systems that underlay the foundation of multi‑GPU computing at NVIDIA What...  ...Science, Applied Math, Electrical Engineering or related field (or equivalent experience... 
    Senior
    Foundation

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $180k - $300k

    MixMode is seeking a Principal Software ML Test Engineer to lead testing for the d-Matrix AI compute engine in Santa Clara, California. This role involves overseeing test planning, automation, and execution, while collaborating closely with software development teams.... 
    Senior

    MixMode

    Santa Clara, CA
    1 day ago
  • $272k - $431.25k

     ...a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA...  ...join our Hardware Infrastructure team. As an Engineer...  ...similar background in Computer Science or related...  ...Lustre, GPFS, BeeGFS), scheduling & orchestration (e.g...  ...data processing, model training, and inference... 

    NVIDIA Gruppe

    Santa Clara, CA
    21 hours ago
  • A leading technology firm located in Sunnyvale, California, is seeking a Machine Learning Engineer to innovate machine learning algorithms for computational photography and computer vision. The role involves collaborating with cross-functional teams, designing algorithms... 

    Apple Inc.

    Sunnyvale, CA
    4 days ago
  • $152k - $241.5k

     ...Intelligence, High-Performance Computing, and Visualization....  ...now looking for a ML Platform Engineer to help accelerate...  ...high-performance ML infrastructure using modern...  ...the most advanced ML models on some of the world...  ...orchestration, resource scheduling, and platform... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  •  ...Sunnyvale, CA, is looking for a Senior Staff Software Engineer focused on TPU Performance. This role is crucial...  ...benchmarks and optimizing ML performance for Google's Cloud services...  ...experience in software development and ML infrastructure, with proficiency in Python and C++... 
    Senior

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $128.7k - $261.3k

     ...enables repeatable, high‑velocity model deployments through...  ...developers, deployment, and infrastructure engineers to ship numerically robust,...  ...Qualifications Bachelor’s degree in Computer Science, Electrical...  ..., Mathematics, Data Science/ML, or a closely related quantitative... 
    Senior
    Local area
    Remote work
    Flexible hours

    General Motors

    Sunnyvale, CA
    4 days ago
  •  ...production‑grade ML systems with end‑...  ...ownership of the model lifecycle from conception...  ...to shape the foundations of the AI stack,...  ...of the ML infrastructure and processes for...  ...Master’s degree in Computer Science, Machine...  ...experience in ML engineering. Strong programming... 
    Foundation
    Full time

    Catalyst Labs, LLC

    Sunnyvale, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model. Be the first to apply!