Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Infrastructure

$181.1k - $318.4k

Apple Oakbrook

AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Infrastructure San Francisco Bay Area, California, United States Machine Learning and AI Description As an engineer on ML Compute team, your work will include: Drive large‑scale pre‑training initiatives to support cutting‑edge foundation models, focusing on resiliency, efficiency, scalability, and resource optimization. Enhance distributed training techniques for foundation models. Research and implement new patterns and technologies to improve system performance, maintainability, and design. Optimize execution and performance of workloads built with JAX, PyTorch, XLA and CUDA on large distributed systems. Leverage high‑performance networking technologies such as NCCL for GPU collectives and TPU interconnect (ICI/Fabric) for large‑scale distributed training. Architect a robust MLOps platform to streamline and automate pre‑training operations. Operationalize large‑scale ML workloads on Kubernetes, ensuring distributed trainings are robust, efficient, and fault‑tolerant. Lead complex technical projects, defining requirements and tracking progress with team members. Collaborate with cross‑functional engineers to solve large‑scale ML training challenges. Mentor engineers in areas of your expertise, fostering skill growth and knowledge sharing. Cultivate a team centered on collaboration, technical excellence, and innovation. Minimum Qualifications Bachelors in Computer Science, engineering, or a related field 6+ years of hands‑on experience in building scalable backend systems for training and evaluation of machine learning models Proficient in relevant programming languages, like Python or Go Strong expertise in distributed systems, reliability and scalability, containerization, and cloud platforms Proficient in cloud computing infrastructure and tools: Kubernetes, Ray, PySpark Ability to clearly and concisely communicate technical and architectural problems, while working with partners to iteratively find Preferred Qualifications Advance degrees in Computer Science, engineering, or a related field Proficient in working with and debugging accelerators, like: GPU, TPU, AWS Trainium Proficient in ML training and deployment frameworks, like: JAX, Tensorflow, PyTorch, TensorRT, vLLM At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. At Apple, we believe accessibility is a fundamental human right. You’ll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong. Apple accepts applications to this posting on an ongoing basis. #J-18808-Ljbffr

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Infrastructure in San Francisco, CA vacancy
  •  ...A technology company in San Francisco is seeking an experienced ML Infrastructure Engineer to develop platforms for machine learning jobs and to lead cross-functional initiatives. The ideal candidate will have experience with continuous integration and deployment models... 
    Suggested

    Delphina

    San Francisco, CA
    9 hours ago
  •  ...Accelerated AI Server Engineer Sygaldry Technologies is building quantum-...  ...speed up training and inference for AI...  .... They need compute infrastructure that stays out of their...  ...manage the compute platform this team runs on. The...  ...training) Python-based ML and scientific... 
    Training
    Casual work
    Local area
    Visa sponsorship

    Sygaldry

    San Francisco, CA
    3 days ago
  •  ...advanced hardware engineering and AI solutions....  ...cutting-edge technologies that restore autonomy...  ...Machine Learning Infrastructure Engineer to join...  ...modeling, and analysis platforms, playing a...  ...production-grade ML ecosystem to support...  ...distributed training pipelines, with support... 
    Training
    Flexible hours

    Echo Neurotechnologies

    San Francisco, CA
    1 day ago
  •  ...The AI Infrastructure team at Zensors builds the engine that powers our visual sensing platform. We provide the tools to automate the lifecycle...  ...Learning Engineer in ML Runtime & Optimization , you will develop technologies to accelerate the training and inference of... 
    Training

    Zensors

    San Francisco, CA
    8 hours ago
  • $216k - $270k

     ...As a Software Engineer on the Machine Learning Infrastructure team, you will build the "Operating...  ...a high-performance training platform that handles the immense...  ...and integrate emerging technologies in the CNCF and AI ecosystem...  ...on orchestrating ML workloads at scale (100... 
    Training
    Full time

    Scale AI

    San Francisco, CA
    7 days ago
  • Principal Engineer, AI Platform & Infrastructure About the Role SPREEAI is building the future...  .... This role spans ML platform engineering, deployment...  ...You'll Own ML Platform & Training Enablement Build and...  ...lifelike photorealistic try‑on technology and hyper‑personalized... 
    Training

    SpreeAI

    San Francisco, CA
    1 day ago
  • $216k - $270k

     ...As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving...  ..., and relevant education or training. Scale employees in eligible...  ...quality data and full-stack technologies that power the world's... 
    Training
    Full time

    Scale AI

    San Francisco, CA
    7 days ago
  • $212k - $318.4k

     ...A leading technology company in San Francisco is seeking a Software Engineer to join its Applied Machine Learning team. This role focuses on designing and building a robust ML platform and infrastructure to support enterprise-level initiatives. Candidates should have at... 

    Apple

    San Francisco, CA
    1 day ago
  •  ...read on. What You'll Do Training Automation: Design and...  ...requirement. Evaluation Infrastructure: Build scalable evaluation...  ...degree in Computer Science, Engineering, or equivalent practical experience...  ...Engineering, MLOps, or ML Infrastructure ~ Strong Python... 
    Training
    Immediate start
    Relocation package
    Night shift

    AGI

    San Francisco, CA
    4 days ago
  • $180k - $250k

     ...AI is building the pre-model intelligence...  ...the context engine layer that solves...  ...come from better infrastructure around models: Better...  ...PhD in Robotics and ML. Clark Zhang, CTO...  ...years of Enterprise Technology experience. About...  ..., such as training pipelines, inference... 
    Training
    Full time

    Graphon.AI

    San Francisco, CA
    1 day ago
  •  .... Our AI-powered platform was purpose-built...  ...enterprise-grade technology transforms patient...  ...and engineers working together...  ...The Role As an ML Infrastructure Engineer Model Inference...  ...model inference and training Develop optimize...  ...usage. ~ Pre-tax Benefits: Access... 
    Training
    Hourly pay
    Full time
    Flexible hours

    Abridge

    San Francisco, CA
    10 days ago
  •  ...A progressive technology company in San Francisco is looking for a Data Infrastructure Engineer to design and operate data and ML infrastructure on AWS. The ideal candidate will have strong software engineering fundamentals and experience building production systems, particularly... 

    Matter Intelligence

    San Francisco, CA
    1 day ago
  • $250k - $350k

     ...This one builds what makes them actually work. We’re hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what’s...  ...video pipelines handling millions of hours of data Training and inference systems for multimodal / LLM-based models GPU... 
    Training

    Trades Workforce Solutions

    San Francisco, CA
    9 hours ago
  • $200k - $280k

     ...Engineering San Francisco Full-time $200,000 - $280,000 About the Role Join our ML Infrastructure team to build the systems that train, deploy, and serve our AI models at scale. You'll work at the intersection of machine learning and systems engineering. What You Will... 
    Training
    Full time
    Work at office

    Lattice

    San Francisco, CA
    9 hours ago
  •  ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our...  ...models better. Responsibilities Optimize distributed training & RL across our GPU cluster of hundreds of H100 GPUs (FSDP... 
    Training

    Spectral Labs

    San Francisco, CA
    3 days ago
  •  ...A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional... 

    Abridge

    San Francisco, CA
    1 day ago
  • $320k - $405k

     ...committed researchers, engineers, policy experts,...  ...Machine Learning Infrastructure Engineer to join...  ...safety, developing the platforms and tools that...  ...and implement ML infrastructure that...  ...combination of education, training, and/or experience...  ..., we expect all staff to be in one of... 
    Training
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    2 days ago
  •  ...We are seeking a Data Infrastructure Engineer to build and operate the infrastructure...  ...standards that keep the platform trustworthy as data volume,...  ...and operate scalable data and ML infrastructure on AWS,...  ...to support perception model training and evaluation workflows, enabling... 
    Training
    Permanent employment
    Full time

    Matter Intelligence

    San Francisco, CA
    2 days ago
  • $300k - $430k

     ...conversational AI platform empowering every brand...  ...experiences. Our technology enables industry-defining...  ...About the Team The ML Infrastructure team builds the...  ...for model training, the infrastructure...  ...Role We're hiring a Staff ML Infrastructure Engineer to own the platforms... 
    Training
    Work at office

    Decagon

    San Francisco, CA
    9 hours ago
  •  ...on scaling and optimizing ML training systems. Key responsibilities...  ...owning the training infrastructure, improving performance, and...  ...will have strong software engineering foundations, hands-on experience...  ...and familiarity with cloud platforms. This position provides a unique... 
    Training

    Physical Intelligence

    San Francisco, CA
    4 days ago
  •  ...research organization in San Francisco seeks an Infrastructure Engineer to design and maintain large distributed ML training and inference clusters. The ideal candidate...  ...FSDP and DeepSpeed. Proficiency in cloud platforms and containerization is essential. Join us to... 
    Training

    Causal Labs

    San Francisco, CA
    3 days ago
  • $200.2k - $357.5k

     ...Cloud, which is a platform that enables...  ...are the infrastructure of our planet,...  ...We’re hiring a Staff / Senior Staff...  ...Infrastructure Engineer to lead the design...  ...end-to-end ML platform powering...  ...and mature technologies driving scalable...  ...ML platform (training, experimentation... 
    Training
    Work at office
    Remote work
    Flexible hours

    Samsara

    San Francisco, CA
    1 day ago
  •  ...Ltd. is seeking a Machine Learning Engineer with expertise in high-performance...  ...systems to manage and optimize their infrastructure for ML model training and deployment. The ideal candidate...  ...Science and experience with cloud platforms like AWS and Google Cloud. Responsibilities... 
    Training
    Full time

    Ipro Networks Pte. Ltd.

    San Francisco, CA
    8 hours ago
  • $292k - $417.2k

     ...Director, ML Engineering & Infrastructure San Francisco, CA; Los Angeles, CA;...  ...feature engineering, model training, evaluation, and serving....  ...latency services, streaming platforms, and large-scale serving....  ...) and ML infrastructure technologies. ~ Track record of... 
    Training
    Full time
    Temporary work
    Local area
    Remote work
    Flexible hours

    Tubi

    San Francisco, CA
    7 days ago
  • $212k - $318.4k

     ...Software Engineer, ML platform and Infrastructure San Francisco Bay Area, California, United States Software...  ...integration of cutting‑edge open‑source technologies and building innovative internal...  ...for data processing and model training/fine‑tuning workflows. Hands‑on experience... 
    Training
    Relocation

    Apple

    San Francisco, CA
    9 hours ago
  •  ...Workshop Labs Job Posting Build the infrastructure to serve personal AI models privately and...  ...ever seeing your data. Our core ML systems challenge: how do we serve the...  ...SNP, or related confidential computing technologies. We encourage speculative applications... 
    Remote work
    Shift work

    Workshop Labs

    San Francisco, CA
    4 days ago
  • $230.77k - $323.08k

     ...seeking a Principal Software Engineer to join our Vehicle Platforms team for our...  ...the foundational platform infrastructure-the "nervous system"-that...  ...transportation/shipping training. Required for certain...  ...Credential, which includes pre-employment and random drug... 
    Training
    Permanent employment
    Temporary work
    Local area

    Blue Origin

    San Francisco, CA
    4 days ago
  •  ...Machine Learning Engineer, Training Infrastructure Job Title: Machine Learning Engineer...  ...: We are looking for an ML Engineer with 3+ YOE in high...  ...Science, Information Technology, or a related field, with...  ...Experience with cloud computing platforms such as Amazon Web... 
    Training
    Full time
    Work experience placement

    Ipro Networks Pte. Ltd.

    San Francisco, CA
    1 day ago
  • $200k - $300k

     ...for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-leverage...  ...work closely with our ML, data, and product teams...  ...workloads – whether training pipelines, evaluation systems...  ...with cutting edge AI technology, on a product that dramatically... 
    Training
    Work at office
    3 days per week

    Ambience Healthcare

    San Francisco, CA
    9 hours ago
  • A leading livestream shopping platform is looking for an AI/ML Platform Engineer to shape the future of AI and ML systems. This role involves designing the infrastructure that powers machine learning applications, working alongside experts to deploy models at scale. Candidates... 
    Remote work
    Flexible hours

    Whatnot

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Infrastructure. Be the first to apply!