Machine Learning Infrastructure Engineer
$150kInstitute of Foundation Models
About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy. As part of our team, you’ll have the opportunity to work on the core of cutting‑edge foundation model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers. The Role We're looking for a distributed ML infrastructure engineer to help extend and scale our training systems. You’ll work side‑by‑side with world‑class researchers and engineers to: Extend distributed training frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod) Implement distributed optimizers from mathematical specs Build robust config + launch systems across multi‑node, multi‑GPU clusters Own experiment tracking, metrics logging, and job monitoring for external visibility Improve training system reliability, maintainability, and performance While much of the work will support large‑scale pre‑training, pre‑training experience is not required. Strong infrastructure and systems experience is what we value most. Key Responsibilities Distributed Framework Ownership – Extend or modify training frameworks (e.g., DeepSpeed, FSDP) to support new use cases and architectures. Optimizer Implementation – Translate mathematical optimizer specs into distributed implementations. Launch Config & Debugging – Create and debug multi‑node launch scripts with flexible batch sizes, parallelism strategies, and hardware targets. Metrics & Monitoring – Build systems for experiment tracking, job monitoring, and logging usable by collaborators and researchers. Infra Engineering – Write production‑quality code and tests for ML infra in PyTorch or JAX; ensure reliability and maintainability at scale. Qualifications Must-Haves: 5+ years of experience in ML systems, infra, or distributed training Experience modifying distributed ML frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod) Strong software engineering fundamentals (Python, systems design, testing) Proven multi‑node experience (e.g., Slurm, Kubernetes, Ray) and debugging skills (e.g., NCCL/GLOO) Ability to implement algorithms across GPUs/nodes based on mathematical specs Experience working on an ML platform/ infrastructure, and/or distributed inference optimization team Experience with large‑scale machine learning workloads (strong ML fundamentals) Nice-to-Haves: Exposure to mixed‑precision training (e.g., bf16, fp8) with accuracy validation Familiarity with performance profiling, kernel fusion, or memory optimization Open‑source contributions or published research (MLSys, ICML, NeurIPS) CUDA or Triton kernel experience Experience with large‑scale pre‑training Experience building custom training pipelines at scale and modifying them for custom needs Deep familiarity with training infrastructure and performance tuning $150,000 - $450,000 a year Benefits Comprehensive medical, dental, and vision 401(k) program Generous PTO, sick leave, and holidays Paid parental leave and family‑friendly benefits On‑site amenities and perks: Complimentary lunch, gym access, and a short walk to the Sunnyvale Caltrain station #J-18808-Ljbffr Institute of Foundation Models
$92k - $138k
...Mountain View, CA, USA Machine Learning Engineer, Offline Infrastructure (Entry-Level / New Grad) Location Mountain View, CA, USA Department AI & Machine Learning Requisition ID JOBREQ-2616004 Role description The opportunity Unity Vector builds...SuggestedWork at officeWorldwideRelocation package$160k - $200k
...fast-growing teams. As a Senior ML Infrastructure Engineer at Plus, you will design scalable architectures... ...integrated with state-of-the-art deep learning frameworks like PyTorch or TensorFlow.... ...the boundaries of what's possible in machine learning infrastructure and contribute...Suggested- ...Regional Manager, Sales Engineering - Public Sector As a Regional Manager, Sales Engineering, you will lead a team of Sales Engineers and frontline leaders, driving technical execution, operational excellence, and team development across your region. You’ll act as a force...SuggestedInternship
$183.7k - $248.6k
...The opportunity Unity is looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where we build the real-time systems that power Unity's global advertising platform. This is a high-scale, low-latency environment — processing billions...SuggestedWork at officeRemote workWorldwideRelocation package- ...Overview As our Senior Staff Software Engineer, ML infra Engineer for Search &... ...training pipelines * Develop and scale data infrastructure that powers batch and real-time data... ...of professional experience in applied machine learning * Experience in machine learning,...SuggestedFull timeTemporary workFlexible hours
- ...Intuit is seeking a highly motivated and experienced Principal Machine Learning Engineer to join our Mid Market AI team. In this influential role, you will lead the design, development, and deployment of end-to-end AI/ML solutions that power the next generation of intelligent...
$197k - $266.5k
...Overview Come join Intuit as a Staff Machine Learning Engineer! In this role, you’ll be embedded inside a vibrant team of data scientists. You’ll be expected to help conceive, code, and deploy data science models at scale using the latest industry tools. Important...Work experience placementShift work$212k - $318.4k
...Santa Clara, California, United States Machine Learning and AI Are you interested in enhancing... ..., including applied machine learning engineers with a focus on ML and LLM, and experienced... ...with diverse teams, including infrastructure, quality, data, product, and design, to...Work experience placementRelocation- ...View, CA (any hybrid work will be at the manager’s discretion). W2 Candidates only Position Summary Seeking an experienced Machine Learning Engineer to lead the development of prompt injection and prompt safety models that protect downstream agentic AI systems across...
- ...Microsegmentation, Illumio enables Zero Trust, strengthening cyber resilience for the infrastructure, systems, and organizations that keep the world running. Our Team's Vision: Our Engineering team is shaping the future of cybersecurity. We thrive on visionary leadership,...Immediate start
- ...agents that reason, act, and continuously improve. As a Machine Learning Engineer , you won't just build models, you'll architect the entire... ...thrives in ambiguity and wants to shape foundational AI infrastructure from the ground up. You'll work at the intersection of LLMs...
- ...world’s fastest-growing companies automate, simplify, and accelerate revenue. We are looking for a curious and innovative Machine Learning Engineer to explore, experiment and build AI driven solutions that analyze customer journey and go to market data. The ideal candidate...Full timeWork at officeFlexible hours
$160k - $225k
...will be used to expand our product and engineering teams, bringing our vision of... ...software has a clear playbook, building the infrastructure for autonomous, intelligent agents is... ...'re writing the manual. As an early Machine Learning Engineer at MAI, you won't just be writing...- Machine Learning Engineer, GAI Search Platform - Moveworks Job Description What You Will Do We are looking for Machine learning engineers to... ...platform team works closely with the ranking, product, design, infrastructure and data science teams to drive our agentic search...
- ...Technical Hiring Criteria (Must Haves) • Top 3 Required skills: Machine Learning, Gen AI, Python • Years of experience in each of the must-... ...use cases. Create, test, and refine prompts (prompt engineering) including system instructions, chains, and templates to...Hourly payPermanent employmentWork at officeRemote work3 days per week
$171k - $247k
...accessible for all. We are seeking a ML Engineering TL to join the Behavior Planning Team... ...large-scale models trained with Imitation Learning and Reinforcement Learning that enable... ...Qualifications ~ MS or PhD in Robotics, Machine Learning, Computer Science, or a related...Work at officeLocal area3 days per week- ...About the job Machine Learning Engineer Glint Tech Solutions is Hiring an experienced Machine Learning Engineer to join our client's high-performing team, working on cutting-edge ML infrastructure and scalable cloud-based solutions. What You'll Do: Design...
$147.4k - $272.1k
...Machine Learning Compiler Engineer At Apple, we're on the cutting edge of delivering transformative experiences through Artificial Intelligence. If you're passionate about pushing the boundaries of AI and hardware optimization, we want you to join our team! As a Machine...Relocation$213k - $263k
...Machine Learning Engineer, Simulation Realism Waymo is an autonomous driving technology company with the mission to be the world's most... ...for long term credit assignment. We also invest in capable infrastructure that allows us to quickly set up and roll out multiple counterfactual...Full timeRemote work- ...delivered for millions of patients worldwide. We're a team of engineers, clinicians, and innovators united by one purpose: to make... ...of the lung, to take a biopsy at a target location. As a machine learning engineer on the Ion project, you will join a small team of experts...Local areaWorldwideFlexible hours
$195k - $230k
...with the right team to fulfill our mission: building the infrastructure layer for content intelligence. If you're inspired to... ...visit About the Role We are looking for a Senior Machine Learning Engineer to help evolve our large-scale recommendation systems...Full timeLocal areaWork from home- As part of our Silicon Technologies group, you’ll help design and manufacture our next-generation, high-performance, power-efficient processor, system-on-chip (SoC). You’ll ensure Apple products and services can seamlessly and efficiently handle the tasks that make them...
$100.8k - $155.98k
...Mountain View, CA, USA Machine Learning Engineer, User Understanding (Entry-Level / New Grad) Location Mountain View, CA, USA Department AI & Machine Learning Requisition ID JOBREQ-2616049 Role description The opportunity Our Gamer AI team develops...Work at officeWorldwideRelocation package$181.1k - $318.4k
...Sr. Machine Learning Engineer, Siri Speech Join the team redefining what a deeply personal and integrated assistant can be. As part of the Siri organization, you will help shape one of the world's most widely used AI assistants, powered by our next-generation of Apple...WorldwideRelocation$175k - $230k
...are Atoms is building the machines that power the next era of progress... ...environments, operate them, learn from them, and improve them... ...scale. We are roboticists, engineers, operators, and builders. We... ...codelabs, tools, and infrastructure to democratize access to machine...Full timeTemporary workWork at officeFlexible hours$165.2k - $223.6k
...Description The Product: Amazon's Machine Learning accelerators are at the forefront of our innovation and one of several Amazon's tools... .... The team covers multiple disciplines including silicon engineering, hardware design and verification, software and operations....InternshipLocal areaFlexible hours$181.1k - $318.4k
...Senior Machine Learning Engineer Imagine what you could do here! The people here at Apple don't just create products — they build the kind... ...), Generative AI and optimize Apple-wide systems & infrastructure. As a member of the fast-paced team, you will have the outstanding...Work experience placementRelocation- ...Job Title Required Skills: ~12+ years of experience in Client Engineering with experience in NLP ~ Experience in deploying Client models ~ Strong understanding of machine learning principles, especially in the context of LLMs. ~ Experience building chatbots...
- ...Senior Machine Learning Engineer It started with a simple idea: what if surgery could be less invasive and recovery less painful? Nearly 30 years later, that question still fuels everything we do at Intuitive. As a global leader in robotic-assisted surgery and minimally...Local areaWorldwideFlexible hours
$120k - $235k
...most innovative companies to build strong engineering teams ready for what's next.... ...integrity signals. Build the evaluation infrastructure, golden datasets, and benchmarking pipelines... ..., target bonus, and equity. Want to learn more about HackerRank? Check out HackerRank...Shift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Machine Learning Infrastructure Engineer. Be the first to apply!
- machine learning engineer Sunnyvale, CA
- data scientist machine learning engineer Sunnyvale, CA
- senior ml engineer Sunnyvale, CA
- computer vision machine learning engineer Sunnyvale, CA
- ai ml engineer Sunnyvale, CA
- machine learning software engineer Sunnyvale, CA
- machine learning ai engineer Sunnyvale, CA
- security infrastructure engineer Sunnyvale, CA
- infrastructure engineer Sunnyvale, CA
- data infrastructure engineer Sunnyvale, CA


