Engineering Manager - ML Platform and Infrastructure
$204k - $343kApplied Intuition
Engineering Manager, ML Platform Team
Applied Intuition, Inc. is powering the future of physical AI. Founded in 2017 and now valued at $15 billion, the Silicon Valley company is creating the digital infrastructure needed to bring intelligence to every moving machine on the planet. The company services the automotive, defense, trucking, construction, mining, and agriculture industries in three core areas: tools and infrastructure, operating systems, and autonomy. Applied Intuition is headquartered in Sunnyvale, California, with offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo.
We are an in-office company, and our expectation is that employees primarily work from their Applied Intuition office 5 days a week. However, we also recognize the importance of flexibility and trust our employees to manage their schedules responsibly. This may include occasional remote work, starting the day with morning meetings from home before heading to the office, or leaving earlier when needed to accommodate family commitments.
As an Engineering Manager on the ML Platform team, you'll lead a world-class group of engineers focused on building the infrastructure that powers Physical AI at scale. Your team will own three critical areas: Training & Inference Orchestration, where we build frameworks to efficiently schedule and run massive jobs across thousands of GPUs; GPU Cluster Architecture, where we design and scale what will be the largest GPU cluster for Physical AI in the industry; and Performance Optimization, where we push the limits of hardware utilization, throughput, and cost efficiency for large-scale training and inference workloads. You'll work at the intersection of systems engineering and ML, partnering directly with stack development and research teams to remove bottlenecks and accelerate the path from experimentation to production.
At Applied Intuition, you will:
- Grow and manage a team of world-class infrastructure and systems engineers with the goal of delivering a best-in-class ML platform for Physical AI
- Own the design and evolution of frameworks for orchestrating distributed training and inference jobs across thousands of GPUs
- Drive the buildout and scaling of our GPU cluster infrastructure, making critical decisions on architecture, scheduling, networking, and resource management
- Lead efforts to optimize training and inference performance including throughput, fault tolerance, GPU utilization, and cost efficiency at scale
- Set team goals and roadmap in alignment with research milestones, model development timelines, and production deployment requirements
- Partner closely with research, stack development, and infrastructure teams to understand their workflows and accelerate their iteration speed
- Drive hiring, mentoring, and growth for a high-performing, mission-driven team
We're looking for someone who has:
- 3+ years of engineering management experience, ideally leading infrastructure or platform teams
- Passion for building and leading high-performing teams that operate at the frontier of scale
- Deep experience with distributed systems, GPU computing, or large-scale ML infrastructure
- Direct experience building or operating large GPU clusters (1,000+ GPUs)
- Strong understanding of distributed training frameworks (e.g., PyTorch Distributed, Megatron-LM, DeepSpeed, FSDP) and job orchestration at scale
- Familiarity with GPU cluster management, high-performance networking (InfiniBand, RDMA), and resource scheduling (Slurm, Kubernetes)
- Track record of building and operating systems that run reliably at massive scale
Nice to have:
- Background in training optimization techniques such as mixed-precision training, pipeline/tensor/data parallelism, or checkpointing strategies
- Experience with inference optimization (batching, model serving, quantization, compiler-level optimizations)
- Familiarity with Physical AI domains such as autonomous driving, robotics, or simulation
- Contributions to open-source ML infrastructure projects
Compensation at Applied Intuition for eligible roles includes base salary, equity, and benefits. Base salary is a single component of the total compensation package, which may also include equity in the form of options and/or restricted stock units, comprehensive health, dental, vision, life and disability insurance coverage, 401k retirement benefits with employer match, learning and wellness stipends, and paid time off. Note that benefits are subject to change and may vary based on jurisdiction of employment.
Applied Intuition pay ranges reflect the minimum and maximum intended target base salary for new hire salaries for the position. The actual base salary offered to a successful candidate will additionally be influenced by a variety of factors including experience, credentials & certifications, educational attainment, skill level requirements, interview performance, and the level and scope of the position.
Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the location listed is: $204,000 - $343,000 USD annually.
Don't meet every single requirement? If you're excited about this role but your past experience doesn't align perfectly with every qualification in the job description, we encourage you to apply anyway. You may be just the right candidate for this or other roles.
Applied Intuition is an equal opportunity employer and federal contractor or subcontractor. Consequently, the parties agree that, as applicable, they will abide by the requirements of 41 CFR 60-1.4(a), 41 CFR 60-300.5(a) and 41 CFR 60-741.5(a) and that these laws are incorporated herein by reference. These regulations prohibit discrimination against qualified individuals based on their status as protected veterans or individuals with disabilities, and prohibit discrimination against all individuals based on their race, color, religion, sex, sexual orientation, gender identity or national origin. These regulations require that covered prime contractors and subcontractors take affirmative action to employ and advance in employment individuals without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status or disability. The parties also agree that, as applicable, they will abide by the requirements of Executive Order 13496 (29 CFR Part 471, Appendix A to Subpart A), relating to the notice of employee rights under federal labor laws.
Applied Intuition- Moveworks is seeking an Engineering Manager for the Machine Learning Infrastructure team in Mountain View, California. This critical role involves leading engineers to develop the ML infrastructure powering Conversational AI and ensuring its scalability. Candidates should...Platform
$276k - $367k
...As the Engineering Manager for the Machine Learning Infrastructure team, you will spearhead the development of the cutting-edge platform that powers Moveworks' conversational AI. This role is absolutely... ...end-to-end systems for the entire ML/LLM lifecycle. This includes our...PlatformFull time- Decisive Point is looking for an Engineering Manager for the ML Platform team in Sunnyvale, California. This role involves leading a team to build and optimize the infrastructure for Physical AI, managing GPU clusters, and ensuring the delivery of high-performance ML solutions...Platform
- ...forefront of cloud-native infrastructure, where reliability,... ...Senior Site Reliability Engineer, you will design and operate the platforms that power our applications... ...monitoring Leverage AI/ML to automate incident... ...for code deployment, management, and observability of application...PlatformVisa sponsorshipWork visa
$251k - $310k
...can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has... ...in Research, Oracles, and Simulation. Manage a team (~10) with diverse skills including engineering, modeling, ML infrastructure, of senior and junior SWEs, foster an inclusive...PlatformTemporary workImmediate start- ...Engineering Manager, Agentic Systems - Moveworks Job Description As the Engineering... ...for the Machine Learning Infrastructure team, you will spearhead... ...of the cutting‑edge platform that powers Moveworks' conversational... ...end systems for the entire ML/LLM lifecycle. This...Platform
$141.8k - $258.6k
...seeking a Technical Program Manager to help shape the... ...machine learning infrastructure. In this role, you’ll... ...functional coordination across engineering teams to deliver robust, high‑performance ML systems that operate... ...complex software platforms, tools, and processes...PlatformRelocation$245k - $330k
...learning models, the Moveworks platform learns the unique language... ...As a technical lead and manager of the core infrastructure team, you will be... ...scale seamlessly and our engineering teams to build customer facing... ...Experience with search and ML infrastructure is plus....PlatformFull timeImmediate start$185.1k - $284.1k
The Role As the Tech Lead Manager for the Rendering Infrastructure team within Simulation, you will be both the... ...for a small, high-leverage group of engineers. The team owns the foundational... ...Rendering algorithms team, Simulation platform teams, and downstream consumers (perception...PlatformRemote workFlexible hours$204k - $343k
About The Role As an Engineering Manager on the Data Intelligence team, you... ...large‑scale data mining infrastructure Lead the integration of foundation... ...costs, and accelerate ML iteration cycles Set team... ...labeling pipelines, or annotation platforms at scale Familiarity with...PlatformFull timeFor contractorsFor subcontractor$204k - $343k
...creating the digital infrastructure needed to bring intelligence... ...our employees to manage their schedules responsibly... ...About the role As an Engineering Manager on the Data... ...costs, and accelerate ML iteration cycles Set... ...pipelines, or annotation platforms at scale Familiarity...PlatformFull timeFor contractorsFor subcontractorCasual workWork at officeRemote workDay shift- Google Inc. in Sunnyvale, CA is seeking a Senior Software Engineering Manager to lead and develop teams across multiple locations. You will... .... Successful candidates will have significant software development and infrastructure experience. #J-18808-Ljbffr Google Inc.Platform
- Apple Inc. is looking for an Engineering Manager in Cupertino, California, to lead a team responsible for building and operating scalable machine learning infrastructure. The role involves driving best practices in system design and collaborating with cross-functional...
$262k - $365k
Senior Engineering Manager AI Inference Platform, Distributed Cloud Location: Sunnyvale, CA, USA Pay US: $262,0... ...teams focused on machine learning infrastructure, AI platforms, or high‑performance... ...of experience utilizing deep‑dive ML profiling tools (e.g., Nsight, xprof...Platform$251k - $310k
...can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver... ...-coach" for a team of roughly 6-10 engineers. This is a high impact role, where your... ...will report to a Sr Staff Technical Lead Manager. You will: Own a specific domain within...PlatformFull timeTemporary workRemote work$200k - $250k
...Full time Department Engineering Compensation Estimated... ...coding, revenue cycle management and more — all... ...to lead the Frontend Platform team powering our Ambient... ...Partner with backend and ML teams to define clean... ...platforms, shared UI infrastructure, or design systems Strong...PlatformFull timeWork at officeLocal areaRemote work$250k - $300k
...for the modern world. Our cloud-native platform uses computer vision and AI to help businesses... ...We are looking for a technically deep Engineering Manager to lead the AI team at Coram. This team... ...record of shipping production‑grade ML systems at scale Ability to balance...PlatformShift work- ...Sr. Technical Program Manager Cerebras Systems builds... ...closely with Hardware Engineering, Inference Engineering... ...Engineering, AI Cloud Infrastructure & Operations, Network... ..., Hardware-centric platforms ~ Proven ability to... ...Preferred Experience AI/ML, HPC, or accelerator-...Platform
$144.6k - $218k
...Engineering Project Manager (EPM) for GenAI, AI & Data Platforms (AiDP) Do you love understanding every detail of how new technologies work? Join the team that serves... ...you have one-plus years experience leading AI or ML programs in an enterprise environment...PlatformRelocation$280k - $385k
...the world's best data and AI infrastructure platform, so our customers can focus... ...to their missions. Our engineering teams build highly technical... ...trusted data analytics and ML platform in the world. The... ...Privacy, Trust, Safety, Identity Management, Access Control, Key...PlatformLocal areaRemote workWorldwide$250k - $300k
...AI-powered digital commerce platform is revolutionizing the way... ...online. Our unified ecommerce management solutions empower brands to... ...Role We're looking for an Engineering Leader with a Data Science... ...Design and build scalable ML infrastructure to support model training,...PlatformTemporary work$250k - $344.5k
...Network Security (NetSec) Engineering – Our team is at the... ...advanced security platforms. We are now expanding... ...applications that leverage AI/ML to solve real-world... ...top-tier talent and managers focused on the intersection... ...) and cloud-native infrastructure (AWS/GCP, Kubernetes,...Platform$212.7k - $287.7k
...personalized recommendations powered by advanced relevance models, and deep insights into viewer behavior. Manager of ML Infrastructure You will lead multiple engineering teams to define the vision, strategy, and execution plan for ML infrastructure stack and deliver...PlatformLocal areaWorldwideFlexible hours$296.3k
...Role: We are seeking a Principal AI Engineer to lead the design and advancement of our AI platform. You will play a key role in shaping the infrastructure that powers large-scale training and... ...Architect, build, and optimize core AI/ML platform infrastructure to support massive...PlatformRemote workFlexible hours$140k - $230k
..., our software development platform for software-defined vehicles... ...Cloud & AI, the digital infrastructure powering our collaborative... ...efficient machine learning (ML) training and evaluation pipelines... ...strong, hands‑on Engineering Manager to lead a group of highly talented...PlatformTemporary workWork at officeFlexible hours- ...Senior Staff Data Scientist - Machine Learning to lead the end-to-end ML stack powering production models across its consumer platform. This role involves setting the technical direction for ML infrastructure and partnering with business teams to tackle complex challenges...Platform
$208.4k - $365.4k
...in Santa Clara is seeking an experienced Engineering Program Manager (EPM) to lead strategic initiatives across AI and ML platforms. This pivotal role requires 10+ years in... ...Responsibilities include driving large-scale infrastructure execution, aligning engineering with...Platform$147k - $237.5k
...OpenShift Operations Engineer Position Overview: The... ...our high‑availability infrastructure. This role bridges the... ...Hold OpenShift Container Platform (OCP). Your mission is... ...: Implement and manage OpenShift 4.x clusters... ...similar) and specialized AI/ML hardware....PlatformRemote work$160k - $250k
...’s most advanced AI-native platform. We work on large scale distributed... ...is seeking a Senior Engineering Manager for our Core API and Cloud... ...role oversees critical infrastructure that powers our platform's... ...insights Experience with AI/ML model deployment and monitoring...PlatformFull timeWork experience placementWork at officeLocal areaRemote work$250k - $300k
...Engineering Manager Santa Clara, CA (5 days/week in office) Comp: ~$250K–$300K base + equity... ...without wrangling complex infrastructure. Born from Netflix's battle-tested... ...open-source project, OSS Conductor, our platform is now powering billions of mission critical...PlatformWork at officeShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Engineering Manager - ML Platform and Infrastructure. Be the first to apply!
- senior platform engineer Sunnyvale, CA
- platform engineering manager Sunnyvale, CA
- platform developer Sunnyvale, CA
- data platform engineer Sunnyvale, CA
- platform engineer Sunnyvale, CA
- IT operations director Sunnyvale, CA
- platform manager Sunnyvale, CA
- platform product manager Sunnyvale, CA
- machine learning remote Sunnyvale, CA
- machine learning research scientist Sunnyvale, CA


