Inference Optimization ML Engineer

Rhoda ai

Inference Optimization MLE

At Rhoda AI, we're building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.

We're looking for an Inference Optimization MLE to help build and operate the systems that make our foundation models run fast and efficiently in production. You'll be responsible for squeezing maximum performance out of large multimodal models, across cloud and on-robot deployment targets. You will working closely with research and robotics teams to close the gap between training and real-world deployment.

What You'll Do

Own inference performance end-to-end — diagnose and improve latency, throughput, and efficiency of large foundation models in production
Build systematic performance attribution: latency decomposition (compute vs. memory bandwidth vs. I/O), bottleneck identification, and prioritization across model families
Apply and develop optimization techniques including quantization, pruning, distillation, operator fusion, and model compilation (e.g., TensorRT, torch.compile, XLA)
Optimize attention mechanisms, KV caching, and memory layouts for large multimodal models (vision, video, language, proprioception)
Work with kernel-level tooling (e.g., CUDA, Triton) to identify hotspots and implement or tune custom kernels where needed
Build benchmarking and regression detection infrastructure: latency baselines, throughput curves, and automated detection of performance regressions across model versions
Collaborate closely with research engineers to translate model innovations into optimized, deployment-ready implementations

What We're Looking For

3+ years of experience in inference optimization, ML systems, or a closely related field
Deep hands-on experience with modern ML stacks (PyTorch required; JAX a plus)
Strong understanding of compute, memory bandwidth, and I/O bottlenecks in large model inference
Experience with model optimization techniques: quantization (INT8/FP8/AWQ), distillation, pruning, and compilation
Familiarity with inference serving frameworks (e.g., Triton, TensorRT, vLLM, TorchServe)
Exceptional debugging and measurement ability: turn "inference is slow" into clear bottlenecks, experiments, and validated improvements
High ownership mindset and comfort in a fast-moving environment

Nice to Have (But Not Required)

GPU kernel or compiler-level experience (CUDA, Triton, graph capture, operator fusion)
Experience with multimodal or video model inference (variable-length sequences, packing/bucketing)
Familiarity with edge/cloud hybrid deployment patterns and on-robot inference constraints
Experience with speculative decoding, continuous batching, or other LLM serving optimizations
Background in streaming or low-latency systems relevant to real-time robot control

Why This Role

Direct leverage on research velocity and real-world robot performance — every efficiency gain you make accelerates model iteration and tightens the loop between model and robot behavior
Own the optimization layer that determines how quickly and efficiently our foundation models run in the real world — high ownership, high impact, small elite team

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Inference Optimization ML Engineer in Palo Alto, CA vacancy

Principal Machine Learning Engineer, Mobile AI Inference Optimization
$278.1k - $347.6k
...our Principal Machine Learning Engineer, you will be the foremost... ...impact role. You will define the inference strategy, drive architectural... ...across the full mobile ML stack, and mentor a team of senior... ...team. Own the end-to-end optimization pipeline: from model export and...
Suggested
Work at office
Worldwide
Relocation package
Unity
Mountain View, CA
2 days ago
ML Inference Engineer
...up to make that a reality. We're looking for an ML Infrastructure Engineer to help build and operate the inference systems that power our automation stack. You'll... ...inference across cloud and on-prem environments Optimize latency, throughput, and reliability of deployed...
Suggested
Rhoda AI
Palo Alto, CA
2 days ago
ML Runtime Optimization Engineer
$159.05k - $199.3k
...exception.) About the role We are looking for a software engineer with deep experience in optimizing ML models and deploying them on production‑grade... ...strategies to optimize efficiency and latency of model inference for compute boards selected by our customers Work on...
Suggested
Full time
For contractors
For subcontractor
Casual work
Work at office
Remote work
Day shift
Decisive Point
Mountain View, CA
12 hours ago
Staff ML Engineer — Ultra-Low-Latency Inference
A tech company in Mountain View seeks talented engineers for a role emphasizing high-performance systems, inference optimization, and model acceleration. You will thrive in ambiguity, tackle unclear problems, and design impactful solutions. The position offers a competitive...
Suggested
Inworld
Mountain View, CA
3 days ago
Robotics ML Inference Infrastructure Engineer
Rhoda ai in Palo Alto is seeking an Inference Infrastructure Engineer to help power their model deployment stack for humanoid robots. This role... ...focus on Kubernetes deployment pipelines and resource optimization across GPU clusters, you will play a crucial role in scaling...
Suggested
Rhoda ai
Palo Alto, CA
4 days ago
Senior ML Infrastructure Engineer, Inference Platform
$155.42k - $205.9k
...Description About the Team: The ML Inference Platform is part of the AV ML Infrastructure... ...innovation and feature development by optimizing for high-priority, ML-centric use cases... ...are seeking a Senior ML Infrastructure engineer to help build and scale robust...
Local area
Remote work
Work from home
Relocation
Relocation package
Flexible hours
General Motors
Mountain View, CA
12 hours ago
Senior ML Inference Engineer - Platform
$128.7k - $261.3k
...Team The Model Deployment & Inference Solutions team in GM AV... ...mission is two-fold: build the ML deployment platform that makes... ...rollouts fast and predictable, and optimize models so they meet the real-... ...performed manually by engineers. Build the developer experience...
Local area
Remote work
Work from home
Relocation package
Flexible hours
Shift work
General Motors
Mountain View, CA
2 days ago
Robotics ML Inference Engineer — Edge & Cloud AI
...cutting-edge robotics company in California seeks an ML Infrastructure Engineer to build and operate inference systems for their automation stack.... ...include maintaining infrastructure for model inference, optimizing performance, and collaborating with research teams...
Rhoda AI
Palo Alto, CA
2 days ago
Machine Learning Engineer, Model Optimization
$170k - $216k
...Machine Learning Engineer, Model Optimization Waymo is an autonomous driving technology company with... ...utilization in model training and model inference through model architecture/ hardware co... ...with Python ~ Experience with ML frameworks like PyTorch or JAX...
Full time
Remote work
Waymo
Mountain View, CA
3 days ago
Staff Inference ML Runtime Engineer
...industry-leading training and inference speeds and empowers machine... ...effortlessly run large-scale ML applications, without the hassle... ...Role The Inference ML Engineering team at Cerebras Systems is dedicated... ...of various features. Optimize software to accelerate...
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
12 hours ago
ML Inference & Compiler Engineer — Equity Eligible
$124k - $195.5k
...Learning Applications and Compiler Engineer for New College Grad 2026 in Santa... ...focus on developing algorithms for inference and compiler stack optimizations, working at the intersection of deep... ...compiler development, and experience with ML frameworks like TensorFlow and...
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior ML Compiler Engineer - EdgeTPU Optimization
$174k - $252k
Google Inc. is seeking a Senior Software Engineer to develop next-generation ML compiler optimizations for EdgeTPU hardware in Mountain View, CA. This role requires strong expertise in software development, compiler optimization, and machine learning. Responsibilities include...
Google Inc.
Mountain View, CA
12 hours ago
ML Engineer — AI Platform & Multimodal Inference
...Mountain View is seeking a Machine Learning Engineer to build and optimize the infrastructure for its Intelligence... ...The role involves designing and deploying ML models for multimodal data understanding, optimizing inference pipelines, and collaborating with teams to...
Corvic
Mountain View, CA
1 day ago
ML Systems Engineer: Scale Training & Inference on Custom Silicon
...is hiring a Machine Learning Systems Engineer in Cupertino, California. You will collaborate... ...with Siri modeling teams to optimize model training and inference on Apple's custom Silicon. The ideal candidate has strong experience in ML models, with proficiency in Python and...
Apple Inc.
Cupertino, CA
12 hours ago
Robotics ML Systems Engineer: Training & Real-Time Inference
A robotics innovation company focused on home robotics seeks a Software Engineer to develop machine learning infrastructure. You will own training systems, optimize data pipelines, and work on real-time robot control systems. Ideal candidates have a strong software engineering...
Sunday
Mountain View, CA
2 days ago
ML Engineer
...ML Engineer Palo Alto, California, United States About the Job Our client is a rapidly... ...how businesses learn from and optimize their in-person customer experiences.... ...preprocessing, model training, deployment, inference, and monitoring in production environments...
Full time
Catalyst Labs, LLC
Palo Alto, CA
12 hours ago
Machine Learning Engineer, Runtime & Optimization
$213k - $263k
...Machine Learning Engineer, Runtime & Optimization Waymo is an autonomous driving technology company with the mission to be the world's most trusted... ...of billions in simulation across 15+ U.S. states. The ML Platform team at Waymo provides a set of tools to support...
Full time
Remote work
Waymo
Mountain View, CA
1 day ago
Real-Time ML Inference Engineer
A leading tech company is seeking a Machine Learning Engineer in Cupertino, California. In this role, you will design, implement, and optimize machine learning frameworks, develop text input features, and collaborate with data scientists and software developers. Required...
Apple Inc.
Cupertino, CA
2 days ago
TPU ML Compiler Engineer — Optimize Large-Scale Workloads
$147k - $211k
Google Inc. is seeking a skilled ML Compiler Software Engineer for its Sunnyvale office. The position requires a Bachelor's degree, proficiency... .... In this role, you will focus on developing compiler optimizations for Tensor Processing Units (TPUs), enhancing parallelization...
Full time
Work at office
Google Inc.
Sunnyvale, CA
12 hours ago
Machine Learning Engineer, Proactive - Large Language Models & Generative AI Inference
$147.4k - $272.1k
...Machine Learning Engineer, Proactive - Large Language Models & Generative AI Inference The Intelligence Platform team empowers clients across Apple's operating... ...the GenAI inference stack, ensuring performance optimization and alignment with broader business goals....
Relocation
Apple
Cupertino, CA
4 days ago
ML Engineer (Internship and Full-time)
...ML Engineer Tilde Research is a moonshot AI lab advancing mechanistic interpretability, new architectures, and pretraining science... ..., and control. What you might work on: Optimize inference and training throughput for novel model architectures Build...
Full time
Internship
Tilde
Palo Alto, CA
2 days ago
Sr. ML Optimization Engineer, iCloud
$181.1k - $318.4k
...Sr. ML Optimization Engineer, iCloud In Apple's iCloud services organization, efficiency is not just a technical goal; it's an essential part of our commitment to environmental sustainability and optimal resource utilization. This team plays a pivotal role in ensuring...
Relocation
Apple
Cupertino, CA
12 hours ago
ML Modeling Engineer, AI Hardware
$176k - $420k
...innovation. Comprising brilliant engineers and visionaries, the team... ...designs and develops advanced AI inference chips tailored to accelerate... ...creating custom silicon and optimized architectures, the team... ...Tesla AI hardware team, the AI/ML Modeling Engineer will drive...
Hourly pay
Full time
Temporary work
Flexible hours
Tesla
Palo Alto, CA
4 days ago
Senior ML Engineer - Embodied AI Onboard Autonomy
$158k - $241.9k
...global scale. Role: As a Senior AI/ML Engineer within the Onboard Embodied AI... ...-to-end solutions capable of real-time inference and robust autonomous driving performance... ...training methodologies, and inference optimization strategies suited for real-time onboard...
Local area
Work from home
Relocation package
Flexible hours
General Motors
Mountain View, CA
3 days ago
Embedded ML Runtime Optimization Engineer
Applied Intuition seeks a software engineer in Mountain View, CA, to optimize machine learning models for embedded environments. You'll drive performance enhancements... ...across various technologies, collaborating closely with ML engineers. The ideal candidate will have experience with...
Decisive Point
Mountain View, CA
12 hours ago
ML Systems Engineer
$300k - $400k
...frontier model training and inference fast, efficient, and tightly... ...translate findings into actionable optimizations Implement direct S3... ...and benchmarking distributed ML systems to identify and eliminate... ...'s best — the scientists, engineers, and problem-solvers who don'...
Visa sponsorship
Flexible hours
Shift work
Periodic Labs
Menlo Park, CA
3 days ago
Staff ML Engineer, Inference Platform
$185.5k - $270k
...assistance. About the Team: The ML Inference Platform is part of the AI Compute Platforms... ...innovation and feature development by optimizing for high-priority, ML-centric use cases... ...are seeking a Staff ML Infrastructure engineer to help build and scale robust Compute...
Local area
Work from home
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
4 days ago
ML Infra Engineer Intern: Optimize BEV Training on GPUs
$19 - $65 per hour
PlusAI is seeking a Machine Learning Infrastructure Engineer Intern to work on high-performance kernels for BEV model training. In this... ..., and Triton. This internship also explores the use of LLMs to optimize code generation and performance profiling. The hourly pay ranges...
Hourly pay
Internship
PlusAI
Santa Clara, CA
2 days ago
Senior ML Performance Engineer: Scale & Optimize AI Models
$174k - $252k
Google Inc. is seeking a Senior Machine Learning Engineer in Sunnyvale, CA, to improve AI model performance and efficiency. Candidates... ...experience in software development, testing, and performance optimization. Responsibilities include engaging with product teams to resolve...
Google Inc.
Sunnyvale, CA
12 hours ago
Foundation Model Services ML Engineer for Scale Inference
$181.1k - $318.4k
Apple Inc. in Santa Clara, California, is looking for an experienced Machine Learning engineer to optimize and build production-grade solutions serving millions in real time. You will work closely with product teams and utilize advanced machine learning technologies, contributing...
Apple Inc.
Santa Clara, CA
12 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Optimization ML Engineer. Be the first to apply!