Inference Optimization ML Engineer
Rhoda ai
Inference Optimization MLE
At Rhoda AI, we're building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.
We're looking for an Inference Optimization MLE to help build and operate the systems that make our foundation models run fast and efficiently in production. You'll be responsible for squeezing maximum performance out of large multimodal models, across cloud and on-robot deployment targets. You will working closely with research and robotics teams to close the gap between training and real-world deployment.
What You'll Do
- Own inference performance end-to-end — diagnose and improve latency, throughput, and efficiency of large foundation models in production
- Build systematic performance attribution: latency decomposition (compute vs. memory bandwidth vs. I/O), bottleneck identification, and prioritization across model families
- Apply and develop optimization techniques including quantization, pruning, distillation, operator fusion, and model compilation (e.g., TensorRT, torch.compile, XLA)
- Optimize attention mechanisms, KV caching, and memory layouts for large multimodal models (vision, video, language, proprioception)
- Work with kernel-level tooling (e.g., CUDA, Triton) to identify hotspots and implement or tune custom kernels where needed
- Build benchmarking and regression detection infrastructure: latency baselines, throughput curves, and automated detection of performance regressions across model versions
- Collaborate closely with research engineers to translate model innovations into optimized, deployment-ready implementations
What We're Looking For
- 3+ years of experience in inference optimization, ML systems, or a closely related field
- Deep hands-on experience with modern ML stacks (PyTorch required; JAX a plus)
- Strong understanding of compute, memory bandwidth, and I/O bottlenecks in large model inference
- Experience with model optimization techniques: quantization (INT8/FP8/AWQ), distillation, pruning, and compilation
- Familiarity with inference serving frameworks (e.g., Triton, TensorRT, vLLM, TorchServe)
- Exceptional debugging and measurement ability: turn "inference is slow" into clear bottlenecks, experiments, and validated improvements
- High ownership mindset and comfort in a fast-moving environment
Nice to Have (But Not Required)
- GPU kernel or compiler-level experience (CUDA, Triton, graph capture, operator fusion)
- Experience with multimodal or video model inference (variable-length sequences, packing/bucketing)
- Familiarity with edge/cloud hybrid deployment patterns and on-robot inference constraints
- Experience with speculative decoding, continuous batching, or other LLM serving optimizations
- Background in streaming or low-latency systems relevant to real-time robot control
Why This Role
- Direct leverage on research velocity and real-world robot performance — every efficiency gain you make accelerates model iteration and tightens the loop between model and robot behavior
- Own the optimization layer that determines how quickly and efficiently our foundation models run in the real world — high ownership, high impact, small elite team
$278.1k - $347.6k
...our Principal Machine Learning Engineer, you will be the foremost... ...impact role. You will define the inference strategy, drive architectural... ...across the full mobile ML stack, and mentor a team of senior... ...team. Own the end-to-end optimization pipeline: from model export and...SuggestedWork at officeWorldwideRelocation package- ...up to make that a reality. We're looking for an ML Infrastructure Engineer to help build and operate the inference systems that power our automation stack. You'll... ...inference across cloud and on-prem environments Optimize latency, throughput, and reliability of deployed...Suggested
$159.05k - $199.3k
...exception.) About the role We are looking for a software engineer with deep experience in optimizing ML models and deploying them on production‑grade... ...strategies to optimize efficiency and latency of model inference for compute boards selected by our customers Work on...SuggestedFull timeFor contractorsFor subcontractorCasual workWork at officeRemote workDay shift- A tech company in Mountain View seeks talented engineers for a role emphasizing high-performance systems, inference optimization, and model acceleration. You will thrive in ambiguity, tackle unclear problems, and design impactful solutions. The position offers a competitive...Suggested
- Rhoda ai in Palo Alto is seeking an Inference Infrastructure Engineer to help power their model deployment stack for humanoid robots. This role... ...focus on Kubernetes deployment pipelines and resource optimization across GPU clusters, you will play a crucial role in scaling...Suggested
$155.42k - $205.9k
...Description About the Team: The ML Inference Platform is part of the AV ML Infrastructure... ...innovation and feature development by optimizing for high-priority, ML-centric use cases... ...are seeking a Senior ML Infrastructure engineer to help build and scale robust...Local areaRemote workWork from homeRelocationRelocation packageFlexible hours$128.7k - $261.3k
...Team The Model Deployment & Inference Solutions team in GM AV... ...mission is two-fold: build the ML deployment platform that makes... ...rollouts fast and predictable, and optimize models so they meet the real-... ...performed manually by engineers. Build the developer experience...Local areaRemote workWork from homeRelocation packageFlexible hoursShift work- ...cutting-edge robotics company in California seeks an ML Infrastructure Engineer to build and operate inference systems for their automation stack.... ...include maintaining infrastructure for model inference, optimizing performance, and collaborating with research teams...
$170k - $216k
...Machine Learning Engineer, Model Optimization Waymo is an autonomous driving technology company with... ...utilization in model training and model inference through model architecture/ hardware co... ...with Python ~ Experience with ML frameworks like PyTorch or JAX...Full timeRemote work- ...industry-leading training and inference speeds and empowers machine... ...effortlessly run large-scale ML applications, without the hassle... ...Role The Inference ML Engineering team at Cerebras Systems is dedicated... ...of various features. Optimize software to accelerate...
$124k - $195.5k
...Learning Applications and Compiler Engineer for New College Grad 2026 in Santa... ...focus on developing algorithms for inference and compiler stack optimizations, working at the intersection of deep... ...compiler development, and experience with ML frameworks like TensorFlow and...$174k - $252k
Google Inc. is seeking a Senior Software Engineer to develop next-generation ML compiler optimizations for EdgeTPU hardware in Mountain View, CA. This role requires strong expertise in software development, compiler optimization, and machine learning. Responsibilities include...- ...Mountain View is seeking a Machine Learning Engineer to build and optimize the infrastructure for its Intelligence... ...The role involves designing and deploying ML models for multimodal data understanding, optimizing inference pipelines, and collaborating with teams to...
- ...is hiring a Machine Learning Systems Engineer in Cupertino, California. You will collaborate... ...with Siri modeling teams to optimize model training and inference on Apple's custom Silicon. The ideal candidate has strong experience in ML models, with proficiency in Python and...
- A robotics innovation company focused on home robotics seeks a Software Engineer to develop machine learning infrastructure. You will own training systems, optimize data pipelines, and work on real-time robot control systems. Ideal candidates have a strong software engineering...
- ...ML Engineer Palo Alto, California, United States About the Job Our client is a rapidly... ...how businesses learn from and optimize their in-person customer experiences.... ...preprocessing, model training, deployment, inference, and monitoring in production environments...Full time
$213k - $263k
...Machine Learning Engineer, Runtime & Optimization Waymo is an autonomous driving technology company with the mission to be the world's most trusted... ...of billions in simulation across 15+ U.S. states. The ML Platform team at Waymo provides a set of tools to support...Full timeRemote work- A leading tech company is seeking a Machine Learning Engineer in Cupertino, California. In this role, you will design, implement, and optimize machine learning frameworks, develop text input features, and collaborate with data scientists and software developers. Required...
$147k - $211k
Google Inc. is seeking a skilled ML Compiler Software Engineer for its Sunnyvale office. The position requires a Bachelor's degree, proficiency... .... In this role, you will focus on developing compiler optimizations for Tensor Processing Units (TPUs), enhancing parallelization...Full timeWork at office$147.4k - $272.1k
...Machine Learning Engineer, Proactive - Large Language Models & Generative AI Inference The Intelligence Platform team empowers clients across Apple's operating... ...the GenAI inference stack, ensuring performance optimization and alignment with broader business goals....Relocation- ...ML Engineer Tilde Research is a moonshot AI lab advancing mechanistic interpretability, new architectures, and pretraining science... ..., and control. What you might work on: Optimize inference and training throughput for novel model architectures Build...Full timeInternship
$181.1k - $318.4k
...Sr. ML Optimization Engineer, iCloud In Apple's iCloud services organization, efficiency is not just a technical goal; it's an essential part of our commitment to environmental sustainability and optimal resource utilization. This team plays a pivotal role in ensuring...Relocation$176k - $420k
...innovation. Comprising brilliant engineers and visionaries, the team... ...designs and develops advanced AI inference chips tailored to accelerate... ...creating custom silicon and optimized architectures, the team... ...Tesla AI hardware team, the AI/ML Modeling Engineer will drive...Hourly payFull timeTemporary workFlexible hours$158k - $241.9k
...global scale. Role: As a Senior AI/ML Engineer within the Onboard Embodied AI... ...-to-end solutions capable of real-time inference and robust autonomous driving performance... ...training methodologies, and inference optimization strategies suited for real-time onboard...Local areaWork from homeRelocation packageFlexible hours- Applied Intuition seeks a software engineer in Mountain View, CA, to optimize machine learning models for embedded environments. You'll drive performance enhancements... ...across various technologies, collaborating closely with ML engineers. The ideal candidate will have experience with...
$300k - $400k
...frontier model training and inference fast, efficient, and tightly... ...translate findings into actionable optimizations Implement direct S3... ...and benchmarking distributed ML systems to identify and eliminate... ...'s best — the scientists, engineers, and problem-solvers who don'...Visa sponsorshipFlexible hoursShift work$185.5k - $270k
...assistance. About the Team: The ML Inference Platform is part of the AI Compute Platforms... ...innovation and feature development by optimizing for high-priority, ML-centric use cases... ...are seeking a Staff ML Infrastructure engineer to help build and scale robust Compute...Local areaWork from homeRelocation packageFlexible hours$19 - $65 per hour
PlusAI is seeking a Machine Learning Infrastructure Engineer Intern to work on high-performance kernels for BEV model training. In this... ..., and Triton. This internship also explores the use of LLMs to optimize code generation and performance profiling. The hourly pay ranges...Hourly payInternship$174k - $252k
Google Inc. is seeking a Senior Machine Learning Engineer in Sunnyvale, CA, to improve AI model performance and efficiency. Candidates... ...experience in software development, testing, and performance optimization. Responsibilities include engaging with product teams to resolve...$181.1k - $318.4k
Apple Inc. in Santa Clara, California, is looking for an experienced Machine Learning engineer to optimize and build production-grade solutions serving millions in real time. You will work closely with product teams and utilize advanced machine learning technologies, contributing...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Inference Optimization ML Engineer. Be the first to apply!
- machine learning ai engineer Palo Alto, CA
- machine learning engineer Palo Alto, CA
- machine learning software engineer Palo Alto, CA
- ai ml engineer Palo Alto, CA
- senior ml engineer Palo Alto, CA
- computer vision machine learning engineer Palo Alto, CA
- machine learning research scientist Palo Alto, CA
- machine learning part time Palo Alto, CA
- artificial intelligence - machine learning intern Palo Alto, CA
- machine learning Palo Alto, CA

