Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Inference Optimization ML Engineer

Rhoda ai

Inference Optimization MLE

At Rhoda AI, we're building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.

We're looking for an Inference Optimization MLE to help build and operate the systems that make our foundation models run fast and efficiently in production. You'll be responsible for squeezing maximum performance out of large multimodal models, across cloud and on-robot deployment targets. You will working closely with research and robotics teams to close the gap between training and real-world deployment.

What You'll Do
  • Own inference performance end-to-end — diagnose and improve latency, throughput, and efficiency of large foundation models in production
  • Build systematic performance attribution: latency decomposition (compute vs. memory bandwidth vs. I/O), bottleneck identification, and prioritization across model families
  • Apply and develop optimization techniques including quantization, pruning, distillation, operator fusion, and model compilation (e.g., TensorRT, torch.compile, XLA)
  • Optimize attention mechanisms, KV caching, and memory layouts for large multimodal models (vision, video, language, proprioception)
  • Work with kernel-level tooling (e.g., CUDA, Triton) to identify hotspots and implement or tune custom kernels where needed
  • Build benchmarking and regression detection infrastructure: latency baselines, throughput curves, and automated detection of performance regressions across model versions
  • Collaborate closely with research engineers to translate model innovations into optimized, deployment-ready implementations
What We're Looking For
  • 3+ years of experience in inference optimization, ML systems, or a closely related field
  • Deep hands-on experience with modern ML stacks (PyTorch required; JAX a plus)
  • Strong understanding of compute, memory bandwidth, and I/O bottlenecks in large model inference
  • Experience with model optimization techniques: quantization (INT8/FP8/AWQ), distillation, pruning, and compilation
  • Familiarity with inference serving frameworks (e.g., Triton, TensorRT, vLLM, TorchServe)
  • Exceptional debugging and measurement ability: turn "inference is slow" into clear bottlenecks, experiments, and validated improvements
  • High ownership mindset and comfort in a fast-moving environment
Nice to Have (But Not Required)
  • GPU kernel or compiler-level experience (CUDA, Triton, graph capture, operator fusion)
  • Experience with multimodal or video model inference (variable-length sequences, packing/bucketing)
  • Familiarity with edge/cloud hybrid deployment patterns and on-robot inference constraints
  • Experience with speculative decoding, continuous batching, or other LLM serving optimizations
  • Background in streaming or low-latency systems relevant to real-time robot control
Why This Role
  • Direct leverage on research velocity and real-world robot performance — every efficiency gain you make accelerates model iteration and tightens the loop between model and robot behavior
  • Own the optimization layer that determines how quickly and efficiently our foundation models run in the real world — high ownership, high impact, small elite team
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Inference Optimization ML Engineer in Palo Alto, CA vacancy
  • $278.1k - $347.6k

     ...our Principal Machine Learning Engineer, you will be the foremost...  ...impact role. You will define the inference strategy, drive architectural...  ...across the full mobile ML stack, and mentor a team of senior...  ...team. Own the end-to-end optimization pipeline: from model export and... 
    Suggested
    Work at office
    Worldwide
    Relocation package

    Unity

    Mountain View, CA
    2 days ago
  •  ...up to make that a reality. We're looking for an ML Infrastructure Engineer to help build and operate the inference systems that power our automation stack. You'll...  ...inference across cloud and on-prem environments Optimize latency, throughput, and reliability of deployed... 
    Suggested

    Rhoda AI

    Palo Alto, CA
    2 days ago
  • $159.05k - $199.3k

     ...exception.) About the role We are looking for a software engineer with deep experience in optimizing ML models and deploying them on production‑grade...  ...strategies to optimize efficiency and latency of model inference for compute boards selected by our customers Work on... 
    Suggested
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Remote work
    Day shift

    Decisive Point

    Mountain View, CA
    12 hours ago
  • A tech company in Mountain View seeks talented engineers for a role emphasizing high-performance systems, inference optimization, and model acceleration. You will thrive in ambiguity, tackle unclear problems, and design impactful solutions. The position offers a competitive... 
    Suggested

    Inworld

    Mountain View, CA
    3 days ago
  • Rhoda ai in Palo Alto is seeking an Inference Infrastructure Engineer to help power their model deployment stack for humanoid robots. This role...  ...focus on Kubernetes deployment pipelines and resource optimization across GPU clusters, you will play a crucial role in scaling... 
    Suggested

    Rhoda ai

    Palo Alto, CA
    4 days ago
  • $155.42k - $205.9k

     ...Description About the Team: The ML Inference Platform is part of the AV ML Infrastructure...  ...innovation and feature development by optimizing for high-priority, ML-centric use cases...  ...are seeking a Senior ML Infrastructure engineer to help build and scale robust... 
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Mountain View, CA
    12 hours ago
  • $128.7k - $261.3k

     ...Team The Model Deployment & Inference Solutions team in GM AV...  ...mission is two-fold: build the ML deployment platform that makes...  ...rollouts fast and predictable, and optimize models so they meet the real-...  ...performed manually by engineers. Build the developer experience... 
    Local area
    Remote work
    Work from home
    Relocation package
    Flexible hours
    Shift work

    General Motors

    Mountain View, CA
    2 days ago
  •  ...cutting-edge robotics company in California seeks an ML Infrastructure Engineer to build and operate inference systems for their automation stack....  ...include maintaining infrastructure for model inference, optimizing performance, and collaborating with research teams... 

    Rhoda AI

    Palo Alto, CA
    2 days ago
  • $170k - $216k

     ...Machine Learning Engineer, Model Optimization Waymo is an autonomous driving technology company with...  ...utilization in model training and model inference through model architecture/ hardware co...  ...with Python ~ Experience with ML frameworks like PyTorch or JAX... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    3 days ago
  •  ...industry-leading training and inference speeds and empowers machine...  ...effortlessly run large-scale ML applications, without the hassle...  ...Role The Inference ML Engineering team at Cerebras Systems is dedicated...  ...of various features. Optimize software to accelerate... 

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    12 hours ago
  • $124k - $195.5k

     ...Learning Applications and Compiler Engineer for New College Grad 2026 in Santa...  ...focus on developing algorithms for inference and compiler stack optimizations, working at the intersection of deep...  ...compiler development, and experience with ML frameworks like TensorFlow and... 

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $174k - $252k

    Google Inc. is seeking a Senior Software Engineer to develop next-generation ML compiler optimizations for EdgeTPU hardware in Mountain View, CA. This role requires strong expertise in software development, compiler optimization, and machine learning. Responsibilities include... 

    Google Inc.

    Mountain View, CA
    12 hours ago
  •  ...Mountain View is seeking a Machine Learning Engineer to build and optimize the infrastructure for its Intelligence...  ...The role involves designing and deploying ML models for multimodal data understanding, optimizing inference pipelines, and collaborating with teams to... 

    Corvic

    Mountain View, CA
    1 day ago
  •  ...is hiring a Machine Learning Systems Engineer in Cupertino, California. You will collaborate...  ...with Siri modeling teams to optimize model training and inference on Apple's custom Silicon. The ideal candidate has strong experience in ML models, with proficiency in Python and... 

    Apple Inc.

    Cupertino, CA
    12 hours ago
  • A robotics innovation company focused on home robotics seeks a Software Engineer to develop machine learning infrastructure. You will own training systems, optimize data pipelines, and work on real-time robot control systems. Ideal candidates have a strong software engineering... 

    Sunday

    Mountain View, CA
    2 days ago
  •  ...ML Engineer Palo Alto, California, United States About the Job Our client is a rapidly...  ...how businesses learn from and optimize their in-person customer experiences....  ...preprocessing, model training, deployment, inference, and monitoring in production environments... 
    Full time

    Catalyst Labs, LLC

    Palo Alto, CA
    12 hours ago
  • $213k - $263k

     ...Machine Learning Engineer, Runtime & Optimization Waymo is an autonomous driving technology company with the mission to be the world's most trusted...  ...of billions in simulation across 15+ U.S. states. The ML Platform team at Waymo provides a set of tools to support... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    1 day ago
  • A leading tech company is seeking a Machine Learning Engineer in Cupertino, California. In this role, you will design, implement, and optimize machine learning frameworks, develop text input features, and collaborate with data scientists and software developers. Required... 

    Apple Inc.

    Cupertino, CA
    2 days ago
  • $147k - $211k

    Google Inc. is seeking a skilled ML Compiler Software Engineer for its Sunnyvale office. The position requires a Bachelor's degree, proficiency...  .... In this role, you will focus on developing compiler optimizations for Tensor Processing Units (TPUs), enhancing parallelization... 
    Full time
    Work at office

    Google Inc.

    Sunnyvale, CA
    12 hours ago
  • $147.4k - $272.1k

     ...Machine Learning Engineer, Proactive - Large Language Models & Generative AI Inference The Intelligence Platform team empowers clients across Apple's operating...  ...the GenAI inference stack, ensuring performance optimization and alignment with broader business goals.... 
    Relocation

    Apple

    Cupertino, CA
    4 days ago
  •  ...ML Engineer Tilde Research is a moonshot AI lab advancing mechanistic interpretability, new architectures, and pretraining science...  ..., and control. What you might work on: Optimize inference and training throughput for novel model architectures Build... 
    Full time
    Internship

    Tilde

    Palo Alto, CA
    2 days ago
  • $181.1k - $318.4k

     ...Sr. ML Optimization Engineer, iCloud In Apple's iCloud services organization, efficiency is not just a technical goal; it's an essential part of our commitment to environmental sustainability and optimal resource utilization. This team plays a pivotal role in ensuring... 
    Relocation

    Apple

    Cupertino, CA
    12 hours ago
  • $176k - $420k

     ...innovation. Comprising brilliant engineers and visionaries, the team...  ...designs and develops advanced AI inference chips tailored to accelerate...  ...creating custom silicon and optimized architectures, the team...  ...Tesla AI hardware team, the AI/ML Modeling Engineer will drive... 
    Hourly pay
    Full time
    Temporary work
    Flexible hours

    Tesla

    Palo Alto, CA
    4 days ago
  • $158k - $241.9k

     ...global scale. Role: As a Senior AI/ML Engineer within the Onboard Embodied AI...  ...-to-end solutions capable of real-time inference and robust autonomous driving performance...  ...training methodologies, and inference optimization strategies suited for real-time onboard... 
    Local area
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Mountain View, CA
    3 days ago
  • Applied Intuition seeks a software engineer in Mountain View, CA, to optimize machine learning models for embedded environments. You'll drive performance enhancements...  ...across various technologies, collaborating closely with ML engineers. The ideal candidate will have experience with... 

    Decisive Point

    Mountain View, CA
    12 hours ago
  • $300k - $400k

     ...frontier model training and inference fast, efficient, and tightly...  ...translate findings into actionable optimizations Implement direct S3...  ...and benchmarking distributed ML systems to identify and eliminate...  ...'s best — the scientists, engineers, and problem-solvers who don'... 
    Visa sponsorship
    Flexible hours
    Shift work

    Periodic Labs

    Menlo Park, CA
    3 days ago
  • $185.5k - $270k

     ...assistance. About the Team: The ML Inference Platform is part of the AI Compute Platforms...  ...innovation and feature development by optimizing for high-priority, ML-centric use cases...  ...are seeking a Staff ML Infrastructure engineer to help build and scale robust Compute... 
    Local area
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    4 days ago
  • $19 - $65 per hour

    PlusAI is seeking a Machine Learning Infrastructure Engineer Intern to work on high-performance kernels for BEV model training. In this...  ..., and Triton. This internship also explores the use of LLMs to optimize code generation and performance profiling. The hourly pay ranges... 
    Hourly pay
    Internship

    PlusAI

    Santa Clara, CA
    2 days ago
  • $174k - $252k

    Google Inc. is seeking a Senior Machine Learning Engineer in Sunnyvale, CA, to improve AI model performance and efficiency. Candidates...  ...experience in software development, testing, and performance optimization. Responsibilities include engaging with product teams to resolve... 

    Google Inc.

    Sunnyvale, CA
    12 hours ago
  • $181.1k - $318.4k

    Apple Inc. in Santa Clara, California, is looking for an experienced Machine Learning engineer to optimize and build production-grade solutions serving millions in real time. You will work closely with product teams and utilize advanced machine learning technologies, contributing... 

    Apple Inc.

    Santa Clara, CA
    12 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Optimization ML Engineer. Be the first to apply!