Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Inference Optimization ML Engineer

Rhoda AI

At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale‑up to make generalist robotics a reality. We're looking for an Inference Optimization MLE to help build and operate the systems that make our foundation models run fast and efficiently in production. You'll be responsible for squeezing maximum performance out of large multimodal models, across cloud and on‑robot deployment targets. You will work closely with research and robotics teams to close the gap between training and real‑world deployment. What You'll Do Own inference performance end‑to‑end — diagnose and improve latency, throughput, and efficiency of large foundation models in production Build systematic performance attribution: latency decomposition (compute vs. memory bandwidth vs. I/O), bottleneck identification, and prioritization across model families Apply and develop optimization techniques including quantization, pruning, distillation, operator fusion, and model compilation (e.g., TensorRT, torch.compile, XLA) Optimize attention mechanisms, KV caching, and memory layouts for large multimodal models (vision, video, language, proprioception) Work with kernel‑level tooling (e.g., CUDA, Triton) to identify hotspots and implement or tune custom kernels where needed Build benchmarking and regression detection infrastructure: latency baselines, throughput curves, and automated detection of performance regressions across model versions Collaborate closely with research engineers to translate model innovations into optimized, deployment‑ready implementations What We're Looking For 3+ years of experience in inference optimization, ML systems, or a closely related field Deep hands‑on experience with modern ML stacks (PyTorch required; JAX a plus) Strong understanding of compute, memory bandwidth, and I/O bottlenecks in large model inference Experience with model optimization techniques: quantization (INT8/FP8/AWQ), distillation, pruning, and compilation Familiarity with inference serving frameworks (e.g., Triton, TensorRT, vLLM, TorchServe) Exceptional debugging and measurement ability: turn "inference is slow" into clear bottlenecks, experiments, and validated improvements High ownership mindset and comfort in a fast‑moving environment Nice to Have (But Not Required) GPU kernel or compiler‑level experience (CUDA, Triton, graph capture, operator fusion) Experience with multimodal or video model inference (variable‑length sequences, packing/bucketing) Familiarity with edge/cloud hybrid deployment patterns and on‑robot inference constraints Experience with speculative decoding, continuous batching, or other LLM serving optimizations Background in streaming or low‑latency systems relevant to real‑time robot control Why This Role Direct leverage on research velocity and real‑world robot performance — every efficiency gain you make accelerates model iteration and tightens the loop between model and robot behavior Own the optimization layer that determines how quickly and efficiently our foundation models run in the real world — high ownership, high impact, small elite team #J-18808-Ljbffr Rhoda AI

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Inference Optimization ML Engineer in Mountain View, CA vacancy
  • $278.1k - $347.6k

     ...within that runtime. As our Principal Engineer for On-Device AI Inference & Systems, you will be the foremost...  ...leaves research, through export, optimization, and kernel-level tuning, to a shipped...  ...and own the integration between the ML runtime and the game engine: real-time... 
    Suggested
    Work at office
    Worldwide
    Relocation package

    Unity

    Mountain View, CA
    4 days ago
  • Decisive Point is seeking a Software Engineer in Sunnyvale, California, with expertise in optimizing machine learning models for embedded systems. This role involves...  ...embedded compute platforms, collaborating with ML engineers, and requires strong software development... 
    Suggested

    Decisive Point

    Sunnyvale, CA
    2 days ago
  • A tech company in Mountain View seeks talented engineers for a role emphasizing high-performance systems, inference optimization, and model acceleration. You will thrive in ambiguity, tackle unclear problems, and design impactful solutions. The position offers a competitive... 
    Suggested

    Inworld

    Mountain View, CA
    4 days ago
  • Rhoda ai in Palo Alto is seeking an Inference Infrastructure Engineer to help power their model deployment stack for humanoid robots. This role...  ...focus on Kubernetes deployment pipelines and resource optimization across GPU clusters, you will play a crucial role in scaling... 
    Suggested

    Rhoda ai

    Palo Alto, CA
    9 hours ago
  •  ...California, is seeking an experienced ML Framework Engineer to join their Server ML Frameworks...  ...built server hardware for distributed inference. The ideal candidate will have a strong...  ..., with responsibilities including optimizing ML frameworks and collaborating on GPU... 
    Suggested

    Apple

    Cupertino, CA
    4 days ago
  • $128.7k - $261.3k

    The Model Deployment & Inference Solutions team in GM AV deploys machine...  ...is two-fold: build the ML deployment platform that makes...  ...rollouts fast and predictable, and optimize models so they meet the real-...  ...equivalent) as part of your engineering workflow. Experience... 
    Flexible hours

    General Motors

    Sunnyvale, CA
    2 days ago
  •  ...Mountain View is seeking a Machine Learning Engineer to build and optimize the infrastructure for its Intelligence...  ...The role involves designing and deploying ML models for multimodal data understanding, optimizing inference pipelines, and collaborating with teams to... 

    Corvic

    Mountain View, CA
    2 days ago
  • $155.42k - $395.9k

    Job Description About the Team: The ML Inference Platform is part of the AV ML Infrastructure...  ...innovation and feature development by optimizing for high-priority, ML-centric use cases...  ...are seeking a Senior ML Infrastructure engineer to help build and scale robust... 
    Local area
    Remote work
    Relocation
    Relocation package
    Flexible hours

    Israelvcforum

    Mountain View, CA
    4 days ago
  •  ...is hiring a Machine Learning Systems Engineer in Cupertino, California. You will collaborate...  ...with Siri modeling teams to optimize model training and inference on Apple's custom Silicon. The ideal candidate has strong experience in ML models, with proficiency in Python and... 

    Apple

    Cupertino, CA
    1 day ago
  • $152k - $287.5k

     ...Gruppe is seeking a Senior Machine Learning Applications and Compiler Engineer in Santa Clara, California. This role involves developing algorithms for their LPX inference and compiler stack, optimizing the performance of neural network workloads on NVIDIA platforms.... 

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • Modular Mailing Systems, Inc. is seeking an experienced Performance Engineer to optimize LLM inference on their cloud platform. This pivotal role involves building optimization infrastructures and collaborating with teams to enhance performance across GPUs and ASICs. The... 
    Remote job
    Flexible hours

    Modular Mailing Systems, Inc.

    Los Altos, CA
    3 days ago
  • $198k - $286k

     ...how we work and what we value. About the role At Modular, we optimize inference from kernel to cloud on one unified stack. We are building a...  ...apply the latest optimizations across kernels, the inference engine, and distributed systems so that customer workloads stay on the... 
    Remote job
    Work experience placement
    Work at office
    Local area
    Flexible hours

    Modular Mailing Systems, Inc.

    Los Altos, CA
    3 days ago
  • A robotics innovation company focused on home robotics seeks a Software Engineer to develop machine learning infrastructure. You will own training systems, optimize data pipelines, and work on real-time robot control systems. Ideal candidates have a strong software engineering... 

    Sunday

    Mountain View, CA
    3 days ago
  • $195k - $298k

     ...relocation assistance. About the Team The ML Inference Platform is part of the AI Compute...  ...products. We enable rapid innovation by optimizing for high‑priority, ML‑centric use cases...  ...are seeking a Staff ML Infrastructure Engineer to build and scale robust compute platforms... 
    Local area
    Relocation package
    Flexible hours

    Israelvcforum

    Sunnyvale, CA
    1 day ago
  • A leading automotive company is seeking a Staff ML Infrastructure Engineer to build robust compute platforms for machine learning workflows in...  ...strong coding skills in Go, Python or C++, and expertise in ML inference. The position offers a hybrid work model and competitive... 

    General Motors

    Sunnyvale, CA
    9 hours ago
  • jobr.pro is seeking a Staff Machine Learning Engineer to join our Vector Bidding Science team in Mountain View, California. In this critical...  ...vision and develop advanced bidding systems using AI and optimization frameworks. You will design algorithms, analyze marketplace... 

    Jobr

    Mountain View, CA
    4 days ago
  •  ...how businesses learn from and optimize in‑person customer...  ...and deploy production‑grade ML systems with end‑to‑end ownership...  ...model training, deployment, inference, and monitoring in production...  ...professional experience in ML engineering. Strong programming skills in... 
    Full time

    Catalyst Labs, LLC

    Sunnyvale, CA
    1 day ago
  •  ...thinking Principal Machine Learning Engineer to lead the design and...  ...Machine Learning training and inference systems. What you’ll do Architect robust, modular ML pipelines for model experimentation...  ...development, evaluation, and deployment. Optimize models for latency, memory, and... 

    Tensec

    Palo Alto, CA
    3 days ago
  • $170.5k - $315.49k

     ...of AI should belong to the people it serves Role Summary Make models fast on the hardware people actually own. You optimize inference engines (llama.cpp, vLLM) for constrained local and edge environments - GPU/iGPUs, Vulkan backends - not datacenter H100 environment... 
    Internship
    Local area
    Immediate start
    Shift work

    Intel

    Santa Clara, CA
    2 days ago
  • $170.5k - $315.49k

    ## Inference Optimization Engineer (local / edge runtime)Applylocations: US, California, Santa Clara: US, Oregon, Hillsboro: US, California, Folsom: US, Arizona, Phoenixtime type: Full timeposted on: Posted Yesterdayjob requisition id: JR0284871# **Job Details:**## Job... 
    Internship
    Local area
    Immediate start
    Shift work

    Intel

    Santa Clara, CA
    3 days ago
  • $19 - $65 per hour

    PlusAI is seeking a Machine Learning Infrastructure Engineer Intern to work on high-performance kernels for BEV model training. In this...  ..., and Triton. This internship also explores the use of LLMs to optimize code generation and performance profiling. The hourly pay ranges... 
    Hourly pay
    Internship

    PlusAI

    Santa Clara, CA
    3 days ago
  • $158k - $241.9k

     ...a global scale. Role: As a Senior AI/ML Engineer within the Onboard Embodied AI organization...  ...-to-end solutions capable of real-time inference and robust autonomous driving...  ...training methodologies, and inference optimization strategies suited for real-time onboard... 
    Relocation package
    Flexible hours

    General Motors

    Mountain View, CA
    4 days ago
  • $147.4k - $272.1k

     ...search and data platform, and the primary inference platform that enable next generation...  ...accomplished and driven Machine Learning Engineer who has a robust understanding of Large...  ...inference stack, ensuring performance optimization and alignment with broader business... 
    Relocation

    Apple

    Cupertino, CA
    5 days ago
  • $181.1k - $318.4k

     ...commitment to environmental sustainability and optimal resource utilization. This team plays a...  ...at scale. This team also focuses on ML-driven forecasting, capacity planning, resource...  ...scale services. As a Sr. ML Optimization Engineer, you will work at the intersection of... 
    Relocation

    Apple

    Cupertino, CA
    3 days ago
  • $158k - $241.9k

     ...on a global scale. Role As a Senior AI/ML Engineer within the Onboard Embodied AI...  ...‑to‑end solutions capable of real‑time inference and robust autonomous driving performance...  ...training methodologies, and inference optimization strategies suited for real‑time onboard... 
    Local area
    Relocation package
    Flexible hours

    Israelvcforum

    Mountain View, CA
    4 days ago
  • $250k - $350k

     ...model innovation and systems engineering with a design-minded product...  ...Machine Learning training and inference systems. You'll work cross...  ...Architect robust, modular ML pipelines for model experimentation...  ..., and deployment. Optimize models for latency, memory, and... 

    Sanas

    Palo Alto, CA
    1 day ago
  • Israelvcforum is looking for a Senior ML Infrastructure Engineer in Mountain View, California. This position aims to build and scale robust platforms for ML inference workflows supporting GM’s AI efforts. You will collaborate with ML engineers and researchers to implement... 
    Remote job

    Israelvcforum

    Mountain View, CA
    4 days ago
  • General Motors is seeking a Senior ML Infrastructure Engineer to build and scale a robust platform for machine learning inference workflows. You will design backend software components, collaborate with ML engineers, and lead initiatives across GM's ML ecosystem. With... 
    Remote job

    General Motors

    Sunnyvale, CA
    1 day ago
  • NVIDIA is seeking a Senior DL Algorithms Engineer to optimize LLM/Omni models and enhance performance across its software stack. The ideal...  ...and 3+ years of experience in deep learning, specifically in inference. This role involves profiling, analyzing bottlenecks, and... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • Intel Corporation in Santa Clara seeks an Inference Optimization Engineer to optimize AI models for local and edge environments. Candidates should possess over 5 years of experience in software development, proficient in C++ and Python, and comfortable with performance... 
    Local area

    Intel Corporation

    Santa Clara, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Optimization ML Engineer. Be the first to apply!