Inference Optimization ML Engineer

Rhoda AI

At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale‑up to make generalist robotics a reality. We're looking for an Inference Optimization MLE to help build and operate the systems that make our foundation models run fast and efficiently in production. You'll be responsible for squeezing maximum performance out of large multimodal models, across cloud and on‑robot deployment targets. You will work closely with research and robotics teams to close the gap between training and real‑world deployment. What You'll Do Own inference performance end‑to‑end — diagnose and improve latency, throughput, and efficiency of large foundation models in production Build systematic performance attribution: latency decomposition (compute vs. memory bandwidth vs. I/O), bottleneck identification, and prioritization across model families Apply and develop optimization techniques including quantization, pruning, distillation, operator fusion, and model compilation (e.g., TensorRT, torch.compile, XLA) Optimize attention mechanisms, KV caching, and memory layouts for large multimodal models (vision, video, language, proprioception) Work with kernel‑level tooling (e.g., CUDA, Triton) to identify hotspots and implement or tune custom kernels where needed Build benchmarking and regression detection infrastructure: latency baselines, throughput curves, and automated detection of performance regressions across model versions Collaborate closely with research engineers to translate model innovations into optimized, deployment‑ready implementations What We're Looking For 3+ years of experience in inference optimization, ML systems, or a closely related field Deep hands‑on experience with modern ML stacks (PyTorch required; JAX a plus) Strong understanding of compute, memory bandwidth, and I/O bottlenecks in large model inference Experience with model optimization techniques: quantization (INT8/FP8/AWQ), distillation, pruning, and compilation Familiarity with inference serving frameworks (e.g., Triton, TensorRT, vLLM, TorchServe) Exceptional debugging and measurement ability: turn "inference is slow" into clear bottlenecks, experiments, and validated improvements High ownership mindset and comfort in a fast‑moving environment Nice to Have (But Not Required) GPU kernel or compiler‑level experience (CUDA, Triton, graph capture, operator fusion) Experience with multimodal or video model inference (variable‑length sequences, packing/bucketing) Familiarity with edge/cloud hybrid deployment patterns and on‑robot inference constraints Experience with speculative decoding, continuous batching, or other LLM serving optimizations Background in streaming or low‑latency systems relevant to real‑time robot control Why This Role Direct leverage on research velocity and real‑world robot performance — every efficiency gain you make accelerates model iteration and tightens the loop between model and robot behavior Own the optimization layer that determines how quickly and efficiently our foundation models run in the real world — high ownership, high impact, small elite team #J-18808-Ljbffr Rhoda AI

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Inference Optimization ML Engineer in Mountain View, CA vacancy

ML Engineer, Inference & Optimization
About the Role We are seeking Inference Engineers to accelerate the performance of Pika's AI-driven products. In this highly technical role,... ...industry-leading user experiences at scale. You will design and optimize inference pipelines, implement state-of-the-art acceleration...
Suggested
Work at office
3 days per week
Pika
Palo Alto, CA
17 hours ago
ML Runtime Optimization Engineer
$159.05k - $199.3k
About the role We are looking for a software engineer with deep experience in optimizing ML models and deploying them on production‑grade embedded runtime... ...to optimize efficiency and latency of model inference for compute boards selected by our customers Work on...
Suggested
Full time
For contractors
For subcontractor
Decisive Point
Sunnyvale, CA
1 day ago
Embedded ML Inference Optimization Engineer
Decisive Point is seeking a Software Engineer in Sunnyvale, California, with expertise in optimizing machine learning models for embedded systems. This role involves... ...embedded compute platforms, collaborating with ML engineers, and requires strong software development...
Suggested
Decisive Point
Sunnyvale, CA
3 days ago
ML Systems Engineer: Production-Scale LLM Inference
ScOp Venture Capital is looking for an ML Systems Engineer to optimize LLM inference systems crucial for their AI platform. The role focuses on enhancing performance and efficiency via low-level systems optimization, directly impacting industry leader processes in semiconductor...
Suggested
ScOp Venture Capital
Santa Clara, CA
17 hours ago
AI/ML Engineer - Model Inference
$117.7k - $221.4k
...This operating model reflects how Cola engineers think: build durable intermediate artifacts... ...quality, speed, and cost instead of optimizing any one of them in isolation. The Role... ...the data processing, featurization, and inference foundations that power scalable world...
Suggested
Full time
Local area
Remote work
Work from home
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
9 hours ago
ML Engineer
...how businesses learn from and optimize in‑person customer... ...and deploy production‑grade ML systems with end‑to‑end ownership... ...model training, deployment, inference, and monitoring in production... ...professional experience in ML engineering. Strong programming skills in...
Full time
Catalyst Labs
Sunnyvale, CA
3 days ago
Embedded ML Inference Performance Engineer
$159.05k - $199.3k
Applied Intuition in Sunnyvale, California, seeks a Software Engineer to optimize machine learning models for embedded systems. The role involves extensive experience in ML model optimization and embedded programming, targeting deployment on various compute platforms. Ideal...
applied
Sunnyvale, CA
3 days ago
Edge Inference Optimization Engineer (Local)
Intel Corporation in Santa Clara seeks an Inference Optimization Engineer to optimize AI models for local and edge environments. Candidates should possess over 5 years of experience in software development, proficient in C++ and Python, and comfortable with performance...
Local area
Intel Corporation
Santa Clara, CA
17 hours ago
Inference Optimization Engineer (local / edge runtime)
$170.5k - $315.49k
...future of AI should belong to the people it serves. Role Summary Make models fast on the hardware people actually own. You optimize inference engines (llama.cpp, vLLM) for constrained local and edge environments — GPU/iGPUs, Vulkan backends — not datacenter H100...
Local area
Immediate start
Shift work
Intel
Santa Clara, CA
4 days ago
Edge Inference Engineer: Local AI Latency Optimizer
Intel in Santa Clara, California is seeking a talented individual to optimize inference engines for local environments, impacting the future of AI. Applicants should have a strong background in C++ and software development, with experience in profiling performance issues...
Local area
Intel
Santa Clara, CA
4 days ago
ML Systems Engineer, Physical AI
...teams. 3+ yrs distributed systems / ML infra. About Orbifold AI Orbifold AI... ...Overview We are hiring a Machine Learning Engineer to scale and optimize the ML infrastructure behind our... ...and tune distributed training jobs and inference deployments to maximize GPU/CPU...
Orbifold AI
Palo Alto, CA
2 days ago
Staff ML Performance Engineer (Training Efficiency)
$336.4k - $359k
The role We are looking for a Staff ML Performance Engineer to join our Training Tech team working on optimizing large scale ML jobs to enable scaling our models to the... ...will increase efficiency of training and inference workloads in order to allow Wayve to train larger...
Full time
Wayve
Sunnyvale, CA
2 days ago
ML Engineer
Job Title: ML Engineer What You Will Own End‑to‑End ML Lifecycle across real products: data ingestion, feature design, model selection,... ...and unlimited PTO. Learning budget and a hybrid Bay Area setup optimized for collaboration without dogma. Work that compounds: systems...
MetAntz
Palo Alto, CA
1 day ago
CV and ML Engineer
$184.7k - $324.8k
...of computer vision and machine learning engineers building real-time 3D perception and spatial... ...3D computer vision and on-device model optimization. In this role, you will help design,... ...Proficiency in C++ or experience integrating ML models into performance-critical systems...
Relocation
Apple
Sunnyvale, CA
2 days ago
Distributed Machine Learning Engineer
$150k
...researchers, data scientists, and engineers, tackling the most fundamental... .... The Role The Distributed ML Engineer will play a role at the forefront of optimizing performance for the machine learning... ..., especially at training and inference, and support the team to...
Work experience placement
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
1 day ago
ML Engineer, Personalization & Recommendations
Pantera Capital is looking for experienced ML engineers to design and optimize the recommendation systems that enhance core experiences on Perplexity. This role involves building user models and decision layers that improve personalization and ranking. The ideal candidate...
Pantera Capital
Palo Alto, CA
1 day ago
Lead AI/ML Engineer
$170k - $190k
...interruption handling, streaming inference, and audio quality, and can translate... ...inference pipelines Design and optimize low-latency inference workflows for... ...leadership within the team, mentoring engineers and promoting best practices in ML engineering Partner with product...
Remote work
ASAPP
Mountain View, CA
1 day ago
ML Engineer - Real-Time Audio AI & Production Systems
...A growing AI technology startup is seeking an ML Engineer to design and deploy production-grade systems. The role involves using Python and collaborating with teams to optimize customer interactions through advanced AI applications. Candidates should have a degree in...
Catalyst Labs, LLC
Mountain View, CA
2 days ago
Sr. Machine Learning Engineer
...world running. Our Team's Vision: Our Engineering team is shaping the future of cybersecurity... ...: Asynchronous Systems: Architect and optimize high‑throughput, event‑driven systems... ...TensorRT‑LLM) or managing proprietary model inference endpoints. This position involves...
Immediate start
Illumio
Sunnyvale, CA
2 days ago
ML Infra Engineer: Scale Ray + PyTorch for Multimodal AI
Orbifold AI is seeking a Machine Learning Engineer in Palo Alto, California, to build and optimize the infrastructure for multimodal data processing. You will play a key role in scaling our Ray and PyTorch-based systems utilized in frontier robotics. The ideal candidate...
Orbifold AI
Palo Alto, CA
2 days ago
Real-Time 3D Vision ML Engineer
$184.7k - $324.8k
Apple Inc. is seeking a Machine Learning Engineer to join their team in Sunnyvale, California. The ideal candidate will work on cutting... ...on developing algorithms for 3D spatial understanding and optimizing deep learning models. This role offers a salary range of $184,...
Apple
Sunnyvale, CA
2 days ago
LLM Inference Engineer - Scalable GPU Serving
Hippocratic AI is looking for an experienced LLM Inference Engineer based in Palo Alto, CA, to optimize their large language model (LLM) serving infrastructure. You'll design multi-node serving architectures and implement advanced optimization techniques while collaborating...
Hippocratic AI
Palo Alto, CA
2 days ago
LLM Inference Engineer
...Polaris constellation, resulting in a system with over 99.9% accuracy. About the Role We're seeking an experienced LLM Inference Engineer to optimize our large language model (LLM) serving infrastructure. The ideal candidate has: Extensive hands‑on experience with...
Dormont Manufacturing Co
Palo Alto, CA
2 days ago
Senior ML Validation Engineer
$144.7k - $261.3k
...Senior ML Validation Research Engineer The Senior ML Validation Research Engineer will lead applied machine learning research focused on improving... ...equivalent simulation platforms. Knowledge of Bayesian ML, causal inference, and sequential testing. Experience with digital twin...
Local area
Flexible hours
Dormont Manufacturing Company
Sunnyvale, CA
1 day ago
ML Systems Engineer
...leverages cutting‑edge generative AI to assist engineers in RTL design, simulation, and... ...chips. Position Overview We are seeking an ML Systems Engineer to optimize the performance and efficiency of large language model inference powering our agentic AI platform. This...
ScOp Venture Capital
Santa Clara, CA
17 hours ago
Machine Learning Infrastructure Engineer
$150k
...researchers, data scientists, and engineers, tackling the most... ...re looking for a distributed ML infrastructure engineer to help... ...Horovod) Implement distributed optimizers from mathematical specs Build... ...infrastructure, and/or distributed inference optimization team Experience...
Flexible hours
Institute of Foundation Models
Sunnyvale, CA
1 day ago
Staff ML Engineer - Personalization & Recommendations
$220k - $405k
Perplexity in Palo Alto is seeking experienced ML engineers to design and optimize recommendation systems that enhance core user experiences. This role involves owning personalization and ranking, building user models, and shaping technical directions in a collaborative...
Perplexity
Palo Alto, CA
1 day ago
Sr. Machine Learning Engineer, Infrastructure Mountain View, CA, USA View role
$210.3k - $315.5k
## Sr. Machine Learning Engineer, InfrastructureApplylocations: Mountain View, CA, USA:... ...operate the infrastructure that brings ML models from training into production, ensuring... ...feature serving, model versioning, and inference optimization**What we're looking for*** Experience...
Work at office
Remote work
Worldwide
Relocation package
Unity Technologies
Mountain View, CA
1 day ago
Senior Staff ML/NLP Engineer, Assistant
...collaboration with the broader Assistant engineering team, you will be responsible for... ...strategy & roadmaps, building NLP models and ML/DL engineering systems to deliver function... ...for i18n locales, leveraging and optimizing state-of-the-art language models and deploying...
Gravity Engineering Services Pvt Ltd.
Mountain View, CA
2 days ago
Staff Multimodal ML Engineer - Mid-Training
$180k
A leading technology company is seeking expert engineers for a role focused on multimodal mid-training data. Candidates... ...will design algorithms to enhance model intelligence and optimize data mixtures. Expertise in ML and familiarity with large model scaling are essential....
Relocation
Pantera Capital
Palo Alto, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Optimization ML Engineer. Be the first to apply!