Inference Optimization ML Engineer
Rhoda AI
At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale‑up to make generalist robotics a reality. We're looking for an Inference Optimization MLE to help build and operate the systems that make our foundation models run fast and efficiently in production. You'll be responsible for squeezing maximum performance out of large multimodal models, across cloud and on‑robot deployment targets. You will work closely with research and robotics teams to close the gap between training and real‑world deployment. What You'll Do Own inference performance end‑to‑end — diagnose and improve latency, throughput, and efficiency of large foundation models in production Build systematic performance attribution: latency decomposition (compute vs. memory bandwidth vs. I/O), bottleneck identification, and prioritization across model families Apply and develop optimization techniques including quantization, pruning, distillation, operator fusion, and model compilation (e.g., TensorRT, torch.compile, XLA) Optimize attention mechanisms, KV caching, and memory layouts for large multimodal models (vision, video, language, proprioception) Work with kernel‑level tooling (e.g., CUDA, Triton) to identify hotspots and implement or tune custom kernels where needed Build benchmarking and regression detection infrastructure: latency baselines, throughput curves, and automated detection of performance regressions across model versions Collaborate closely with research engineers to translate model innovations into optimized, deployment‑ready implementations What We're Looking For 3+ years of experience in inference optimization, ML systems, or a closely related field Deep hands‑on experience with modern ML stacks (PyTorch required; JAX a plus) Strong understanding of compute, memory bandwidth, and I/O bottlenecks in large model inference Experience with model optimization techniques: quantization (INT8/FP8/AWQ), distillation, pruning, and compilation Familiarity with inference serving frameworks (e.g., Triton, TensorRT, vLLM, TorchServe) Exceptional debugging and measurement ability: turn "inference is slow" into clear bottlenecks, experiments, and validated improvements High ownership mindset and comfort in a fast‑moving environment Nice to Have (But Not Required) GPU kernel or compiler‑level experience (CUDA, Triton, graph capture, operator fusion) Experience with multimodal or video model inference (variable‑length sequences, packing/bucketing) Familiarity with edge/cloud hybrid deployment patterns and on‑robot inference constraints Experience with speculative decoding, continuous batching, or other LLM serving optimizations Background in streaming or low‑latency systems relevant to real‑time robot control Why This Role Direct leverage on research velocity and real‑world robot performance — every efficiency gain you make accelerates model iteration and tightens the loop between model and robot behavior Own the optimization layer that determines how quickly and efficiently our foundation models run in the real world — high ownership, high impact, small elite team #J-18808-Ljbffr Rhoda AI
$278.1k - $347.6k
...within that runtime. As our Principal Engineer for On-Device AI Inference & Systems, you will be the foremost... ...leaves research, through export, optimization, and kernel-level tuning, to a shipped... ...and own the integration between the ML runtime and the game engine: real-time...SuggestedWork at officeWorldwideRelocation package- Decisive Point is seeking a Software Engineer in Sunnyvale, California, with expertise in optimizing machine learning models for embedded systems. This role involves... ...embedded compute platforms, collaborating with ML engineers, and requires strong software development...Suggested
- A tech company in Mountain View seeks talented engineers for a role emphasizing high-performance systems, inference optimization, and model acceleration. You will thrive in ambiguity, tackle unclear problems, and design impactful solutions. The position offers a competitive...Suggested
- Rhoda ai in Palo Alto is seeking an Inference Infrastructure Engineer to help power their model deployment stack for humanoid robots. This role... ...focus on Kubernetes deployment pipelines and resource optimization across GPU clusters, you will play a crucial role in scaling...Suggested
- ...California, is seeking an experienced ML Framework Engineer to join their Server ML Frameworks... ...built server hardware for distributed inference. The ideal candidate will have a strong... ..., with responsibilities including optimizing ML frameworks and collaborating on GPU...Suggested
$128.7k - $261.3k
The Model Deployment & Inference Solutions team in GM AV deploys machine... ...is two-fold: build the ML deployment platform that makes... ...rollouts fast and predictable, and optimize models so they meet the real-... ...equivalent) as part of your engineering workflow. Experience...Flexible hours- ...Mountain View is seeking a Machine Learning Engineer to build and optimize the infrastructure for its Intelligence... ...The role involves designing and deploying ML models for multimodal data understanding, optimizing inference pipelines, and collaborating with teams to...
$155.42k - $395.9k
Job Description About the Team: The ML Inference Platform is part of the AV ML Infrastructure... ...innovation and feature development by optimizing for high-priority, ML-centric use cases... ...are seeking a Senior ML Infrastructure engineer to help build and scale robust...Local areaRemote workRelocationRelocation packageFlexible hours- ...is hiring a Machine Learning Systems Engineer in Cupertino, California. You will collaborate... ...with Siri modeling teams to optimize model training and inference on Apple's custom Silicon. The ideal candidate has strong experience in ML models, with proficiency in Python and...
$152k - $287.5k
...Gruppe is seeking a Senior Machine Learning Applications and Compiler Engineer in Santa Clara, California. This role involves developing algorithms for their LPX inference and compiler stack, optimizing the performance of neural network workloads on NVIDIA platforms....- Modular Mailing Systems, Inc. is seeking an experienced Performance Engineer to optimize LLM inference on their cloud platform. This pivotal role involves building optimization infrastructures and collaborating with teams to enhance performance across GPUs and ASICs. The...Remote jobFlexible hours
$198k - $286k
...how we work and what we value. About the role At Modular, we optimize inference from kernel to cloud on one unified stack. We are building a... ...apply the latest optimizations across kernels, the inference engine, and distributed systems so that customer workloads stay on the...Remote jobWork experience placementWork at officeLocal areaFlexible hours- A robotics innovation company focused on home robotics seeks a Software Engineer to develop machine learning infrastructure. You will own training systems, optimize data pipelines, and work on real-time robot control systems. Ideal candidates have a strong software engineering...
$195k - $298k
...relocation assistance. About the Team The ML Inference Platform is part of the AI Compute... ...products. We enable rapid innovation by optimizing for high‑priority, ML‑centric use cases... ...are seeking a Staff ML Infrastructure Engineer to build and scale robust compute platforms...Local areaRelocation packageFlexible hours- A leading automotive company is seeking a Staff ML Infrastructure Engineer to build robust compute platforms for machine learning workflows in... ...strong coding skills in Go, Python or C++, and expertise in ML inference. The position offers a hybrid work model and competitive...
- jobr.pro is seeking a Staff Machine Learning Engineer to join our Vector Bidding Science team in Mountain View, California. In this critical... ...vision and develop advanced bidding systems using AI and optimization frameworks. You will design algorithms, analyze marketplace...
- ...how businesses learn from and optimize in‑person customer... ...and deploy production‑grade ML systems with end‑to‑end ownership... ...model training, deployment, inference, and monitoring in production... ...professional experience in ML engineering. Strong programming skills in...Full time
- ...thinking Principal Machine Learning Engineer to lead the design and... ...Machine Learning training and inference systems. What you’ll do Architect robust, modular ML pipelines for model experimentation... ...development, evaluation, and deployment. Optimize models for latency, memory, and...
$170.5k - $315.49k
...of AI should belong to the people it serves Role Summary Make models fast on the hardware people actually own. You optimize inference engines (llama.cpp, vLLM) for constrained local and edge environments - GPU/iGPUs, Vulkan backends - not datacenter H100 environment...InternshipLocal areaImmediate startShift work$170.5k - $315.49k
## Inference Optimization Engineer (local / edge runtime)Applylocations: US, California, Santa Clara: US, Oregon, Hillsboro: US, California, Folsom: US, Arizona, Phoenixtime type: Full timeposted on: Posted Yesterdayjob requisition id: JR0284871# **Job Details:**## Job...InternshipLocal areaImmediate startShift work$19 - $65 per hour
PlusAI is seeking a Machine Learning Infrastructure Engineer Intern to work on high-performance kernels for BEV model training. In this... ..., and Triton. This internship also explores the use of LLMs to optimize code generation and performance profiling. The hourly pay ranges...Hourly payInternship$158k - $241.9k
...a global scale. Role: As a Senior AI/ML Engineer within the Onboard Embodied AI organization... ...-to-end solutions capable of real-time inference and robust autonomous driving... ...training methodologies, and inference optimization strategies suited for real-time onboard...Relocation packageFlexible hours$147.4k - $272.1k
...search and data platform, and the primary inference platform that enable next generation... ...accomplished and driven Machine Learning Engineer who has a robust understanding of Large... ...inference stack, ensuring performance optimization and alignment with broader business...Relocation$181.1k - $318.4k
...commitment to environmental sustainability and optimal resource utilization. This team plays a... ...at scale. This team also focuses on ML-driven forecasting, capacity planning, resource... ...scale services. As a Sr. ML Optimization Engineer, you will work at the intersection of...Relocation$158k - $241.9k
...on a global scale. Role As a Senior AI/ML Engineer within the Onboard Embodied AI... ...‑to‑end solutions capable of real‑time inference and robust autonomous driving performance... ...training methodologies, and inference optimization strategies suited for real‑time onboard...Local areaRelocation packageFlexible hours$250k - $350k
...model innovation and systems engineering with a design-minded product... ...Machine Learning training and inference systems. You'll work cross... ...Architect robust, modular ML pipelines for model experimentation... ..., and deployment. Optimize models for latency, memory, and...- Israelvcforum is looking for a Senior ML Infrastructure Engineer in Mountain View, California. This position aims to build and scale robust platforms for ML inference workflows supporting GM’s AI efforts. You will collaborate with ML engineers and researchers to implement...Remote job
- General Motors is seeking a Senior ML Infrastructure Engineer to build and scale a robust platform for machine learning inference workflows. You will design backend software components, collaborate with ML engineers, and lead initiatives across GM's ML ecosystem. With...Remote job
- NVIDIA is seeking a Senior DL Algorithms Engineer to optimize LLM/Omni models and enhance performance across its software stack. The ideal... ...and 3+ years of experience in deep learning, specifically in inference. This role involves profiling, analyzing bottlenecks, and...
- Intel Corporation in Santa Clara seeks an Inference Optimization Engineer to optimize AI models for local and edge environments. Candidates should possess over 5 years of experience in software development, proficient in C++ and Python, and comfortable with performance...Local area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Inference Optimization ML Engineer. Be the first to apply!
- machine learning engineer Mountain View, CA
- senior ml engineer Mountain View, CA
- computer vision machine learning engineer Mountain View, CA
- ai ml engineer Mountain View, CA
- machine learning software engineer Mountain View, CA
- machine learning ai engineer Mountain View, CA
- machine learning scientist Mountain View, CA
- machine learning remote Mountain View, CA
- machine learning Mountain View, CA
- machine learning researcher Mountain View, CA

