Senior AI Inference Engineer - Model Optimization & Deployment

Zoox Inc.

Job Description

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.

As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.

In this role, you will:

Optimize large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed-precision inference frameworks, and parameter-efficient fine-tuning (LoRA, QLoRA).
Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment.
Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks and compiled edge binaries.
Develop and optimize custom ML OPs and TensorRT Plugins with efficient CUDA kernels to minimize latency and maximize memory bandwidth on AI accelerators.
Write production-level, low latency, and memory-safe C++ and CUDA code for real-time inference on vehicle systems.

Qualifications:

Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference frameworks (INT8, FP8, FP4, BF16/FP16).
Proven experience optimizing large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs/VLAs) utilizing Efficient Attention mechanisms (e.g., FlashAttention, Linear Attention), KV-cache optimization (e.g., PagedAttention) and Speculative Decoding.
Extensive experience with model conversion/compilation pipelines (e.g., ONNX, TensorRT, torch.compile) and performing rigorous latency benchmark and model quality parity valuation.
Proficiency in low-level programming for AI accelerators, specifically developing and optimizing custom ML OPs and TensorRT Plugins with efficient CUDA kernel implementations.
Production-level C++ (14/17/20) and Python programming skills, with experience developing concurrent, memory-safe, real-time inference code for edge devices.

Bonus Qualifications:

Familiarity with SOTA autonomous driving perception algorithms (temporal 3D object detection, BEV, 3D Occupancy Networks) and multi-modal sensor processing (Vision, LiDAR, Radar).
Experience with distributed training pipelines and model/tensor parallelism (PyTorch Distributed, Ray, DeepSpeed, Megatron-LM) and runtime efficiency optimization for GPU clusters.
Experience with end-to-end autonomous driving paradigms (VLM/VLA models, Foundation models) and edge deployment technologies (e.g., TensorRT-LLM).

Base Salary Range

There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. A sign-on bonus may be offered as part of the compensation package. The listed range applies only to the base salary. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. The salary range listed in this posting is representative of the range of levels Zoox is considering for this position.

Zoox also offers a comprehensive package of benefits, including paid time off (e.g. sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance.

About Zoox

Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We’re looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.

Accommodations

If you need an accommodation to participate in the application or interview process please reach out to View email address on ziprecruiter.com or your assigned recruiter.

A Final Note:

You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Apply

Vacancy posted 8 days ago

Similar jobs that could be interesting for youBased on the Senior AI Inference Engineer - Model Optimization & Deployment in Foster, CA vacancy

Senior AI Inference Engineer - Model Optimization & Deployment
$242k - $290k
...Model Optimization & Deployment Engineer The Perception team is pioneering the development of a multi-modality... ...kernels, and build highly concurrent inference code to ensure real-time,... ...latency and maximize memory bandwidth on AI accelerators. Write production-level...
Senior
Temporary work
Relocation package
Zoox
San Mateo, CA
1 day ago
Senior AI Infrastructure Engineer - Computer Vision
...About Obvio AI Each year, more than... ...are preventable. We deploy solar-powered, AI-... ..., preprocessing, inference, validation, and delivery... ...end. Build the model serving and... ...inference reliably. Optimize for GPU... ...downtime. Set the engineering standard. This is...
Senior
Local area
Obvio
San Carlos, CA
4 days ago
Senior/Staff Software Engineer, ML Performance Optimization
...the early stages of deploying our robotaxis on... ...-scale Foundation models, VLMs, and VLAs to... ...our ML Performance Optimization initiatives and... ...of strong software engineers and act as a force... ...edge ML Training OR Inference performance... ...artificial intelligence (AI) tools to support...
Senior
Zoox
Foster, CA
4 days ago
AI Inference Engineer
$110k - $270k
...architecture. Quadric's co-optimized software and... ...network (NN) inference workloads in a... ...Role The AI Inference Engineer in Quadric is the... ...world of AI/LLM models and Quadric unique... ...the model deployment for efficient inference... .... This senior technical role demands...
Suggested
Work at office
Local area
Immediate start
Flexible hours
2 days per week
quadric.io
Burlingame, CA
4 days ago
AI Inference Engineer Intern - Model Pruning
$45 - $60 per hour
...GPNPU) architecture. Quadric's co-optimized software and hardware is... ...targeted to run neural network (NN) inference workloads in a wide variety of... ...site. Responsibilities: Model pruning: Prune the model to... ...alongside industry experts in AI and semiconductor technology,...
Suggested
Hourly pay
Temporary work
Internship
Work at office
Relocation
quadric, Inc
Burlingame, CA
26 days ago
Senior AI Engineer
$152.7k - $249.2k
...Overview We're looking for a Senior AI Engineer to help bring pragmatic,... ...Identify, prototype, and deploy AI/ML solutions into production... ...core ML/LLM infrastructure (model gateways, prompt/agent... ...for ML, reproducible training/inference pipelines. Experience building...
Senior
Temporary work
Joby Aviation
San Carlos, CA
2 days ago
Senior Principal AI Agent / ML Software Engineer (OCI)
$96.8k - $251.6k
Senior Principal AI Agent / ML Software Engineer (OCI) Redwood City, CA; Seattle, WA,... ...workflows, scalable inference infrastructure,... ...inference systems, model serving, AI workflow... ...services optimized for low latency, high... ...reviews, test strategy, deployment automation,...
Senior
Temporary work
Flexible hours
Ll Oefentherapie
Redwood City, CA
1 day ago
Physical AI Engineering Consultant - Senior - Consulting - Open Location
$105.8k - $174.8k
...and Decision Science – AI Native Engineering Physical AI Engineering Consultant, Senior Consultant The... ...ensure data integrity and optimize learning processes,... ...improve high-performance models. This position may... ...Jira to develop and deploy analytical solutions with...
Senior
Full time
Work experience placement
Summer holiday
Flexible hours
EY
San Mateo, CA
1 day ago
Software Engineer, ML Performance Optimization
...the early stages of deploying our robotaxis on... ...-scale Foundation models, VLMs, and VLAs to... ...our ML Performance Optimization initiatives and... ...of strong software engineers and act as a force... ...edge ML Training OR Inference performance... ...artificial intelligence (AI) tools to support...
Temporary work
Relocation package
Zoox
Foster, CA
20 days ago
Software Engineer, AI Inference
$100k - $300k
Company Overview At Skild AI, we are building the... ...for the widespread deployment of robots within... ...looking for a Software Engineer to work at the forefront... ...our cutting-edge AI models, enhancing the... ...will be responsible for optimizing AI inference processes from lightweight...
Skild AI
San Mateo, CA
4 days ago
AI Engineering - Power & Utilities Sector - Senior Manager - Consulting
$144k - $329.1k
...and Decision Science – AI Native Engineering AI Engineering, Senior Manager, Consultant Power... ...clients define and deploy Generative AI (GenAI) and... ...frameworks, AI operating models, defining solution architectures... ..., and restoration optimization Vegetation management...
Senior
Summer holiday
Flexible hours
EY
San Mateo, CA
5 days ago
Senior AI Engineer - Cortex Code
$128k - $184k
...usher in this new era, we seek AI-native thinkers across every... .... You'll own the full AI engineering lifecycle: design, prompt/tool engineering, evals, deployment, measurement, and optimization. You'll work with a small, high-powered modeling and infrastructure team. What...
Senior
Flexible hours
Snowflake Computing
Menlo Park, CA
7 days ago
Senior AI Engineer I
$280.71k
...Our scientists, engineers, sales executives,... ...Commercial Data & Applied AI team. We are... ...highly motivated Senior AI Engineer I to... ...automation solutions that optimize operations across... ...: * Build and deploy LLM-powered... ...databases, and embedding models * Familiarity...
Senior
Full time
Worldwide
2 days per week
3 days per week
BillionToOne
Menlo Park, CA
3 days ago
AI Inference Engineer
$110k - $270k
...architecture. Quadric's co-optimized software and... ...network (NN) inference workloads in a... ...Role: The AI Inference Engineer in Quadric is the... ...world of AI/LLM models and Quadric unique... ...the model deployment for efficient inference... .... This senior technical role demands...
Full time
Temporary work
Work from home
quadric, Inc
Burlingame, CA
more than 2 months ago
Senior/Staff AI Algorithms Engineer
$170k - $225k
...We're looking for an Senior/Staff AI Algorithms Engineer with deep foundations in... ...reinforcement learning, and optimization-and a strong drive to apply... ...technical ideas into systems deployed at scale. What You'll... ...approaches, balancing modeling intuition with real-world...
Senior
Worldwide
Dexterity
Redwood City, CA
1 day ago
Senior AI/ML Engineer LLM & Agent Stack
...Senior AI/ML Engineer — LLM & Agent Stack Every production AI system, whether it's powering customer... ...infrastructure. A way to route between models. A way to manage tools and integrate... ..., orchestration, and governance. AI Deploy is the compute layer, a Kubernetes-based...
Senior
TrueFoundry
San Mateo, CA
1 day ago
Senior Manager (AI Engineering)
$265k - $300k
...Clinical Intelligence. Our AI-native products and... ...every day As a Senior Engineering Manager for our AI Engineering... ...ML data pipeline, to model and agent development,... ...primarily perform inference, summarization,... ...operations, and experience deploying high-concurrency,...
Senior
Work at office
Suki
Redwood City, CA
4 days ago
Senior Full Stack Software Engineer
...We are a team of engineers, scientists, and domain... ...are developing novel AI solutions to address... ...from first prototype to deployment, then you could be a... ...engineers to productionize models. You will be part of... ...software to enable inference, optimization, and other complex...
Senior
Anori
San Mateo, CA
4 days ago
Senior Embedded Engineer
...Senior Embedded Engineer Each year, more than 40,000 people... ...are preventable. We deploy solar-powered, AI-assisted cameras to... ...the development and optimization of software that... ...processing, and real-time inference pipelines... ...optimize computer vision model deployment on edge...
Senior
Local area
obvio
San Carlos, CA
1 day ago
Senior Engineering Manager, ML Platform
$317k - $370k
...Senior Engineering Manager, ML Platform Zoox is on a mission to reimagine... ...in the early stages of deploying our robotaxis, and it is a... ...to develop and deploy models across our robotaxi and cloud... ...cutting-edge training and inference optimization techniques. The...
Senior
Zoox
San Mateo, CA
1 day ago
Sr. Staff AI Engineer, Silicon Design Engineer
Sr. Staff AI Engineer, Silicon Design Position Overview We... ...foundational language models and cognitive orchestration systems that optimize everything from RTL generation... ...: Design and deploy production‑scale generative... ...orchestration, custom inference optimization tools, and...
Senior
Cognichip
Redwood City, CA
5 days ago
Software Engineer, Inference
$187.5k - $395k
...About Luma AI Luma's mission is to build multimodal... .... To go beyond language models and build more aware,... ...integrating them into our inference engine Collaborate closely across... ...to streamline and optimize model efficiency and deployments Build internal tooling...
Luma AI
Redwood City, CA
4 days ago
Managed Services - AI-Native Software Engineer - Senior Consultant
$86.5k - $142.7k
...prototypes and builds modern, AI‑enabled applications and... ...proofs‑of‑concept, and guiding engineering teams through complex technical... ...Records (ADRs), sequence flows, deployment diagrams and non‑functional... ...with privacy, security or model‑risk considerations. • Relevant...
Senior
Summer holiday
Flexible hours
EY
San Mateo, CA
4 days ago
Senior AI/ML Platform Engineer
$148k - $247k
...at the forefront of AI, cloud, and data platform... ...teamwork. ¹ As a Senior AI/ML Platform Engineer, you will architect... ...from data ingestion to model monitoring. Design... ...-ready datasets. Optimize ML workload... ...with real-time model inference and streaming ML pipelines...
Senior
Full time
Part time
Immediate start
Flexible hours
Guidewire
San Mateo, CA
1 day ago
Autonomy Engineer - Deep Learning Model Acceleration
$170k - $277.5k
...learning infrastructure engineer, you will be... ...Learning (DL) and AI efforts. You will... ...performance deep learning inference for CV workloads... ...Vision Language Models (VLMs) to analyze... ...and acceleration/optimization opportunities and... ...workflows for model deployment, monitoring, and...
Full time
Local area
Relocation package
Skydio
San Mateo, CA
2 days ago
AI Field Engineer - Microsoft Foundry
$280k - $320k
...future of generative AI infrastructure.... ...the highest-quality models with the fastest and most scalable inference in the industry.... ...Role As an AI Field Engineer for Microsoft... ...teams, and architect deployments that span Azure... ...SGLang), determining optimal shapes,...
Full time
Fireworks AI
San Mateo, CA
4 days ago
Senior Software Engineer
$190k - $260k
...data layer for modern AI and analytics. Proven in... .... Our technology is deployed at scale by organizations... ...distributed-systems engineers to join our Core Product... ...global scale. As a Senior Software Engineer, you’... ...problems such as: # Optimizing metadata management, caching...
Senior
Full time
Alluxio, Inc.
Foster, CA
2 days ago
Data Scientist, New Grad - Model Optimization
$120k - $160k
...) architecture. Quadric's co-optimized software and hardware is targeted... ...to run neural network (NN) inference workloads in a wide variety... ...a full-time role focused on model optimization for Quadric's custom... ...: Develop and deploy quantization workflows for vision...
Full time
Work at office
Local area
Immediate start
Flexible hours
quadric.io, Inc
Burlingame, CA
5 days ago
Data Scientist - Model Optimization
$110k - $270k
...) architecture. Quadric's co-optimized software and hardware is targeted... ...to run neural network (NN) inference workloads in a wide variety... ...data science team focused on model optimization for Quadric's custom... ...California Bay Area based engineering role is intended to be...
Work at office
Local area
Immediate start
Flexible hours
quadric, Inc
Burlingame, CA
15 days ago
Forward Deployed AI Engineer (Post-Sales)
$230k - $300k
...About the Company Models are what they eat.... ...more to train and deploy. At DatologyAI, we... ...curate and optimize petabytes of data... ...far less compute at inference time, substantially... ...Microsoft, Amazon, and AI visionaries like... ...research and data engineering necessary to solve...
Full time
Work at office
Relocation package
Datology
Redwood City, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Inference Engineer - Model Optimization & Deployment. Be the first to apply!