Senior AI Inference Engineer - Model Optimization & Deployment

Full-time

Zoox Inc.

Job Description

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.

As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.

In this role, you will:

Optimize large-scale models (LLMs, VLMs) using advanced quantization (PTQ, QAT), mixed-precision inference workflows, and parameter-efficient fine-tuning (LoRA, QLoRA).
Architect and implement model conversion and compilation pipelines using TensorRT and TensorRT-LLM for edge deployment.
Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks and compiled edge binaries.
Write and optimize custom CUDA kernels and TensorRT Plugins to maximize memory bandwidth and minimize latency on AI accelerators.
Write production-level, highly concurrent, and memory-safe C++ and Python code for real-time inference on vehicle SOCs.

Qualifications:

Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference workflows (INT8, FP8, INT4, BF16/FP16).
Proven experience optimizing large-scale models (LLMs, VLMs, or VLAs) utilizing KV-cache optimization (e.g., PagedAttention), Speculative Decoding, and Efficient Attention mechanisms (FlashAttention, Linear Attention).
Extensive experience with model conversion/compilation pipelines (TensorRT, TensorRT-LLM) and performing rigorous parity/latency benchmarking.
Proficiency in low-level programming for AI accelerators, specifically writing and optimizing custom CUDA kernels and TensorRT Plugins.
Production-level C++ (14/17/20) and Python programming skills, with experience writing concurrent, memory-safe, real-time inference code for edge devices.

Bonus Qualifications:

Experience with distributed training pipelines and model/tensor parallelism (PyTorch Distributed, Ray, DeepSpeed, Megatron-LM) and runtime efficiency optimization for GPU clusters.
Familiarity with autonomous driving perception stacks (temporal 3D object detection, BEV, 3D Occupancy Networks) and processing multi-modal sensor streams (Vision, LiDAR, Radar).
Understanding of end-to-end autonomous driving paradigms (VLA models, closed-loop simulation validation).

Base Salary Range

There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. A sign-on bonus may be offered as part of the compensation package. The listed range applies only to the base salary. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. The salary range listed in this posting is representative of the range of levels Zoox is considering for this position.

Zoox also offers a comprehensive package of benefits, including paid time off (e.g. sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance.

About Zoox

Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We’re looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.

Accommodations

If you need an accommodation to participate in the application or interview process please reach out to View email address on jobswipe.net or your assigned recruiter.

A Final Note:

You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Apply

Vacancy posted a month ago

Similar jobs that could be interesting for youBased on the Senior AI Inference Engineer - Model Optimization & Deployment in Seattle, WA vacancy

Senior AI Inference Engineer - Model Optimization & Deployment
...multi-modality foundation model to drive the next generation... ...intelligence. As a Model Optimization & Deployment Engineer, you will focus on bringing... ...build highly concurrent inference code to ensure real-time, deterministic... ...and minimize latency on AI accelerators. Write...
Senior
Temporary work
Relocation package
Zoox
Seattle, WA
8 hours ago
Sr. Multimodal Model Training and Inference Optimization Engineer
$202.16k - $368.22k
...applied research in Generative AI and CV/Multimodal... ...groups dedicated to generative models for content creation, image... ...Multimodal Model Training and Inference Optimization Engineer with expertise in... ...performance, scalability, and deployment of large-scale generative AI...
Senior
Temporary work
Local area
ByteDance
Seattle, WA
3 days ago
Staff AI Software Engineer, Edge Model Optimization & Deployment
$70k - $300k
...Staff AI Software Engineer - Edge Model Optimization & Deployment FieldAI is transforming how robots interact with the real world. Our growing ML team in Seattle... ...platforms. In this role, you will own the edge inference stack end to end, profiling and accelerating...
Suggested
Field AI
Seattle, WA
8 hours ago
Senior AI Inference Optimizations Engineer Remote
...A cloud technology company is looking for a Senior Engineer 2 to enhance their AI Inference Optimization team. In this role, you will drive architectural decisions... ...improve throughput and reduce latency in large models. Candidates should have over 5 years of experience...
Senior
Remote work
DigitalOcean
Seattle, WA
1 day ago
Senior AI Inference Data Plane Engineer (Remote)
$167.2k - $209k
...pioneering cloud service provider in Seattle seeks a Senior Engineer 2 for its AI Inference Data Plane team. This role requires designing and delivering... ...technical leadership, system design, performance optimization, mentorship, and operational excellence. Candidates should...
Senior
Remote work
DigitalOcean
Seattle, WA
3 days ago
Senior AI Engineer - Privacy
...Senior AI Engineer – Privacy The Senior AI Engineer – Privacy... ...apply large language models (LLMs), retrieval-... ...Engineering Build and optimize data pipelines using... ..., fine-tuning, and inference. Apply prompt... ...Cloud & MLOps Deploy and manage AI workloads...
Senior
Merican
Bellevue, WA
8 hours ago
Senior Principal AI Agent / ML Software Engineer (OCI)
...Senior Principal AI Agent / ML Software Engineer The Senior Principal AI Agent /... ...workflows, scalable inference infrastructure, and... ...systems, model serving, AI workflow... ...distributed services optimized for low latency, high... ...reviews, test strategy, deployment automation,...
Senior
Oracle
Seattle, WA
2 days ago
Senior/Staff Software Engineer, ML Performance Optimization
$242k - $389k
...the early stages of deploying our robotaxis on... ...-scale Foundation models, VLMs, and VLAs to... ...our ML Performance Optimization initiatives and... ...of strong software engineers and act as a force... ...edge ML Training OR Inference performance... ...artificial intelligence (AI) tools to support...
Senior
Zoox
Seattle, WA
2 days ago
AI Inference Infrastructure Software Engineer (Kubernetes / Cloud)
...a mission to reinvent AI inference infrastructure from the... ...every layer, from model architecture to kernels... ...Infrastructure Software Engineer to own and evolve the... ...scale predictably, and deploy seamlessly across managed... ...reliability and cost optimization, working closely with...
Work at office
Flexible hours
3 days per week
ElastixAI Inc.
Seattle, WA
13 days ago
Large Machine Learning Model Optimization Engineer, SIML
...is an applied research and engineering team responsible for developing... ...for the on-device optimization and deployment of the Apple Intelligence LLM and diffusion models. As a Machine Learning Engineer... ...High performance kernel implementation Distributed inference...
Apple
Seattle, WA
1 day ago
Senior AI Engineer - Cortex Code
...usher in this new era, we seek AI-native thinkers across every... .... You'll own the full AI engineering lifecycle: design, prompt/tool engineering, evals, deployment, measurement, and optimization. You'll work with a small, high-powered modeling and infrastructure team....
Senior
Snowflake Computing
Bellevue, WA
1 day ago
Senior AI Engineer
$150k - $220k
...Senior Software Engineer, AI QXO, Inc. is the largest publicly traded distributor... ..., AI to design, build, and deploy production-grade AI agents... ...Architect, build, and optimize AI agents using modern agent... ...equivalents). Implement MCP (Model Context Protocol) servers,...
Senior
Flexible hours
QXO
Seattle, WA
2 days ago
Senior Applied AI Engineer
$190k - $255k
...Role We're hiring a Senior Software Engineer - ML to join our Applied AI team at Supio. You'll... ...ML and large language models-and you love seeing your... ...platform. Develop and optimize end-to-end evaluation... ...containerized environments for deployment. Clear communicator...
Senior
Remote work
Flexible hours
Supio
Seattle, WA
3 days ago
Senior AI/ML Engineer
$176.76k - $232k
...The Enterprise Data & AI team is a strategic and... ...As a Senior AI/ML Engineer, you will lead the delivery... ...problems. You will build, deploy, scale and maintain AI... ...challenges from setting up model training and fine-tuning... ...design for serving AI/ML inference solutions in...
Senior
Permanent employment
Contract work
Part time
Work visa
lululemon
Seattle, WA
3 days ago
Senior AI/ML Engineer
$145k - $210k
...Senior AI/ML Engineer Cooley is seeking a Senior AI/ML Engineer to join the Practice Engineering... ...will play a key role in building, deploying, and operating enterprise scale Artificial... ...code for AI and ML workloads Optimize performance, reliability, and cost efficiency...
Senior
Full time
Temporary work
Work at office
Flexible hours
Weekend work
Cooley Corp.
Seattle, WA
5 days ago
Staff + Sr. Software Engineer, Inference Deployment
...Staff + Sr. Software Engineer, Inference Deployment San Francisco, CA | New York... ...interpretable, and steerable AI systems. We want AI to be safe... ..., and Trainium — and every model update must reach... ...is a resource-constrained optimization problem at its core: validation...
Senior
Work at office
Visa sponsorship
Flexible hours
Shift work
anthropic
Seattle, WA
8 hours ago
Senior AI Infra Systems Engineer - Build Scalable Backbone
A startup building AI infrastructure is seeking a Senior Systems Engineer to support deployment and maintenance of their systems. This hands-on role involves validating... ...deployments in a data center environment, ensuring optimal performance and reliability. Candidates should...
Senior
Nscale
Seattle, WA
3 days ago
Senior AI Robotics Engineer — Autonomy & Perception
$200k - $220k
...California, is seeking a Lead Software Engineer to design and build advanced... ...involves the development and deployment of robotic and embedded systems while leading teams to optimize software performance and... ...0 to $220,000 per year. #J-18808-Ljbffr AI Chopping Block, Inc.
Senior
AI Chopping Block, Inc.
Seattle, WA
3 days ago
Senior Databricks AI/ML Engineer
$150.33k - $183.74k
...challenging opportunity for a Senior Databricks AI/ML Engineer to join our community.... ...focuses on building and deploying scalable AI/ML solutions across... ...to operationalize models, transforming them into robust... ...tuning. Develop and optimize complex SQL queries and stored...
Senior
Full time
Temporary work
Part time
Work experience placement
Immediate start
Work from home
Flexible hours
Shift work
PEMCO Insurance
Seattle, WA
2 days ago
Staff + Sr. Software Engineer, Inference Deployment
$320k
...Role Our mandate is to make inference deployment boring and unattended.... ..., and Trainium — and every model update must reach production... ...unattended. As a Software Engineer on the Launch Engineering team... ...is a resource‑constrained optimization problem at its core: validation...
Senior
Visa sponsorship
Shift work
Menlo Ventures
Seattle, WA
2 days ago
Physical AI Engineering Consultant - Senior - Consulting - Open Location
$105.8k - $174.8k
...and Decision Science – AI Native Engineering Physical AI Engineering Consultant, Senior Consultant The... ...ensure data integrity and optimize learning processes,... ...improve high-performance models. This position may... ...Jira to develop and deploy analytical solutions with...
Senior
Full time
Work experience placement
Summer holiday
Flexible hours
EY
Seattle, WA
8 hours ago
Software Development Engineer - AI/ML, Amazon Neuron, Multimodal Inference
$143.7k - $194.4k
...unparalleled ML inference and training... ...wide range of models and supporting... ...boundary, our engineers build systematic... ...fine tuned for optimal performance for... ...'s possible in AI acceleration.... ...frameworks for deployment on custom ML hardware... ...mentorship. Our senior members enjoy...
Work experience placement
Internship
Flexible hours
Amazon
Seattle, WA
3 days ago
Staff + Sr. Software Engineer, Cloud Inference Launch Engineering
$320k
...Staff + Sr. Software Engineer, Cloud Inference Launch Engineering... ...interpretable, and steerable AI systems. We want AI... ...team scales and optimizes Claude to serve the massive... ...Cloud Inference, the model & inference launch... ..., observability, deployment patterns, hard cross-...
Senior
Work at office
Visa sponsorship
Flexible hours
Anthropic
Seattle, WA
3 days ago
Staff + Sr. Software Engineer, Cloud Inference
$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA... ...interpretable, and steerable AI systems. We want AI... ...team scales and optimizes Claude to serve the massive... ..., and operational models grows significantly.... ...validation and deployment pipelines, that reliably...
Senior
Work at office
Visa sponsorship
Flexible hours
Anthropic
Seattle, WA
8 hours ago
Senior Edge AI Engineer — On-Device ML & Model Compression
A leading technology company in Seattle is seeking a Senior Engineer to architect and implement AI solutions. This role involves collaborating with scientists and developing on-device monitoring systems. The ideal candidate has over 13 years of experience in software engineering...
Senior
Axon Enterprise
Seattle, WA
4 days ago
Senior Staff Software AI Engineer, Data Cloud Frontier AI
$262k - $365k
Senior Staff Software AI Engineer, Data Cloud Frontier AI In accordance with Washington state law, we are highlighting our comprehensive... ...technical project strategy, ML design, and optimizing ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging...
Senior
Full time
Temporary work
Immediate start
Flexible hours
Google Inc.
Seattle, WA
2 days ago
Physical AI Engineering Consultant - Senior - Consulting - Open Location
$105.8k - $174.8k
...skills and ambitions. As a Senior AI Native Engineer, you will be at the... ...ensure data integrity and optimize learning processes, all while... ...to improve high‑performance models. This position may have travel... ...such as Jira to develop and deploy analytical solutions with multiple...
Senior
Full time
Work experience placement
Summer holiday
Flexible hours
Ernst & Young Oman
Seattle, WA
2 days ago
Applied AI Engineer
$178k - $316k
...Applied AI Engineer At Quizlet, our mission... ...on a variety of models and modeling systems... ...and responsible deployment This is an... ...(e.g., SFT/DPO), optimize prompts, and improve... ...latency/cost-aware inference; contribute to... ...with and learn from senior ML/SWE teammates;...
Work at office
3 days per week
Quizlet
Seattle, WA
8 hours ago
AI Infrastructure Engineer
$150k - $200k
...AI Infrastructure Specialist As vCluster’s... ...to a production‑ready deployment. This is not a traditional... .... Infrastructure Optimization: Configure and troubleshoot... ...: Collaborate with Engineering and Product to surface... ...: Experience with inference serving, GPU scheduling...
Remote work
Flexible hours
vCluster
Seattle, WA
3 days ago
Applied AI Engineer - iCloud Data
$181.1k - $318.4k
...Applied AI Engineer - iCloud Data Would you like to drive... ...and observability, to deployment and the on-call... ...cost, performance and inference-quality efficiency across... ...systems, making thoughtful model selection and serving decisions, optimizing latency, throughput and...
Worldwide
Relocation
Apple
Seattle, WA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Inference Engineer - Model Optimization & Deployment. Be the first to apply!