Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Inference Engineer - Model Optimization & Deployment

$242k - $290k

Zoox Inc.

Model Optimization & Deployment Engineer

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.

As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.

In this role, you will:

  • Optimize large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed-precision inference frameworks, and parameter-efficient fine-tuning (LoRA, QLoRA).
  • Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment.
  • Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks and compiled edge binaries.
  • Develop and optimize custom ML OPs and TensorRT Plugins with efficient CUDA kernels to minimize latency and maximize memory bandwidth on AI accelerators.
  • Write production-level, low latency, and memory-safe C++ and CUDA code for real-time inference on vehicle systems.

Qualifications:

  • Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference frameworks (INT8, FP8, FP4, BF16/FP16).
  • Proven experience optimizing large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs/VLAs) utilizing Efficient Attention mechanisms (e.g., FlashAttention, Linear Attention), KV-cache optimization (e.g., PagedAttention) and Speculative Decoding.
  • Extensive experience with model conversion/compilation pipelines (e.g., ONNX, TensorRT, torch.compile) and performing rigorous latency benchmark and model quality parity valuation.
  • Proficiency in low-level programming for AI accelerators, specifically developing and optimizing custom ML OPs and TensorRT Plugins with efficient CUDA kernel implementations.
  • Production-level C++ (14/17/20) and Python programming skills, with experience developing concurrent, memory-safe, real-time inference code for edge devices.

Bonus Qualifications:

  • Familiarity with SOTA autonomous driving perception algorithms (temporal 3D object detection, BEV, 3D Occupancy Networks) and multi-modal sensor processing (Vision, LiDAR, Radar).
  • Experience with distributed training pipelines and model/tensor parallelism (PyTorch Distributed, Ray, DeepSpeed, Megatron-LM) and runtime efficiency optimization for GPU clusters.
  • Experience with end-to-end autonomous driving paradigms (VLM/VLA models, Foundation models) and edge deployment technologies (e.g., TensorRT-LLM).

$242,000 - $290,000 a year Base Salary Range

About Zoox

Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We're looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.

Accommodations

If you need an accommodation to participate in the application or interview process please reach out to View email address on click.appcast.io or your assigned recruiter.

A Final Note:

You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Senior AI Inference Engineer - Model Optimization & Deployment in Seattle, WA vacancy
  • $232.56k - $427.5k

     ...applied research in Generative AI and CV/Multimodal...  ...groups dedicated to generative models for content creation, image...  ...Multimodal Model Training and Inference Optimization Engineer with expertise in...  ...performance, scalability, and deployment of large-scale generative AI... 
    Senior
    Temporary work
    Local area

    Tik Tok

    Seattle, WA
    4 days ago
  • A cloud technology company is looking for a Senior Engineer 2 to enhance their AI Inference Optimization team. In this role, you will drive architectural decisions...  ...improve throughput and reduce latency in large models. Candidates should have over 5 years of experience... 
    Senior
    Remote work

    DigitalOcean

    Seattle, WA
    2 days ago
  • $250k - $350k

     ...photorealistic, real-time AI avatars with...  ...foundation models designed for...  ..., KV cache optimization, kernel-level...  ...experienced engineers and researchers...  ...operated at a senior to senior-staff...  ...high-traffic inference team. Everyone...  ...standard LLM deployment: we're serving... 
    Suggested
    H1b
    Work at office
    Visa sponsorship

    Nuance Labs, Inc.

    Seattle, WA
    3 days ago
  •  ...Senior AI Engineer – Privacy The Senior AI Engineer – Privacy...  ...apply large language models (LLMs), retrieval-...  ...Engineering Build and optimize data pipelines using...  ..., fine-tuning, and inference. Apply prompt...  ...Cloud & MLOps Deploy and manage AI workloads... 
    Senior

    Merican

    Bellevue, WA
    1 day ago
  • $167.2k - $209k

     ...pioneering cloud service provider in Seattle seeks a Senior Engineer 2 for its AI Inference Data Plane team. This role requires designing and delivering...  ...technical leadership, system design, performance optimization, mentorship, and operational excellence. Candidates should... 
    Senior
    Remote job

    DigitalOcean

    Seattle, WA
    5 days ago
  • $96.8k - $306.4k

     ...Description The Senior Principal AI Agent / ML Software Engineer is a Senior Staff-...  ..., scalable inference infrastructure, and...  ...inference systems, model serving, AI workflow...  ...distributed services optimized for low latency,...  ..., test strategy, deployment automation, incident... 
    Senior
    Temporary work
    Flexible hours

    Oracle

    Seattle, WA
    21 hours ago
  • $200k - $300k

     ...photorealistic, real-time AI avatars with emotional...  ...We can train a great model, but the next problem...  ..., KV cache optimization, kernel‑level acceleration...  ...aimed at early‑career engineers finishing or recently...  ...Contribute to end‑to‑end inference optimization across our... 
    Internship
    H1b
    Work at office
    Visa sponsorship

    Nuance Labs

    Seattle, WA
    2 days ago
  • $139.5k - $258.1k

    Large Machine Learning Model Optimization Engineer Seattle, Washington, United States Software and Services...  ...for the on-device optimization and deployment of the Apple Intelligence LLM and...  ...kernel implementation Distributed inference At Apple, base pay is one part of... 
    Relocation

    Apple Inc.

    Seattle, WA
    3 days ago
  • $117.2k - $313.7k

     ...Category Software Engineering Job Details...  ...Salesforce is the #1 AI CRM, where humans with...  ...highly accomplished and senior-level Forward Deployed Engineer with 5+ years...  ...and ongoing optimization. As a Forward...  ...and Large Language Model (LLM) based intelligent... 
    Senior

    Salesforce.Com Inc

    Bellevue, WA
    4 days ago
  •  ...the future of business with AI + Data + CRM. Leading with...  ...a highly accomplished and senior-level Forward Deployed Engineer with 5+ years of...  ...successful deployment and ongoing optimization. As a Forward Deployed Engineer...  ...AI and Large Language Model (LLM) based intelligent... 
    Senior

    B Capital

    Bellevue, WA
    1 day ago
  • $128k - $184k

     ...usher in this new era, we seek AI-native thinkers across every...  .... You'll own the full AI engineering lifecycle: design, prompt/tool engineering, evals, deployment, measurement, and optimization. You'll work with a small, high-powered modeling and infrastructure team. What... 
    Senior
    Flexible hours

    Snowflake Computing

    Bellevue, WA
    12 days ago
  • $150k - $220k

     ...Senior Software Engineer, AI QXO, Inc. is the largest publicly traded distributor...  ..., AI to design, build, and deploy production-grade AI agents...  ...Architect, build, and optimize AI agents using modern agent...  ...equivalents). Implement MCP (Model Context Protocol) servers,... 
    Senior
    Flexible hours

    QXO

    Seattle, WA
    3 days ago
  •  ...is the world’s leading AI‑powered CRM platform,...  ...drive complex AI agent deployments on the Agentforce...  ...lifecycle: design data models, build efficient pipelines...  ..., and ensure secure, optimized data for advanced AI applications...  ...and evangelize engineering best practices for... 
    Senior

    100 Salesforce, Inc.

    Bellevue, WA
    3 days ago
  • A startup building AI infrastructure is seeking a Senior Systems Engineer to support deployment and maintenance of their systems. This hands-on role involves validating...  ...deployments in a data center environment, ensuring optimal performance and reliability. Candidates should... 
    Senior

    Nscale

    Seattle, WA
    4 days ago
  • $132k - $181.5k

     ...We are seeking a Senior Applied AI Engineer who is a true hands‑on builder, someone who can take...  ...Design, develop, and deploy production models, services, and pipelines that are...  ...into technical solutions. Build and optimize model training, evaluation, deployment... 
    Senior
    Temporary work
    Work experience placement
    Work at office
    Flexible hours

    Sensata Technologies

    Seattle, WA
    1 day ago
  • $200k - $220k

     ...California, is seeking a Lead Software Engineer to design and build advanced...  ...involves the development and deployment of robotic and embedded systems while leading teams to optimize software performance and...  ...0 to $220,000 per year. #J-18808-Ljbffr AI Chopping Block, Inc.
    Senior

    AI Chopping Block, Inc.

    Seattle, WA
    4 days ago
  • $13 per hour

     ...Category Software Engineering Job Details About...  ...Salesforce is the #1 AI CRM, where humans with...  ...for 20 years." As a Senior/Lead AI Software Engineer...  ...to deliver secure, optimized, and high-quality code...  ...decision-making for AI model deployment, safety constraints,... 
    Senior
    Immediate start

    Salesforce

    Seattle, WA
    2 days ago
  • $172k

     ...Senior AI/ML Engineer Chicago, IL, USA; New York, NY, USA; San Francisco...  ...develop foundational transformer models that convert behavioral and...  ...and Engineering teams to deploy scalable AI systems that...  ...experimentation frameworks, optimization strategies, and scalable ML... 
    Senior
    Full time
    Work at office
    Local area
    Remote work
    Night shift

    CHIME INC.

    Seattle, WA
    3 days ago
  • $176.76k - $232k

     ...The Enterprise Data & AI team is a strategic and...  ...As a Senior AI/ML Engineer, you will lead the delivery...  ...problems. You will build, deploy, scale and maintain AI...  ...challenges from setting up model training and fine-tuning...  ...design for serving AI/ML inference solutions in... 
    Senior
    Permanent employment
    Contract work
    Part time
    Work visa

    lululemon

    Seattle, WA
    4 days ago
  • $150.33k - $183.74k

     ...challenging opportunity for a Senior Databricks AI/ML Engineer to join our community....  ...focuses on building and deploying scalable AI/ML solutions across...  ...to operationalize models, transforming them into robust...  ...tuning. Develop and optimize complex SQL queries and stored... 
    Senior
    Full time
    Temporary work
    Part time
    Work experience placement
    Immediate start
    Work from home
    Flexible hours
    Shift work

    PEMCO Insurance

    Seattle, WA
    3 days ago
  •  ...A leading technology company in Seattle is seeking a Senior Engineer to architect and implement AI solutions. This role involves collaborating with scientists and developing on-device monitoring systems. The ideal candidate has over 13 years of experience in software engineering... 
    Senior

    Axon Enterprise Inc

    Seattle, WA
    3 days ago
  • $107.95k - $156.4k

     ...Artificial Intelligence ( AI ) Platform...  ...(Mid-Level or Senior) to join the team...  ...design, build and deploy end-to-end AI applications...  ...Learning (ML) model inferences to functional...  ...spans prompt engineering, vector database...  ...Develop, manage, and optimize databases Write... 
    Senior
    Permanent employment
    Work experience placement
    Relocation
    Visa sponsorship
    Work visa
    Flexible hours
    Shift work
    Day shift

    The Boeing Company

    Seattle, WA
    3 days ago
  • $151.8k - $265.35k

     ...standout content with ease. The AI Foundations team builds the...  .... We're looking for an engineer to help develop and scale...  ...of the platform, including model integration, inference services, data pipelines, storage...  ..., fine-tuning, and deployment of ML models. Support runtime... 
    Senior
    Temporary work
    Local area
    Worldwide

    Adobe

    Seattle, WA
    2 days ago
  • $151.28k - $183.32k

     ...every department. From optimizing a production line to...  ...with-us. Summary: As a Senior Application Engineer within Bristol Myers Squibb's AI Venture Studio delivery...  ...patterns, deployment pipelines, semantic-layer...  ...Accelerator delivery model: six two-week sprints... 
    Senior
    Hourly pay
    Full time
    Temporary work
    Part time
    For contractors
    Summer work
    Live in
    Work at office
    Local area
    Remote work
    Flexible hours
    Shift work

    Reporter Newspapers

    Seattle, WA
    3 days ago
  • $105.8k - $174.8k

     ...and Decision Science – AI Native Engineering Physical AI Engineering Consultant, Senior Consultant The...  ...ensure data integrity and optimize learning processes,...  ...improve high-performance models. This position may...  ...Jira to develop and deploy analytical solutions with... 
    Senior
    Full time
    Work experience placement
    Summer holiday
    Flexible hours

    EY

    Seattle, WA
    1 day ago
  •  ...the early stages of deploying our robotaxis on...  ...-scale Foundation models, VLMs, and VLAs to...  ...our ML Performance Optimization initiatives and...  ...of strong software engineers and act as a force...  ...edge ML Training OR Inference performance...  ...artificial intelligence (AI) tools to support... 
    Senior

    Zoox

    Seattle, WA
    9 days ago
  • $200k - $332k

     ...in the early stages of deploying our robotaxis on...  ...lead our ML Performance Optimization initiatives and make our Training and Inference platform that enables...  ...and Advanced Hardware Engineering group and have the opportunity...  ...for distributed model training. Experience... 
    Senior
    Temporary work
    Relocation package

    Zoox

    Seattle, WA
    more than 2 months ago
  • $262k - $365k

    Senior Staff Software AI Engineer, Data Cloud Frontier AI In accordance with Washington state law, we are highlighting our comprehensive...  ...technical project strategy, ML design, and optimizing ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging... 
    Senior
    Full time
    Temporary work
    Immediate start
    Flexible hours

    Google Inc.

    Seattle, WA
    3 days ago
  • $105.8k - $174.8k

     ...skills and ambitions. As a Senior AI Native Engineer, you will be at the...  ...ensure data integrity and optimize learning processes, all while...  ...to improve high‑performance models. This position may have travel...  ...such as Jira to develop and deploy analytical solutions with multiple... 
    Senior
    Full time
    Work experience placement
    Summer holiday
    Flexible hours

    Ernst & Young Oman

    Seattle, WA
    3 days ago
  • $250k - $350k

     ...Nuance Labs in Seattle is looking for a Member of Technical Staff to optimize real-time AI model inference. The ideal candidate will have deep expertise in LLM inference optimization and will work on improving performance across their model stack. The compensation includes... 
    Full time

    Nuance Labs, Inc.

    Seattle, WA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Inference Engineer - Model Optimization & Deployment. Be the first to apply!