Senior AI Inference Engineer - Model Optimization & Deployment
$242k - $290kZoox Inc.
The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.
As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices. In this role, you will:- Optimize large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed-precision inference frameworks, and parameter-efficient fine-tuning (LoRA, QLoRA).
- Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment.
- Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks and compiled edge binaries.
- Develop and optimize custom ML OPs and TensorRT Plugins with efficient CUDA kernels to minimize latency and maximize memory bandwidth on AI accelerators.
- Write production-level, low latency, and memory-safe C++ and CUDA code for real-time inference on vehicle systems.
- Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference frameworks (INT8, FP8, FP4, BF16/FP16).
- Proven experience optimizing large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs/VLAs) utilizing Efficient Attention mechanisms (e.g., FlashAttention, Linear Attention), KV-cache optimization (e.g., PagedAttention) and Speculative Decoding.
- Extensive experience with model conversion/compilation pipelines (e.g., ONNX, TensorRT, torch.compile) and performing rigorous latency benchmark and model quality parity valuation.
- Proficiency in low-level programming for AI accelerators, specifically developing and optimizing custom ML OPs and TensorRT Plugins with efficient CUDA kernel implementations.
- Production-level C++ (14/17/20) and Python programming skills, with experience developing concurrent, memory-safe, real-time inference code for edge devices.
- Familiarity with SOTA autonomous driving perception algorithms (temporal 3D object detection, BEV, 3D Occupancy Networks) and multi-modal sensor processing (Vision, LiDAR, Radar).
- Experience with distributed training pipelines and model/tensor parallelism (PyTorch Distributed, Ray, DeepSpeed, Megatron-LM) and runtime efficiency optimization for GPU clusters.
- Experience with end-to-end autonomous driving paradigms (VLM/VLA models, Foundation models) and edge deployment technologies (e.g., TensorRT-LLM).
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Senior AI Inference Engineer - Model Optimization & Deployment in Nacogdoches, TX vacancy
$140.8k - $211.2k
...Technologies, Inc. Job Area Engineering Group: Machine... ...enable next‑generation AI experiences and drive... ...time. What You’ll Do Model Reauthoring &... ...core pipeline code. Inference Optimization for Edge Hardware Integrate... ...Translate end‑customer deployment constraints — target...SeniorFull timeWork from home- ...Qualcomm in San Diego is looking for an AI Engineer specializing in machine learning. You will convert and optimize models, analyze performance, and collaborate across teams to advance AI technologies. The ideal candidate should have extensive hands-on experience with...Senior
$158.4k - $237.6k
...Technologies, Inc.Job Area:Engineering Group, Engineering Group... ...RoleJoin the Qualcomm AI Hub team and help developers... ...tools to help developers optimize and deploy machine learning models on edge and mobile... ...or similar families) for inference optimizationFamiliarity with...SuggestedWork experience placementImmediate startWork from home$139.87k - $250.38k
...here! PURPOSE OF THE JOB The Senior Applied AI Engineer is responsible for leading the design... ...-edge AI technologies, ensures models are deployed securely, cost-effectively, and in... ...regulated environments. Monitor and optimize AI workloads for cost efficiency in...SeniorFull timeWork at officeLocal area$111.3k - $166.9k
...Technologies, Inc. Job Area: Engineering Group, Engineering Group... ...-generation on-device Voice AI capabilities on Windows-on-... ...multiple audio features. Optimize and validate performance... .../or UMDF , including build, deployment, and debugging fundamentals....SeniorWork experience placementWork from home$105.8k - $174.8k
...skills and ambitions. As a Senior AI Native Engineer, you will be at the forefront... ...ensure data integrity and optimize learning processes, all... ...to improve high‑performance models. This position may have travel... ...as Jira to develop and deploy analytical solutions with multiple...SeniorFull timeWork experience placementSummer holidayFlexible hours$178.4k - $267.6k
.... Job Area: Engineering Group, Engineering Group... ...Summary: AI Performance Engineer... ...software solutions for Inference Acceleration.... ...development to commercial deployment-and demands strategic... ...: ~ Convert, optimize and deploy models for efficient...SeniorWork experience placementWork from home$162.6k - $244k
...Job Area: Engineering Group, Engineering... ...Qualcomm Datacenter AI Systems and Solutions... ..., develop, optimize, and validate software... ...solutions that enable the deployment of cutting-edge AI... ...class Qualcomm AI inference accelerators for... ..., diffusion models, and hybrid systems...SeniorWork experience placementWork from home$124k - $280k
...Data, Analytics & AI Industry/Sector... ...data and analytics engineering focus on... ...optimising algorithms, models, and systems to enable... ...health plans. As a Senior Manager, you will... ...) and operational optimization Foster a collaborative... ...of PHI-compliant deployment patterns and HIPAA...SeniorFull timeH1b$165k - $175k
...Lead Software & AI Engineer San Diego, California, United... ..., developing, and deploying advanced software solutions... ..., and Large Language Models (LLMs) into secure... ...frameworks, model deployment, inference pipelines, and... ...Experience developing and optimizing CI/CD pipelines using:...For contractors$111.3k - $166.9k
...Technologies, Inc. Job Area Engineering Group Software... ...across subsystems—AI/Gen AI, and... ...with AI and GenAI inference frameworks such as... ...foundation in AI concepts, model architectures,... ..., implementation, deployment, and support.... ...and applications. Optimize AI Pipeline for performance...SeniorWork experience placementWork from home- ...technology firm in San Diego seeks an LLM Serving Engineer to develop scalable AI solutions. This role involves building LLM inference platforms and collaborating with teams to... ...machine learning. Responsibilities include optimizing deep learning workloads and utilizing...Senior
$140.8k - $211.2k
...General Summary Qualcomm AI Research is looking... ...-class algorithm engineers in general domain machine... ..., and user-friendly model optimization tools such as... ...technology that will be deployed worldwide in our industry... ...modal, VLA Efficient inference algorithms, e.g. batching...SeniorWork experience placementWorldwide$131k - $169k
...Karbon, a leader in AI-powered practice management software, seeks a Senior Security Engineer to enhance its security posture. This role involves partnering across teams... ...security practices from feature design to deployment while leveraging AI tools for security improvements...SeniorFlexible hours$140.8k - $211.2k
...transformation. As a Qualcomm AI Software Engineer, you will develop... ...generative AI models on Snapdragon... ...framework for inference on resource‑... ...Validate, analyze, and optimize the performance... ...experience (senior). Proficiency in... ...acceleration and deployment of generative AI...SeniorInternship$141.6k - $212.4k
...Qualcomm in San Diego seeks a Software Engineer with strong C/C++ skills and familiarity with GPGPU APIs. The ideal candidate will have... ...include improving machine learning frameworks, optimizing GPU resource utilization, and writing documentation. The position...$124k - $280k
...Competency: Data, Analytics & AI Industry/Sector:... ...in data and analytics engineering focus on leveraging... ...optimising algorithms, models, and systems to enable... ...for health systems. As a Senior Manager, you will serve... ...AI in a HIPAA-compliant deployment context...SeniorFull timeH1b$206.9k - $310.3k
...Inc. Job Area: Engineering Group, Engineering... ...General Summary: Auto AI Systems: Team Lead... ...Visual Language Action Models (VLAs), VLMs, LLMs used... ...-to-end AD design and deployment on the Qualcomm Ride platform... ...kernel/compiler optimization. Strong...Temporary workWork experience placementWork from home- ...AI Engineer We are seeking an innovative and hands-on AI Engineer to join our Data Science... .... What You'll Do Design and deploy scalable AI/ML and LLM-powered... ...lifecycle including deployment, monitoring, optimization, and retraining Implement evaluation...
- ...About the job AI Engineer AI / Machine Learning Engineer... ...expertise with hands-on experience deploying LLM-based solutions into... ...ecosystems and understands how to optimize, evaluate, and productionize... ...and benchmark ML and language models using structured...Full timeWork at office
$140.8k - $211.2k
...Qualcomm is looking for an AI Software Engineer in San Diego, CA to develop and implement machine learning techniques across... ...building software for Qualcomm's AI Stack SDKs to optimize performance of generative AI models. Candidates should possess a Bachelor's degree in Engineering...- ...Qualcomm in San Diego is seeking an AI Software Engineer to develop and implement cutting-edge machine learning techniques. In this role, you... ...SDK into applications and collaborate with various teams to optimize performance across technology verticals. The ideal candidate...Senior
- ...lab to clinic. Role Overview AI/ML Data Engineer – Global Data and Analytics team. Design, build and optimize data pipelines and infrastructure... ...into technical solutions. Deploy AI/ML solutions at scale... ...data infrastructure needs for model training, tuning and deployment...Summer workWork at officeNight shift
- ...Senior ML Engineer page is loaded## Senior ML Engineerlocations: US - CA -... ...machine learning algorithms and models to solve problems involving... ...design, implementation and deployment. This role strongly... ...application of cutting-edge AI methodologies at Exact Sciences...SeniorFull timeFor contractorsLocal areaNight shift
$122.5k - $183.7k
...Qualcomm is seeking a highly skilled Senior Engineer to join their Modem Power Software team in San Diego. In this role, you will design and optimize advanced power management software for next-generation mobile platforms, including 6th Generation modem technologies....Senior$160k - $240k
...Senior Software Engineer Founded in 2015, Shield AI is a venture-backed deep-tech company with the mission of protecting... ..., robotics, control systems, optimization, and data analysis to create... ...mission autonomy software stack and deploying them across a breadth of...SeniorFull timeTemporary workPart timeWorldwide$160k - $240k
...Flight Integration Engineer Founded in 2015, Shield AI is a venture-backed deep-tech company with the mission... ...candidate will be skilled at deploying autonomy solutions onto unmanned platforms... ...the ability to troubleshoot and optimize system performance. Excellent...SeniorFull timeTemporary workPart timeWork experience placementWork at officeWorldwide$162.6k - $244k
...Inc. Job Area: Engineering Group, Engineering... ...General Summary: As a Senior Software Engineer, you... ..., development, and optimization and commercialization... ...(core, cache, memory models, bus architecture, etc... ...for the development and deployment of highly critical processes...SeniorWork experience placementImmediate startWork from home$200.8k - $301.2k
..., Inc. Job Area: Engineering Group Machine Learning... ...Qualcomm Cloud AI team is... ...software solutions for Inference Acceleration. We... ...technology, performance modeling, and bottleneck... ...&D to commercial deployment. The environment... ..., analyzing, and optimizing neural networks...Work experience placementWork from home$160k - $240k
...Founded in 2015, Shield AI is a venture‑backed defense‑tech... ...providers). It’s a hands‑on role for engineers who like seeing their code... ...bring‑up, container‑based deployment (e.g., k3s/k3d), and configuration... ...ability to troubleshoot and optimize system performance across the...SeniorFull timeTemporary workPart timeWork experience placementWork at officeWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Inference Engineer - Model Optimization & Deployment. Be the first to apply!
Related searches
- ai engineer Nacogdoches, TX
- ai developer Nacogdoches, TX
- senior accounts payable Nacogdoches, TX
- senior brand designer Nacogdoches, TX
- senior cost analyst Nacogdoches, TX
- senior business analyst contract Nacogdoches, TX
- senior app developer Nacogdoches, TX
- senior digital account manager Nacogdoches, TX
- director sr. director clinical operations Nacogdoches, TX
- senior specialist Nacogdoches, TX

