Staff Machine Learning Engineer - Model Optimization & Quantization

$158.4k - $237.6k

Qualcomm

Staff Software Engineer

Join the Qualcomm AI Hub team and help developers integrate machine learning into their products and experiences.

In this role you will develop tools to help developers optimize and deploy machine learning models on edge and mobile hardware. AIMET is Qualcomm's open-source library for state-of-the-art model quantization, and compression techniques. You will develop and support cutting-edge model optimization workflows — pushing the boundary of what's possible on resource-constrained hardware. Applications range from quantizing large language models (LLMs) and generative AI models to compressing latency-critical vision, audio, and multimodal networks for deployment on Qualcomm Snapdragon and other edge SoCs.

For this role we are seeking a talented and motivated Staff Software Engineer with expertise in the optimizing and deploying ML models – especially for edge devices.

What You'll Do

Design, develop, and maintain quantization algorithms and compression pipelines within the AIMET framework (PTQ, QAT, mixed-precision, AdaScale etc.)
Implement advanced quantization techniques including weight-only quantization, activation quantization, KV-cache quantization, and sub-4-bit quantization for LLMs and generative AI models
Build tooling to analyze, profile, and debug model accuracy degradation caused by quantization
Integrate AIMET workflows with popular ML frameworks — PyTorch and ONNX
Develop APIs and developer-facing tooling to make AIMET accessible and easy to use for external customers and design partners
Integrate AIMET in AI Hub Workbench Quantize job to enable Quantization at large scale.
Own end-to-end quantization and optimization of models published on Qualcomm AI Hub, ensuring they meet accuracy, latency, and power targets on Qualcomm hardware
Quantize and validate a broad range of model families — vision transformers, LLMs, diffusion models, speech, and multimodal architectures — for deployment via AI Hub
Develop and maintain automated quantization pipelines and evaluation harnesses to scale model onboarding across AI Hub's growing model catalog

Minimum Qualifications:

Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR
Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR
PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.

Preferred Qualifications:

3+ years of industry experience in machine learning, deep learning, or AI infrastructure
Strong proficiency in Python, with hands-on experience in PyTorch, ONNX and/or TensorFlow
Solid understanding of neural network architectures — CNNs, Transformers, LLMs, diffusion models, multimodal models
Experience with model quantization techniques — PTQ, QAT, weight-only quantization, mixed-precision, sub-4-bit methods
Hands-on experience quantizing LLMs (GPT, LLaMA, Mistral, Falcon, or similar families) for inference optimization
Familiarity with AIMET, GPTQ, AWQ, SmoothQuant, or similar quantization frameworks is a strong plus
Experience working with ONNX, TFLite / LiteRT, or other model interchange formats
Understanding of hardware constraints: memory bandwidth, compute precision (INT4/INT8/FP16/BF16), and NPU/DSP execution
Experience collaborating across teams or BUs to drive technical alignment and model delivery
Proficiency with git and software development best practices
Strong written and verbal communication skills — ability to write clean APIs, documentation, and engage directly with external developers
Experience with C++ for performance-critical components is a bonus
Familiarity with ARM processors and mobile SoC architecture (Snapdragon) is a plus
Experience with automated evaluation pipelines and model benchmarking at scale is a plus

Level of Responsibility

Works independently with minimal supervision
Provides technical guidance and mentorship to other team members
Decision-making is significant and affects work beyond the immediate team
Requires strong communication skills to convey complex quantization concepts to varied audiences — from hardware engineers and BU partners to external researchers and application developers
Has meaningful influence on the AIMET product roadmap, AI Hub model catalog, and cross-BU quantization strategy
Tasks are open-ended; planning, prioritization, and problem-solving are core to the role

Pay range and Other Compensation & Benefits :

$158,400.00 - $237,600.00

The above pay scale reflects the broad, minimum to maximum, pay scale for this job code for the location for which it has been posted. Even more importantly, please note that salary is only one component of total compensation at Qualcomm. We also offer a competitive annual discretionary bonus program and opportunity for annual RSU grants (employees on sales-incentive plans are not eligible for our annual bonus). In addition, our highly competitive benefits package is designed to support your success at work, at home, and at play. Your recruiter will be happy to discuss all that Qualcomm has to offer – and you can review more details about our US benefits at this link.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Staff Machine Learning Engineer - Model Optimization & Quantization in Santa Clara, CA vacancy

Machine Learning Engineer, Model Optimization
$170k - $216k
...Machine Learning Engineer, Model Optimization Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The...
Suggested
Full time
Remote work
Waymo
Mountain View, CA
3 days ago
Staff Machine Learning Engineer - Foundation Model
$215.28k - $364.32k
...Staff Machine Learning Engineer - Foundation Model Santa Clara, CA XPENG is a leading smart technology company at the forefront of innovation... ...tasks. Contribute to model deployment optimization, including quantization, export, and latency–accuracy trade-offs for...
Suggested
Full time
XPENG
Santa Clara, CA
1 day ago
Edge ML Software Engineer (Model Optimization-PICO) - San Jose
$212.8k
...Convert and compile ML models for execution on edge NPUs, and apply quantization mechanisms. - Profile... ...Apply hardware-aware optimization strategies, such as... ...Science, Electrical Engineering, Computer Engineering... ...industry experience in machine learning software engineering,...
Suggested
Temporary work
Local area
ByteDance
San Jose, CA
2 days ago
Machine Learning Engineer - World Model
$150k
...Institute of Foundation Models We are a dedicated... ...data scientists, and engineers, tackling the most fundamental... ...computing in deep learning, driving impactful... ...foundation models to unlock machine intelligence beyond... ...systems. Knowledge of cost optimization, security, and...
Suggested
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
4 days ago
Senior ML Engineer - Model Compression
$128.7k - $261.3k
...repeatable, high-velocity model deployments through... ...deployment and infra engineers to ship numerically robust... ...focused onmodel optimization and deployment, with significant... ...work inneural network quantization / model compression /... ...your ambitions. Learn how GM supports a...
Suggested
Local area
Remote work
Work from home
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
2 days ago
Senior ML Engineer - Model Compression
$128.7k - $261.3k
...enables repeatable, high‑velocity model deployments through... ...deployment, and infrastructure engineers to ship numerically robust,... ...Parity function by developing quantization and compression strategies,... ...experience focused on model optimization and deployment, with significant...
Local area
Remote work
Flexible hours
General Motors
Sunnyvale, CA
4 days ago
ML Runtime Optimization Engineer
...Software Engineer Applied Intuition, Inc. is powering... ...to every moving machine on the planet. Applied... ...deep experience in optimizing ML models and deploying them on... ...on model pruning and quantization, and support deployment... ...in working with deep learning frameworks (e.g.,...
For contractors
For subcontractor
Casual work
Work at office
Remote work
Day shift
Applied Intuition
Sunnyvale, CA
3 days ago
Senior DL Software Engineer, Model Optimization and Edge Deployment - Autonomous Vehicles
$184k - $287.5k
...Develop state‑of‑the‑art model optimization techniques—speculative decoding... ...techniques, including Quantization (FP4/FP8), pruning, and knowledge... ...Science, Computer Engineering, or a related technical field... ...PyTorch, JAX, or similar machine learning frameworks. Sophisticated...
NVIDIA Gruppe
Santa Clara, CA
3 days ago
ML Engineer - Inference & Model Deployment
...00x better job search engine: fast, comprehensive,... ...turn powerful AI and ML models into fast, reliable... ...infrastructure: deploying models, optimizing inference latency and... ...techniques such as quantization, pruning, batching,... ...and optimized deep learning models in production environments...
Relocation package
HiringCafe
Cupertino, CA
1 day ago
Machine Learning Model Engineer
$240k - $280k
...environments in the industry to learn just how fast you can grow,... ...part of the journey. Machine learning lies at the core of... ...streams. As a machine learning model engineer of the Samsung Ads Platform Intelligence... ...learning pipelines Optimize and scale up existing machine...
Worldwide
Samsung Electronics America North America
Mountain View, CA
4 days ago
Senior DL Engineer: Edge Model Optimization & Inference
NVIDIA Gruppe is looking for a skilled professional to enhance the performance of large-scale models through advanced optimization techniques in Santa Clara, California. Candidates should have a strong background in DL model training and deployment, ideally with a PhD...
NVIDIA Gruppe
Santa Clara, CA
3 days ago
Inference Optimization ML Engineer
...of-the-art foundation world models that control our robots. Our... ...re looking for an Inference Optimization MLE to help build and operate... ...techniques including quantization, pruning, distillation, operator... ...Collaborate closely with research engineers to translate model...
Rhoda ai
Mountain View, CA
4 days ago
ML Infra Engineer Intern: Optimize BEV Training on GPUs
$19 - $65 per hour
...PlusAI is seeking a Machine Learning Infrastructure Engineer Intern to work on high-performance kernels for BEV model training. In this role, you will analyze training bottlenecks... ...also explores the use of LLMs to optimize code generation and performance profiling...
Hourly pay
Internship
PlusAI, Inc.
Santa Clara, CA
4 days ago
Machine Learning Engineer - AI Compiler Optimization
$156k - $387.6k
...Machine Learning Engineer - AI Compiler Optimization Location: San Jose Team: Technology Employment Type: Regular Job Code: A86940 Responsibilities... ..., and memory levels specifically for recommendation model scenarios, including but not limited to graph-...
Temporary work
Local area
ByteDance
San Jose, CA
1 day ago
Machine Learning Engineer 5 - Decisioning & Optimization
...Machine Learning Engineer 5 - Decisioning & Optimization New York, New York, United States of America • Seattle, Washington, United States of America At... ...three platform areas: ML infrastructure for model serving: real-time inference at 1M+ QPS, multi-model...
Hourly pay
Full time
Immediate start
Flexible hours
Shift work
Netflix
Los Gatos, CA
1 day ago
Machine Learning Engineer, Runtime & Optimization
$213k - $263k
...Machine Learning Engineer, Runtime & Optimization Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver... ..., including feature and experiment management, model development, optimization and monitoring. These efforts...
Full time
Remote work
Waymo
Mountain View, CA
1 day ago
Sr. Machine Learning Engineer, Medical Imaging
.... We're a team of engineers, clinicians, and innovators... ...vision and machine learning algorithms tailored for... ...(GNN) and Diffusion models, adapting them for... ...utilizing techniques like quantization, pruning, and... ...ITK, VTK, and model optimization tools such as TensorRT...
Local area
Worldwide
Flexible hours
Intuitive
Sunnyvale, CA
3 days ago
Senior, Data Scientist (Machine Learning Engineer)
...technologies in GenAI, Machine Learning, Deep Learning, and Engineering. We tackle complex problems... ..., visualization, and model serving. We take pride in... ...in production Build and optimize high-throughput batch and... ..., distillation, quantization, and serving strategies...
Relha LLC
Sunnyvale, CA
5 days ago
Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model
$181.1k - $318.4k
...Clara, California, United States Machine Learning and AI Apple is where individual... ...something! Description As a Senior/Staff Engineer on the Foundation Model Compute Infrastructure team, you... ...engineering, and performance optimization. Responsibilities Design and evolve...
Relocation
Apple Inc.
Santa Clara, CA
3 days ago
On-Device Machine Learning Engineer
$181.1k - $318.4k
...California, United States Machine Learning and AI We’re... ...foundation and large language models, and many... ...Learning Integration Engineer to join our team in the... ...innovative techniques to optimize their performance, efficiency... ..., distillation, quantization and weight clustering...
Relocation
Apple Inc.
Sunnyvale, CA
4 days ago
Staff Machine Learning Engineer - AI Foundation
$215.28k - $364.32k
...Staff Machine Learning Engineer - Ai Foundation Santa Clara, CA Xpeng is a leading... ...very large foundation model and accelerating model training... ...Responsibilities: Optimize transformer-based LLMs for... ...Implement and benchmark (Quantization, Knowledge distillation,...
Full time
XPENG
Santa Clara, CA
20 days ago
Sr. Staff ML Engineer, Quantization & Compression
$265k - $331k
...We are looking for an Engineer / Research Scientist with deep... ...expertise in quantized deep learning models for hardware acceleration in... ...perception models and develop optimization pipelines for the quantized... ...optimizations, including quantization strategies, model...
Full time
Contract work
Temporary work
Part time
Local area
Shift work
Rivian
Palo Alto, CA
2 days ago
Research Scientist Privacy-Preserving Large-Scale Model Training & Architecture Optimization
$156k - $316.8k
...Research Scientist — Privacy-Preserving Large-Scale Model Training & Architecture Optimization Location: San Jose Employment Type: Regular Job Code: DW... ...Qualifications: Strong background in large-scale deep learning systems and distributed training. Hands‑on experience...
Temporary work
Local area
Ellis Technologies, Inc.
San Jose, CA
5 days ago
Research Scientist - Vision Language Model
$150k
...Institute of Foundation Models We are a... ...scientists, and engineers, tackling the most... ...computing in deep learning, driving impactful... ...modularity, and inference optimization. Build and... ...experience in Machine Learning, Computer... ...FlashAttention, quantization, tensor parallelism...
Institute of Foundation Models
Sunnyvale, CA
4 days ago
Principal ML Architect - Security AI & Advanced Model Systems
$254k - $349.25k
...LLMs/SLMs and advanced machine learning techniques . This... ...requires deep expertise in model architecture,... ...real‑time environments Optimize inference systems for... ...Efficient inference (quantization, pruning, batching) Distributed... ..., etc.) Systems & Engineering Experience designing...
Flexible hours
Proofpoint
Sunnyvale, CA
5 days ago
Sr. Multimodal Model Training and Inference Optimization Engineer
$244.8k
...research groups dedicated to generative models for content creation, image generation... ...Model Training and Inference Optimization Engineer with expertise in optimizing AI model... ...training. - Benchmark and profile deep learning models to identify performance bottlenecks...
Temporary work
Local area
ByteDance
San Jose, CA
1 day ago
Principal ML Architect - Security AI & Advanced Model Systems
$254k - $349.25k
...LLMs/SLMs and advanced machine learning techniques . This... ...deep expertise in model architecture, training... ...time environments Optimize inference systems for... ...Efficient inference (quantization, pruning, batching)... ...CASB, etc.) Systems & Engineering Experience...
Flexible hours
Proofpoint
Sunnyvale, CA
1 day ago
Machine Learning Video Processing Algorithm Engineer
$147.4k - $272.1k
Machine Learning Video Processing Algorithm Engineer Imagine the impact you can make. A billion users will use the... .... Responsibilities Develop and optimize machine learning based video processing... ...training/testing/validation, model quantization and distillation. Investigate...
Relocation
Apple Inc.
Sunnyvale, CA
3 days ago
Senior Software & Machine Learning Engineer - Energy Optimization
...About the Role We are seeking a Senior Software & Machine Learning Engineer to join our Energy Optimization team. This role focuses on building scalable software... ...with experience deploying machine learning models in production environments. You will contribute to...
Pentangle Tech Services | P5 Group
Palo Alto, CA
4 days ago
Senior AI/ML Research Engineer - Model development
...Senior AI/ML Research Engineer – Model Development It started with a simple idea: what if surgery... ...that enable rapid prototyping and learning while building toward a product solution... ...others can build on, knowing when to optimize versus when to move fast, and thinking...
Local area
Worldwide
Flexible hours
Intuitive
Sunnyvale, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Machine Learning Engineer - Model Optimization & Quantization. Be the first to apply!