Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Machine Learning Engineer - Model Optimization & Quantization

$158.4k - $237.6k

Qualcomm

Staff Software Engineer

Join the Qualcomm AI Hub team and help developers integrate machine learning into their products and experiences.

In this role you will develop tools to help developers optimize and deploy machine learning models on edge and mobile hardware. AIMET is Qualcomm's open-source library for state-of-the-art model quantization, and compression techniques. You will develop and support cutting-edge model optimization workflows — pushing the boundary of what's possible on resource-constrained hardware. Applications range from quantizing large language models (LLMs) and generative AI models to compressing latency-critical vision, audio, and multimodal networks for deployment on Qualcomm Snapdragon and other edge SoCs.

For this role we are seeking a talented and motivated Staff Software Engineer with expertise in the optimizing and deploying ML models – especially for edge devices.

What You'll Do

  • Design, develop, and maintain quantization algorithms and compression pipelines within the AIMET framework (PTQ, QAT, mixed-precision, AdaScale etc.)
  • Implement advanced quantization techniques including weight-only quantization, activation quantization, KV-cache quantization, and sub-4-bit quantization for LLMs and generative AI models
  • Build tooling to analyze, profile, and debug model accuracy degradation caused by quantization
  • Integrate AIMET workflows with popular ML frameworks — PyTorch and ONNX
  • Develop APIs and developer-facing tooling to make AIMET accessible and easy to use for external customers and design partners
  • Integrate AIMET in AI Hub Workbench Quantize job to enable Quantization at large scale.
  • Own end-to-end quantization and optimization of models published on Qualcomm AI Hub, ensuring they meet accuracy, latency, and power targets on Qualcomm hardware
  • Quantize and validate a broad range of model families — vision transformers, LLMs, diffusion models, speech, and multimodal architectures — for deployment via AI Hub
  • Develop and maintain automated quantization pipelines and evaluation harnesses to scale model onboarding across AI Hub's growing model catalog

Minimum Qualifications:

• Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.

Preferred Qualifications:

  • 3+ years of industry experience in machine learning, deep learning, or AI infrastructure
  • Strong proficiency in Python, with hands-on experience in PyTorch, ONNX and/or TensorFlow
  • Solid understanding of neural network architectures — CNNs, Transformers, LLMs, diffusion models, multimodal models
  • Experience with model quantization techniques — PTQ, QAT, weight-only quantization, mixed-precision, sub-4-bit methods
  • Hands-on experience quantizing LLMs (GPT, LLaMA, Mistral, Falcon, or similar families) for inference optimization
  • Familiarity with AIMET, GPTQ, AWQ, SmoothQuant, or similar quantization frameworks is a strong plus
  • Experience working with ONNX, TFLite / LiteRT, or other model interchange formats
  • Understanding of hardware constraints: memory bandwidth, compute precision (INT4/INT8/FP16/BF16), and NPU/DSP execution
  • Experience collaborating across teams or BUs to drive technical alignment and model delivery
  • Proficiency with git and software development best practices
  • Strong written and verbal communication skills — ability to write clean APIs, documentation, and engage directly with external developers
  • Experience with C++ for performance-critical components is a bonus
  • Familiarity with ARM processors and mobile SoC architecture (Snapdragon) is a plus
  • Experience with automated evaluation pipelines and model benchmarking at scale is a plus

Level of Responsibility

  • Works independently with minimal supervision
  • Provides technical guidance and mentorship to other team members
  • Decision-making is significant and affects work beyond the immediate team
  • Requires strong communication skills to convey complex quantization concepts to varied audiences — from hardware engineers and BU partners to external researchers and application developers
  • Has meaningful influence on the AIMET product roadmap, AI Hub model catalog, and cross-BU quantization strategy
  • Tasks are open-ended; planning, prioritization, and problem-solving are core to the role

Pay range and Other Compensation & Benefits :

$158,400.00 - $237,600.00

The above pay scale reflects the broad, minimum to maximum, pay scale for this job code for the location for which it has been posted. Even more importantly, please note that salary is only one component of total compensation at Qualcomm. We also offer a competitive annual discretionary bonus program and opportunity for annual RSU grants (employees on sales-incentive plans are not eligible for our annual bonus). In addition, our highly competitive benefits package is designed to support your success at work, at home, and at play. Your recruiter will be happy to discuss all that Qualcomm has to offer – and you can review more details about our US benefits at this link.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Staff Machine Learning Engineer - Model Optimization & Quantization in Santa Clara, CA vacancy
  • $215.28k - $364.32k

     ...Staff Machine Learning Engineer – Autonomous Driving Model Quantization & Deployment Santa Clara, CA XPENG is a leading smart technology company at the forefront...  ...autonomous driving systems. You will lead the effort to optimize and deploy our VLA models onto vehicle-grade... 
    Suggested
    Full time

    XPENG

    Santa Clara, CA
    4 days ago
  • $124k

     ...not just training models, we're building the...  ...post-training quantization and quantization-aware...  ...massive deep learning models run lightning...  ...-making. You will optimize inference latency,...  ...compiler, inference engine, and silicon teams...  ...Computer Science, Machine Learning, Robotics... 
    Suggested
    Hourly pay
    Full time
    Temporary work
    Immediate start
    Flexible hours

    Tesla

    Palo Alto, CA
    1 day ago
  • $174.72k - $295.68k

     ...Senior Machine Learning Engineer - Foundation Model Santa Clara, CA XPENG is a leading smart technology company at the forefront of innovation...  ...tasks. Contribute to model deployment optimization, including quantization, export, and latency–accuracy trade-offs for onboard... 
    Suggested
    Full time

    XPENG

    Santa Clara, CA
    1 day ago
  • $170k - $216k

     ...Machine Learning Engineer, Model Optimization Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The... 
    Suggested
    Full time
    Remote work

    Waymo

    Mountain View, CA
    2 days ago
  • $212.8k

     ...Convert and compile ML models for execution on edge NPUs, and apply quantization mechanisms. - Profile...  ...Apply hardware-aware optimization strategies, such as...  ...Science, Electrical Engineering, Computer Engineering...  ...industry experience in machine learning software engineering,... 
    Suggested
    Temporary work
    Local area

    ByteDance

    San Jose, CA
    1 day ago
  • $150k

     ...Institute of Foundation Models We are a...  ...data scientists, and engineers, tackling the most fundamental...  ...performance computing in deep learning, driving impactful...  ...models to unlock machine intelligence beyond lingual...  ...~ Knowledge of cost optimization, security, and... 
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    1 day ago
  • $181.1k - $318.4k

     ...Machine Learning Engineer, Foundation Model Services Work Locations (2) Submit Resume Do you feel you think differently, you are eager to break...  ...in life of people. You will have a chance to work on optimizing billions of parameter language and vision and speech... 
    Relocation

    Apple

    Santa Clara, CA
    1 day ago
  • $159.05k - $199.3k

     ...ML Runtime Optimization Engineer Sunnyvale, California, United...  ...to every moving machine on the planet. Applied...  ...Bangalore; Seoul; and Tokyo. Learn more at applied.co....  ...in optimizing ML models and deploying them on...  ...on model pruning and quantization, and support deployment... 
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Remote work
    Day shift

    Applied Intuition

    Sunnyvale, CA
    2 days ago
  • $184k - $287.5k

    Senior DL Software Engineer, Model Optimization and Edge Deployment - Autonomous...  ...seeking a high-caliber Deep Learning Engineer to bridge the gap...  ...techniques including Quantization (FP4/FP8), pruning, and knowledge...  ...PyTorch, JAX, or similar machine learning frameworks.*... 

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • $278.1k - $347.6k

     ...experiences, deploying world models to mobile on-device. As our Principal Machine Learning Engineer, you will be the foremost...  ...decisions on model compression, quantization, pruning, and knowledge...  ...the team. Own the end-to-end optimization pipeline: from model export... 
    Work at office
    Worldwide
    Relocation package

    Unity

    Mountain View, CA
    1 day ago
  • $240k - $280k

     ...environments in the industry to learn just how fast you can grow,...  ...part of the journey. Machine learning lies at the core of...  ...streams. As a machine learning model engineer of the Samsung Ads Platform Intelligence...  ...learning pipelines Optimize and scale up existing machine... 
    Worldwide

    Samsung Electronics America North America

    Mountain View, CA
    3 days ago
  • $129.19k - $247.04k

     ...The Role The Foundation Model Team focuses on building large...  ...and generalizable deep learning systems that enable safe and...  ...intersection of large-scale machine learning, autonomous driving,...  ...high-quality data pipelines Optimize models for latency, reliability... 

    DiDi Labs

    San Jose, CA
    1 day ago
  • $230k - $280k

     ...environments in the industry to learn just how fast you can grow,...  ...part of the journey. Machine learning lies at the core of...  ...streams. As a machine learning model engineer of the Samsung Ads Platform...  ...models to achieve different optimization goals (e.g., app-install... 

    Samsung Electronics America North America

    Mountain View, CA
    3 days ago
  • $181.1k - $318.4k

     ...Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model Work Locations (2) Submit Resume Apple is where...  ..., and performance optimization. Responsibilities Design...  ...payments as well as relocation. Learn more about Apple Benefits Note... 
    Relocation

    Apple

    Santa Clara, CA
    4 days ago
  • $181.1k - $318.4k

     ...Staff/Sr. AI Infra Performance Engineer Scaling machine learning workloads across thousands of GPUs and TPUs creates challenges...  ...experience developing and optimizing training frameworks (e.g. PyTorch...  ...teams. Familiarity with model architectures and various training... 
    Relocation

    Apple

    Santa Clara, CA
    15 hours ago
  • $181.1k - $318.4k

     ...On-Device Machine Learning Engineer We're starting to see the incredible potential...  ...and large language models, and many applications in...  ...innovative techniques to optimize their performance, efficiency...  ...as pruning, distillation, quantization and weight clustering.... 
    Relocation

    Apple

    Sunnyvale, CA
    3 days ago
  • $156k - $387.6k

     ...businesses of the company. Currently, we are looking for Machine Learning Engineer in AI Compiler Optimization to join our team to support and advance that mission...  ..., and memory levels specifically for recommendation model scenarios, including but not limited to graph-... 
    Temporary work
    Local area

    ByteDance

    San Jose, CA
    15 hours ago
  •  ...Inference Optimization MLE At Rhoda AI, we're building the next generation...  ...of-the-art foundation world models that control our robots. Our...  ...techniques including quantization, pruning, distillation,...  ...Collaborate closely with research engineers to translate model... 

    Rhoda ai

    Palo Alto, CA
    3 days ago
  • $213k - $263k

     ...Machine Learning Engineer, Runtime & Optimization Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver...  ..., including feature and experiment management, model development, optimization and monitoring. These efforts... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    15 hours ago
  • $174.72k - $295.68k

     ...Senior Machine Learning Engineer - Ai Foundation Santa Clara, CA Xpeng is...  ...training very large foundation model and accelerating model...  ...Job Responsibilities: Optimize transformer-based LLMs for...  ...Implement and benchmark (Quantization, Knowledge distillation, structured... 
    Full time

    XPENG

    Santa Clara, CA
    3 days ago
  •  ...technologies in GenAI, Machine Learning, Deep Learning, and Engineering. We tackle complex problems...  ..., visualization, and model serving. We take pride in...  ...production Build and optimize high-throughput batch and...  ..., distillation, quantization, and serving strategies... 

    Walmart

    Sunnyvale, CA
    9 days ago
  • $181.1k - $318.4k

     ...Clara, California, United States Machine Learning and AI Apple is where individual...  ...something! Description As a Senior/Staff Engineer on the Foundation Model Compute Infrastructure team, you...  ...engineering, and performance optimization. Responsibilities Design and evolve... 
    Relocation

    Apple Inc.

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

     ...combines Small Language Models (SLMs), retrieval...  ...We work closely across engineering and product teams to ensure...  ...performance. Optimize local inference using llama.cpp , including quantization, memory usage, and performance...  ...real problems, learning as they go, and collaborating... 
    Local area

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $19 - $65 per hour

    PlusAI is seeking a Machine Learning Infrastructure Engineer Intern to work on high-performance kernels for BEV model training. In this role, you will analyze training bottlenecks...  ...also explores the use of LLMs to optimize code generation and performance profiling... 
    Hourly pay
    Internship

    PlusAI

    Santa Clara, CA
    1 day ago
  • $174k - $252k

    Senior Machine Learning Engineer, Performance corporate_fare Google place Sunnyvale...  ..., and providing last-mile optimization where customization is...  ...directly impacts critical models, like DeepSeek, Qwen,...  ...techniques, such as sharding, quantization, and sparsity, to improve... 
    Full time

    Google Inc.

    Sunnyvale, CA
    15 hours ago
  • $147k - $211k

    Google Inc. is seeking a skilled ML Compiler Software Engineer for its Sunnyvale office. The position requires a Bachelor's degree,...  ...interaction. In this role, you will focus on developing compiler optimizations for Tensor Processing Units (TPUs), enhancing parallelization... 
    Full time
    Work at office

    Google Inc.

    Sunnyvale, CA
    4 days ago
  •  ...companies develop AI models for embedded, edge, and...  ...selection, training, optimization, and validation into...  ...a motivated ML Engineer to help advance our AutoML...  ...pipelines that combine deep-learning and conventional...  ...techniques: pruning, quantization, transfer learning,... 
    Remote work

    Nerdleveltech

    Sunnyvale, CA
    2 days ago
  • $174k - $252k

    Google Inc. is seeking a Senior Machine Learning Engineer in Sunnyvale, CA, to improve AI model performance and efficiency. Candidates should possess a Bachelor...  ...software development, testing, and performance optimization. Responsibilities include engaging with product... 

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $181.1k - $318.4k

     ...Santa Clara, California, is looking for an experienced Machine Learning engineer to optimize and build production-grade solutions serving millions in...  ...contributing directly to optimizing language and vision models. Applicants should have at least 5 years of industry experience... 

    Apple Inc.

    Santa Clara, CA
    4 days ago
  • $244.8k

     ...research groups dedicated to generative models for content creation, image generation...  ...Model Training and Inference Optimization Engineer with expertise in optimizing AI model...  ...training. - Benchmark and profile deep learning models to identify performance bottlenecks... 
    Temporary work
    Local area

    Tik Tok

    San Jose, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Machine Learning Engineer - Model Optimization & Quantization. Be the first to apply!