ML Engineer - Inference & Model Deployment

HiringCafe

Job discovery is broken. Indeed and LinkedIn want to keep it that way. HiringCafe is building a 100x better job search engine: fast, comprehensive, honest, and actually useful. We index millions of jobs, remove noise, rank what matters, and help people find real opportunities without dark patterns, ads, or pay-to-win placement. We are looking for a founding ML engineer who can help us turn powerful AI and ML models into fast, reliable production systems. You will own the bridge between model development and real user-facing infrastructure: deploying models, optimizing inference latency and throughput, scaling serving systems, and making sure our models run efficiently in production. This is a hands-on engineering role for someone who loves the details of model performance, GPU utilization, inference architecture, and production reliability. What You’ll Do Deploy and integrate researcher-trained model checkpoints into our cloud infrastructure and production pipelines. Profile and benchmark model performance to identify latency, throughput, memory, and compute bottlenecks. Implement optimization techniques such as quantization, pruning, batching, caching, efficient attention, and precision trade-offs while preserving model quality. Build scalable multi‑GPU inference systems for search, ranking, recommendations, agents, and other AI‑powered product experiences. Design reliable model‑serving architecture that can support millions of users. Develop efficient training and fine‑tuning workflows where needed, including distributed training, mixed precision, and parallelism strategies. Work closely with our search & engineering teams to make model deployment a smooth part of our development workflow. You May Be a Strong Fit If You Have deployed and optimized deep learning models in production environments. Have experience with large-scale model serving, multi‑GPU inference, or high-throughput inference systems. Understand inference optimization techniques such as quantization, pruning, compilation, batching, caching, and memory optimization. Have strong instincts for profiling, benchmarking, and debugging model performance. Are familiar with efficient attention mechanisms, transformer optimization, or modern LLM/embedding/ranking model infrastructure. Have worked with inference frameworks or serving stacks such as SGLang, vLLM, TensorRT, or equivalent. Can write clean, production-quality code and integrate ML systems into backend infrastructure. Are comfortable with cloud platforms, distributed systems, storage systems, and modern ML training or serving workflows. Want ownership, leverage, and responsibility from day one. Logistics This role is based in Cupertino, where we work in person. We believe the best ideas come from being in the same room. We offer generous health, dental, and vision coverage, paid parental leave, and relocation support. Don’t meet every single qualification? That’s okay. We care more about your trajectory than checking every box. If the role excites you and the mission resonates, we’d love to hear from you. #J-18808-Ljbffr

Apply

Vacancy posted 19 hours ago

Similar jobs that could be interesting for youBased on the ML Engineer - Inference & Model Deployment in Cupertino, CA vacancy

Senior ML Engineer - Model Compression
$128.7k - $261.3k
...enables repeatable, high-velocity model deployments through principled and... ...developers and deployment and infra engineers to ship numerically robust,... ...Mathematics, Data Science / ML, or a closely related... ...model compression / efficient inference or relevant experience ~...
Suggested
Local area
Remote work
Work from home
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
2 days ago
Inference Optimization ML Engineer
...Inference Optimization MLE At Rhoda AI, we're building... ...-art foundation world models that control our... ...across cloud and on-robot deployment targets. You will... ...closely with research engineers to translate model innovations... ...optimization, ML systems, or a closely...
Suggested
Rhoda AI
Palo Alto, CA
2 days ago
Founding ML Engineer: Production Inference & Deployment
...HiringCafe is seeking a Founding ML Engineer in Cupertino to transform AI and ML models into reliable production systems. You'll be responsible for deploying models, optimizing their performance, and ensuring they run efficiently in production. Success in this role requires...
Suggested
HiringCafe
Cupertino, CA
20 hours ago
Edge ML Software Engineer (Model Optimization-PICO) - San Jose
$212.8k
...Responsibilities: - Convert and compile ML models for execution on edge NPUs,... ...Science, Electrical Engineering, Computer Engineering, or a... ...software engineering, model deployment, or ML systems for... ...environments. - Understanding of model inference constraints on edge devices,...
Suggested
Temporary work
Local area
ByteDance
San Jose, CA
2 days ago
Senior ML Inference Engineer - Platform
$128.7k - $261.3k
...The Model Deployment & Inference Solutions team in GM AV deploys machine learning models from training... .... Our mission is two-fold: build the ML deployment platform that makes model... ...Copilot, or equivalent) as part of your engineering workflow. Experience designing clean,...
Suggested
Flexible hours
General Motors
Sunnyvale, CA
4 days ago
Sr. Multimodal Model Training and Inference Optimization Engineer
$244.8k
...research groups dedicated to generative models for content creation, image... ...experienced Multimodal Model Training and Inference Optimization Engineer with expertise in optimizing AI... ...enhancing the performance, scalability, and deployment of large-scale generative AI models....
Temporary work
Local area
ByteDance
San Jose, CA
1 day ago
Robotics ML Inference Infrastructure Engineer
...Rhoda ai in Palo Alto is seeking an Inference Infrastructure Engineer to help power their model deployment stack for humanoid robots. This role involves designing and operating large-scale infrastructure while managing cloud and on-prem environments efficiently. With...
Rhoda AI
Palo Alto, CA
4 days ago
Senior ML Inference & Deployment Engineer - Remote
$120k - $130k
...GlobalLogic is seeking candidates for a role focused on ML inference and deployment in San Jose, California. The position requires a Bachelor's... ...responsibilities include evaluating ML frameworks, deploying models, and ensuring performance in user-facing applications. The...
Remote work
GlobalLogic
San Jose, CA
4 days ago
Lead ML Inference Engineer, Advertising
$246.5k
...our Machine Learning and Inference Platform that powers... ...hardware, software, and models. We're looking for a strong... ...deep experience in ML serving, high-performance... ...excited to mentor engineers, innovate at scale, and... ...experience in developing and deploying large-scale,...
Work at office
Local area
Remote work
Monday to Thursday
Flexible hours
Roku
San Jose, CA
3 days ago
Senior AI/ML Research Engineer - Model development
...Senior AI/ML Research Engineer – Model Development It started with a simple idea: what if surgery could be less invasive and recovery less... ...Sim). Teleoperation, kinematics, and real-time on-robot deployment. Publications at CoRL, RSS, ICRA, IROS, or NeurIPS....
Local area
Worldwide
Flexible hours
Intuitive
Sunnyvale, CA
1 day ago
ML Inference Engineer - GCP & Hybrid Deployments
...GlobalLogic is seeking an ML Engineer to evaluate frameworks, deploy models on Google Cloud Platform, and own deployment automations. Ideal candidates will have strong ML knowledge, problem-solving skills, and the ability to adapt to evolving tech stacks. Located in San...
GlobalLogic
San Jose, CA
4 days ago
ML Runtime Optimization Engineer
...Software Engineer Applied Intuition, Inc. is powering the future of physical AI... ...engineer with deep experience in optimizing ML models and deploying them on production-grade embedded... ...efficiency and latency of model inference for compute boards selected by our customers...
For contractors
For subcontractor
Casual work
Work at office
Remote work
Day shift
Applied Intuition
Sunnyvale, CA
3 days ago
Senior ML Accelerator Engineer - GPU
$128.7k - $261.3k
.... We pioneer new approaches to model export, kernel development, and performance engineering so that every cycle on our accelerators... ...at the heart of our on‑vehicle ML inference for ADAS and autonomous... ..., and easier to maintain and deploy on real cars, under real‑world...
Local area
Work from home
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
2 days ago
Senior ML Compiler Engineer
$128.7k - $261.3k
...pioneer new approaches to model export, kernel development, and performance engineering so that every cycle on... ...into fast, reliable inference across GPUs powering GM... ...kernel integration, and deployment tooling, with a mandate... ...reliable, and effortless for ML engineers across the AV...
Local area
Work from home
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
2 days ago
Matterport - Senior ML Ops Engineer
$173k - $253k
...Senior MLOps Engineer Matterport is leading the digital... ...our machine learning models. You will be... ...optimization techniques, and deploying highly efficient models... ...will work closely with ML R&D Engineers and other... ...performance, optimize inference speed and resource utilization...
Work at office
Work from home
CoStar Group
Sunnyvale, CA
2 days ago
ML Engineer
...ML Engineer Sunnyvale, California, United States About the Job... ...of applied intelligence from model optimization to productized AI... ...Responsibilities Design, build, and deploy production-grade ML systems... ...model training, deployment, inference, and monitoring in production...
Full time
Catalyst Labs, LLC
Sunnyvale, CA
5 days ago
Principal Machine Learning Engineer, Mobile AI Inference Optimization
$278.1k - $347.6k
...USA Principal Machine Learning Engineer, Mobile AI Inference Optimization Location Mountain... ...generation of mobile game AI experiences, deploying world models to mobile on-device. As our... ...architectural decisions across the full mobile ML stack, and mentor a team of senior...
Work at office
Worldwide
Relocation package
Unity Technologies
Mountain View, CA
7 days ago
Staff Inference ML Runtime Engineer
...About The Role The Inference ML Engineering team at Cerebras Systems is dedicated to enabling our fast generative inference solution through simple... ...tools that enable running state‑of‑the‑art generative AI models on our custom hardware. You will architect solutions that...
Dormont Manufacturing Company
Sunnyvale, CA
4 days ago
Staff ML Infra Engineer: Scalable Inference Platform (Hybrid)
...automotive company is seeking a Staff ML Infrastructure Engineer to build robust compute platforms for... ...ML engineers to ensure efficient model serving, leading technical decision-making... ..., Python or C++, and expertise in ML inference. The position offers a hybrid work...
General Motors
Sunnyvale, CA
5 days ago
Staff ML Infrastructure Engineer - Embodied AI Offboard Perception
...training, evaluation, and deployment of offboard perception models. Own the integration of... ...Implement CI/CD pipelines for ML systems, including... ...including training metrics, inference performance metrics, data... ...edge cases. Partner with ML engineers, researchers, and...
Local area
Remote work
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
4 days ago
AI/ML Technical Leader - Language Model Inference & AI Ops
$212.3k - $275.8k
...collaborate with product and engineering teams to deploy reliable, secure, and... ...observable AI services, optimizing inference performance from CPU and... ...deployment automation, and model/service observability. This... ...production services for ML/AI workloads. Experience...
Full time
Temporary work
Local area
Flexible hours
3 days per week
Cisco
San Jose, CA
6 days ago
Senior DL Engineer: Edge Model Optimization & Inference
...skilled professional to enhance the performance of large-scale models through advanced optimization techniques in Santa Clara, California... ...should have a strong background in DL model training and deployment, ideally with a PhD or equivalent experience in Computer Science...
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Principal ML Architect - Security AI & Advanced Model Systems
$254k - $349.25k
...seeking a Principal ML Architect to lead the... ...requires deep expertise in model architecture, training... ...for efficient deployment in enterprise environments... ...Optimize inference systems for low latency... ...CASB, etc.) Systems & Engineering Experience designing...
Flexible hours
Proofpoint
Sunnyvale, CA
1 day ago
Principal ML Architect - Security AI & Advanced Model Systems
$254k - $349.25k
...are seeking a Principal ML Architect to lead the... ...deep expertise in model architecture, training... ...compression for efficient deployment in enterprise... ...environments Optimize inference systems for low latency... ...CASB, etc.) Systems & Engineering Experience designing high...
Flexible hours
Proofpoint
Sunnyvale, CA
5 days ago
Senior AI Infra Engineer - Large Model Inference Systems (Multimodal/LLM/VLM)
$212.8k
...Senior AI Infra Engineer - Large Model Inference Systems (Multimodal/LLM/VLM) Location: San Jose Employment Type: Regular Job Code: A18... ...optimization, performance tuning, consistency validation, deployment pipelines, and intelligent operations...
Temporary work
Local area
Tik Tok
San Jose, CA
2 hours ago
Senior ML Software Engineer - Integration & Quality
...industry‑leading training and inference speeds and empowers... ...run large‑scale ML applications, without the... ...customers include top model labs, global enterprises... ...multi‑year partnership to deploy 750 megawatts of scale,... ...looking for a Software Engineer to join the ML...
Work at office
Remote work
Dormont Manufacturing Company
Sunnyvale, CA
5 days ago
Senior Staff Machine Learning Engineer - Foundation Model
$244.14k - $413.16k
...Senior Staff Machine Learning Engineer - Foundation Model Santa Clara, CA XPENG is a leading smart technology company at the forefront of innovation... ..., and infrastructure experts to design, train, and deploy large-scale multi-modal models that unify vision, language...
Full time
XPENG
Santa Clara, CA
2 days ago
Senior AI/ML Performance Engineer
$144.7k - $261.3k
..., cloud infrastructure, and ML/AI GPU platforms for AV research... ...for a Senior Performance Engineer to join the AV Capacity and... ...large-scale ML training and inference environments. Your s... ...H100, B200, and GB200 . Model Deployment: Experience deploying and scaling...
Local area
Remote work
Work from home
Flexible hours
3 days per week
General Motors
Sunnyvale, CA
2 days ago
ML/AI Engineer - Vehicle Intelligence
$189k - $261k
...trustworthy experiences. As an ML / AI Engineer, you will design and develop... ..., multimodal AI, foundation models, large language models,... ...based learning, and on‑device inference. Reinforcement learning will... ...behavior before production deployment. Collaborate with autonomous...
42dot Inc.
Sunnyvale, CA
20 hours ago
Technical Program Manager - World Model
$220k - $320k
...Institute of Foundation Models We are a dedicated... ...data scientists, and engineers, tackling the most fundamental... ...agent, reasoning, and deployment teams. Academic... ...familiarity with the ML training lifecycle.... ...pre-training and inference, know what a checkpoint...
Visa sponsorship
Flexible hours
Institute of Foundation Models
Sunnyvale, CA
12 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Engineer - Inference & Model Deployment. Be the first to apply!