Principal Machine Learning Engineer, Mobile AI Inference Optimization

$278.1k - $347.6k

Unity

The opportunity
We are building the next generation of AI-driven game experiences, running generative models on-device, right where the players are — on phones, tablets, laptops, and desktops. Our games run inside a modern, browser-native runtime (built on technologies such as WebGPU and WebNN), so the models that power these experiences must be deployed and accelerated entirely within that runtime. As our Principal Engineer for On-Device AI Inference & Systems, you will be the foremost engineering authority on taking state-of-the-art multi-modal models (transformers and diffusion networks) and making them run fast, small, and reliably within that runtime, fully integrated into a production game engine.

This is a deeply hands-on, high-impact engineering role. You will own the inference and integration stack end-to-end — from the moment a trained checkpoint leaves research, through export, optimization, and kernel-level tuning, to a shipped feature running inside the engine at interactive frame rates within a fixed memory and power budget. You will set the engineering standards, drive the architecture of the runtime and integration layers, and mentor a team of senior and mid-level engineers. Your work directly determines the latency, quality, memory footprint, and battery profile of AI features experienced by players worldwide.

This role is for an engineer who is energized by the gap between a research model and a shipping, AI-based product. If you love profilers, frame captures, op-fusion, and shaving milliseconds and megabytes, this is your role.

What you'll be doing

Inference & On-Device Optimization
Own the end-to-end optimization pipeline: model export, graph transformation, operator fusion, memory-layout planning, and hardware-specific kernel tuning across NPU, mobile GPU, and desktop/laptop GPU.
Make authoritative decisions on quantization (INT4/INT8/FP16), weight sharing, structured/unstructured pruning, and knowledge distillation to hit hard latency, memory, and power budgets — and validate them against quality bars.
Drive low-level performance work: write and tune WebGPU compute shaders (WGSL) and, where relevant, native kernels (Metal, Vulkan/SPIR-V compute, D3D12, CUDA); profile with browser and platform tools (Chrome/Dawn GPU traces, PIX, Instruments/Metal System Trace, Snapdragon Profiler, Nsight, RenderDoc), and eliminate bottlenecks at the op and memory-bandwidth level.
Apply efficiency techniques — dynamic resolution, token reduction, cross-frame caching/reuse, reduced-step diffusion samplers — as engineering levers to meet budgets on target SKUs.
Runtime & Systems Integration
Evaluate, select, and drive adoption of WebGPU-targeted inference runtimes (ONNX Runtime Web, Transformers.js, WebLLM, TensorFlow.js) alongside native options (CoreML, ONNX Runtime, TFLite, ExecuTorch) — and extend or build runtime/glue code where off-the-shelf options fall short of our diffusion workloads.
Design and own the integration between the ML runtime and the game engine: real-time scheduling, threading, memory pooling, zero-copy buffer sharing between the inference path and the render path, and frame-budget management alongside the renderer.
Architect inference systems that handle diverse inputs — images, text, primitives, metadata — and produce pixel-level outputs with real-time performance, robust to the messy realities of production (cold starts, thermal throttling, device fragmentation, backgrounding).
Build the supporting engineering: model packaging and asset pipelines, on-device fallbacks and SKU-aware capability tiers, crash/quality telemetry, and automated on-device benchmarking in CI.
Research Productionization
Partner closely with research scientists to turn novel architectures into implementations that are deployable, debuggable, and fast on device.
Provide the feedback loop back into research: surface hardware constraints, op-support gaps, and cost models early so model design and deployment converge.
Track breakthroughs in efficient inference (efficient attention, distillation, reduced-step diffusion) and assess them pragmatically: what actually moves latency/memory/power on our target devices, and what is worth the engineering cost.
Engineering Leadership
Lead and mentor a team of engineers; set engineering best practices, code-review standards, performance-regression gates, and on-device benchmarking methodology.
Champion a culture of measurement: define and enforce KPIs for latency, quality, memory, and power, and ensure they are tracked rigorously across the device matrix.
Partner with platform engineers, product managers, and runtime teams to align ML capabilities with device-SKU constraints and product roadmaps.

What we're looking for

8+ years in software/ML engineering, with at least 4 years focused on on-device / edge inference or real-time, performance-critical systems.
Proven production deployment of transformer- and/or diffusion-based models (e.g., ViT, Stable Diffusion) on mobile, desktop, or embedded hardware — shipped, not just prototyped.
Hands-on experience deploying models through WebGPU — e.g., ONNX Runtime Web (WebGPU EP), Transformers.js, WebLLM, or TensorFlow.js — including writing/tuning WGSL compute shaders and working within WebGPU's adapter, device-limits, and binding model. Equivalent deep experience with a native GPU/compute API plus a clear path to WebGPU will also be considered.
Hands-on expertise with at least one major inference runtime (ONNX Runtime / ORT Web, CoreML, TFLite, ExecuTorch) and deep understanding of operator fusion, memory layout, and runtime scheduling.
Low-level performance engineering: strong command of at least one GPU/compute API — WebGPU/WGSL, Metal, Vulkan, D3D12, or CUDA — and the profiling tools to go with it. You can read a frame capture and a kernel trace and know where the time and memory go.
Working knowledge of model-optimization techniques — quantization (INT4/INT8/FP16), weight sharing, pruning, and distillation — and the practical judgment to apply them to hit latency and memory budgets. You don't need to be a research expert in these methods; you need to use them effectively as engineering tools.
Strong understanding of target hardware: mobile SoCs (Apple Neural Engine, Qualcomm Hexagon/Adreno, ARM Mali) and desktop/laptop GPUs (Apple Silicon, NVIDIA, AMD, Intel) — and how to target each for peak throughput.
Proficiency in the core languages of a browser-native runtime — TypeScript/JavaScript and WGSL — plus solid Python for export pipelines and training-side tooling.
Working fluency with the models you deploy — enough to read an architecture, modify it for deployment, and reason about accuracy trade-offs.
Track record of technical leadership: setting engineering direction, influencing cross-functional partners, and growing engineers.

You might also have

Experience shipping world-model, neural-rendering, or real-time generative pipelines (NeRF, 3DGS, real-time diffusion, or similar) on device.
Deep game-engine or real-time-graphics background (Unity, Unreal, or a custom engine; Metal/Vulkan/D3D/OpenGL ES render pipelines) — especially integrating compute workloads alongside a renderer.
Contributions to open-source ML inference frameworks, runtimes, or GPU/compute libraries — especially in the WebGPU ecosystem (Dawn, wgpu, ORT Web, Transformers.js, WebLLM).
Familiarity with the WebGPU specification and its evolving compute features (subgroups, FP16/shader-f16, timestamp queries) and the trade-offs of running heavy diffusion workloads in the browser/web runtime.
Familiarity with compiler stacks (MLIR, TVM, IREE, XLA) for custom kernel generation and graph optimization.
Experience with on-device benchmarking infrastructure, performance-regression CI, and large device-farm matrices.

Additional information

International relocation support is not available for this position
Work visa/immigration sponsorship is not available for this position

Benefits
At Unity, we want our team members to thrive. We offer a wide range of benefits designed to support well-being and work-life balance.

Please note: Benefits eligibility, specific offerings, and coverage vary based on the country and employment status.

While specific benefits vary, here are some of the ways we strive to take care of our eligible team members globally: Comprehensive health, life, and disability insurance | Commute subsidy | Employee stock ownership | Competitive retirement/pension plans | Generous vacation and personal days | Support for new parents through leave and family-care programs | Office food snacks | Mental Health and Wellbeing programs and support | Employee Resource Groups | Global Employee Assistance Program | Training and development programs | Volunteering and donation matching program

Life at Unity
Unity [NYSE: U] is the world’s leading game engine, powering play for more than 3 billion consumers each month. The top mobile games in the world, the most played PC indie titles, the most innovative console games, and virtually all of the top XR and Web Games are developed, deployed, and grown in Unity. Unity also enables teams across industries like automotive, manufacturing, and healthcare to design, simulate, and collaborate in 3D — closing the gap between ideas and reality. For more information, please visit

Unity is a proud equal opportunity employer. We are committed to fostering an inclusive, innovative environment and celebrate our employees across age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, or any other protected status in accordance with applicable law. Our differences are strengths that enable us to support the growing and evolving needs of our customers, partners, and collaborators. If you have a disability that means there are preparations or accommodations we can make to help ensure you have a comfortable and positive interview experience, please fill out this form to let us know.

This position requires the incumbent to have a sufficient knowledge of English to have professional verbal and written exchanges in this language since the performance of the duties related to this position requires frequent and regular communication with colleagues and partners located worldwide and whose common language is English.

This posting is intended to fill an existing vacancy, and we are committed to providing applicants with updates throughout the hiring process in accordance with applicable law

Headhunters and recruitment agencies may not submit resumes/CVs through this website or directly to managers. Unity does not accept unsolicited headhunter and agency resumes. Unity will not pay fees to any third-party agency or company that does not have a signed agreement with Unity.

Your privacy is important to us. Please take a moment to review our Prospect Privacy Policy and Applicant Privacy Policy. Should you have any concerns about your privacy, please contact us at View email address on click.appcast.io.

#DIR #LI-MC1

*Note: This range reflects the anticipated base salary for this position. Beyond base salary, this role may be eligible for equity awards and participation in our company incentive plans (such as annual discretionary bonuses or sales commissions). The final offer amount will depend on several factors, including geographic location and the candidate’s relevant experience, professional background, and skill set. Gross pay salary $278,100—$347,600 USD

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Principal Machine Learning Engineer, Mobile AI Inference Optimization in Mountain View, CA vacancy

Principal Machine Learning Engineer
$296.3k
...We are seeking a Principal AI Engineer to lead the design and... ...scale training and cloud inference. This includes... ...pipelines, and Pytorch model optimization. This is a highly... ...realizing your ambitions. Learn how GM supports a... ...to help us transform mobility. Explore our global...
Principal
Local area
Remote work
Work from home
Flexible hours
General Motors
Sunnyvale, CA
2 days ago
Inference Optimization ML Engineer
At Rhoda AI, we’re building the next generation of generalist intelligent robots.... ...robotics a reality. We're looking for an Inference Optimization MLE to help build and operate the... ...versions Collaborate closely with research engineers to translate model innovations into...
Suggested
Rhoda AI
Mountain View, CA
1 day ago
Principal Machine Learning Engineer
...AI Models Team Member Splunk, a Cisco... ..., multi-modal machine-generated data —... ...and Cisco's global engineering capabilities. Our... ...data, deep learning-based time series... ...Scale Training & Optimization – Experience optimizing... ...pipelines, and inference efficiency to minimize...
Principal
Flexible hours
Webex Events (formerly Socio)
Mountain View, CA
1 day ago
Principal ML Engineer
...experienced and forward-thinking Principal Machine Learning Engineer to lead the design and... ...industry-leading Voice AI products. This is a high-impact... ...Learning training and inference systems. What you’ll do... ...evaluation, and deployment. Optimize models for latency, memory...
Principal
Tensec
Palo Alto, CA
3 days ago
Machine Learning Engineer
...Seeking an experienced Machine Learning Engineer to lead the... ...protect downstream agentic AI systems across phone,... ...RLHF, DPO, and related optimization techniques to push detection... ...that split safety inference between on‑device (... ...safety models into mobile‑use agents, XR/AR assistants...
Suggested
The Fountain Group
Mountain View, CA
5 days ago
Senior Machine Learning Infrastructure Engineer
$183.7k - $248.6k
...View, CA, USA Senior Machine Learning Infrastructure Engineer Location Mountain... ...CA, USA Department AI & Machine Learning Requisition... ..., model versioning, and inference optimization What we're looking... ...each month. The top mobile games in the world, the...
Work at office
Remote work
Worldwide
Relocation package
Unity Technologies
Mountain View, CA
5 days ago
Principal ML Engineer
$250k - $350k
...first real-time speech AI platform capable of... ...and systems engineering with a design-minded... ...and forward-thinking Principal Machine Learning Engineer to lead the... ...Learning training and inference systems. You'll work... ...and deployment. Optimize models for latency,...
Principal
Sanas
Palo Alto, CA
1 day ago
Principal Machine Learning Engineer, Accelerated Apache Spark
$272k - $431.25k
...NVIDIA is looking for a Machine Learning (ML) Engineer to join the GPU accelerated... ...ML/DL model training and inference pipelines, spanning many domains... ...will apply the latest ML/AI methods to empower... ...performance prediction and optimization of GPU accelerated enterprise...
Principal
NVIDIA
Santa Clara, CA
3 days ago
Machine Learning Engineer - Large Language Models & Generative AI Inference
$147.4k - $272.1k
...California, United States Machine Learning and AI The Intelligence Platform... ...platform, and the primary inference platform that enable next... ...and driven Machine Learning Engineer who has a robust... ...stack, ensuring performance optimization and alignment with broader...
Relocation
Apple
Cupertino, CA
5 days ago
HPE Labs - Principal AI and Machine Learning Research Engineer
...new architectures for AI/ML accelerator integrated... ...operations for optimal assignment of computational... ...** in Electrical Engineering, Computer Science, Data... ...of experience in AI & Machine learning ( academic or industrial... ...secure, cloud-enabled, mobile-friendly infrastructure...
Principal
Work experience placement
Local area
Hewlett Packard Enterprise Development LP
Milpitas, CA
1 day ago
Staff / Principal Machine Learning Engineer, Serving - USA
$270k
...systems and sub-second multimodal inference at scale barely existed.... ...from varied backgrounds who learn fast, thrive in ambiguity, and... ...enough to make a case. Inference Optimization. Deep understanding of modern... ...to major inference engines, or deep-dive technical write...
Principal
Full time
Work at office
Relocation package
Inworld
Mountain View, CA
4 days ago
Principal Machine Learning Engineer
...Intuit is seeking a highly motivated and experienced Principal Machine Learning Engineer to join our Mid Market AI team. In this influential role, you will lead the design, development, and deployment of end-to-end AI/ML solutions that power the next generation of intelligent...
Principal
Intuit Inc.
Mountain View, CA
4 days ago
Machine Learning Engineer, Next-Generation Recommendation Systems (New Grad / PhD)
$112.7k - $169.1k
...opportunity Unity's Vector AI team builds the machine learning systems that decide... ...'s leading game engine. Recommendation and... ...user value, optimizing bids, and delivering... ...experiments using causal inference, A/B testing, and... ...month. The top mobile games in the world,...
Internship
Work at office
Worldwide
Relocation package
Shift work
Jobr
Mountain View, CA
5 days ago
Sr. Machine Learning Engineer
...resilience. Powered by the Illumio AI Security Graph, our breach... .... Our Team's Vision: Our Engineering team is shaping the future of... ...Asynchronous Systems: Architect and optimize high‑throughput, event‑driven... ...managing proprietary model inference endpoints. This position...
Immediate start
Illumio
Sunnyvale, CA
1 day ago
Principal Machine Learning Engineer
...s in it for you? Constant learning, skill growth, great benefits... ...highly skilled and driven Principal Machine Learning Engineer to design and deliver... ...systems that power large-scale AI and large language model (... ...a plus) Build and optimize backend services, databases...
Principal
Permanent employment
Full time
Worldwide
Flexible hours
SAP
Stanford, CA
23 hours ago
Senior Machine Learning Engineer
$194k - $214k
...waste. Instrumental's AI-powered platform gives... ...-centric Senior ML Engineer who will join our cross... ...Experience with deep learning in a production setting... ..., deployment, and inference at scale with familiarity... ...deployment, and performance optimization. Feel at home...
Instrumental Inc
Palo Alto, CA
1 day ago
Principal AI/ML Engineer, AV ML Infra
$275.8k - $340.5k
...the future of mobility with advanced self... ...develop while learning from leaders at... ...demands of AI and ML innovation... ...productivity of ML engineers, and drive the... ...Validation & Inference: Ensures robust... ...: The Principal AI/ML Engineer... ...involve applying machine learning models...
Principal
Local area
Remote work
Work from home
Relocation
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
5 days ago
Machine Learning Software Engineer
$147k - $211k
...training, and deploying machine learning models using... ...with generative AI techniques (e.g.,... ...'s software engineers develop the next-... ...processing, UI design and mobile; the list goes on... ...will build and optimize the deep learning... ...efficient GenAI inference integration. Own...
Full time
Immediate start
Google Inc.
Mountain View, CA
1 day ago
Robotics ML Inference Infrastructure Engineer
Rhoda ai in Palo Alto is seeking an Inference Infrastructure Engineer to help power their model deployment stack for humanoid robots. This role involves designing... ...on Kubernetes deployment pipelines and resource optimization across GPU clusters, you will play a crucial...
Rhoda ai
Palo Alto, CA
5 days ago
Senior Machine Learning Engineer, Recommendation & AI Applications
$195k - $230k
...information powered by advanced AI, recommendation systems,... ...are looking for a Senior Machine Learning Engineer to help evolve our large-... ..., and multi-objective optimization to balance engagement, retention... ...offline training → online inference → A/B experimentation →...
Full time
Local area
Work from home
NewsBreak
Mountain View, CA
2 days ago
Senior ML Inference Engineer - Platform
$128.7k - $261.3k
The Model Deployment & Inference Solutions team in GM AV deploys machine learning models from training... ...and predictable, and optimize models so they meet the... ...equivalent) as part of your engineering workflow. Experience... ...to help us transform mobility.We are determined to...
Flexible hours
General Motors
Sunnyvale, CA
2 days ago
Senior Machine Learning Engineer
$188.5k - $282.7k
...SAGE , Rubrik's Semantic AI Governance Engine, which is the first system... ...supervised fine-tuning, preference optimization (DPO/RLAIF), and... ...Performance Model Serving and Inference Infrastructure (25% of... ...higher) in Computer Science, Machine Learning, Computer Engineering,...
Permanent employment
Local area
Rubrik
Palo Alto, CA
2 days ago
Principal AI Inference Engineer Open-Source & GPU-Focused
$272k - $431.25k
NVIDIA Gruppe is looking for a Principal Software Engineer to advance open-source AI inference. This hands-on role emphasizes running high-performance inference... ...across various teams. Key responsibilities include optimizing inference runtimes, improving efficiency, and...
Principal
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior ML Infrastructure Engineer, Inference Platform
$155.42k - $395.9k
...About the Team: The ML Inference Platform is part of... ...platform that powers GM’s AI efforts. We’re proud... ...feature development by optimizing for high-priority, ML-... ...-of-the-art (SOTA) machine learning models for experimental... ...Senior ML Infrastructure engineer to help build and...
Local area
Remote work
Relocation
Relocation package
Flexible hours
Israelvcforum
Mountain View, CA
4 days ago
Machine Learning Engineer II
$145k - $165k
...science, network economics, AI and ML, online and real-... ..., Growth, and Revenue optimization. Our mission is to apply machine learning to enhance user... ...roles: Machine Learning Engineers (this role) who focus on... ...recommendation systems or casual inference Familiarity with big...
Work experience placement
Casual work
Work at office
Match Group
Palo Alto, CA
1 day ago
Machine Learning Engineer
...Role Overview: As a Machine Learning Engineer, you will play a central role... ...maintainable, and high-impact AI-driven features that align... ...Proven experience scaling inference infrastructure for LLMs/... ..., quantization, deployment optimization). Experienced in inference...
Nace AI
Palo Alto, CA
4 days ago
Senior ML Engineer - Embodied AI Onboard Autonomy
$158k - $241.9k
...teams are redefining mobility. Through a human-... ...Role: As a Senior AI/ML Engineer within the Onboard Embodied... ...-edge end-to-end machine learning solutions directly impacting... ...of real-time inference and robust... ...methodologies, and inference optimization strategies suited for...
Relocation package
Flexible hours
General Motors
Mountain View, CA
4 days ago
Senior Machine Learning Engineer - Perception & Embodied AI
$170.6k - $261.3k
...Motors, our product teams are redefining mobility. Through a human-centered design... ...on a global scale. Role As a Senior Machine Learning Engineer for Perception within the EmbodiedAI... ...research and large-scale data curation to optimization and real-time deployment on the...
Remote work
Relocation package
Flexible hours
General Motors
Mountain View, CA
1 day ago
Machine Learning Compiler Engineer
$147.4k - $272.1k
...Machine Learning Compiler Engineer At Apple, we're on the cutting edge of delivering transformative... ...about pushing the boundaries of AI and hardware optimization, we want you to join our team!... ...optimizing it for deep learning inference with a focus on performance,...
Relocation
Apple
Sunnyvale, CA
3 days ago
Machine Learning Engineer, User Understanding (Entry-Level / New Grad)
$100.8k - $155.98k
...Mountain View, CA, USA Machine Learning Engineer, User Understanding (Entry-Level... ..., CA, USA Department AI & Machine Learning... ...build behavioral models and optimize ad performance. What you'... ...consumers each month. The top mobile games in the world, the most...
Work at office
Worldwide
Relocation package
Unity Technologies
Mountain View, CA
6 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Machine Learning Engineer, Mobile AI Inference Optimization. Be the first to apply!