Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Engineer, On-Device AI Inference & Systems

$278.1k - $347.6k
Full-time

Unity Technologies

The opportunity We are building the next generation of AI-driven game experiences, running generative models on-device, right where the players are — on phones, tablets, laptops, and desktops. Our games run inside a modern, browser-native runtime (built on technologies such as WebGPU and WebNN), so the models that power these experiences must be deployed and accelerated entirely within that runtime. As our Principal Engineer for On-Device AI Inference & Systems, you will be the foremost engineering authority on taking state-of-the-art multi-modal models (transformers and diffusion networks) and making them run fast, small, and reliably within that runtime, fully integrated into a production game engine. This is a deeply hands-on, high-impact engineering role. You will own the inference and integration stack end-to-end — from the moment a trained checkpoint leaves research, through export, optimization, and kernel-level tuning, to a shipped feature running inside the engine at interactive frame rates within a fixed memory and power budget. You will set the engineering standards, drive the architecture of the runtime and integration layers, and mentor a team of senior and mid-level engineers. Your work directly determines the latency, quality, memory footprint, and battery profile of AI features experienced by players worldwide. This role is for an engineer who is energized by the gap between a research model and a shipping, AI-based product. If you love profilers, frame captures, op-fusion, and shaving milliseconds and megabytes, this is your role. What you'll be doing

  • Inference & On-Device Optimization
  • Own the end-to-end optimization pipeline: model export, graph transformation,
operator fusion, memory-layout planning, and hardware-specific kernel tuning across NPU, mobile GPU, and desktop/laptop GPU. * Make authoritative decisions on quantization (INT4/INT8/FP16), weight sharing, structured/unstructured pruning, and knowledge distillation to hit hard latency, memory, and power budgets — and validate them against quality bars. * Drive low-level performance work: write and tune WebGPU compute shaders (WGSL) and, where relevant, native kernels (Metal, Vulkan/SPIR-V compute, D3D12, CUDA); profile with browser and platform tools (Chrome/Dawn GPU traces, PIX, Instruments/Metal System Trace, Snapdragon Profiler, Nsight, RenderDoc), and eliminate bottlenecks at the op and memory-bandwidth level. * Apply efficiency techniques — dynamic resolution, token reduction, cross-frame caching/reuse, reduced-step diffusion samplers — as engineering levers to meet budgets on target SKUs.
  • Runtime & Systems Integration
  • Evaluate, select, and drive adoption of WebGPU-targeted inference runtimes
(ONNX Runtime Web, Transformers.js, WebLLM, TensorFlow.js) alongside native options (CoreML, ONNX Runtime, TFLite, ExecuTorch) — and extend or build runtime/glue code where off-the-shelf options fall short of our diffusion workloads. * Design and own the integration between the ML runtime and the game engine: real-time scheduling, threading, memory pooling, zero-copy buffer sharing between the inference path and the render path, and frame-budget management alongside the renderer. * Architect inference systems that handle diverse inputs — images, text, primitives, metadata — and produce pixel-level outputs with real-time performance, robust to the messy realities of production (cold starts, thermal throttling, device fragmentation, backgrounding). * Build the supporting engineering: model packaging and asset pipelines, on-device fallbacks and SKU-aware capability tiers, crash/quality telemetry, and automated on-device benchmarking in CI.
  • Research Productionization
  • Partner closely with research scientists to turn novel architectures into
implementations that are deployable, debuggable, and fast on device. * Provide the feedback loop back into research: surface hardware constraints, op-support gaps, and cost models early so model design and deployment converge. * Track breakthroughs in efficient inference (efficient attention, distillation, reduced-step diffusion) and assess them pragmatically: what actually moves latency/memory/power on our target devices, and what is worth the engineering cost.
  • Engineering Leadership
  • Lead and mentor a team of engineers; set engineering best practices,
code-review standards, performance-regression gates, and on-device benchmarking methodology. * Champion a culture of measurement: define and enforce KPIs for latency, quality, memory, and power, and ensure they are tracked rigorously across the device matrix. * Partner with platform engineers, product managers, and runtime teams to align ML capabilities with device-SKU constraints and product roadmaps. What we're looking for * 8+ years in software/ML engineering, with at least 4 years focused on on-device / edge inference or real-time, performance-critical systems. * Proven production deployment of transformer- and/or diffusion-based models (e.g., ViT, Stable Diffusion) on mobile, desktop, or embedded hardware — shipped, not just prototyped. * Hands-on experience deploying models through WebGPU — e.g., ONNX Runtime Web (WebGPU EP), Transformers.js, WebLLM, or TensorFlow.js — including writing/tuning WGSL compute shaders and working within WebGPU's adapter, device-limits, and binding model. Equivalent deep experience with a native GPU/compute API plus a clear path to WebGPU will also be considered. * Hands-on expertise with at least one major inference runtime (ONNX Runtime / ORT Web, CoreML, TFLite, ExecuTorch) and deep understanding of operator fusion, memory layout, and runtime scheduling. * Low-level performance engineering: strong command of at least one GPU/compute API — WebGPU/WGSL, Metal, Vulkan, D3D12, or CUDA — and the profiling tools to go with it. You can read a frame capture and a kernel trace and know where the time and memory go. * Working knowledge of model-optimization techniques — quantization (INT4/INT8/FP16), weight sharing, pruning, and distillation — and the practical judgment to apply them to hit latency and memory budgets. You don't need to be a research expert in these methods; you need to use them effectively as engineering tools. * Strong understanding of target hardware: mobile SoCs (Apple Neural Engine, Qualcomm Hexagon/Adreno, ARM Mali) and desktop/laptop GPUs (Apple Silicon, NVIDIA, AMD, Intel) — and how to target each for peak throughput. * Proficiency in the core languages of a browser-native runtime — TypeScript/JavaScript and WGSL — plus solid Python for export pipelines and training-side tooling. * Working fluency with the models you deploy — enough to read an architecture, modify it for deployment, and reason about accuracy trade-offs. * Track record of technical leadership: setting engineering direction, influencing cross-functional partners, and growing engineers. You might also have * Experience shipping world-model, neural-rendering, or real-time generative pipelines (NeRF, 3DGS, real-time diffusion, or similar) on device. * Deep game-engine or real-time-graphics background (Unity, Unreal, or a custom engine; Metal/Vulkan/D3D/OpenGL ES render pipelines) — especially integrating compute workloads alongside a renderer. * Contributions to open-source ML inference frameworks, runtimes, or GPU/compute libraries — especially in the WebGPU ecosystem (Dawn, wgpu, ORT Web, Transformers.js, WebLLM). * Familiarity with the WebGPU specification and its evolving compute features (subgroups, FP16/shader-f16, timestamp queries) and the trade-offs of running heavy diffusion workloads in the browser/web runtime. * Familiarity with compiler stacks (MLIR, TVM, IREE, XLA) for custom kernel generation and graph optimization. * Experience with on-device benchmarking infrastructure, performance-regression CI, and large device-farm matrices. Additional information
  • International relocation support is not available for this position
  • Work visa/immigration sponsorship is not available for this position
Benefits At Unity, we want our team members to thrive. We offer a wide range of benefits designed to support well-being and work-life balance. Please note: Benefits eligibility, specific offerings, and coverage vary based on the country and employment status. While specific benefits vary, here are some of the ways we strive to take care of our eligible team members globally: Comprehensive health, life, and disability insurance | Commute subsidy | Employee stock ownership | Competitive retirement/pension plans | Generous vacation and personal days | Support for new parents through leave and family-care programs | Office food snacks | Mental Health and Wellbeing programs and support | Employee Resource Groups | Global Employee Assistance Program | Training and development programs | Volunteering and donation matching program Life at Unity Unity [NYSE: U] is the world’s leading game engine, powering play for more than 3 billion consumers each month. The top mobile games in the world, the most played PC indie titles, the most innovative console games, and virtually all of the top XR and Web Games are developed, deployed, and grown in Unity. Unity also enables teams across industries like automotive, manufacturing, and healthcare to design, simulate, and collaborate in 3D — closing the gap between ideas and reality. For more information, please visit Unity is a proud equal opportunity employer. We are committed to fostering an inclusive, innovative environment and celebrate our employees across age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, or any other protected status in accordance with applicable law. Our differences are strengths that enable us to support the growing and evolving needs of our customers, partners, and collaborators. If you have a disability that means there are preparations or accommodations we can make to help ensure you have a comfortable and positive interview experience, please fill out this form [ to let us know. This position requires the incumbent to have a sufficient knowledge of English to have professional verbal and written exchanges in this language since the performance of the duties related to this position requires frequent and regular communication with colleagues and partners located worldwide and whose common language is English. This posting is intended to fill an existing vacancy, and we are committed to providing applicants with updates throughout the hiring process in accordance with applicable law Headhunters and recruitment agencies may not submit resumes/CVs through this website or directly to managers. Unity does not accept unsolicited headhunter and agency resumes. Unity will not pay fees to any third-party agency or company that does not have a signed agreement with Unity. Your privacy is important to us. Please take a moment to review our Prospect Privacy Policy [ and Applicant Privacy Policy [ Should you have any concerns about your privacy, please contact us at View email address on click.appcast.io.

#DIR #LI-MC1

*Note: This range reflects the anticipated base salary for this position. Beyond base salary, this role may be eligible for equity awards and participation in our company incentive plans (such as annual discretionary bonuses or sales commissions). The final offer amount will depend on several factors, including geographic location and the candidate’s relevant experience, professional background, and skill set. Gross pay salary

$278,100—$347,600 USD

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Principal Engineer, On-Device AI Inference & Systems in Mountain View, CA vacancy
  • $278.1k - $347.6k

     ...Mountain View, CA, USA Principal Machine Learning Engineer, Mobile AI Inference Optimization Location Mountain View...  ...world models to mobile on-device. As our Principal Machine Learning...  ...implementations. Design scalable systems for multi-modal inference that process... 
    Suggested
    Work at office
    Worldwide
    Relocation package

    Unity Technologies

    Mountain View, CA
    2 days ago
  •  ...generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of...  ...looking for a Senior Staff AI Infra Engineer who is passionate about...  ...and accelerate LLM training and inference on AMD GPUs, improving kernel, communication... 
    Suggested

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    4 days ago
  • A leading technology company in Sunnyvale seeks a Systems Software Engineering Manager for their Vision Products Group. You will lead a team to develop foundational technologies, enhance networking and multimedia capabilities, and mentor engineers. Ideal candidates have... 
    Suggested

    Apple Inc.

    Sunnyvale, CA
    18 hours ago
  •  ...Engineering Manager, Inference ML Runtime Sunnyvale CA or Toronto Canada Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture...  ...programming simplicity of a single device. This approach allows Cerebras to... 
    Suggested

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    4 days ago
  • $320k

    Within NVIDIA's Edge AI, Metropolis, and Blueprints...  ...team is the execution engine behind NVIDIA’s Vision...  ...accelerated video intelligence systems using DeepStream and...  ...robust, low‑latency inference at scale. You have led...  ...Edge and Enterprise devices: Lead Accelerated Computing... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

    NVIDIA Gruppe is looking for a Principal Software Engineer to advance open-source AI inference. This hands-on role emphasizes running high-performance inference on...  ...other engineers. A strong background in systems engineering, LLM serving, and programming in Rust... 

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  •  ...Digital Solutions in Santa Clara, CA is seeking a Principal Electrical Engineer to lead the MONARCH platform's electrical system architecture. The successful candidate will have deep expertise in PCB design and medical device safety standards, with responsibilities... 

    6267-Auris Health Inc. Legal Entity

    Santa Clara, CA
    18 hours ago
  • $224k - $356.5k

     ...tapping into the unlimited potential of AI to define the next era of computing. An...  ...LLM serving performance across various inference frameworks. Hyperscalers, cloud...  ...hands‑on leadership with expertise in systems engineering, inference infrastructure, and open‑source... 
    Local area
    Worldwide

    NVIDIA Gruppe

    Santa Clara, CA
    18 hours ago
  • $255.7k - $346k

     ...and defense. Our Self-Driving Systems (SDS) team develops...  ...Europe. We are looking for an Engineering Manager to lead ML teams within...  ...at scale, quantization, and device-specific optimizations. Models...  ...from training code to onboard inference. Experience managing through... 
    Full time

    Applied Intuition

    Sunnyvale, CA
    4 days ago
  • $79.21 - $104.97 per hour

     ...) This is a Stanford Health Care job. A Brief Overview As a Principal System Engineer - Cloud Services, You will oversee Stanford Health Care's cloud...  ...related to modern application platforms, generative AI, machine learning, data engineering pipelines, Databricks environments... 
    Hourly pay
    Work experience placement

    Stanford Health Care

    Palo Alto, CA
    8 days ago
  • $230k - $260k

     ...Principal ML Engineer Palo Alto, CA About Typeface We help the world's...  ...intersection of creativity and AI with real impact. Join us to...  ...design of large-scale ML systems and shared platforms that power...  ...(training, evaluation, inference, safety) used across multiple... 
    Work at office
    Immediate start
    Flexible hours
    3 days per week

    Typeface

    Palo Alto, CA
    3 days ago
  • $291.5k - $369.1k

     ...cloud environments. Join the AI Models team at Splunk, where...  ...of Splunk and Cisco’s global engineering capabilities. Our work spans...  ...research ideas into production systems. Preferred Qualifications:...  ...distributed training pipelines, and inference efficiency to minimize cost... 
    Full time
    Temporary work
    Local area
    Flexible hours

    CISCO, Inc.

    Los Altos, CA
    3 days ago
  • $296.3k

     ...at minimum. The Role: We are seeking a Principal AI Engineer to lead the design and advancement of...  ...powers large-scale training and cloud inference. This includes accelerating training throughput...  ...next generation of AI-driven driving systems. We're tackling challenges across... 
    Remote work
    Flexible hours

    General Motors

    Sunnyvale, CA
    18 hours ago
  • $148.5k - $223.9k

     ...is seeking a Senior Member of Technical Staff specializing in AI Research in California's Palo Alto area. The role involves collaboration with engineers and product managers to innovate agentic AI systems, utilizing strong technical skills and strategic thinking. Key... 

    Airkit

    Palo Alto, CA
    4 days ago
  • ATX Venture Partners seeks a Principal Engineer to drive technology initiatives and create scalable solutions. You'll develop systems in a highly collaborative environment, utilizing both...  ...back-end technologies, particularly in AI domains. The ideal candidate has over 10... 

    ATX Venture Partners

    Mountain View, CA
    4 days ago
  • $163.2k - $220.8k

     ...The firm is looking for a Principal Solutions Engineer to join and be a key member...  ...SaaS, PaaS, on-prem and cloud systems. Lead solution design...  ...business technologies (including AI advancements) and leverages...  ...threat modeling, DLP, MDM, device posture, and vendor risk.... 
    Remote work
    Worldwide

    Wilson Sonsini Goodrich and Rosati

    Palo Alto, CA
    2 days ago
  • $250k - $350k

     ...world's first real‑time speech AI platform capable of accent...  ...in model innovation and systems engineering with a design‑minded product...  ...experienced and forward‑thinking Principal Machine Learning Engineer to...  ...Learning training and inference systems. Youʼll work cross‑... 

    Sanas

    Palo Alto, CA
    18 hours ago
  • $147k - $237.5k

     ...Job Title We're seeking innovators - engineers who seek to design new products, designing...  ...exchange IoT intelligences and extend system functionality through integrations with...  ...high performance services to support IoT devices' operational technology support Design... 
    Work experience placement

    Palo Alto Networks

    Palo Alto, CA
    18 hours ago
  • $190k - $290k

     ...Dormont Manufacturing Co is hiring an Engineering Manager for Foundational Data Systems. This role involves hands-on technical leadership in managing a distributed...  ...team, focusing on backend systems essential for AI. The ideal candidate should have extensive experience... 

    Dormont Manufacturing Company

    Mountain View, CA
    4 days ago
  •  ...Principal AI Engineer We are seeking a highly accomplished Principal AI Engineer to define and...  ...building, and scaling production-grade AI systems that deliver durable business impact....  ...and batch processing, and model inference optimization Excellent communication... 

    Dynamic Yield

    Mountain View, CA
    2 days ago
  • $180k - $230k

     ...Knightscope is seeking a Principal Engineer to own the end-to-end...  ..., enterprise-grade system that unifies our entire...  ...PTZ/Axis cameras, and AI detections into a single...  ...and third-party devices. Architect the real...  ...including video analytics, AI inference, digital twin engines,... 
    Full time

    Knightscope

    Sunnyvale, CA
    5 days ago
  •  ...Intuit is seeking a highly motivated and experienced Principal Machine Learning Engineer to join our Mid Market AI team. In this influential role, you will lead the...  ...a multi-year technical roadmap. End-to-End ML Systems: Lead the full lifecycle of ML solutions—from data... 

    Intuit Inc.

    Mountain View, CA
    3 days ago
  • $220.2k - $330.4k

     ...Technologies, Inc. Job Area: Engineering Group, Engineering Group > Systems Engineering General...  ...edge, focusing on AI, edge computing and...  ...for generative AI inference and computer vision...  ...various connected devices in on‑device, on‑...  ...scenarios. As a Principal Systems Solutions... 
    Work experience placement
    Work at office

    Qualcomm

    Santa Clara, CA
    3 days ago
  • $200k - $270k

     ...(CCoE), you will own the program and a Systems Engineering team focused on cloud standards, shared...  ...Evaluate emerging tools and technologies (AI infrastructure, edge compute, etc.) and...  ...Background in consumer IoT, connected devices, or real‑time streaming platforms.... 
    Work at office

    Arlo Technologies, Inc.

    Milpitas, CA
    18 hours ago
  • $163k - $237k

    Technical Program Manager III, System Engineering, Pixel Watch corporate_fare Google place Mountain View, CA, USA Apply Bachelor's degree...  ...wearable experience. The team works on shaping the future of Pixel devices and services through some of the most advanced designs,... 
    Full time

    Google Inc.

    Mountain View, CA
    3 days ago
  • $275.8k - $340.5k

     ...the unique demands of AI and ML innovation, supporting...  ...productivity of ML engineers, and drive the...  ...includes: AI Validation & Inference: Ensures robust model...  ..., these tools and systems empower GM to tackle the...  ...Overview: The Principal AI/ML Engineer will lead... 
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    4 days ago
  • $307k - $427k

    Google Inc. is looking for a Principal Engineer for YouTube Shorts in Mountain View, CA. This role focuses on inspiring creators and mapping...  .... Key responsibilities include building recommendation systems and leveraging advanced technologies to drive creator engagement... 
    Full time

    Google Inc.

    Mountain View, CA
    18 hours ago
  • $164.8k - $226.6k

     ...scalability. For decades, quartz devices, non-silicon technology, have kept systems in sync, but they struggle...  ...high-growth ones in AI datacenters, automated driving...  ...We are seeking a hands-on Principal Infrastructure Hardware Engineer to architect, design, and deliver... 

    SiTime Corporation

    Santa Clara, CA
    more than 2 months ago
  • $250k - $300k

     ...Company Overview CommerceIQ’s AI-powered digital commerce...  ...The Role We're looking for an Engineering Leader with a Data Science /...  ...be accountable for scaling AI systems from research through production...  ...and pipelines for training and inference. Fluency in backend and ML‑... 
    Temporary work

    CommerceIQ

    Mountain View, CA
    18 hours ago
  • Walmart is seeking a Principal Engineer in Performance and Resiliency Engineering in Sunnyvale. This role involves architecting AI systems that prevent performance degradation and failures, impacting Walmart’s operational reliability. Your expertise in distributed systems... 

    Walmart

    Sunnyvale, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Engineer, On-Device AI Inference & Systems. Be the first to apply!