Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior NPU Kernel Operator Engineer

Full-time

Senior NPU Kernel / Operator Engineer

Overview

We are seeking a Senior NPU Kernel / Operator Engineer to lead the development and optimization of high-performance deep learning operators for a next-generation AI accelerator platform .

This role focuses on kernel design, hardware-aware performance tuning, and correctness validation across a broad range of neural network workloads.

The ideal candidate will have deep experience optimizing compute-intensive software on GPU, NPU, DSP, SIMD, embedded accelerators, compiler backends, or HPC systems , with the ability to reason from model-level requirements down to hardware execution efficiency.

Responsibilities

  • Design, implement, and optimize high-performance operators such as:
    • Normalization
    • Reduction
    • Transpose
    • Reshape
    • Gather / Scatter
    • Quantization / Dequantization
    • Fused elementwise kernels
  • Own performance optimization across key hardware constraints, including:
    • Memory bandwidth
    • SRAM utilization
    • Data reuse
    • DMA latency
    • Bank conflicts
    • Compute utilization
  • Develop advanced optimization strategies including:
    • Tiling
    • Blocking
    • Vectorization
    • Memory scheduling
  • Analyze and resolve bottlenecks related to:
    • Memory hierarchy
    • Synchronization overhead
    • Instruction scheduling
    • Data movement
  • Validate operator correctness and numerical precision against reference implementations (e.g. PyTorch, NumPy)
  • Benchmark and profile kernel performance across simulation, emulation, FPGA, or production silicon environments
  • Debug complex issues involving:
    • Tensor layouts
    • Precision loss
    • Memory access patterns
    • Performance regressions
  • Build performance models and optimize operators toward hardware roofline limits
  • Collaborate closely with compiler, runtime, hardware architecture, and ML model teams to improve operator APIs and execution efficiency
  • Document optimization strategies, tensor layouts, and performance improvements
  • Mentor junior engineers and help define engineering best practices

Requirements

  • BS / MS / PhD in Computer Science, Electrical Engineering, Computer Engineering , or related field
  • 5+ years of experience in one or more of the following:
    • Accelerator programming
    • GPU / NPU development
    • Compiler backend engineering
    • Embedded systems
    • High-performance computing
    • Performance optimization
  • Strong programming skills in:
    • C/C++
    • Python
  • Deep understanding of:
    • Tensor computation
    • Neural network operators
  • Strong knowledge of computer architecture concepts:
    • Memory hierarchy
    • Bandwidth and latency analysis
    • Cache / SRAM behaviour
    • Parallelism and synchronization
    • Data locality and vectorization
  • Proven experience optimizing performance-critical kernels or numerical compute pipelines
  • Ability to identify and resolve performance bottlenecks from algorithm through to hardware execution
  • Strong debugging, profiling, and analytical problem-solving skills

Preferred Experience

Experience with one or more of the following:

Frameworks / Tooling

  • CUDA
  • Triton
  • OpenCL
  • TVM
  • MLIR
  • Halide

Systems Experience

  • SIMD
  • DSP
  • Embedded C/C++
  • GPU / NPU programming
  • FPGA development
  • HPC systems

Advanced Optimization Techniques

  • Tiling and blocking
  • Vectorization
  • Memory access optimization
  • Instruction scheduling
  • Mixed-precision optimization

Numerical Formats

  • FP32
  • FP16
  • BF16
  • FP8
  • INT8 / INT4

AI Accelerator Architecture Familiarity

  • Matrix engines
  • Vector engines
  • Systolic arrays
  • DMA engines
  • SRAM / NoC / DRAM systems

Bonus

  • Experience with simulator, emulator, FPGA, or silicon bring-up

Opportunity

Join a highly technical team building cutting-edge AI compute infrastructure and contribute directly to the performance of next-generation machine learning hardware. This is an opportunity to work at the intersection of AI systems, compiler optimisation, and hardware acceleration , with significant ownership and technical impact.

Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.

Reece Waldon

Vacancy posted 14 days ago
Similar jobs that could be interesting for youBased on the Senior NPU Kernel Operator Engineer in San Jose, CA vacancy
  • We are looking for a Senior NPU Kernel/Operator Engineer to lead the design and optimization of high-performance kernels for a custom AI accelerator / NPU. This role focuses on general-purpose deep learning operators, fused kernels, and hardware-aware performance optimization... 
    Suggested
    Night shift

    Black Sesame Technologies Inc

    San Jose, CA
    4 days ago
  • Black Sesame Technologies Inc is seeking a Senior NPU Kernel/Operator Engineer in San Jose, California. This role focuses on designing and optimizing high-performance kernels for a custom AI accelerator, involving deep learning operations and performance optimization. The... 
    Senior

    Black Sesame Technologies Inc

    San Jose, CA
    4 days ago
  • A global semiconductor company in San Jose seeks a Senior Systems Design Engineer to develop and optimize ML operator kernels for their NPU platform. The candidate will work on end-to-end model performance and collaborate closely with silicon teams to ensure innovation... 
    Senior
    Full time

    AMD

    San Jose, CA
    4 days ago
  • $143.2k - $186k

     ...smart electric sedan. About the Position We are seeking a senior OS / kernel engineer to join our SkyOS team. The team is responsible for the design and development of NIO's full-domain vehicle operating systems. The position will explore new ideas and designs that... 
    Senior
    Full time
    Temporary work
    Flexible hours

    NIO

    San Jose, CA
    4 days ago
  • NVIDIA Gruppe is looking for a senior engineer to join their Math Libraries team in Santa Clara, California. This role involves designing...  ...numerical linear algebra software on GPUs, with a strong focus on kernel generation. The ideal candidate has over 8 years of... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • NVIDIA Gruppe is seeking a Senior Formal Verification Engineer for GPU Kernels, focused on creating verification tools that ensure correct behavior in various environments. This role involves designing verification tools, integrating AI into workflows, and participating... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $143.2k - $186k

    1600 NIO USA, Inc. is seeking a Senior OS / Kernel Engineer for the SkyOS team to design and develop full-domain vehicle operating systems. Candidates should have a strong background in operating system internals and proficiency in languages like C or Rust. The position... 
    Senior

    1600 NIO USA, Inc.

    San Jose, CA
    4 days ago
  • NVIDIA Gruppe is seeking a Senior Software Engineer to work on system software for datacenter products in Santa Clara, California. This role involves...  ...have over 10 years of experience, a strong grasp on Linux kernel internals, and expertise in data center architectures.... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

    NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

    Overview We are looking for a Senior Formal Verification Engineer for GPU Kernels. NVIDIA's Deep Learning Safety Team is hiring engineers to build verification tools that prove GPU kernels behave correctly, enabling their deployment in a wide range of environments, including... 
    Senior
    Work experience placement

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $272k - $431.25k

    NVIDIA Gruppe is looking for a seasoned software professional to work on the CUDA Driver, an essential part of our platform for accelerating general purpose computation on the GPU. This role involves delivering features to enhance NVIDIA hardware for various computational...
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • A pioneering AI semiconductor company based in San Jose, CA is seeking a Senior Field Application Engineer. You will engage with customers to provide technical support, lead product demonstrations, and work closely with R&D teams. The ideal candidate will have a BS/MS in... 
    Senior

    DEEPX Co., Ltd.

    San Jose, CA
    3 days ago
  • $205k - $255k

     ...Senior Principal Technologist – Memory San Jose, California, United States Astera...  ...Qualifications ~ BS in Electrical or computer engineering, MS or PhD preferred. ~≥10 year's...  ...in product integration with BIOS, kernel, OS, tooling, and BMCs Experience with... 
    Senior
    Flexible hours

    Astera Labs

    San Jose, CA
    1 day ago
  • $165k - $241.4k

     ...The Cisco Distributed System Engineering (DSE) group owns the development...  ...One ASIC in the area of NPU management and health monitoring...  ...automated test suites for kernel module validation, including...  ...Pytest. Experience with Linux Operating Sytems and debugging tools.... 
    Senior
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    Milpitas, CA
    4 days ago
  • $184k - $287.5k

     ...infrastructure environments, to automate operational monitoring and alerting, and to enable...  ...degree in Computer Science, Electrical Engineering or related field or equivalent...  ...Vast, Lustre, GPFS) and Linux storage kernel development. GPU & AI Infrastructure... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

     ...known as “the AI computing company”. We are hiring software engineers for the CUDA Tile team. NVIDIA GPUs are at the center of the deep...  ...lowering passes, and optimize the performance of tile-based kernels to ensure they execute efficiently across multiple generations... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • NVIDIA Gruppe is seeking an experienced Compiler Engineer in Santa Clara to design and optimize compiler passes and infrastructure for GPU kernels. You'll work with a dynamic team and be involved in architecture decisions while collaborating across various teams. The ideal... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...We are looking for software engineers to join our development efforts in the area of dense linear algebra kernels for high-performance libraries such as cuSOLVER. Around the world, leading commercial and academic organizations are revolutionizing AI, data analytics, and... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

     ...NVIDIA Math Libraries team is looking for a senior engineer to join our development efforts in the area of kernel generation for AI and HPC, specifically targeting matrix operations, JITing and fusions. Around the world, leading commercial and academic organizations are... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    22 hours ago
  • $184k - $287.5k

     ...Senior System CI/CD Engineer Your responsibility will be crucial in maintaining and improving our outstanding software development infrastructure...  ...systems. ~ Proven experience with system software and kernel development, including debugging and optimization. ~... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $190k - $235k

     ...Senior Perception Learning Engineer Sunnyvale, CA Apptronik is a human-centered robotics company developing...  ...healthcare, the home, and beyond. We operate at the cutting edge of embodied AI,...  ...models on edge hardware (GPU/NPU/embedded platforms) under compute, latency... 
    Senior
    Local area

    Apptronik

    Sunnyvale, CA
    3 days ago
  • $152k - $241.5k

     ...company”. We are looking for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring software engineers for its Deep Learning & AI...  ...models, algorithms and frameworks, such as PyTorch, JAX. GPU kernel authoring and performance analysis using tools such as Nsight... 
    Senior

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

     ...perceive and understand the world.We are seeking an exceptional Senior Perception Engineer to help design and productize NVIDIA’s next-generation...  ...training or inference pipelines through custom CUDA kernels or other GPU-accelerated components. Your base salary will... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

    We are hiring software engineers for the CUDA Tile team, a new tile‑based programming model for NVIDIA GPUs. What you’ll be doing: Work...  ...and lowering passes, and optimize the performance of tile‑based kernels to ensure efficient execution across multiple generations of NVIDIA... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • Apple Inc. is seeking a Senior Software Engineer in Cupertino, California to help shape the future of Apple Watch. Your main responsibility will...  ..., you will work on system software stack challenges, from kernel to application layers, diagnose power issues, and implement... 
    Senior

    Apple Inc.

    Cupertino, CA
    1 day ago
  • $184k - $287.5k

     ...software development. The role of a Deep Learning Systems Engineer would be to analyze the performance and power...  ...at least one of the following: ~ System Software: Operating Systems (Linux), Compilers, GPU kernels (CUDA), DL Frameworks (PyTorch, TensorFlow). ~ Silicon... 
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $184k - $356.5k

    NVIDIA Gruppe in Santa Clara, California, is hiring software engineers to enhance the CUDA driver, integral for GPU computation. Responsibilities...  ...chips, coordinating with teams, and maintaining performance in kernel and userspace code. The ideal candidate will possess a BS or MS... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

     ...We’re currently seeking a Senior Developer Technology Engineer! NVIDIA's Developer Technology Engineering team is a global network of world-class...  ...ratio, such as processing directly on compressed data and kernel fusion. You optimized end-to-end performance of applications... 
    Senior
    Work experience placement

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

     ...computing company”. We are looking for a Senior Performance Compiler Engineer to join our team and work on the...  ...using MLIR to optimize high-level kernel descriptions (written in Triton\'s Python...  ...products and ensure we are always operating at maximum efficiency.... 
    Senior

    NVIDIA AI

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

    NVIDIA is hiring an AI & Deep Learning Compiler Engineer for its Deep Learning & AI Compiler (DLC) team. What you’ll be doing Analyzing...  ...models, algorithms and frameworks, such as PyTorch and JAX. GPU kernel authoring and performance analysis using tools such as Nsight... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior NPU Kernel Operator Engineer. Be the first to apply!