Senior NPU Kernel Operator Engineer
Senior NPU Kernel / Operator Engineer
Overview
We are seeking a Senior NPU Kernel / Operator Engineer to lead the development and optimization of high-performance deep learning operators for a next-generation AI accelerator platform .
This role focuses on kernel design, hardware-aware performance tuning, and correctness validation across a broad range of neural network workloads.
The ideal candidate will have deep experience optimizing compute-intensive software on GPU, NPU, DSP, SIMD, embedded accelerators, compiler backends, or HPC systems , with the ability to reason from model-level requirements down to hardware execution efficiency.
Responsibilities
- Design, implement, and optimize high-performance operators such as:
- Normalization
- Reduction
- Transpose
- Reshape
- Gather / Scatter
- Quantization / Dequantization
- Fused elementwise kernels
- Own performance optimization across key hardware constraints, including:
- Memory bandwidth
- SRAM utilization
- Data reuse
- DMA latency
- Bank conflicts
- Compute utilization
- Develop advanced optimization strategies including:
- Tiling
- Blocking
- Vectorization
- Memory scheduling
- Analyze and resolve bottlenecks related to:
- Memory hierarchy
- Synchronization overhead
- Instruction scheduling
- Data movement
- Validate operator correctness and numerical precision against reference implementations (e.g. PyTorch, NumPy)
- Benchmark and profile kernel performance across simulation, emulation, FPGA, or production silicon environments
- Debug complex issues involving:
- Tensor layouts
- Precision loss
- Memory access patterns
- Performance regressions
- Build performance models and optimize operators toward hardware roofline limits
- Collaborate closely with compiler, runtime, hardware architecture, and ML model teams to improve operator APIs and execution efficiency
- Document optimization strategies, tensor layouts, and performance improvements
- Mentor junior engineers and help define engineering best practices
Requirements
- BS / MS / PhD in Computer Science, Electrical Engineering, Computer Engineering , or related field
- 5+ years of experience in one or more of the following:
- Accelerator programming
- GPU / NPU development
- Compiler backend engineering
- Embedded systems
- High-performance computing
- Performance optimization
- Strong programming skills in:
- C/C++
- Python
- Deep understanding of:
- Tensor computation
- Neural network operators
- Strong knowledge of computer architecture concepts:
- Memory hierarchy
- Bandwidth and latency analysis
- Cache / SRAM behaviour
- Parallelism and synchronization
- Data locality and vectorization
- Proven experience optimizing performance-critical kernels or numerical compute pipelines
- Ability to identify and resolve performance bottlenecks from algorithm through to hardware execution
- Strong debugging, profiling, and analytical problem-solving skills
Preferred Experience
Experience with one or more of the following:
Frameworks / Tooling
- CUDA
- Triton
- OpenCL
- TVM
- MLIR
- Halide
Systems Experience
- SIMD
- DSP
- Embedded C/C++
- GPU / NPU programming
- FPGA development
- HPC systems
Advanced Optimization Techniques
- Tiling and blocking
- Vectorization
- Memory access optimization
- Instruction scheduling
- Mixed-precision optimization
Numerical Formats
- FP32
- FP16
- BF16
- FP8
- INT8 / INT4
AI Accelerator Architecture Familiarity
- Matrix engines
- Vector engines
- Systolic arrays
- DMA engines
- SRAM / NoC / DRAM systems
Bonus
- Experience with simulator, emulator, FPGA, or silicon bring-up
Opportunity
Join a highly technical team building cutting-edge AI compute infrastructure and contribute directly to the performance of next-generation machine learning hardware. This is an opportunity to work at the intersection of AI systems, compiler optimisation, and hardware acceleration , with significant ownership and technical impact.
Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.
Reece Waldon
- We are looking for a Senior NPU Kernel/Operator Engineer to lead the design and optimization of high-performance kernels for a custom AI accelerator / NPU. This role focuses on general-purpose deep learning operators, fused kernels, and hardware-aware performance optimization...SuggestedNight shift
- Black Sesame Technologies Inc is seeking a Senior NPU Kernel/Operator Engineer in San Jose, California. This role focuses on designing and optimizing high-performance kernels for a custom AI accelerator, involving deep learning operations and performance optimization. The...Senior
- A global semiconductor company in San Jose seeks a Senior Systems Design Engineer to develop and optimize ML operator kernels for their NPU platform. The candidate will work on end-to-end model performance and collaborate closely with silicon teams to ensure innovation...SeniorFull time
$143.2k - $186k
...smart electric sedan. About the Position We are seeking a senior OS / kernel engineer to join our SkyOS team. The team is responsible for the design and development of NIO's full-domain vehicle operating systems. The position will explore new ideas and designs that...SeniorFull timeTemporary workFlexible hours- NVIDIA Gruppe is looking for a senior engineer to join their Math Libraries team in Santa Clara, California. This role involves designing... ...numerical linear algebra software on GPUs, with a strong focus on kernel generation. The ideal candidate has over 8 years of...Senior
- NVIDIA Gruppe is seeking a Senior Formal Verification Engineer for GPU Kernels, focused on creating verification tools that ensure correct behavior in various environments. This role involves designing verification tools, integrating AI into workflows, and participating...Senior
$143.2k - $186k
1600 NIO USA, Inc. is seeking a Senior OS / Kernel Engineer for the SkyOS team to design and develop full-domain vehicle operating systems. Candidates should have a strong background in operating system internals and proficiency in languages like C or Rust. The position...Senior- NVIDIA Gruppe is seeking a Senior Software Engineer to work on system software for datacenter products in Santa Clara, California. This role involves... ...have over 10 years of experience, a strong grasp on Linux kernel internals, and expertise in data center architectures....Senior
$184k - $287.5k
NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact...Senior$184k - $287.5k
Overview We are looking for a Senior Formal Verification Engineer for GPU Kernels. NVIDIA's Deep Learning Safety Team is hiring engineers to build verification tools that prove GPU kernels behave correctly, enabling their deployment in a wide range of environments, including...SeniorWork experience placement$272k - $431.25k
NVIDIA Gruppe is looking for a seasoned software professional to work on the CUDA Driver, an essential part of our platform for accelerating general purpose computation on the GPU. This role involves delivering features to enhance NVIDIA hardware for various computational...Senior- A pioneering AI semiconductor company based in San Jose, CA is seeking a Senior Field Application Engineer. You will engage with customers to provide technical support, lead product demonstrations, and work closely with R&D teams. The ideal candidate will have a BS/MS in...Senior
$205k - $255k
...Senior Principal Technologist – Memory San Jose, California, United States Astera... ...Qualifications ~ BS in Electrical or computer engineering, MS or PhD preferred. ~≥10 year's... ...in product integration with BIOS, kernel, OS, tooling, and BMCs Experience with...SeniorFlexible hours$165k - $241.4k
...The Cisco Distributed System Engineering (DSE) group owns the development... ...One ASIC in the area of NPU management and health monitoring... ...automated test suites for kernel module validation, including... ...Pytest. Experience with Linux Operating Sytems and debugging tools....SeniorFull timeTemporary workLocal areaFlexible hours$184k - $287.5k
...infrastructure environments, to automate operational monitoring and alerting, and to enable... ...degree in Computer Science, Electrical Engineering or related field or equivalent... ...Vast, Lustre, GPFS) and Linux storage kernel development. GPU & AI Infrastructure...Senior$152k - $241.5k
...known as “the AI computing company”. We are hiring software engineers for the CUDA Tile team. NVIDIA GPUs are at the center of the deep... ...lowering passes, and optimize the performance of tile-based kernels to ensure they execute efficiently across multiple generations...Senior- NVIDIA Gruppe is seeking an experienced Compiler Engineer in Santa Clara to design and optimize compiler passes and infrastructure for GPU kernels. You'll work with a dynamic team and be involved in architecture decisions while collaborating across various teams. The ideal...Senior
$184k - $287.5k
...We are looking for software engineers to join our development efforts in the area of dense linear algebra kernels for high-performance libraries such as cuSOLVER. Around the world, leading commercial and academic organizations are revolutionizing AI, data analytics, and...SeniorRemote work$184k - $287.5k
...NVIDIA Math Libraries team is looking for a senior engineer to join our development efforts in the area of kernel generation for AI and HPC, specifically targeting matrix operations, JITing and fusions. Around the world, leading commercial and academic organizations are...SeniorRemote work$184k - $287.5k
...Senior System CI/CD Engineer Your responsibility will be crucial in maintaining and improving our outstanding software development infrastructure... ...systems. ~ Proven experience with system software and kernel development, including debugging and optimization. ~...Senior$190k - $235k
...Senior Perception Learning Engineer Sunnyvale, CA Apptronik is a human-centered robotics company developing... ...healthcare, the home, and beyond. We operate at the cutting edge of embodied AI,... ...models on edge hardware (GPU/NPU/embedded platforms) under compute, latency...SeniorLocal area$152k - $241.5k
...company”. We are looking for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring software engineers for its Deep Learning & AI... ...models, algorithms and frameworks, such as PyTorch, JAX. GPU kernel authoring and performance analysis using tools such as Nsight...Senior$184k - $287.5k
...perceive and understand the world.We are seeking an exceptional Senior Perception Engineer to help design and productize NVIDIA’s next-generation... ...training or inference pipelines through custom CUDA kernels or other GPU-accelerated components. Your base salary will...Senior$152k - $241.5k
We are hiring software engineers for the CUDA Tile team, a new tile‑based programming model for NVIDIA GPUs. What you’ll be doing: Work... ...and lowering passes, and optimize the performance of tile‑based kernels to ensure efficient execution across multiple generations of NVIDIA...Senior- Apple Inc. is seeking a Senior Software Engineer in Cupertino, California to help shape the future of Apple Watch. Your main responsibility will... ..., you will work on system software stack challenges, from kernel to application layers, diagnose power issues, and implement...Senior
$184k - $287.5k
...software development. The role of a Deep Learning Systems Engineer would be to analyze the performance and power... ...at least one of the following: ~ System Software: Operating Systems (Linux), Compilers, GPU kernels (CUDA), DL Frameworks (PyTorch, TensorFlow). ~ Silicon...Senior$184k - $356.5k
NVIDIA Gruppe in Santa Clara, California, is hiring software engineers to enhance the CUDA driver, integral for GPU computation. Responsibilities... ...chips, coordinating with teams, and maintaining performance in kernel and userspace code. The ideal candidate will possess a BS or MS...Senior$184k - $287.5k
...We’re currently seeking a Senior Developer Technology Engineer! NVIDIA's Developer Technology Engineering team is a global network of world-class... ...ratio, such as processing directly on compressed data and kernel fusion. You optimized end-to-end performance of applications...SeniorWork experience placement$184k - $287.5k
...computing company”. We are looking for a Senior Performance Compiler Engineer to join our team and work on the... ...using MLIR to optimize high-level kernel descriptions (written in Triton\'s Python... ...products and ensure we are always operating at maximum efficiency....Senior$152k - $241.5k
NVIDIA is hiring an AI & Deep Learning Compiler Engineer for its Deep Learning & AI Compiler (DLC) team. What you’ll be doing Analyzing... ...models, algorithms and frameworks, such as PyTorch and JAX. GPU kernel authoring and performance analysis using tools such as Nsight...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior NPU Kernel Operator Engineer. Be the first to apply!
- mixing operator San Jose, CA
- list operator San Jose, CA
- pool operator San Jose, CA
- scale operator San Jose, CA
- female phone operator San Jose, CA
- semiconductor operator San Jose, CA
- vehicle operator San Jose, CA
- automation operator San Jose, CA
- machine set up operator San Jose, CA
- hotel operator San Jose, CA

