CUDA Kernel Engineer [Remote]

PRAGMATIKE

Remote job

Location: Remote US
Start date: ASAP
Languages: English (required)

About the Role

Pragmatike is hiring on behalf of a fast-growing AI startup recognized as a Top 10 GenAI company by GTM Capital , founded by MIT CSAIL researchers.

We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch . You will work on the GPU performance layer powering large-scale, high-throughput AI systems used by Fortune 500 customers.

This role is ideal for someone who deeply understands NVIDIA GPU architecture, memory hierarchy, warp-level execution, and profiling workflows not someone coming from generic hardware, FPGA, or non-NVIDIA compute backgrounds. You will directly influence the GPU efficiency, throughput, and scalability of mission-critical AI systems.

What Youll Do

Design, implement, and optimize custom CUDA kernels for NVIDIA GPUs , with a focus on maximizing occupancy, memory throughput, and warp efficiency.
Profile GPU workloads using tools such as N sight Compute, Nsight Systems, nvprof, and CUDA‐MEMCHECK .
Analyze and eliminate performance bottlenecks including warp divergence, uncoalesced memory access, register pressure, and PCIe transfer overhead.
Improve GPU memory pipelines (global, shared, L2, texture memory) and ensure proper memory coalescing.
Collaborate closely with AI systems, model acceleration, and backend distributed systems teams.
Contribute to GPU architecture decisions, kernel libraries, and internal performance-engineering best practices.

What Were Looking For

Proven track record building NVIDIA CUDA kernels from scratchnot just calling existing libraries.
Strong ability to optimize kernels (tiling strategies, occupancy tuning, shared memory design, warp scheduling).
Deep understanding of CUDA threads, warps, blocks, and grids, GPU memory hierarchy and memory coalescing, as well as warp divergence (how to detect, analyze, and mitigate it)
Experience diagnosing PCIe bottlenecks and optimizing host-device transfers (pinned memory, streams, batching, overlap).
Familiarity with C++, CUDA runtime APIs, and GPU debugging/profiling tooling.

Bonus Points

Experience with multi-GPU or distributed GPU systems (NCCL, NVLink, MIG).
Background in GPU acceleration for ML frameworks or HPC workloads.
Knowledge of model inference optimization (TensorRT, CUDA Graphs, CUTLASS).
Exposure to compiler-level optimization or PTX/SASS analysis.
Startup experience or comfort working in fast-moving, ambiguous environments.

Why This Role Will Pivot Your Career

Research pedigree: MIT CSAIL founders recognized for breakthrough AI and systems contributions.
Customer impact: Deploy AI solutions powering Fortune 500 clients .
Industry momentum: Lab alumni have led high-value acquisitions (MosaicML Databricks, Run:AI Nvidia, W&B CoreWeave).
Funding & growth: Oversubscribed seed round, next funding in 2026.
Career growth & influence: Lead AI initiatives, optimize pipelines, and directly impact production AI systems at scale .
Culture & autonomy: Own critical systems while collaborating with world-class engineers.
Aspirational impact: Solve GPU/AI performance challenges few engineers ever face.

Benefits

Competitive salary & equity options
Sign-on bonus
Health, Dental, and Vision
401k

Pragmatike is an Equal Opportunity Employer and is committed to providing equal employment opportunities to all applicants without discrimination. We recruit on behalf of our clients and prohibit discrimination and harassment based on race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.We are committed to a fair and inclusive hiring process. We process your personal data solely for recruitment purposes, in accordance with applicable privacy laws, and maintain reasonable safeguards to protect your information. Your data may be shared with our client(s) for hiring consideration, but will not be disclosed to third parties outside of the recruitment process.

Vacancy posted a month ago

Similar jobs that could be interesting for youBased on the CUDA Kernel Engineer [Remote] in San Francisco, CA vacancy

Remote CUDA Kernel Engineer
Mercor is seeking a CUDA Engineering Expert to analyze and optimize GPU kernels for performance in a remote role. The ideal candidate should be fluent in core C++ features through C++17, with working knowledge of Python and Git, and experience in GPU programming models...
Suggested
Remote job
Mercor
San Francisco, CA
1 day ago
Senior CUDA Kernel Engineer - GPU Performance Lead
Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience...
Suggested
Remote work
Relocation package
Pragmatike
San Francisco, CA
3 days ago
Senior GPU Kernel Engineer - Accelerate AI Training Systems
...MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for... .... The ideal candidate will have a strong background in CUDA or similar, with proven experience in kernel optimizations. Join...
Suggested
MakerMaker
San Francisco, CA
4 days ago
Kernel Engineer- GPU
...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE We’re seeking a GPU Kernel Engineer to join our team at the cutting edge... ...-experts routing Write and optimize code using CUDA, PTX assembly, and architecture‑specific techniques...
Suggested
Flexible hours
Baseten
San Francisco, CA
4 days ago
GPU Kernel Engineer for AI Inference & Performance
...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
Suggested
FriendliAI
San Francisco, CA
5 days ago
GPU Kernel Engineer
...Sciforium Gpu Kernel Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary... ...Design, implement, and optimize custom GPU kernels using C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas. Profile and optimize end-...
Flexible hours
Sciforium
San Francisco, CA
2 days ago
Senior Engineer 2: GPU Kernel and Performance
$167.2k - $209k
...world. DigitalOcean is seeking a Senior Engineer 2 to play a key technical role in our AI... ...optimizations at the inference engine and GPU kernel layers, ensuring our infrastructure... ...(NVIDIA/AMD) and their software stacks (CUDA, ROCm, TensorRT, OpenAI Triton), advising...
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
1 day ago
KERNEL ENGINEER
...THE ROLE You’ll write and optimize the GPU kernels and supporting systems software that... ...our models actually use. We hire kernel engineers because the gap between "this works" and... ...YOU'LL DO Write and optimize GPU kernels (CUDA, ROCm, Triton, or similar) for training and...
Shift work
MakerMaker.AI
San Francisco, CA
4 days ago
GPU Kernel Engineer
$100k - $120k
...training and inference workloads grow, we need kernel‑level innovations to reduce latency,... ...Lead a team of kernel and system engineers focused on performance-critical code Design... ...compute kernels for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware accelerators Find bottlenecks...
Coda Robotics
San Francisco, CA
4 days ago
Senior GPU Kernel Engineer for Autonomous Driving
$128.7k - $261.3k
...San Francisco is seeking an experienced developer for their AI Kernels & Compilers team to innovate in autonomous driving technology. The... .... The ideal candidate will have a strong background in CUDA programming and C++, with a minimum of 3 years of industry experience...
Israelvcforum
San Francisco, CA
5 days ago
GPU Kernel Engineer: Build Fast AI Inference at Scale
...leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will... ...computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of GPU architecture...
Baseten
San Francisco, CA
5 days ago
Kernel Engineer for High-Performance AI Systems
Acceler8 Talent is looking for a Kernel Engineer in San Francisco, California. The role involves designing and optimizing high-performance kernels... ...GPUs and familiarity with kernel optimization frameworks like CUDA. Compensation includes competitive salary, equity, health...
Flexible hours
Acceler8 Talent
San Francisco, CA
5 days ago
Founding GPU Kernel Engineer
$285k - $315k
About The Role We're looking for a Founding GPU Kernel Engineer who lives right at the boundary between hardware and software. Someone who thinks... .../SASS or GPU assembly Solid systems programming in C++ and CUDA (or ROCm/HIP) Good understanding of how high-level ML...
Full time
Work at office
Relocation package
SF Tensor
San Francisco, CA
5 days ago
Pioneering GPU Kernel Engineer for ML Performance
$285k - $315k
SF Tensor is looking for a Founding GPU Kernel Engineer in San Francisco, specializing in GPU architecture and kernel optimization for machine... ...-critical kernels, and strong programming skills in C++ and CUDA. This full-time position offers a competitive salary of $285,0...
Full time
Relocation package
SF Tensor
San Francisco, CA
5 days ago
Kernel Performance Engineer - AI Tooling & Systems
A leading AI research company in San Francisco is seeking a Systems Engineer focused on kernel optimization and AI-assisted workflows. You'll develop tooling to improve performance, debug infrastructures, and collaborate across teams to enhance production kernels. The...
OpenAI
San Francisco, CA
4 days ago
TPU Kernel Engineer — Lead Low-Latency ML Kernels (Hybrid)
$280k
Anthropic is looking for a TPU Kernel Engineer in San Francisco, California. In this role, you will identify and resolve performance issues across ML systems, particularly in research, training, and inference. You will design and optimize TPU kernels and provide critical...
Anthropic
San Francisco, CA
3 days ago
Remote Low-Level Engineer: Kernel & Inference Optimization
$90 - $125 per hour
A cutting-edge AI company is looking for Low-Level Engineers to design RL environments that optimize kernel development and systems programming. Candidates should have strong Python skills and a solid understanding of LLMs. This remote contractor role offers an hourly...
Remote job
Hourly pay
For contractors
Open Data Science
San Francisco, CA
2 days ago
Kernel AI Engineer (CUDA/GPU) — Flexible Hours
...critical computing systems. You'll work with AI agents to improve performance and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer competitive compensation ranging from $150K to $250K, full health coverage,...
Flexible hours
Asari AI
San Francisco, CA
4 days ago
TPU Kernel Engineer
$280k
...whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role As a TPU Kernel Engineer, you'll be responsible for identifying and addressing...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
4 days ago
Staff Engineer: GPU Kernels & AI Performance
...a Member of Technical Staff focused on kernels and GPU performance. This role involves... .... Ideal candidates have strong software engineering foundations and experience with performance... ...systems. Familiarity with tools like CUDA and performance profiling is preferred....
Gimlet Labs
San Francisco, CA
3 days ago
Senior GenAI Research Engineer - Optimization and Kernels
$166k - $225k
...available to all. Job Description As a research engineer on the Scaling team, you will be... ...advanced optimization techniques including kernel fusion, mixed precision, memory layout optimization... ...hands‑on experience writing and tuning CUDA kernels for ML training applications, or...
Worldwide
Cacheflow
San Francisco, CA
2 days ago
Systems Research Engineer, GPU Programming
$160k - $230k
...Systems Research Engineer, GPU Programming San Francisco About the Role As a Systems... ...and optimizing GPU-accelerated kernels and algorithms for ML/AI applications. Working... ...programming and parallel computing, such as CUDA and/or Triton. Knowledge of ML/AI applications...
Full time
Remote work
Together AI
San Francisco, CA
4 days ago
Software Engineer - GPU Kernel
...About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our large... ...., GEMM, attention, routing) Develop and maintain GPU code in CUDA and C++, including low-level assembly when needed Implement...
Flexible hours
FriendliAI
San Francisco, CA
5 days ago
GPU Kernel Engineer — Fast ML Training
MakerMaker.AI in San Francisco is seeking a skilled Software Engineer to write and optimize GPU kernels. You will work on deep low-level tasks that directly impact the performance of machine learning models. The ideal candidate has over 4 years of experience with GPU kernels...
MakerMaker.AI
San Francisco, CA
2 days ago
Edge Inference Engineer: Optimize On-Device AI Kernels
...seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming experience with...
Flexible hours
Liquid AI
San Francisco, CA
2 days ago
Kernel Engineer for High-Performance AI Kernels
$225k
Magic is hiring a Kernel Engineer in San Francisco to design and maintain high-performance kernels that optimize throughput and latency during AI training and inference. The ideal candidate has low-level programming expertise, particularly for AI accelerators like NVIDIA...
Magic
San Francisco, CA
5 days ago
TPU Kernel Engineer
$280k
About The Role As a TPU Kernel Engineer, you'll be responsible for identifying and addressing performance issues across many different ML systems, including research, training, and inference. A significant portion of this work will involve designing and optimizing kernels...
Visa sponsorship
Anthropic
San Francisco, CA
3 days ago
Research Engineer - AI Performance & Kernel Optimization
...based in San Francisco, California. The Role: As a Research Engineer - AI Performance & Kernel Optimization , you will improve and optimize the... ...workloads, using any level of the stack from PTX/assembly to CUDA, HIP, Triton, or other GPU DSLs Performance tuning for training...
Work at office
Relocation package
Zyphra
San Francisco, CA
1 day ago
Distributed Systems Engineer, Data & Inference Platform
...you honest about both. Researchers and ML engineers will hand you workloads that barely run; you... ...knowledge of the GPU/accelerator stack: CUDA fundamentals, NCCL, mixed precision, memory layout. You don't need to write kernels, but you should know why a workload is bound...
Flexible hours
Adaption
San Francisco, CA
20 days ago
Sr. Systems Performance Engineer
$500 per month
...shoot down drones. We're a small team of engineers, former US military operators, and... ...drone that gets closer. Every inefficient kernel is a target that gets away. Your job is to... ...and I/O boundaries Develop and optimize CUDA kernels for high-throughput, low-latency...
Permanent employment
Work at office
Monday to Friday
Flexible hours
Night shift
Weekend work
Aurelius Systems
San Francisco, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to CUDA Kernel Engineer [Remote]. Be the first to apply!