CUDA Kernel Engineer [Remote]
PRAGMATIKE
- Remote job
Location: Remote US
Start date: ASAP
Languages: English (required)
About the Role
Pragmatike is hiring on behalf of a fast-growing AI startup recognized as a Top 10 GenAI company by GTM Capital , founded by MIT CSAIL researchers.
We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch . You will work on the GPU performance layer powering large-scale, high-throughput AI systems used by Fortune 500 customers.
This role is ideal for someone who deeply understands NVIDIA GPU architecture, memory hierarchy, warp-level execution, and profiling workflows not someone coming from generic hardware, FPGA, or non-NVIDIA compute backgrounds. You will directly influence the GPU efficiency, throughput, and scalability of mission-critical AI systems.
What Youll Do
- Design, implement, and optimize custom CUDA kernels for NVIDIA GPUs , with a focus on maximizing occupancy, memory throughput, and warp efficiency.
- Profile GPU workloads using tools such as N sight Compute, Nsight Systems, nvprof, and CUDA‐MEMCHECK .
- Analyze and eliminate performance bottlenecks including warp divergence, uncoalesced memory access, register pressure, and PCIe transfer overhead.
- Improve GPU memory pipelines (global, shared, L2, texture memory) and ensure proper memory coalescing.
- Collaborate closely with AI systems, model acceleration, and backend distributed systems teams.
- Contribute to GPU architecture decisions, kernel libraries, and internal performance-engineering best practices.
What Were Looking For
- Proven track record building NVIDIA CUDA kernels from scratchnot just calling existing libraries.
- Strong ability to optimize kernels (tiling strategies, occupancy tuning, shared memory design, warp scheduling).
- Deep understanding of CUDA threads, warps, blocks, and grids, GPU memory hierarchy and memory coalescing, as well as warp divergence (how to detect, analyze, and mitigate it)
- Experience diagnosing PCIe bottlenecks and optimizing host-device transfers (pinned memory, streams, batching, overlap).
- Familiarity with C++, CUDA runtime APIs, and GPU debugging/profiling tooling.
Bonus Points
- Experience with multi-GPU or distributed GPU systems (NCCL, NVLink, MIG).
- Background in GPU acceleration for ML frameworks or HPC workloads.
- Knowledge of model inference optimization (TensorRT, CUDA Graphs, CUTLASS).
- Exposure to compiler-level optimization or PTX/SASS analysis.
- Startup experience or comfort working in fast-moving, ambiguous environments.
Why This Role Will Pivot Your Career
- Research pedigree: MIT CSAIL founders recognized for breakthrough AI and systems contributions.
- Customer impact: Deploy AI solutions powering Fortune 500 clients .
- Industry momentum: Lab alumni have led high-value acquisitions (MosaicML Databricks, Run:AI Nvidia, W&B CoreWeave).
- Funding & growth: Oversubscribed seed round, next funding in 2026.
- Career growth & influence: Lead AI initiatives, optimize pipelines, and directly impact production AI systems at scale .
- Culture & autonomy: Own critical systems while collaborating with world-class engineers.
- Aspirational impact: Solve GPU/AI performance challenges few engineers ever face.
Benefits
- Competitive salary & equity options
- Sign-on bonus
- Health, Dental, and Vision
- 401k
Pragmatike is an Equal Opportunity Employer and is committed to providing equal employment opportunities to all applicants without discrimination. We recruit on behalf of our clients and prohibit discrimination and harassment based on race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.We are committed to a fair and inclusive hiring process. We process your personal data solely for recruitment purposes, in accordance with applicable privacy laws, and maintain reasonable safeguards to protect your information. Your data may be shared with our client(s) for hiring consideration, but will not be disclosed to third parties outside of the recruitment process.
- Mercor is seeking a CUDA Engineering Expert to analyze and optimize GPU kernels for performance in a remote role. The ideal candidate should be fluent in core C++ features through C++17, with working knowledge of Python and Git, and experience in GPU programming models...SuggestedRemote job
- Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience...SuggestedRemote workRelocation package
- ...MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for... .... The ideal candidate will have a strong background in CUDA or similar, with proven experience in kernel optimizations. Join...Suggested
- ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE We’re seeking a GPU Kernel Engineer to join our team at the cutting edge... ...-experts routing Write and optimize code using CUDA, PTX assembly, and architecture‑specific techniques...SuggestedFlexible hours
- ...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...Suggested
- ...Sciforium Gpu Kernel Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary... ...Design, implement, and optimize custom GPU kernels using C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas. Profile and optimize end-...Flexible hours
$167.2k - $209k
...world. DigitalOcean is seeking a Senior Engineer 2 to play a key technical role in our AI... ...optimizations at the inference engine and GPU kernel layers, ensuring our infrastructure... ...(NVIDIA/AMD) and their software stacks (CUDA, ROCm, TensorRT, OpenAI Triton), advising...Local areaRemote workWorldwideFlexible hours- ...THE ROLE You’ll write and optimize the GPU kernels and supporting systems software that... ...our models actually use. We hire kernel engineers because the gap between "this works" and... ...YOU'LL DO Write and optimize GPU kernels (CUDA, ROCm, Triton, or similar) for training and...Shift work
$100k - $120k
...training and inference workloads grow, we need kernel‑level innovations to reduce latency,... ...Lead a team of kernel and system engineers focused on performance-critical code Design... ...compute kernels for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware accelerators Find bottlenecks...$128.7k - $261.3k
...San Francisco is seeking an experienced developer for their AI Kernels & Compilers team to innovate in autonomous driving technology. The... .... The ideal candidate will have a strong background in CUDA programming and C++, with a minimum of 3 years of industry experience...- ...leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will... ...computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of GPU architecture...
- Acceler8 Talent is looking for a Kernel Engineer in San Francisco, California. The role involves designing and optimizing high-performance kernels... ...GPUs and familiarity with kernel optimization frameworks like CUDA. Compensation includes competitive salary, equity, health...Flexible hours
$285k - $315k
About The Role We're looking for a Founding GPU Kernel Engineer who lives right at the boundary between hardware and software. Someone who thinks... .../SASS or GPU assembly Solid systems programming in C++ and CUDA (or ROCm/HIP) Good understanding of how high-level ML...Full timeWork at officeRelocation package$285k - $315k
SF Tensor is looking for a Founding GPU Kernel Engineer in San Francisco, specializing in GPU architecture and kernel optimization for machine... ...-critical kernels, and strong programming skills in C++ and CUDA. This full-time position offers a competitive salary of $285,0...Full timeRelocation package- A leading AI research company in San Francisco is seeking a Systems Engineer focused on kernel optimization and AI-assisted workflows. You'll develop tooling to improve performance, debug infrastructures, and collaborate across teams to enhance production kernels. The...
$280k
Anthropic is looking for a TPU Kernel Engineer in San Francisco, California. In this role, you will identify and resolve performance issues across ML systems, particularly in research, training, and inference. You will design and optimize TPU kernels and provide critical...$90 - $125 per hour
A cutting-edge AI company is looking for Low-Level Engineers to design RL environments that optimize kernel development and systems programming. Candidates should have strong Python skills and a solid understanding of LLMs. This remote contractor role offers an hourly...Remote jobHourly payFor contractors- ...critical computing systems. You'll work with AI agents to improve performance and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer competitive compensation ranging from $150K to $250K, full health coverage,...Flexible hours
$280k
...whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role As a TPU Kernel Engineer, you'll be responsible for identifying and addressing...Work at officeVisa sponsorshipFlexible hours- ...a Member of Technical Staff focused on kernels and GPU performance. This role involves... .... Ideal candidates have strong software engineering foundations and experience with performance... ...systems. Familiarity with tools like CUDA and performance profiling is preferred....
$166k - $225k
...available to all. Job Description As a research engineer on the Scaling team, you will be... ...advanced optimization techniques including kernel fusion, mixed precision, memory layout optimization... ...hands‑on experience writing and tuning CUDA kernels for ML training applications, or...Worldwide$160k - $230k
...Systems Research Engineer, GPU Programming San Francisco About the Role As a Systems... ...and optimizing GPU-accelerated kernels and algorithms for ML/AI applications. Working... ...programming and parallel computing, such as CUDA and/or Triton. Knowledge of ML/AI applications...Full timeRemote work- ...About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our large... ...., GEMM, attention, routing) Develop and maintain GPU code in CUDA and C++, including low-level assembly when needed Implement...Flexible hours
- MakerMaker.AI in San Francisco is seeking a skilled Software Engineer to write and optimize GPU kernels. You will work on deep low-level tasks that directly impact the performance of machine learning models. The ideal candidate has over 4 years of experience with GPU kernels...
- ...seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming experience with...Flexible hours
$225k
Magic is hiring a Kernel Engineer in San Francisco to design and maintain high-performance kernels that optimize throughput and latency during AI training and inference. The ideal candidate has low-level programming expertise, particularly for AI accelerators like NVIDIA...$280k
About The Role As a TPU Kernel Engineer, you'll be responsible for identifying and addressing performance issues across many different ML systems, including research, training, and inference. A significant portion of this work will involve designing and optimizing kernels...Visa sponsorship- ...based in San Francisco, California. The Role: As a Research Engineer - AI Performance & Kernel Optimization , you will improve and optimize the... ...workloads, using any level of the stack from PTX/assembly to CUDA, HIP, Triton, or other GPU DSLs Performance tuning for training...Work at officeRelocation package
- ...you honest about both. Researchers and ML engineers will hand you workloads that barely run; you... ...knowledge of the GPU/accelerator stack: CUDA fundamentals, NCCL, mixed precision, memory layout. You don't need to write kernels, but you should know why a workload is bound...Flexible hours
$500 per month
...shoot down drones. We're a small team of engineers, former US military operators, and... ...drone that gets closer. Every inefficient kernel is a target that gets away. Your job is to... ...and I/O boundaries Develop and optimize CUDA kernels for high-throughput, low-latency...Permanent employmentWork at officeMonday to FridayFlexible hoursNight shiftWeekend work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to CUDA Kernel Engineer [Remote]. Be the first to apply!
- remote quality assurance San Francisco, CA
- remote accounts payable San Francisco, CA
- remote gis San Francisco, CA
- entry level remote San Francisco, CA
- remote medical billing part time San Francisco, CA
- sales engineer remote San Francisco, CA
- remote dba San Francisco, CA
- remote isolated San Francisco, CA
- remote hr coordinator San Francisco, CA
- remote program manager San Francisco, CA

