Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

CUDA Kernel Engineer [Remote]

PRAGMATIKE

Location: Remote US
Start date: ASAP
Languages: English (required)

About the Role

Pragmatike is hiring on behalf of a fast-growing AI startup recognized as a Top 10 GenAI company by GTM Capital , founded by MIT CSAIL researchers.

We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch . You will work on the GPU performance layer powering large-scale, high-throughput AI systems used by Fortune 500 customers.

This role is ideal for someone who deeply understands NVIDIA GPU architecture, memory hierarchy, warp-level execution, and profiling workflows not someone coming from generic hardware, FPGA, or non-NVIDIA compute backgrounds. You will directly influence the GPU efficiency, throughput, and scalability of mission-critical AI systems.

What Youll Do

  • Design, implement, and optimize custom CUDA kernels for NVIDIA GPUs , with a focus on maximizing occupancy, memory throughput, and warp efficiency.
  • Profile GPU workloads using tools such as N sight Compute, Nsight Systems, nvprof, and CUDA‐MEMCHECK .
  • Analyze and eliminate performance bottlenecks including warp divergence, uncoalesced memory access, register pressure, and PCIe transfer overhead.
  • Improve GPU memory pipelines (global, shared, L2, texture memory) and ensure proper memory coalescing.
  • Collaborate closely with AI systems, model acceleration, and backend distributed systems teams.
  • Contribute to GPU architecture decisions, kernel libraries, and internal performance-engineering best practices.

What Were Looking For

  • Proven track record building NVIDIA CUDA kernels from scratchnot just calling existing libraries.
  • Strong ability to optimize kernels (tiling strategies, occupancy tuning, shared memory design, warp scheduling).
  • Deep understanding of CUDA threads, warps, blocks, and grids, GPU memory hierarchy and memory coalescing, as well as warp divergence (how to detect, analyze, and mitigate it)
  • Experience diagnosing PCIe bottlenecks and optimizing host-device transfers (pinned memory, streams, batching, overlap).
  • Familiarity with C++, CUDA runtime APIs, and GPU debugging/profiling tooling.

Bonus Points

  • Experience with multi-GPU or distributed GPU systems (NCCL, NVLink, MIG).
  • Background in GPU acceleration for ML frameworks or HPC workloads.
  • Knowledge of model inference optimization (TensorRT, CUDA Graphs, CUTLASS).
  • Exposure to compiler-level optimization or PTX/SASS analysis.
  • Startup experience or comfort working in fast-moving, ambiguous environments.

Why This Role Will Pivot Your Career

  • Research pedigree: MIT CSAIL founders recognized for breakthrough AI and systems contributions.
  • Customer impact: Deploy AI solutions powering Fortune 500 clients .
  • Industry momentum: Lab alumni have led high-value acquisitions (MosaicML Databricks, Run:AI Nvidia, W&B CoreWeave).
  • Funding & growth: Oversubscribed seed round, next funding in 2026.
  • Career growth & influence: Lead AI initiatives, optimize pipelines, and directly impact production AI systems at scale .
  • Culture & autonomy: Own critical systems while collaborating with world-class engineers.
  • Aspirational impact: Solve GPU/AI performance challenges few engineers ever face.

Benefits

  • Competitive salary & equity options
  • Sign-on bonus
  • Health, Dental, and Vision
  • 401k

Pragmatike is an Equal Opportunity Employer and is committed to providing equal employment opportunities to all applicants without discrimination. We recruit on behalf of our clients and prohibit discrimination and harassment based on race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.We are committed to a fair and inclusive hiring process. We process your personal data solely for recruitment purposes, in accordance with applicable privacy laws, and maintain reasonable safeguards to protect your information. Your data may be shared with our client(s) for hiring consideration, but will not be disclosed to third parties outside of the recruitment process.

Vacancy posted a month ago
Similar jobs that could be interesting for youBased on the CUDA Kernel Engineer [Remote] in San Francisco, CA vacancy
  • Mercor is seeking a CUDA Engineering Expert to analyze and optimize GPU kernels for performance in a remote role. The ideal candidate should be fluent in core C++ features through C++17, with working knowledge of Python and Git, and experience in GPU programming models... 
    Suggested
    Remote job

    Mercor

    San Francisco, CA
    1 day ago
  • Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience... 
    Suggested
    Remote work
    Relocation package

    Pragmatike

    San Francisco, CA
    3 days ago
  •  ...MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for...  .... The ideal candidate will have a strong background in CUDA or similar, with proven experience in kernel optimizations. Join... 
    Suggested

    MakerMaker

    San Francisco, CA
    4 days ago
  •  ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE We’re seeking a GPU Kernel Engineer to join our team at the cutting edge...  ...-experts routing Write and optimize code using CUDA, PTX assembly, and architecture‑specific techniques... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    4 days ago
  •  ...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative... 
    Suggested

    FriendliAI

    San Francisco, CA
    5 days ago
  •  ...Sciforium Gpu Kernel Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary...  ...Design, implement, and optimize custom GPU kernels using C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas. Profile and optimize end-... 
    Flexible hours

    Sciforium

    San Francisco, CA
    2 days ago
  • $167.2k - $209k

     ...world. DigitalOcean is seeking a Senior Engineer 2 to play a key technical role in our AI...  ...optimizations at the inference engine and GPU kernel layers, ensuring our infrastructure...  ...(NVIDIA/AMD) and their software stacks (CUDA, ROCm, TensorRT, OpenAI Triton), advising... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    1 day ago
  •  ...THE ROLE You’ll write and optimize the GPU kernels and supporting systems software that...  ...our models actually use. We hire kernel engineers because the gap between "this works" and...  ...YOU'LL DO Write and optimize GPU kernels (CUDA, ROCm, Triton, or similar) for training and... 
    Shift work

    MakerMaker.AI

    San Francisco, CA
    4 days ago
  • $100k - $120k

     ...training and inference workloads grow, we need kernel‑level innovations to reduce latency,...  ...Lead a team of kernel and system engineers focused on performance-critical code Design...  ...compute kernels for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware accelerators Find bottlenecks... 

    Coda Robotics

    San Francisco, CA
    4 days ago
  • $128.7k - $261.3k

     ...San Francisco is seeking an experienced developer for their AI Kernels & Compilers team to innovate in autonomous driving technology. The...  .... The ideal candidate will have a strong background in CUDA programming and C++, with a minimum of 3 years of industry experience... 

    Israelvcforum

    San Francisco, CA
    5 days ago
  •  ...leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will...  ...computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of GPU architecture... 

    Baseten

    San Francisco, CA
    5 days ago
  • Acceler8 Talent is looking for a Kernel Engineer in San Francisco, California. The role involves designing and optimizing high-performance kernels...  ...GPUs and familiarity with kernel optimization frameworks like CUDA. Compensation includes competitive salary, equity, health... 
    Flexible hours

    Acceler8 Talent

    San Francisco, CA
    5 days ago
  • $285k - $315k

    About The Role We're looking for a Founding GPU Kernel Engineer who lives right at the boundary between hardware and software. Someone who thinks...  .../SASS or GPU assembly Solid systems programming in C++ and CUDA (or ROCm/HIP) Good understanding of how high-level ML... 
    Full time
    Work at office
    Relocation package

    SF Tensor

    San Francisco, CA
    5 days ago
  • $285k - $315k

    SF Tensor is looking for a Founding GPU Kernel Engineer in San Francisco, specializing in GPU architecture and kernel optimization for machine...  ...-critical kernels, and strong programming skills in C++ and CUDA. This full-time position offers a competitive salary of $285,0... 
    Full time
    Relocation package

    SF Tensor

    San Francisco, CA
    5 days ago
  • A leading AI research company in San Francisco is seeking a Systems Engineer focused on kernel optimization and AI-assisted workflows. You'll develop tooling to improve performance, debug infrastructures, and collaborate across teams to enhance production kernels. The... 

    OpenAI

    San Francisco, CA
    4 days ago
  • $280k

    Anthropic is looking for a TPU Kernel Engineer in San Francisco, California. In this role, you will identify and resolve performance issues across ML systems, particularly in research, training, and inference. You will design and optimize TPU kernels and provide critical... 

    Anthropic

    San Francisco, CA
    3 days ago
  • $90 - $125 per hour

    A cutting-edge AI company is looking for Low-Level Engineers to design RL environments that optimize kernel development and systems programming. Candidates should have strong Python skills and a solid understanding of LLMs. This remote contractor role offers an hourly... 
    Remote job
    Hourly pay
    For contractors

    Open Data Science

    San Francisco, CA
    2 days ago
  •  ...critical computing systems. You'll work with AI agents to improve performance and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer competitive compensation ranging from $150K to $250K, full health coverage,... 
    Flexible hours

    Asari AI

    San Francisco, CA
    4 days ago
  • $280k

     ...whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role As a TPU Kernel Engineer, you'll be responsible for identifying and addressing... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    4 days ago
  •  ...a Member of Technical Staff focused on kernels and GPU performance. This role involves...  .... Ideal candidates have strong software engineering foundations and experience with performance...  ...systems. Familiarity with tools like CUDA and performance profiling is preferred.... 

    Gimlet Labs

    San Francisco, CA
    3 days ago
  • $166k - $225k

     ...available to all. Job Description As a research engineer on the Scaling team, you will be...  ...advanced optimization techniques including kernel fusion, mixed precision, memory layout optimization...  ...hands‑on experience writing and tuning CUDA kernels for ML training applications, or... 
    Worldwide

    Cacheflow

    San Francisco, CA
    2 days ago
  • $160k - $230k

     ...Systems Research Engineer, GPU Programming San Francisco About the Role As a Systems...  ...and optimizing GPU-accelerated kernels and algorithms for ML/AI applications. Working...  ...programming and parallel computing, such as CUDA and/or Triton. Knowledge of ML/AI applications... 
    Full time
    Remote work

    Together AI

    San Francisco, CA
    4 days ago
  •  ...About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our large...  ...., GEMM, attention, routing) Develop and maintain GPU code in CUDA and C++, including low-level assembly when needed Implement... 
    Flexible hours

    FriendliAI

    San Francisco, CA
    5 days ago
  • MakerMaker.AI in San Francisco is seeking a skilled Software Engineer to write and optimize GPU kernels. You will work on deep low-level tasks that directly impact the performance of machine learning models. The ideal candidate has over 4 years of experience with GPU kernels... 

    MakerMaker.AI

    San Francisco, CA
    2 days ago
  •  ...seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming experience with... 
    Flexible hours

    Liquid AI

    San Francisco, CA
    2 days ago
  • $225k

    Magic is hiring a Kernel Engineer in San Francisco to design and maintain high-performance kernels that optimize throughput and latency during AI training and inference. The ideal candidate has low-level programming expertise, particularly for AI accelerators like NVIDIA... 

    Magic

    San Francisco, CA
    5 days ago
  • $280k

    About The Role As a TPU Kernel Engineer, you'll be responsible for identifying and addressing performance issues across many different ML systems, including research, training, and inference. A significant portion of this work will involve designing and optimizing kernels... 
    Visa sponsorship

    Anthropic

    San Francisco, CA
    3 days ago
  •  ...based in San Francisco, California. The Role: As a Research Engineer - AI Performance & Kernel Optimization , you will improve and optimize the...  ...workloads, using any level of the stack from PTX/assembly to CUDA, HIP, Triton, or other GPU DSLs Performance tuning for training... 
    Work at office
    Relocation package

    Zyphra

    San Francisco, CA
    1 day ago
  •  ...you honest about both. Researchers and ML engineers will hand you workloads that barely run; you...  ...knowledge of the GPU/accelerator stack: CUDA fundamentals, NCCL, mixed precision, memory layout. You don't need to write kernels, but you should know why a workload is bound... 
    Flexible hours

    Adaption

    San Francisco, CA
    20 days ago
  • $500 per month

     ...shoot down drones. We're a small team of engineers, former US military operators, and...  ...drone that gets closer. Every inefficient kernel is a target that gets away. Your job is to...  ...and I/O boundaries Develop and optimize CUDA kernels for high-throughput, low-latency... 
    Permanent employment
    Work at office
    Monday to Friday
    Flexible hours
    Night shift
    Weekend work

    Aurelius Systems

    San Francisco, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to CUDA Kernel Engineer [Remote]. Be the first to apply!