Software Engineer - GPU Kernels

Baseten

ABOUT BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. We're growing quickly and recently raised our $300M Series E backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE We’re seeking a GPU Kernel Engineer to join our team at the cutting edge of AI acceleration, where your code directly impacts the performance of state-of-the-art machine learning models. As a GPU Kernel Engineer, you'll craft the foundation that powers modern AI workloads, optimizing every microsecond of computation to enable breakthrough applications. You'll work in a fast-paced, intellectually stimulating environment where technical excellence is paramount and your contributions directly influence production systems serving millions of users across numerous products. This role offers exceptional growth potential for engineers passionate about low-level optimization and high-impact systems work. EXAMPLE INITIATIVES You'll get to work on these types of projects as part of our Model Performance team: - Baseten Embeddings Inference: The fastest embeddings solution available - The Baseten Inference Stack - Driving model performance optimization RESPONSIBILITIES Core Engineering Responsibilities - Design and implement high-performance GPU kernels for key ML operations, including matrix multiplications, attention mechanisms, and mixture-of-experts routing - Write and optimize code using CUDA, PTX assembly, and architecture-specific techniques - Apply advanced performance optimization methods such as memory coalescing, warp-level programming, tensor core acceleration, and compute/memory overlap Performance & Innovation - Implement cutting-edge features like quantization (FP8/FP4), sparsity, and compute/communication overlap - Identify and resolve performance bottlenecks using tools like Nsight Systems, Nsight Compute, and Torch Profiler - Collaborate with research teams to productionize theoretical advancements Impact & Collaboration - Contribute to internal and open-source GPU libraries - Present technical contributions at industry conferences (e.g., NVIDIA GTC, AWS re:Invent) REQUIREMENTS - Strong understanding of GPU architecture and programming paradigms: - Memory hierarchy (global, shared, registers, L1/L2 cache) - Thread/block/grid organization - Synchronization techniques and race condition mitigation - Proficient in C++ and GPU performance profiling tools - Knowledge of: - CUDA C++ API - Memory access patterns and bandwidth optimization - Numerical precision and quantization strategies - Modern GPU features (e.g., tensor cores, async operations) NICE TO HAVE - Experience with Transformer models and attention optimization (e.g., Flash Attention) - Familiarity with GPU kernel libraries: Cutlass, Triton, Thrust, CUB - Background in GEMM tuning and distributed/multi-GPU compute - Contributions to open-source GPU projects - Research publications or conference presentations on GPU performance BENEFITS - Competitive compensation, including meaningful equity. - 100% coverage of medical, dental, and vision insurance for employee and dependents - Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!) - Paid parental leave - Fertility and family-building stipend through Carrot - Company-facilitated 401(k) - Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities. Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you. At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status. We are an Equal Opportunity Employer and will consider qualified applicants with criminal histories in a manner consistent with applicable law (by example, the requirements of the San Francisco Fair Chance Ordinance, where applicable).

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Software Engineer - GPU Kernels in San Francisco, CA vacancy

Software Engineer - GPU Kernel
About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our large-scale, GPU-accelerated AI inference platform. You will be delivering world-class inference speed across NVIDIA and AMD GPUs. With...
Suggested
Flexible hours
FriendliAI
San Francisco, CA
3 days ago
Lead GPU Kernel Engineer for High-Performance ML
$100k - $120k
Coda Robotics is looking for an experienced engineer to join their founding team, focusing on low-level compute kernels to enhance robotic foundation models. The ideal candidate... ...programming (C/C++, assembly), expertise in GPU optimizations, and familiarity with ML...
Suggested
Coda Robotics
San Francisco, CA
14 hours ago
Senior Real-Time Systems Architect: Low-Latency GPU/Kernel
...searching for a Sr. Systems Performance Software Engineer to own the architecture and performance... ...systems and drive performance across CPU, GPU, and memory boundaries. The ideal... ...experience in robotics software, possesses kernel-level coding skills, and a solid understanding...
Suggested
Aurelius Systems
San Francisco, CA
2 days ago
Systems Software Engineer — GPU Compute & Kernel
The San Francisco Compute Company is looking for a talented software engineer to develop their GPU market platform. The role requires familiarity with Rust, multi-threaded programming, and Linux systems. Responsibilities include provisioning servers and designing APIs....
Suggested
The San Francisco Compute Company
San Francisco, CA
3 days ago
GPU Kernel Engineer
...Sciforium Gpu Kernel Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary... ...generation large-scale AI systems. You will work across the hardware–software stack, from low-level kernel development to integrating...
Suggested
Flexible hours
Sciforium
San Francisco, CA
5 days ago
Founding GPU Kernel Engineer
$285k - $315k
About The Role We're looking for a Founding GPU Kernel Engineer who lives right at the boundary between hardware and software. Someone who thinks in warps, occupancy, and memory hierarchies, and can squeeze every last FLOP out of a GPU. Your job is to go deeper than anyone...
Full time
Work at office
Relocation package
SF Tensor
San Francisco, CA
3 days ago
GPU Kernel Engineer for AI Inference & Performance
FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
FriendliAI
San Francisco, CA
3 days ago
Pioneering GPU Kernel Engineer for ML Performance
$285k - $315k
SF Tensor is looking for a Founding GPU Kernel Engineer in San Francisco, specializing in GPU architecture and kernel optimization for machine learning workloads. The ideal candidate has deep expertise, proven capabilities in hand-optimizing performance-critical kernels...
Full time
Relocation package
SF Tensor
San Francisco, CA
3 days ago
GPU Kernel Engineer: Build Fast AI Inference at Scale
A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal...
Baseten
San Francisco, CA
3 days ago
Senior Engineer 2: GPU Kernel and Performance
$167.2k - $209k
...world. DigitalOcean is seeking a Senior Engineer 2 to play a key technical role in our AI... ...at the inference engine and GPU kernel layers, ensuring our infrastructure extracts... ...modern GPU families (NVIDIA/AMD) and their software stacks (CUDA, ROCm, TensorRT, OpenAI Triton...
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
1 day ago
GPU Kernel Engineer
$100k - $120k
...training and inference workloads grow, we need kernel‑level innovations to reduce latency,... ...Lead a team of kernel and system engineers focused on performance-critical code Design... ...compute kernels for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware accelerators Find...
Coda Robotics
San Francisco, CA
1 day ago
GPU Kernel Engineer — Fast ML Training
MakerMaker.AI in San Francisco is seeking a skilled Software Engineer to write and optimize GPU kernels. You will work on deep low-level tasks that directly impact the performance of machine learning models. The ideal candidate has over 4 years of experience with GPU kernels...
MakerMaker.AI
San Francisco, CA
14 hours ago
Systems Research Engineer, GPU Programming
$160k - $230k
...Systems Research Engineer, GPU Programming San Francisco About the Role As a Systems... ...and optimizing GPU-accelerated kernels and algorithms for ML/AI applications.... ...systems. Collaborating with the hardware and software teams, you will contribute to the co-design...
Full time
Remote work
Together AI
San Francisco, CA
2 days ago
Software Engineer, Kernel Performance & AI Tooling
...generation of AI‑native silicon while working closely with software and research partners to co‑design hardware tightly... ...AI. About the Role We are looking for a systems‑minded engineer to help advance our kernel development, performance engineering, and hardware‑software...
OpenAI
San Francisco, CA
1 day ago
Software Engineer, GPU Infrastructure - HPC
$230k
...deployment over unchecked growth. About the role As a software engineer on the Fleet High Performance Computing (HPC) team, you will... ...tooling (e.g., PCIe, Infiniband, networking, power management, kernel perf tuning) Knowledge of hardware management protocols...
OpenAI
San Francisco, CA
1 day ago
Senior CUDA Kernel Engineer - GPU Performance Lead
Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience...
Remote work
Relocation package
Pragmatike
San Francisco, CA
1 day ago
Staff Software Engineer - GenAI Performance and Kernel
$190.9k - $232.8k
About This Role As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation, optimization, and correctness of the high-performance GPU kernels powering our GenAI inference stack. You will lead development of highly-tuned, low-...
Local area
Worldwide
Cacheflow
San Francisco, CA
3 days ago
Kernel AI Engineer (CUDA/GPU) — Flexible Hours
Asari AI in San Francisco is seeking individuals to optimize high-performance, mission-critical computing systems. You'll work with AI agents to improve performance and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/...
Flexible hours
Asari AI
San Francisco, CA
2 days ago
AI Performance & Kernel Engineer for Frontier-Scale ML
...technology firm located in San Francisco is seeking a Research Engineer specializing in AI Performance & Kernel Optimization. The role involves enhancing the... ...candidates should have a strong engineering background in GPU kernel development and experience with ML workloads....
Zyphra
San Francisco, CA
4 days ago
Remote Security Engineer - Cloud GPU Platform
A leading AI infrastructure company is seeking a remote Security Engineer to ensure the security of its GPU cloud platform. The ideal candidate has over 5 years of experience in cloud security and strong programming skills. Responsibilities include designing secure architectures...
Remote job
Flexible hours
Runpod
San Francisco, CA
14 hours ago
Senior AI Inference Engineer - GPU, Rust & CUDA
$220k
Perplexity is looking for an engineer to join their team in San Francisco. You will work... ...engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime... ...candidate has 3+ years of experience in software engineering with a focus on ML inference...
Perplexity
San Francisco, CA
2 days ago
Software Engineering - deployment services
$115k - $140k
...Software Engineer: Perception Los Angeles, US About Lodestar Lodestar's mission is to develop the first "Protect and Defend... ...neural networks and geometric algorithms using CUDA kernels, TensorRT, or other GPU acceleration frameworks Experience with distributed...
Permanent employment
Full time
Flexible hours
Lodestar
San Francisco, CA
7 days ago
Software Engineer, Workload Enablement
$293k
...responsible for the architectural and engineering backbone of OpenAI's... ...models. Our work spans system software, networking, platform... ...of a system, including CPU, GPU, memory subsystem, frontend,... ...Overlap of compute/communication, kernel-level bottlenecks, memory bandwidth...
OpenAI
San Francisco, CA
3 days ago
Software Engineer - GenAI inference
$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design,... ...touch the full GenAI inference stack - from kernels and runtimes to orchestration and memory... ...etc. Hands-on experience with CUDA, GPU programming, and key libraries (cuBLAS,...
Local area
Worldwide
Databricks
San Francisco, CA
2 days ago
Senior GenAI Research Engineer - Optimization and Kernels
$166k - $225k
...Job Description As a research engineer on the Scaling team, you will... ...optimization techniques including kernel fusion, mixed precision,... ...and optimize high‑performance GPU kernels for training workloads... ...distributed workloads Strong software engineering skills in Python and...
Worldwide
Cacheflow
San Francisco, CA
14 hours ago
Cloud Platforms and Infrastructure Engineer, TPU/GPU
$152k - $222k
Cloud Platforms and Infrastructure Engineer, TPU/GPU Google San Francisco, CA, USA; Sunnyvale, CA, USA Qualifications Bachelor's degree in... ...Java, Go, C or C++) including data structures, algorithms, software design, Linux environments and Kubernetes orchestration. Experience...
Google Inc.
San Francisco, CA
1 day ago
Compute Platform Engineer - GPU & Multi-Cloud Infra
B Capital is seeking a Systems Engineer to join its Compute Platform team in San Francisco. This role involves maintaining a K8s-based platform and solving complex systems challenges, focusing on GPU infrastructures and multi-cloud environments. The ideal candidate has...
B Capital
San Francisco, CA
4 days ago
Software Engineer, Inference
...fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100... ...with smart batching and caching Optimize kernels, tokenization, and model graphs Evaluate... ...plus C++ or CUDA exposure Experience with GPU profiling and model serving Nice to have...
Work at office
Visa sponsorship
Relocation package
Pulse
San Francisco, CA
1 day ago
Onboard AV Software Engineer
...don’t believe culture can be engineered - but when it falls into place... ...Overview We're looking for a software engineer to optimize and... ...—using TensorRT, custom CUDA kernels, and low-level systems engineering... ...throughput on embedded GPU platforms Collaborate with ML...
Local area
Humble Robotics
San Francisco, CA
4 days ago
Lead Kubernetes & GitOps Engineer for GPU Inference
...cutting-edge AI infrastructure startup is seeking a Kubernetes DevOps Engineer to join their innovative team in San Francisco. The role... ...clusters across various environments, focusing on high-performance GPU workloads. Ideal candidates will have deep Kubernetes expertise...
Jack & Jill/External ATS
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer - GPU Kernels. Be the first to apply!