GPU Kernel Engineer: Build Fast AI Inference at Scale
Baseten
A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of GPU architecture. This position offers competitive compensation, including equity, and comprehensive benefits including medical coverage and generous PTO. #J-18808-Ljbffr Baseten
- ...experience with GPU programming and... .../CuteDSL for kernels , You own problems... ...You do well in fast-moving... ...professional software engineering experience with... ...meaningful work on ML inference or high-... ...observability. Build dashboards, alerts... ...#J-18808-Ljbffr Perplexity AISuggested
- FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...Suggested
- ...seeking a talented software engineer to join their dynamic Inference team. This role involves... ...infrastructure for large-scale multimodal models, focusing... ...to push the boundaries of AI technology, ensuring reliable... .... If you thrive in fast-paced environments and enjoy...Suggested
- ...Team OpenAI’s Inference team ensures... ...reliably, and at scale. We build and optimize... ...stack - including kernels, communication... ...We’re hiring engineers to scale and optimize... ...emerging GPU platforms. You’... ...part of a small, fast-moving team building... ...OpenAI is an AI research and...SuggestedFull time
- About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our large-scale, GPU-accelerated AI inference platform. You will be delivering world-class inference speed across NVIDIA and AMD GPUs. With...SuggestedFlexible hours
$160k - $320k
A leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills, HPC experience, and knowledge... .... Responsibilities include designing GPU kernels, optimizing performance, and...- ...powers mission-critical inference for the world's most dynamic AI companies, like Cursor... .... Join us and help build the platform engineers turn to to ship AI products... ...We’re seeking a GPU Kernel Engineer to join our... .... You'll work in a fast-paced, intellectually...Full timeFlexible hours
- ...the Team We’re building high-performance... ...models at massive scale. As part of the inference team, you’ll be... ...GPUs by designing kernels, tuning memory layouts... ...a kernel-focused engineer to lead efforts... ..., and optimizing GPU kernels used in... ...OpenAI is an AI research and deployment...Full time
- ...About Us Most AI is frozen in... ...Our mandate is to build efficient... ...intelligence - the inference services that serve LLMs at scale and the data pipelines... ...and ML engineers will hand you workloads... ...heterogeneous GPU fleets. Batching... ...t need to write kernels, but you should...Flexible hours
$167.2k - $209k
...relentless in their drive to build the simplest scalable... ...are energized by the fast-paced environment of a... ...is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team. DigitalOcean... ...inference engine and GPU kernel layers, ensuring our...Local areaRemote workWorldwideFlexible hours$167.2k - $209k
...in their drive to build the simplest... ...energized by the fast-paced environment... ...is expanding its AI Infrastructure layer... ...seeking a Senior Engineer 2 to join our AI Inference Data Plane team.... ...delivering high-scale, resilient data plane... ...Understanding of GPU-level...Local areaRemote workWorldwideFlexible hours- ...GPU Kernel Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI... ...from AMD engineers the team is scaling rapidly to build the full stack powering frontier... ...used for large-scale training and inference. This role is ideal for someone...Flexible hours
- ...mission‑critical inference for the world's most dynamic AI companies, like Cursor... ...Join us and help build the platform engineers turn to to ship AI... ...‑modal workloads scale, the network is... ...engineers to lead our GPU Networking efforts... .... Optimize Kernels: You will work with...Flexible hours
- ...highly technical Inference Engine Engineer to optimize... ..., and optimizing GPU kernels and supporting infrastructure... ...and agentic AI workloads. Your... ...production-scale machine learning applications... ...FriendliAI is building the world's best... ...multi-modal models fast, efficient, and...WorldwideFlexible hours
$142.2k - $204.6k
...As a software engineer for GenAI inference, you will help... ...serving systems are fast, scalable, and... ...stack - from kernels and runtimes... ...for large-scale LLMs inference... ...accelerators Build and maintain... ...experience with CUDA, GPU programming,... ...the data and AI company. More...Local areaWorldwide$285k - $315k
...looking for a Founding GPU Kernel Engineer who lives right at the... ...different GPU architectures Build tools and methods for... ..., AMD, and emerging AI accelerators - understand... ...experience with large-scale scientific computing,... ...to know why things are fast or slow on the hardware...Full timeWork at officeRelocation package- A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform... ...have a relevant degree, experience in fast-paced environments, and proficiency in...Flexible hours
- MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern...
$220k
Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience...$255k - $405k
...Lambda is the #1 GPU Cloud for ML/AI teams training... ...models, where engineers can easily,... ...and affordably build, test and... ...AI products at scale. Lambda’s product... ...clouds and managed inference services –... ...hardening, kernel integrity monitoring... ...Enjoy moving fast and making a...Full timeWork at officeLocal areaWork from homeFlexible hours$250k - $380k
...Full time Department Scaling Compensation $250K... ...LLM training and inference infrastructure... ...execution across vast GPU/accelerator fleets... ...looking for an engineer to design and implement... ...responsible for building standardized... ...OpenAI OpenAI is an AI research and deployment...Full timeWork at officeLocal areaRelocation packageFlexible hours$100k - $120k
Coda Robotics is scaling the compute infrastructure that powers... ...models. As training and inference workloads grow, we need kernel‑level innovations to... ...team of kernel and system engineers focused on performance-critical... ...for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware...- ...Site Reliability Engineer - AI Infrastructure... ...to the kind of scaled AI... ...have been quietly building the systems, network... ...routes training and inference jobs across... ...debug large‑scale GPU infrastructure... ...network fabric - kernel - framework.... ...narrow it down fast. Strong Candidates...Full timeRemote work
$300k
GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its... ...workloads? This team is building low-latency AI systems where... ...memory hierarchy, kernel launch overhead, occupancy... ..., profiling large-scale speech and... ...model ideas into fast, production-ready...RelocationVisa sponsorshipFree visa- Machine Learning Engineer, Inference Want to solve realtime... ...role is with a fast-growing voice AI company building the realtime speech... ...experiences used at massive scale across customer... ...scheduler design, GPU utilisation,... ...profiling and identifying kernel-level bottlenecks...Remote workFlexible hours
- Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...Flexible hours
- ...hiring on behalf of a fast‑growing AI startup... ...searching for a CUDA Kernel Engineer who has hands‑on... ...will work on the GPU performance layer powering large-scale, high-throughput... ...Proven track record building NVIDIA CUDA kernels... ...of model inference optimization (TensorRT...Remote jobLocal areaImmediate startRelocation package
- Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
$190.9k - $232.8k
...a staff software engineer for GenAI Performance and Kernel, you will own the... ...high-performance GPU kernels powering our GenAI inference stack. You will lead... ...performance at scale. What You Will... ...ML systems Build and maintain profiling... ...is the data and AI company. More...Local areaWorldwide- ...are hiring Software Engineers focused on AI Infrastructure to build the systems that enable... ...at production scale. This role exists because... ...engineering - including GPU orchestration, large-scale inference systems, performance... ...applied scientists to move fast without sacrificing...InternshipImmediate start
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to GPU Kernel Engineer: Build Fast AI Inference at Scale. Be the first to apply!


