GPU Kernel Engineer: Build Fast AI Inference at Scale
Baseten
A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of GPU architecture. This position offers competitive compensation, including equity, and comprehensive benefits including medical coverage and generous PTO. #J-18808-Ljbffr Baseten
- ...experience with GPU programming and... .../CuteDSL for kernels , You own problems... ...You do well in fast-moving... ...professional software engineering experience with... ...meaningful work on ML inference or high-... ...observability. Build dashboards, alerts... ...#J-18808-Ljbffr Perplexity AISuggested
- ...seeking a talented software engineer to join their dynamic Inference team. This role involves... ...infrastructure for large-scale multimodal models, focusing... ...to push the boundaries of AI technology, ensuring reliable... .... If you thrive in fast-paced environments and enjoy...Suggested
- FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...Suggested
- About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our large-scale, GPU-accelerated AI inference platform. You will be delivering world-class inference speed across NVIDIA and AMD GPUs. With...SuggestedFlexible hours
$160k - $320k
A leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills, HPC experience, and knowledge... .... Responsibilities include designing GPU kernels, optimizing performance, and...Suggested- ...About Us Most AI is frozen in... ...Our mandate is to build efficient... ...intelligence - the inference services that serve LLMs at scale and the data pipelines... ...and ML engineers will hand you workloads... ...heterogeneous GPU fleets. Batching... ...t need to write kernels, but you should...Flexible hours
$167.2k - $209k
...in their drive to build the simplest... ...energized by the fast-paced environment... ...is expanding its AI Infrastructure layer... ...seeking a Senior Engineer 2 to join our AI Inference Data Plane team.... ...delivering high-scale, resilient data plane... ...Understanding of GPU-level...Local areaRemote workWorldwideFlexible hours$167.2k - $209k
...relentless in their drive to build the simplest scalable... ...are energized by the fast-paced environment of a... ...is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team. DigitalOcean... ...inference engine and GPU kernel layers, ensuring our...Local areaRemote workWorldwideFlexible hours- ...GPU Kernel Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI... ...from AMD engineers the team is scaling rapidly to build the full stack powering frontier... ...used for large-scale training and inference. This role is ideal for someone...Flexible hours
- ...mission‑critical inference for the world's most dynamic AI companies, like Cursor... ...Join us and help build the platform engineers turn to to ship AI... ...‑modal workloads scale, the network is... ...engineers to lead our GPU Networking efforts... .... Optimize Kernels: You will work with...Flexible hours
$142.2k - $204.6k
...As a software engineer for GenAI inference, you will help... ...serving systems are fast, scalable, and... ...stack - from kernels and runtimes... ...for large-scale LLMs inference... ...accelerators Build and maintain... ...experience with CUDA, GPU programming,... ...the data and AI company. More...Local areaWorldwide- ...highly technical Inference Engine Engineer to optimize... ..., and optimizing GPU kernels and supporting infrastructure... ...and agentic AI workloads. Your... ...production-scale machine learning applications... ...FriendliAI is building the world's best... ...multi-modal models fast, efficient, and...WorldwideFlexible hours
$285k - $315k
...looking for a Founding GPU Kernel Engineer who lives right at the... ...different GPU architectures Build tools and methods for... ..., AMD, and emerging AI accelerators - understand... ...experience with large-scale scientific computing,... ...to know why things are fast or slow on the hardware...Full timeWork at officeRelocation package- ...powers mission‑critical inference for the world's most dynamic AI companies, like Cursor... .... Join us and help build the platform engineers turn to to ship AI products... ...ROLE We’re seeking a GPU Kernel Engineer to join our... .... You’ll work in a fast‑paced, intellectually...Flexible hours
- A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform... ...have a relevant degree, experience in fast-paced environments, and proficiency in...Flexible hours
- MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern...
$220k
Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience...$250k - $380k
...Full time Department Scaling Compensation $250K... ...LLM training and inference infrastructure... ...execution across vast GPU/accelerator fleets... ...looking for an engineer to design and implement... ...responsible for building standardized... ...OpenAI OpenAI is an AI research and deployment...Full timeWork at officeLocal areaRelocation packageFlexible hours- Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
- Machine Learning Engineer, Inference Want to solve realtime... ...role is with a fast-growing voice AI company building the realtime speech... ...experiences used at massive scale across customer... ...scheduler design, GPU utilisation,... ...profiling and identifying kernel-level bottlenecks...Remote jobFlexible hours
- ...hiring on behalf of a fast‑growing AI startup... ...searching for a CUDA Kernel Engineer who has hands‑on... ...will work on the GPU performance layer powering large-scale, high-throughput... ...Proven track record building NVIDIA CUDA kernels... ...of model inference optimization (TensorRT...Remote jobLocal areaImmediate startRelocation package
- Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...Flexible hours
- ...Site Reliability Engineer - AI Infrastructure... ...to the kind of scaled AI... ...have been quietly building the systems, network... ...routes training and inference jobs across... ...debug large‑scale GPU infrastructure... ...network fabric - kernel - framework.... ...narrow it down fast. Strong Candidates...Full timeRemote work
$300k
GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its... ...workloads? This team is building low-latency AI systems where... ...memory hierarchy, kernel launch overhead, occupancy... ..., profiling large-scale speech and... ...model ideas into fast, production-ready...RelocationVisa sponsorshipFree visa$190.9k - $232.8k
...a staff software engineer for GenAI Performance and Kernel, you will own the... ...high-performance GPU kernels powering our GenAI inference stack. You will lead... ...performance at scale. What You Will... ...ML systems Build and maintain profiling... ...is the data and AI company. More...Local areaWorldwide- ...are hiring Software Engineers focused on AI Infrastructure to build the systems that enable... ...at production scale. This role exists because... ...engineering - including GPU orchestration, large-scale inference systems, performance... ...applied scientists to move fast without sacrificing...InternshipImmediate start
- ...Sciforium is an AI infrastructure company... ...from AMD engineers the team is scaling rapidly to build the full stack powering... ...Training and Inference Engineer to... ...training systems are fast, scalable,... ...across multi-node GPU/accelerator clusters... ...failures, and kernel-level...Full timeFlexible hours
$225k
...mission is to build safe AGI that accelerates... ...frontier‑scale pre‑training,... ...context, and inference‑time compute to... ...As a Software Engineer on the Inference... ...scale RL iteration fast and reliable.... ...across GPU, networking, and... ...Collaborate with Kernels and Research to...RelocationVisa sponsorship$185k
About the Role The Engineering Acceleration team builds and operates the foundational systems... ...integration systems for a fast-growing engineering... ...bottlenecks. Use modern AI tools to rethink CI failure... ...or operated CI systems at scale, especially in environments...Local areaRemote work- A leading AI technology firm located in San Francisco is seeking a Research Engineer specializing in AI Performance & Kernel Optimization. The role involves enhancing... ...performance of large-scale AI systems, optimizing kernels... ...background in GPU kernel development and experience...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to GPU Kernel Engineer: Build Fast AI Inference at Scale. Be the first to apply!


