Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

GPU Kernel Engineer: Build Fast AI Inference at Scale

Baseten

A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of GPU architecture. This position offers competitive compensation, including equity, and comprehensive benefits including medical coverage and generous PTO. #J-18808-Ljbffr Baseten

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the GPU Kernel Engineer: Build Fast AI Inference at Scale in San Francisco, CA vacancy
  •  ...experience with GPU programming and...  .../CuteDSL for kernels , You own problems...  ...You do well in fast-moving...  ...professional software engineering experience with...  ...meaningful work on ML inference or high-...  ...observability. Build dashboards, alerts...  ...#J-18808-Ljbffr Perplexity AI
    Suggested

    Perplexity AI

    San Francisco, CA
    14 hours ago
  • FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative... 
    Suggested

    FriendliAI

    San Francisco, CA
    2 days ago
  •  ...seeking a talented software engineer to join their dynamic Inference team. This role involves...  ...infrastructure for large-scale multimodal models, focusing...  ...to push the boundaries of AI technology, ensuring reliable...  .... If you thrive in fast-paced environments and enjoy... 
    Suggested

    OpenAI

    San Francisco, CA
    1 day ago
  •  ...Team OpenAI’s Inference team ensures...  ...reliably, and at scale. We build and optimize...  ...stack - including kernels, communication...  ...We’re hiring engineers to scale and optimize...  ...emerging GPU platforms. You’...  ...part of a small, fast-moving team building...  ...OpenAI is an AI research and... 
    Suggested
    Full time

    OpenAI

    San Francisco, CA
    23 hours ago
  • About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our large-scale, GPU-accelerated AI inference platform. You will be delivering world-class inference speed across NVIDIA and AMD GPUs. With... 
    Suggested
    Flexible hours

    FriendliAI

    San Francisco, CA
    2 days ago
  • $160k - $320k

    A leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills, HPC experience, and knowledge...  .... Responsibilities include designing GPU kernels, optimizing performance, and... 

    Vast.ai

    San Francisco, CA
    2 days ago
  •  ...powers mission-critical inference for the world's most dynamic AI companies, like Cursor...  .... Join us and help build the platform engineers turn to to ship AI products...  ...We’re seeking a GPU Kernel Engineer to join our...  .... You'll work in a fast-paced, intellectually... 
    Full time
    Flexible hours

    Baseten

    San Francisco, CA
    23 hours ago
  •  ...the Team We’re building high-performance...  ...models at massive scale. As part of the inference team, you’ll be...  ...GPUs by designing kernels, tuning memory layouts...  ...a kernel-focused engineer to lead efforts...  ..., and optimizing GPU kernels used in...  ...OpenAI is an AI research and deployment... 
    Full time

    OpenAI

    San Francisco, CA
    23 hours ago
  •  ...About Us Most AI is frozen in...  ...Our mandate is to build efficient...  ...intelligence - the inference services that serve LLMs at scale and the data pipelines...  ...and ML engineers will hand you workloads...  ...heterogeneous GPU fleets. Batching...  ...t need to write kernels, but you should... 
    Flexible hours

    Adaption

    San Francisco, CA
    7 days ago
  • $167.2k - $209k

     ...relentless in their drive to build the simplest scalable...  ...are energized by the fast-paced environment of a...  ...is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team. DigitalOcean...  ...inference engine and GPU kernel layers, ensuring our... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    3 days ago
  • $167.2k - $209k

     ...in their drive to build the simplest...  ...energized by the fast-paced environment...  ...is expanding its AI Infrastructure layer...  ...seeking a Senior Engineer 2 to join our AI Inference Data Plane team....  ...delivering high-scale, resilient data plane...  ...Understanding of GPU-level... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    1 day ago
  •  ...GPU Kernel Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI...  ...from AMD engineers the team is scaling rapidly to build the full stack powering frontier...  ...used for large-scale training and inference. This role is ideal for someone... 
    Flexible hours

    Sciforium

    San Francisco, CA
    4 days ago
  •  ...mission‑critical inference for the world's most dynamic AI companies, like Cursor...  ...Join us and help build the platform engineers turn to to ship AI...  ...‑modal workloads scale, the network is...  ...engineers to lead our GPU Networking efforts...  .... Optimize Kernels: You will work with... 
    Flexible hours

    Baseten

    San Francisco, CA
    2 days ago
  •  ...highly technical Inference Engine Engineer to optimize...  ..., and optimizing GPU kernels and supporting infrastructure...  ...and agentic AI workloads. Your...  ...production-scale machine learning applications...  ...FriendliAI is building the world's best...  ...multi-modal models fast, efficient, and... 
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    23 hours ago
  • $142.2k - $204.6k

     ...As a software engineer for GenAI inference, you will help...  ...serving systems are fast, scalable, and...  ...stack - from kernels and runtimes...  ...for large-scale LLMs inference...  ...accelerators Build and maintain...  ...experience with CUDA, GPU programming,...  ...the data and AI company. More... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    1 day ago
  • $285k - $315k

     ...looking for a Founding GPU Kernel Engineer who lives right at the...  ...different GPU architectures Build tools and methods for...  ..., AMD, and emerging AI accelerators - understand...  ...experience with large-scale scientific computing,...  ...to know why things are fast or slow on the hardware... 
    Full time
    Work at office
    Relocation package

    SF Tensor

    San Francisco, CA
    2 days ago
  • A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform...  ...have a relevant degree, experience in fast-paced environments, and proficiency in... 
    Flexible hours

    Baseten

    San Francisco, CA
    2 days ago
  • MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern... 

    MakerMaker

    San Francisco, CA
    23 hours ago
  • $220k

    Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience... 

    Perplexity

    San Francisco, CA
    1 day ago
  • $255k - $405k

     ...Lambda is the #1 GPU Cloud for ML/AI teams training...  ...models, where engineers can easily,...  ...and affordably build, test and...  ...AI products at scale. Lambda’s product...  ...clouds and managed inference services –...  ...hardening, kernel integrity monitoring...  ...Enjoy moving fast and making a... 
    Full time
    Work at office
    Local area
    Work from home
    Flexible hours

    Lambda

    San Francisco, CA
    23 hours ago
  • $250k - $380k

     ...Full time Department Scaling Compensation $250K...  ...LLM training and inference infrastructure...  ...execution across vast GPU/accelerator fleets...  ...looking for an engineer to design and implement...  ...responsible for building standardized...  ...OpenAI OpenAI is an AI research and deployment... 
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    23 hours ago
  • $100k - $120k

    Coda Robotics is scaling the compute infrastructure that powers...  ...models. As training and inference workloads grow, we need kernel‑level innovations to...  ...team of kernel and system engineers focused on performance-critical...  ...for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware... 

    Coda Robotics

    San Francisco, CA
    23 hours ago
  •  ...Site Reliability Engineer - AI Infrastructure...  ...to the kind of scaled AI...  ...have been quietly building the systems, network...  ...routes training and inference jobs across...  ...debug large‑scale GPU infrastructure...  ...network fabric - kernel - framework....  ...narrow it down fast. Strong Candidates... 
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    23 hours ago
  • $300k

    GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its...  ...workloads? This team is building low-latency AI systems where...  ...memory hierarchy, kernel launch overhead, occupancy...  ..., profiling large-scale speech and...  ...model ideas into fast, production-ready... 
    Relocation
    Visa sponsorship
    Free visa

    Techire Ai

    San Francisco, CA
    4 days ago
  • Machine Learning Engineer, Inference Want to solve realtime...  ...role is with a fast-growing voice AI company building the realtime speech...  ...experiences used at massive scale across customer...  ...scheduler design, GPU utilisation,...  ...profiling and identifying kernel-level bottlenecks... 
    Remote work
    Flexible hours

    Trades Workforce Solutions

    San Francisco, CA
    14 hours ago
  • Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming... 
    Flexible hours

    Liquid AI

    San Francisco, CA
    4 days ago
  •  ...hiring on behalf of a fast‑growing AI startup...  ...searching for a CUDA Kernel Engineer who has hands‑on...  ...will work on the GPU performance layer powering large-scale, high-throughput...  ...Proven track record building NVIDIA CUDA kernels...  ...of model inference optimization (TensorRT... 
    Remote job
    Local area
    Immediate start
    Relocation package

    Pragmatike

    San Francisco, CA
    23 hours ago
  • Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open... 

    Anyscale

    San Francisco, CA
    2 days ago
  • $190.9k - $232.8k

     ...a staff software engineer for GenAI Performance and Kernel, you will own the...  ...high-performance GPU kernels powering our GenAI inference stack. You will lead...  ...performance at scale. What You Will...  ...ML systems Build and maintain profiling...  ...is the data and AI company. More... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    3 days ago
  •  ...are hiring Software Engineers focused on AI Infrastructure to build the systems that enable...  ...at production scale. This role exists because...  ...engineering - including GPU orchestration, large-scale inference systems, performance...  ...applied scientists to move fast without sacrificing... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to GPU Kernel Engineer: Build Fast AI Inference at Scale. Be the first to apply!