Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

GPU Kernel Engineer: Build Fast AI Inference at Scale

Baseten

A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of GPU architecture. This position offers competitive compensation, including equity, and comprehensive benefits including medical coverage and generous PTO. #J-18808-Ljbffr Baseten

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the GPU Kernel Engineer: Build Fast AI Inference at Scale in San Francisco, CA vacancy
  •  ...seeking a talented software engineer to join their dynamic Inference team. This role involves...  ...infrastructure for large-scale multimodal models, focusing...  ...to push the boundaries of AI technology, ensuring reliable...  .... If you thrive in fast-paced environments and enjoy... 
    Suggested

    Jobleads-US

    San Francisco, CA
    4 days ago
  • $325k

    About the Team Our Inference team brings...  ...state-of-the-art AI models,...  ...Role We're hiring engineers to scale and optimize OpenAI...  ...emerging GPU platforms. You'...  ...from low-level kernel performance to...  ...partner teams to build, integrate and...  ...part of a small, fast-moving team building... 
    Suggested

    Centaur Labs

    San Francisco, CA
    3 days ago
  •  ...powers mission‑critical inference for the world's most dynamic AI companies, like Cursor...  .... Join us and help build the platform engineers turn to to ship AI products...  ...ROLE We’re seeking a GPU Kernel Engineer to join our...  .... You’ll work in a fast‑paced, intellectually... 
    Suggested
    Flexible hours

    The Consensus

    San Francisco, CA
    3 days ago
  •  ...mission‑critical inference for the world's most dynamic AI companies, like Cursor...  ...Join us and help build the platform engineers turn to to ship AI...  ...‑modal workloads scale, the network is...  ...engineers to lead our GPU Networking efforts...  .... Optimize Kernels: You will work with... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    4 days ago
  • $115k - $140k

     ...We Are Lightning AI is the company...  ...Founded in 2019, we build an end-to-end...  ...efficient, large-scale compute. Teams...  ...and production inference, with security,...  ...’re looking for engineers who understand the...  ...systems, and scaling GPU workloads in...  ...Comfortable operating in fast moving, highly... 
    Suggested
    Work at office
    Work from home
    Monday to Friday
    Flexible hours
    2 days per week

    Neura Market

    San Francisco, CA
    19 hours ago
  • $285k - $315k

     ...looking for a Founding GPU Kernel Engineer who lives right at the...  ...different GPU architectures Build tools and methods for...  ..., AMD, and emerging AI accelerators - understand...  ...experience with large-scale scientific computing,...  ...to know why things are fast or slow on the hardware... 
    Full time
    Work at office
    Relocation package

    SF Tensor

    San Francisco, CA
    4 days ago
  • A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform...  ...have a relevant degree, experience in fast-paced environments, and proficiency in... 
    Flexible hours

    Baseten

    San Francisco, CA
    9 days ago
  • $167.2k - $209k

     ...in their drive to build the simplest...  ...energized by the fast-paced environment...  ...is expanding its AI Infrastructure layer...  ...seeking a Senior Engineer 2 to join our AI Inference Data Plane team....  ...delivering high-scale, resilient data plane...  ...Understanding of GPU‑level... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    3 days ago
  • $167.2k - $209k

     ...relentless in their drive to build the simplest scalable...  ...are energized by the fast-paced environment of a...  ...is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team. DigitalOcean...  ...inference engine and GPU kernel layers, ensuring our... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    2 days ago
  •  ...seeking an Infrastructure/Cluster Engineer to design and operate large-scale clusters that enable AI inference at scale. The role focuses on...  ...hardware architectures and building robust infrastructure. The ideal...  ...health. Experience with GPU infrastructure is a plus. #J-... 

    Linuxcareers

    San Francisco, CA
    5 days ago
  • $220k

    Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience... 

    Perplexity

    San Francisco, CA
    3 days ago
  • $300k

     ...Job Description GPU Optimisation Engineer - Real-Time Inference Want to push...  ...? This team is building low-latency AI systems where milliseconds...  ...memory hierarchy, kernel launch overhead,...  ..., profiling large-scale speech and...  ...model ideas into fast, production-ready... 
    Relocation
    Visa sponsorship
    Free visa

    Techire Ai

    San Francisco, CA
    2 days ago
  •  ...Site Reliability Engineer - AI Infrastructure...  ...to the kind of scaled AI...  ...have been quietly building the systems, network...  ...routes training and inference jobs across...  ...debug large‑scale GPU infrastructure...  ...network fabric - kernel - framework....  ...narrow it down fast. Strong Candidates... 
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    2 days ago
  •  ...COMPANY We're building autonomous research...  ...operate the inference systems that serve...  .... This is an engineering role, not a...  ...make their work fast and reliable in...  ...quantization, custom kernels, scheduling...  ...-grade, large-scale serving infrastructure...  ...with GPU-accelerated inference... 

    MakerMaker

    San Francisco, CA
    1 day ago
  • Wilder Wealthy & Wise in San Francisco is seeking candidates to develop and optimize GPU-accelerated kernels for machine learning and AI applications. You will work closely with the modeling and algorithm team to enhance the performance of AI systems. The ideal candidate... 

    Wilder Wealthy & Wise

    San Francisco, CA
    2 days ago
  • Sciforium is an AI infrastructure company...  ...from AMD engineers the team is scaling rapidly to build the full stack powering...  ...Training and Inference Engineer to...  ...training systems are fast, scalable,...  ...across multi-node GPU/accelerator clusters...  ...failures, and kernel-level... 
    Flexible hours

    Sciforium

    San Francisco, CA
    3 days ago
  • The Consensus is looking for a GPU Kernel Engineer to optimize machine learning performance. The ideal candidate will design high-performance GPU kernels and collaborate on cutting-edge projects in the AI field. This role offers substantial growth opportunities in an inclusive... 

    The Consensus

    San Francisco, CA
    4 days ago
  • Virio is seeking a Founding Infrastructure Engineer to build and scale its foundational systems. Located in San Francisco, this full-time role emphasizes ownership over critical infrastructure, ensuring performance and reliability as the company grows. You will architect... 
    Full time

    Virio

    San Francisco, CA
    5 days ago
  • Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open... 

    Anyscale

    San Francisco, CA
    4 days ago
  • MakerMaker.AI in San Francisco is seeking a skilled Software Engineer to write and optimize GPU kernels. You will work on deep low-level tasks that directly impact the performance of machine learning models. The ideal candidate has over 4 years of experience with GPU kernels... 

    MakerMaker.AI

    San Francisco, CA
    1 day ago
  • $100k - $120k

    Coda Robotics is scaling the compute infrastructure that powers...  ...models. As training and inference workloads grow, we need kernel‑level innovations to...  ...team of kernel and system engineers focused on performance-critical...  ...for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware... 

    Coda Robotics

    San Francisco, CA
    2 days ago
  • Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming... 
    Flexible hours

    Liquid AI

    San Francisco, CA
    1 day ago
  • Software Engineer Intern (AI Infrastructure / Training / Inference) About the Role We are hiring...  ...Infrastructure to build the systems that enable...  ...at production scale. This role exists because...  ...— including GPU orchestration, large...  ...scientists to move fast without sacrificing... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    4 days ago
  • $405k

     ...role Anthropic's Inference organization serves...  ...that frontier AI demands. We build across GPUs, TPUs...  ...looking for a Staff Engineer to be a technical...  ...serving, scaling, and accelerator...  ...management - across GPU, TPU, and Trainium...  ...baseline comparison, fast rollback... 
    Work at office
    Visa sponsorship
    Flexible hours

    jobr.pro

    San Francisco, CA
    3 days ago
  • A leading AI technology firm located in San Francisco is seeking a Research Engineer specializing in AI Performance & Kernel Optimization. The role involves enhancing...  ...performance of large-scale AI systems, optimizing kernels...  ...background in GPU kernel development and experience... 

    Zyphra

    San Francisco, CA
    5 days ago
  • $160k - $225k

    Cacheflow is seeking a Senior Software Engineer for AI Runtime at Databricks, located in San Francisco. You will be instrumental in building and scaling systems for large-scale GPU training, ensuring high throughput and resilience in training across expansive fleets of... 

    Cacheflow

    San Francisco, CA
    1 day ago
  •  ...Hybrid Department Inference Model Serving Who...  ...Our mission is to scale intelligence to serve...  ...who are building AI systems to power magical...  ...work hard and move fast to do what’s best...  ...team of researchers, engineers, designers, and...  ...with Kubernetes, and GPU workloads on those... 
    Full time
    Work experience placement
    Work at office
    Remote work
    Flexible hours

    Jaide Health

    San Francisco, CA
    2 days ago
  • $218.4k - $273k

     ...Scale's Physical AI business unit is dedicated...  ...an ML Systems Engineer on the Physical...  ...will design and build platforms for...  ...tracking of model inference. Lead: Own...  ...implementation, in a fast-paced, cross-...  ..., including GPU-level...  ...(e.g., CUDA, kernel tuning). Programming... 
    Full time

    Scale AI

    San Francisco, CA
    1 day ago
  • Sail Research in San Francisco is seeking a talented engineer to design and implement robust systems that ensure fast and cost-efficient AI inference at global scale. You will be responsible for building high-performance schedulers and optimizing global routing while focusing... 

    Sail Research

    San Francisco, CA
    4 days ago
  •  ...foundation for AI teams. With instant GPU access, sub-second...  ...low-latency inference. Companies like...  ...infrastructure. We're a fast-growing team...  ...Lovable, Scale AI, Substack, and...  ...and experienced engineering and product leaders...  ...The Role: Modal builds AI infrastructure... 
    Work at office

    Modal

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to GPU Kernel Engineer: Build Fast AI Inference at Scale. Be the first to apply!