Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer, Inference - CUDA / Kernels

OpenAI

About the Team We’re building high-performance infrastructure to serve OpenAI’s frontier models at massive scale. As part of the inference team, you’ll be responsible for unlocking every last FLOP from our GPUs by designing kernels, tuning memory layouts, and optimizing model execution at the lowest levels of the stack. About the Role We are looking for a kernel-focused engineer to lead efforts in writing, porting, and optimizing GPU kernels used in inference workloads. This role requires deep familiarity with CUDA or equivalent kernel programming environments, and a strong intuition for performance tuning across modern GPU architectures. In this role, you will: Design, implement, and optimize CUDA kernels for inference-critical operations (e.g., fused matmuls, custom activation functions, memory layout transforms). Analyze performance bottlenecks and optimize kernel execution for throughput and latency. Contribute to and extend internal GPU libraries and runtime tools. Work closely with hardware-specific profiling tools (e.g., Nsight, nvprof) to guide improvements. Collaborate with researchers to port or re-architect new model operations for production use. You might thrive in this role if you: Have deep expertise in CUDA, and have written high-performance GPU code used in production systems. Understand GPU memory hierarchies, warp scheduling, and kernel-level tuning. Have experience debugging low-level perf issues with Nsight, cupti, or ROCm tools. Are excited to build optimized compute primitives that scale across fleets of accelerators. Enjoy working closely with model researchers to bring new operators to life. Nice to have: Contributions to GPU kernel libraries or frameworks (Triton, cuBLAS, cutlass). Experience with mixed precision or tensor core optimization. Familiarity with MIG, multi-instance GPU configurations, or NUMA-aware execution. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement . Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link . OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Software Engineer, Inference - CUDA / Kernels in San Francisco, CA vacancy
  •  ...About the Team OpenAI’s Inference team ensures that our most advanced...  ...inference stack - including kernels, communication libraries, and...  ...About the Role We’re hiring engineers to scale and optimize OpenAI’...  ...porting GPU kernels using HIP, CUDA, or Triton, and care deeply... 
    Suggested

    OpenAI

    San Francisco, CA
    1 day ago
  •  ...Baseten powers mission-critical inference for the world's most dynamic...  ...and help build the platform engineers turn to to ship AI products....  ...ROLE We're seeking a GPU Kernel Engineer to join our team at...  ...and optimize code using CUDA, PTX assembly, and architecture... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    5 days ago
  •  ...the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-...  ...our large-scale, GPU-accelerated AI inference platform. You will be delivering world-...  ...routing) Develop and maintain GPU code in CUDA and C++, including low-level assembly... 
    Suggested
    Flexible hours

    FriendliAI

    San Francisco, CA
    1 day ago
  • $142.2k - $204.6k

     ...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and...  ...the full GenAI inference stack - from kernels and runtimes to orchestration and...  ...operations, etc. Hands-on experience with CUDA, GPU programming, and key libraries... 
    Suggested
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    6 days ago
  •  ...small, fast-growing team of engineers in San Francisco powering Fortune...  ...low-latency, high-throughput inference for OCR and multimodal models...  ...and caching Optimize kernels, tokenization, and model graphs...  ...systems Strong Python, plus C++ or CUDA exposure Experience with GPU... 
    Suggested
    Work at office
    Visa sponsorship
    Relocation package

    PULSE

    San Francisco, CA
    1 day ago
  •  ...into useful intelligence - the inference services that serve LLMs at...  ...about both. Researchers and ML engineers will hand you workloads that...  ...the GPU/accelerator stack: CUDA fundamentals, NCCL, mixed precision...  .... You don't need to write kernels, but you should know why a... 
    Flexible hours

    Adaption

    San Francisco, CA
    12 days ago
  • Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing...  ...and more than 5 years of relevant experience. Knowledge of CUDA, DSP, and C/C++ is essential. Benefits include life insurance... 

    Quadric

    San Francisco, CA
    2 days ago
  • $220k

    Perplexity is looking for an engineer to join their team in San Francisco...  ...on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based...  ...has 3+ years of experience in software engineering with a focus on ML... 

    Perplexity

    San Francisco, CA
    1 day ago
  • $187.5k - $395k

     ...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality...  ...(via Kubernetes or similar) Nice to have CUDA FFmpeg Compensation The base pay range for this... 

    Luma AI

    San Francisco, CA
    4 days ago
  •  ...Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming...  ...reviews Improve scheduler, batcher, autoscaling; profile latency, cost, utilization Sometimes write kernels #J-18808-Ljbffr... 

    SupportFinity

    San Francisco, CA
    1 day ago
  •  ...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative... 

    FriendliAI

    San Francisco, CA
    1 day ago
  •  ...We are seeking a highly technical Inference Engine Engineer to optimize the performance and...  ...designing, implementing, and optimizing GPU kernels and supporting infrastructure for next-...  ...performance bottlenecks across the software and hardware stack, and implement targeted... 
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    5 days ago
  •  ...ASAP Languages: English (required) We are searching for a CUDA Kernel Engineer who has hands‑on experience developing and optimizing NVIDIA...  ...for ML frameworks or HPC workloads. Knowledge of model inference optimization (TensorRT, CUDA Graphs, CUTLASS). Exposure to... 
    Remote job
    Local area
    Immediate start
    Relocation package

    Pragmatike

    San Francisco, CA
    14 hours ago
  •  ...workloads. Build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime systems....  ...working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes.... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...programming and performance work (CUDA, Triton, CUTLASS, or similar)....  ...model code, CUDA/CuteDSL for kernels , You own problems end-to-end....  ...you , 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems , Familiarity... 

    Perplexity AI

    San Francisco, CA
    4 hours ago
  • $325k

     ...About the Team Our Inference team brings OpenAI's most capable research...  ...Role We are looking for an engineer who wants to take the world's...  ...5 years of professional software engineering experience. Have...  ...that optimize them (e.g. NCCL, CUDA), as well as HPC technologies... 

    OpenAI

    San Francisco, CA
    6 days ago
  • $190.9k - $232.8k

     ...About This Role As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation...  ...performance GPU kernels powering our GenAI inference stack. You will lead development...  ...and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    3 days ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that...  ...orchestration is a strong plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies (InfiniBand, NVLink,... 
    Full time
    Local area

    Together AI

    San Francisco, CA
    4 days ago
  •  ...Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work...  ...candidate will have a strong background in CUDA or similar, with proven experience in... 

    MakerMaker

    San Francisco, CA
    2 days ago
  •  ...critical computing systems. You'll work with AI agents to improve performance and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer competitive compensation ranging from $150K to $250K, full health coverage,... 
    Flexible hours

    Asari AI

    San Francisco, CA
    1 day ago
  •  ...architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety...  ...control code. Role The AI Kernel Engineer in Quadric plays the key role...  ...compute development: CUDA, DSP, NEON, Triton‑lang Proficiency... 

    Quadric

    San Francisco, CA
    2 days ago
  •  ...We are obsessed with rapid iteration, engineering rigor, and deploying real machines into...  ...Optimize policies for real‑time (~10hz) inference on edge hardware What you bring Experience...  ..., or TVM for inference optimization CUDA kernel optimization Ideally, contributions at... 
    Temporary work

    Kovari

    San Francisco, CA
    1 day ago
  •  ...the architectural and engineering backbone of OpenAI’s infrastructure...  ...Our work spans system software, networking, platform...  ..., porting existing inference and training workloads...  .../communication, kernel-level bottlenecks,...  ...performance-critical code (C++/CUDA/HIP is a plus). Strong... 

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  •  ...About the Team We’re hiring a Developer Productivity engineer to support OpenAI’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’re hiring... 

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...don’t believe culture can be engineered – but when it falls into place...  ...Overview We're looking for a software engineer to optimize and deploy...  ...embedded GPUs—using TensorRT, custom CUDA kernels, and low-level systems engineering. Beyond inference, you'll profile and optimize... 
    Local area

    Humble Robotics

    San Francisco, CA
    1 day ago
  •  ...experienced individual to develop low-latency inference pipelines for on-device deployment in...  ...efficient low-level code such as CUDA and Triton, and managing workloads to ensure...  ...strong Python background, and mastery in kernel optimization. This position is essential... 

    Genesis AI

    San Francisco, CA
    2 days ago
  •  ...Mercor is seeking a CUDA Engineering Expert to analyze and optimize GPU kernels for performance in a remote role. The ideal candidate should be fluent in core C++ features through C++17, with working knowledge of Python and Git, and experience in GPU programming models... 
    Remote work

    Mercor Inc

    San Francisco, CA
    3 days ago
  •  ...leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will...  ...computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of GPU architecture... 

    BaseTen

    San Francisco, CA
    1 day ago
  • $320k

     ...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    3 days ago
  •  .... The Role: As a Research Engineer - AI Performance & Kernel Optimization , you will improve...  ...model training and inference stacks. You will work closely...  ...from PTX/assembly to CUDA, HIP, Triton, or other GPU...  ...to reason about hardware-software interactions Are excited... 
    Work at office
    Relocation package

    Zyphra

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Inference - CUDA / Kernels. Be the first to apply!