Software Engineer, Inference - CUDA / Kernels

OpenAI

About the Team We’re building high-performance infrastructure to serve OpenAI’s frontier models at massive scale. As part of the inference team, you’ll be responsible for unlocking every last FLOP from our GPUs by designing kernels, tuning memory layouts, and optimizing model execution at the lowest levels of the stack. About the Role We are looking for a kernel-focused engineer to lead efforts in writing, porting, and optimizing GPU kernels used in inference workloads. This role requires deep familiarity with CUDA or equivalent kernel programming environments, and a strong intuition for performance tuning across modern GPU architectures. In this role, you will: Design, implement, and optimize CUDA kernels for inference-critical operations (e.g., fused matmuls, custom activation functions, memory layout transforms). Analyze performance bottlenecks and optimize kernel execution for throughput and latency. Contribute to and extend internal GPU libraries and runtime tools. Work closely with hardware-specific profiling tools (e.g., Nsight, nvprof) to guide improvements. Collaborate with researchers to port or re-architect new model operations for production use. You might thrive in this role if you: Have deep expertise in CUDA, and have written high-performance GPU code used in production systems. Understand GPU memory hierarchies, warp scheduling, and kernel-level tuning. Have experience debugging low-level perf issues with Nsight, cupti, or ROCm tools. Are excited to build optimized compute primitives that scale across fleets of accelerators. Enjoy working closely with model researchers to bring new operators to life. Nice to have: Contributions to GPU kernel libraries or frameworks (Triton, cuBLAS, cutlass). Experience with mixed precision or tensor core optimization. Familiarity with MIG, multi-instance GPU configurations, or NUMA-aware execution. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement . Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link . OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Software Engineer, Inference - CUDA / Kernels in San Francisco, CA vacancy

Software Engineer, Inference GPU Enablement
...About the Team OpenAI’s Inference team ensures that our most advanced... ...inference stack - including kernels, communication libraries, and... ...About the Role We’re hiring engineers to scale and optimize OpenAI’... ...porting GPU kernels using HIP, CUDA, or Triton, and care deeply...
Suggested
OpenAI
San Francisco, CA
1 day ago
Software Engineer - GPU Kernels
...Baseten powers mission-critical inference for the world's most dynamic... ...and help build the platform engineers turn to to ship AI products.... ...ROLE We're seeking a GPU Kernel Engineer to join our team at... ...and optimize code using CUDA, PTX assembly, and architecture...
Suggested
Flexible hours
Baseten
San Francisco, CA
5 days ago
Software Engineer - GPU Kernel
...the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-... ...our large-scale, GPU-accelerated AI inference platform. You will be delivering world-... ...routing) Develop and maintain GPU code in CUDA and C++, including low-level assembly...
Suggested
Flexible hours
FriendliAI
San Francisco, CA
1 day ago
Software Engineer - GenAI inference
$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and... ...the full GenAI inference stack - from kernels and runtimes to orchestration and... ...operations, etc. Hands-on experience with CUDA, GPU programming, and key libraries...
Suggested
Local area
Worldwide
Databricks
San Francisco, CA
6 days ago
Software Engineer, Inference
...small, fast-growing team of engineers in San Francisco powering Fortune... ...low-latency, high-throughput inference for OCR and multimodal models... ...and caching Optimize kernels, tokenization, and model graphs... ...systems Strong Python, plus C++ or CUDA exposure Experience with GPU...
Suggested
Work at office
Visa sponsorship
Relocation package
PULSE
San Francisco, CA
1 day ago
Distributed Systems Engineer, Data & Inference Platform
...into useful intelligence - the inference services that serve LLMs at... ...about both. Researchers and ML engineers will hand you workloads that... ...the GPU/accelerator stack: CUDA fundamentals, NCCL, mixed precision... .... You don't need to write kernels, but you should know why a...
Flexible hours
Adaption
San Francisco, CA
12 days ago
Senior AI Kernel Engineer for Edge Inference
Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing... ...and more than 5 years of relevant experience. Knowledge of CUDA, DSP, and C/C++ is essential. Benefits include life insurance...
Quadric
San Francisco, CA
2 days ago
Senior AI Inference Engineer - GPU, Rust & CUDA
$220k
Perplexity is looking for an engineer to join their team in San Francisco... ...on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based... ...has 3+ years of experience in software engineering with a focus on ML...
Perplexity
San Francisco, CA
1 day ago
Software Engineer, Inference
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality... ...(via Kubernetes or similar) Nice to have CUDA FFmpeg Compensation The base pay range for this...
Luma AI
San Francisco, CA
4 days ago
Cloud Inference Engineer
...Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming... ...reviews Improve scheduler, batcher, autoscaling; profile latency, cost, utilization Sometimes write kernels #J-18808-Ljbffr...
SupportFinity
San Francisco, CA
1 day ago
GPU Kernel Engineer for AI Inference & Performance
...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
FriendliAI
San Francisco, CA
1 day ago
Software Engineer - AI Inference Engine
...We are seeking a highly technical Inference Engine Engineer to optimize the performance and... ...designing, implementing, and optimizing GPU kernels and supporting infrastructure for next-... ...performance bottlenecks across the software and hardware stack, and implement targeted...
Worldwide
Flexible hours
FriendliAI Corp
San Francisco, CA
5 days ago
CUDA Kernel Engineer (Remote US)
...ASAP Languages: English (required) We are searching for a CUDA Kernel Engineer who has hands‑on experience developing and optimizing NVIDIA... ...for ML frameworks or HPC workloads. Knowledge of model inference optimization (TensorRT, CUDA Graphs, CUTLASS). Exposure to...
Remote job
Local area
Immediate start
Relocation package
Pragmatike
San Francisco, CA
14 hours ago
Inference Technical Lead, On-Device Transformers
...workloads. Build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime systems.... ...working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes....
Work at office
Relocation package
OpenAI
San Francisco, CA
2 days ago
AI Inference Engineer — High-Performance GPU Systems
...programming and performance work (CUDA, Triton, CUTLASS, or similar).... ...model code, CUDA/CuteDSL for kernels , You own problems end-to-end.... ...you , 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems , Familiarity...
Perplexity AI
San Francisco, CA
4 hours ago
Software Engineer, Model Inference
$325k
...About the Team Our Inference team brings OpenAI's most capable research... ...Role We are looking for an engineer who wants to take the world's... ...5 years of professional software engineering experience. Have... ...that optimize them (e.g. NCCL, CUDA), as well as HPC technologies...
OpenAI
San Francisco, CA
6 days ago
Staff Software Engineer - GenAI Performance and Kernel
$190.9k - $232.8k
...About This Role As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation... ...performance GPU kernels powering our GenAI inference stack. You will lead development... ...and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly...
Local area
Worldwide
Databricks
San Francisco, CA
3 days ago
Senior Backend Engineer, Inference Platform
$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that... ...orchestration is a strong plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies (InfiniBand, NVLink,...
Full time
Local area
Together AI
San Francisco, CA
4 days ago
Senior GPU Kernel Engineer - Accelerate AI Training Systems
...Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work... ...candidate will have a strong background in CUDA or similar, with proven experience in...
MakerMaker
San Francisco, CA
2 days ago
Kernel AI Engineer (CUDA/GPU) — Flexible Hours
...critical computing systems. You'll work with AI agents to improve performance and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer competitive compensation ranging from $150K to $250K, full health coverage,...
Flexible hours
Asari AI
San Francisco, CA
1 day ago
AI Kernel Engineer
...architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety... ...control code. Role The AI Kernel Engineer in Quadric plays the key role... ...compute development: CUDA, DSP, NEON, Triton‑lang Proficiency...
Quadric
San Francisco, CA
2 days ago
Founding Software Engineer, Perception
...We are obsessed with rapid iteration, engineering rigor, and deploying real machines into... ...Optimize policies for real‑time (~10hz) inference on edge hardware What you bring Experience... ..., or TVM for inference optimization CUDA kernel optimization Ideally, contributions at...
Temporary work
Kovari
San Francisco, CA
1 day ago
Software Engineer, Workload Enablement
...the architectural and engineering backbone of OpenAI’s infrastructure... ...Our work spans system software, networking, platform... ..., porting existing inference and training workloads... .../communication, kernel-level bottlenecks,... ...performance-critical code (C++/CUDA/HIP is a plus). Strong...
AI Chopping Block, Inc.
San Francisco, CA
1 day ago
Software Engineer, Productivity - Inference Runtime
...About the Team We’re hiring a Developer Productivity engineer to support OpenAI’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’re hiring...
OpenAI
San Francisco, CA
2 days ago
Onboard AV Software Engineer
...don’t believe culture can be engineered – but when it falls into place... ...Overview We're looking for a software engineer to optimize and deploy... ...embedded GPUs—using TensorRT, custom CUDA kernels, and low-level systems engineering. Beyond inference, you'll profile and optimize...
Local area
Humble Robotics
San Francisco, CA
1 day ago
Low-Latency Inference Systems Engineer - On-Device & GPU
...experienced individual to develop low-latency inference pipelines for on-device deployment in... ...efficient low-level code such as CUDA and Triton, and managing workloads to ensure... ...strong Python background, and mastery in kernel optimization. This position is essential...
Genesis AI
San Francisco, CA
2 days ago
Remote CUDA Kernel Engineer
...Mercor is seeking a CUDA Engineering Expert to analyze and optimize GPU kernels for performance in a remote role. The ideal candidate should be fluent in core C++ features through C++17, with working knowledge of Python and Git, and experience in GPU programming models...
Remote work
Mercor Inc
San Francisco, CA
3 days ago
GPU Kernel Engineer: Build Fast AI Inference at Scale
...leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will... ...computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of GPU architecture...
BaseTen
San Francisco, CA
1 day ago
Staff + Sr. Software Engineer, Cloud Inference
$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
3 days ago
Research Engineer - AI Performance & Kernel Optimization
.... The Role: As a Research Engineer - AI Performance & Kernel Optimization , you will improve... ...model training and inference stacks. You will work closely... ...from PTX/assembly to CUDA, HIP, Triton, or other GPU... ...to reason about hardware-software interactions Are excited...
Work at office
Relocation package
Zyphra
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Inference - CUDA / Kernels. Be the first to apply!