Software Engineer, Inference - CUDA / Kernels
OpenAI
About the Team We’re building high-performance infrastructure to serve OpenAI’s frontier models at massive scale. As part of the inference team, you’ll be responsible for unlocking every last FLOP from our GPUs by designing kernels, tuning memory layouts, and optimizing model execution at the lowest levels of the stack. About the Role We are looking for a kernel-focused engineer to lead efforts in writing, porting, and optimizing GPU kernels used in inference workloads. This role requires deep familiarity with CUDA or equivalent kernel programming environments, and a strong intuition for performance tuning across modern GPU architectures. In this role, you will: Design, implement, and optimize CUDA kernels for inference-critical operations (e.g., fused matmuls, custom activation functions, memory layout transforms). Analyze performance bottlenecks and optimize kernel execution for throughput and latency. Contribute to and extend internal GPU libraries and runtime tools. Work closely with hardware-specific profiling tools (e.g., Nsight, nvprof) to guide improvements. Collaborate with researchers to port or re-architect new model operations for production use. You might thrive in this role if you: Have deep expertise in CUDA, and have written high-performance GPU code used in production systems. Understand GPU memory hierarchies, warp scheduling, and kernel-level tuning. Have experience debugging low-level perf issues with Nsight, cupti, or ROCm tools. Are excited to build optimized compute primitives that scale across fleets of accelerators. Enjoy working closely with model researchers to bring new operators to life. Nice to have: Contributions to GPU kernel libraries or frameworks (Triton, cuBLAS, cutlass). Experience with mixed precision or tensor core optimization. Familiarity with MIG, multi-instance GPU configurations, or NUMA-aware execution. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement . Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link . OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
- ...About the Team OpenAI’s Inference team ensures that our most advanced... ...inference stack - including kernels, communication libraries, and... ...About the Role We’re hiring engineers to scale and optimize OpenAI’... ...porting GPU kernels using HIP, CUDA, or Triton, and care deeply...Suggested
- ...Baseten powers mission-critical inference for the world's most dynamic... ...and help build the platform engineers turn to to ship AI products.... ...ROLE We're seeking a GPU Kernel Engineer to join our team at... ...and optimize code using CUDA, PTX assembly, and architecture...SuggestedFlexible hours
- ...the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-... ...our large-scale, GPU-accelerated AI inference platform. You will be delivering world-... ...routing) Develop and maintain GPU code in CUDA and C++, including low-level assembly...SuggestedFlexible hours
$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and... ...the full GenAI inference stack - from kernels and runtimes to orchestration and... ...operations, etc. Hands-on experience with CUDA, GPU programming, and key libraries...SuggestedLocal areaWorldwide- ...small, fast-growing team of engineers in San Francisco powering Fortune... ...low-latency, high-throughput inference for OCR and multimodal models... ...and caching Optimize kernels, tokenization, and model graphs... ...systems Strong Python, plus C++ or CUDA exposure Experience with GPU...SuggestedWork at officeVisa sponsorshipRelocation package
- ...into useful intelligence - the inference services that serve LLMs at... ...about both. Researchers and ML engineers will hand you workloads that... ...the GPU/accelerator stack: CUDA fundamentals, NCCL, mixed precision... .... You don't need to write kernels, but you should know why a...Flexible hours
- Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing... ...and more than 5 years of relevant experience. Knowledge of CUDA, DSP, and C/C++ is essential. Benefits include life insurance...
$220k
Perplexity is looking for an engineer to join their team in San Francisco... ...on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based... ...has 3+ years of experience in software engineering with a focus on ML...$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality... ...(via Kubernetes or similar) Nice to have CUDA FFmpeg Compensation The base pay range for this...- ...Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming... ...reviews Improve scheduler, batcher, autoscaling; profile latency, cost, utilization Sometimes write kernels #J-18808-Ljbffr...
- ...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
- ...We are seeking a highly technical Inference Engine Engineer to optimize the performance and... ...designing, implementing, and optimizing GPU kernels and supporting infrastructure for next-... ...performance bottlenecks across the software and hardware stack, and implement targeted...WorldwideFlexible hours
- ...ASAP Languages: English (required) We are searching for a CUDA Kernel Engineer who has hands‑on experience developing and optimizing NVIDIA... ...for ML frameworks or HPC workloads. Knowledge of model inference optimization (TensorRT, CUDA Graphs, CUTLASS). Exposure to...Remote jobLocal areaImmediate startRelocation package
- ...workloads. Build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime systems.... ...working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes....Work at officeRelocation package
- ...programming and performance work (CUDA, Triton, CUTLASS, or similar).... ...model code, CUDA/CuteDSL for kernels , You own problems end-to-end.... ...you , 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems , Familiarity...
$325k
...About the Team Our Inference team brings OpenAI's most capable research... ...Role We are looking for an engineer who wants to take the world's... ...5 years of professional software engineering experience. Have... ...that optimize them (e.g. NCCL, CUDA), as well as HPC technologies...$190.9k - $232.8k
...About This Role As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation... ...performance GPU kernels powering our GenAI inference stack. You will lead development... ...and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly...Local areaWorldwide$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that... ...orchestration is a strong plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies (InfiniBand, NVLink,...Full timeLocal area- ...Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work... ...candidate will have a strong background in CUDA or similar, with proven experience in...
- ...critical computing systems. You'll work with AI agents to improve performance and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer competitive compensation ranging from $150K to $250K, full health coverage,...Flexible hours
- ...architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety... ...control code. Role The AI Kernel Engineer in Quadric plays the key role... ...compute development: CUDA, DSP, NEON, Triton‑lang Proficiency...
- ...We are obsessed with rapid iteration, engineering rigor, and deploying real machines into... ...Optimize policies for real‑time (~10hz) inference on edge hardware What you bring Experience... ..., or TVM for inference optimization CUDA kernel optimization Ideally, contributions at...Temporary work
- ...the architectural and engineering backbone of OpenAI’s infrastructure... ...Our work spans system software, networking, platform... ..., porting existing inference and training workloads... .../communication, kernel-level bottlenecks,... ...performance-critical code (C++/CUDA/HIP is a plus). Strong...
- ...About the Team We’re hiring a Developer Productivity engineer to support OpenAI’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’re hiring...
- ...don’t believe culture can be engineered – but when it falls into place... ...Overview We're looking for a software engineer to optimize and deploy... ...embedded GPUs—using TensorRT, custom CUDA kernels, and low-level systems engineering. Beyond inference, you'll profile and optimize...Local area
- ...experienced individual to develop low-latency inference pipelines for on-device deployment in... ...efficient low-level code such as CUDA and Triton, and managing workloads to ensure... ...strong Python background, and mastery in kernel optimization. This position is essential...
- ...Mercor is seeking a CUDA Engineering Expert to analyze and optimize GPU kernels for performance in a remote role. The ideal candidate should be fluent in core C++ features through C++17, with working knowledge of Python and Git, and experience in GPU programming models...Remote work
- ...leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will... ...computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of GPU architecture...
$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly...Work at officeVisa sponsorshipFlexible hours- .... The Role: As a Research Engineer - AI Performance & Kernel Optimization , you will improve... ...model training and inference stacks. You will work closely... ...from PTX/assembly to CUDA, HIP, Triton, or other GPU... ...to reason about hardware-software interactions Are excited...Work at officeRelocation package
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer, Inference - CUDA / Kernels. Be the first to apply!
- software sales engineer San Francisco, CA
- software engineer internship remote San Francisco, CA
- IT software developer San Francisco, CA
- new grad software engineer San Francisco, CA
- software engineer staff San Francisco, CA
- integration software engineer San Francisco, CA
- machine learning software engineer San Francisco, CA
- software engineer part time San Francisco, CA
- facebook software engineer San Francisco, CA
- senior robotics software engineer San Francisco, CA

