Software Engineer, Inference - TL

Full-time

OpenAI

About the Team

Our team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprises, and developers alike to access state-of-the-art AI models - unlocking new capabilities across productivity, creativity, and more. We focus on high-performance model inference and accelerating research through efficient and reliable infrastructure.

About the Role

We’re looking for a hands-on Tech Lead to drive the design, optimization, and scaling of our inference systems. In this role, you’ll lead engineering efforts to ensure our largest models run with exceptional efficiency in high-throughput, low-latency environments. You’ll be responsible for shaping our CUDA strategy, driving performance at the kernel level, and collaborating across teams to deliver end-to-end production readiness.

In this role, you will:

Lead the design and implementation of core inference infrastructure for serving frontier AI models in production.
Own and optimize CUDA-based systems and kernels to maximize performance across our fleet.
Partner with researchers to integrate novel model architectures into performant, scalable inference pipelines.
Build tooling and observability to detect bottlenecks, guide system tuning, and ensure stable deployment at scale.
Collaborate cross-functionally to align technical direction across research, infra, and product teams.
Mentor engineers on GPU performance, CUDA development, and distributed inference best practices.

You may thrive in this role if you:

Have deep expertise in CUDA, including writing and optimizing high-performance kernels for inference or training workloads.
Have experience leading complex engineering efforts, particularly at the systems and performance layer of large-scale ML infrastructure.
Understand the full inference stack - from model loading and memory management to communication libraries and deployment orchestration.
Are comfortable working in large, distributed GPU environments and debugging performance issues across hardware and software layers.
Have strong familiarity with PyTorch and NVIDIA’s GPU software stack (NCCL, NVLink, MIG, etc.).
Take a systems-level view, but aren’t afraid to dive into low-level code when performance is on the line.

Bonus:

Experience with inference frameworks like TensorRT, vLLM, SGLang, or custom model parallelism infrastructure.
Familiarity with TPU, AMD GPUs, ROCm, HIP, TensorRT-LLM, Ray Serve, Megatron, MPI, or Horovod.
Familiarity with profiling tools (Nsight, nvprof, or custom observability stacks).
Background in HPC or large-scale distributed systems engineering.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link .

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Apply

Vacancy posted 4 hours ago

Similar jobs that could be interesting for youBased on the Software Engineer, Inference - TL in San Francisco, CA vacancy

Software Engineer, Inference GPU Enablement
...About the Team OpenAI’s Inference team ensures that our most advanced models run efficiently, reliably, and at scale. We build and... ...hardware architectures like AMD. About the Role We’re hiring engineers to scale and optimize OpenAI’s inference infrastructure across...
Suggested
Full time
OpenAI
San Francisco, CA
4 hours ago
Software Engineer, Productivity - Inference Runtime
...About the Team We’re hiring a Developer Productivity engineer to support OpenAI’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’re...
Suggested
Full time
OpenAI
San Francisco, CA
4 hours ago
Software Engineer, Inference
$300k
...growing group of committed researchers, engineers, policy experts, and business leaders working... .... About the role Our Inference team is responsible for building and maintaining... ...fit if you: Have significant software engineering experience, particularly...
Suggested
Full time
Work at office
Worldwide
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
4 hours ago
Software Engineer, Inference - Multi Modal
...About the Team OpenAI’s Inference team powers the deployment of our most advanced models... ...world. We're a small, fast-moving team of engineers focused on delivering a world-class... ...About the Role We’re looking for a software engineer to help us serve OpenAI’s multimodal...
Suggested
Full time
OpenAI
San Francisco, CA
4 hours ago
Software Engineer, Inference
...consistently fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100 enterprises, YC startups... ...About the Role Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own profiling, batching, and autoscaling...
Suggested
Full time
Work at office
Visa sponsorship
Relocation package
Pulse
San Francisco, CA
4 hours ago
Software Engineer- BIS (Baseten Inference Stack)
...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI... ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE... ..., reliability, and ease of use. As a Software Engineer on the Inference Stack team,...
Full time
Flexible hours
Baseten
San Francisco, CA
4 hours ago
Software Engineer, Inference - CUDA / Kernels
...serve OpenAI’s frontier models at massive scale. As part of the inference team, you’ll be responsible for unlocking every last FLOP from... ...stack. About the Role We are looking for a kernel-focused engineer to lead efforts in writing, porting, and optimizing GPU...
Full time
OpenAI
San Francisco, CA
4 hours ago
Software Engineer, Inference - Performance Optimization
...About the Team Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks... ..., analysis, and optimization. Enjoy collaborating with engineering and research teams to improve real production systems. About...
Full time
OpenAI
San Francisco, CA
4 hours ago
Software Engineer, Inference Deployment
$320k
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...the Role Our mandate is to make inference deployment boring and unattended.... ...deployment continuous and unattended. As a Software Engineer on the Launch Engineering team,...
Full time
Work at office
Visa sponsorship
Flexible hours
Shift work
Anthropic
San Francisco, CA
4 hours ago
Software Engineer - Voice AI (Inference Runtime)
...ABOUT BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence... ...Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE: Voice is becoming...
Full time
Flexible hours
Baseten
San Francisco, CA
4 hours ago
Software Engineer, ML Inference, Simulation Infrastructure
$170k - $216k
...products that evaluate the Waymo Driver's software stack at a massive scale. We solve... ...for a broad range of customers Software Engineers, Product, Data Science, System Engineering... ...You will: Build and evolve ML inference infrastructure for simulations. Be responsible...
Full time
Remote work
Waymo
San Francisco, CA
4 hours ago
Software Engineer, Model Inference
...About the Team Our Inference team brings OpenAI’s most capable research and technology to... ...About the Role We are looking for an engineer who wants to take the world's largest and... ...Have at least 5 years of professional software engineering experience. Have or can quickly...
Full time
OpenAI
San Francisco, CA
4 hours ago
Staff + Senior Software Engineer, Cloud Inference
$300k
...growing group of committed researchers, engineers, policy experts, and business leaders working... .... About the Role The Cloud Inference team scales and optimizes Claude to serve... ...Fit If You: Have significant software engineering experience, with a strong background...
Full time
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
4 hours ago
Full-Stack Software Engineer
$150k - $180k
...Capital , and JFF Ventures , and are now hiring a Full Stack Engineer to help build the product that institutions use to interact... ...application layer up to our data stack (Postgres + DuckDB) and model inference, and keep query and inference latency low enough that the...
Full time
Work at office
Immediate start
Straia
San Francisco, CA
4 hours ago
Full Stack Software Engineer
$125k - $160k
...Role Overview We are seeking a versatile Full Stack Software Engineer to join our engineering team. Reporting to the Software Engineering... ...-Augmented Generation) architectures, or local model inference (Ollama). Experience in automated testing at multiple levels...
Full time
Local area
Visa sponsorship
Work visa
Shift work
Cala Health
San Francisco, CA
4 hours ago
Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)
$229.9k - $262.4k
...Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform) Overview: At Capital One, we are creating responsible and reliable... ...One. Design, develop, test, deploy, and support AI software components including foundation model training, large language...
Full time
Part time
Local area
Capital One
San Francisco, CA
2 days ago
Software Engineer, Ads Monetization, Revenue Platform
...video. Our team also manages large-scale inference and platform infrastructure that... ...over unchecked growth. Within Applied Engineering, the Ads Monetization team in Financial... ...Possess a minimum of 5 years of professional software engineering experience. Bring...
Full time
OpenAI
San Francisco, CA
4 hours ago
Software Engineer - Enterprise Platform
...ABOUT BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence... ...Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE As a Senior Enterprise...
Full time
Flexible hours
Baseten
San Francisco, CA
4 hours ago
Staff Software Engineer, Inference Infrastructure
...s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each... ...), especially how they influence latency and throughput of inference. ~ Strong understanding or working experience with distributed...
Full time
Work experience placement
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
4 hours ago
Senior Software Engineer, ML Platform
$230k - $265k
...tools they need. About The Position We’re looking for a software engineer to join Parafin’s Infrastructure team and lead the evolution... ...systems for model experimentation, training, evaluation, inference, and retraining that power underwriting and other ML-driven...
Full time
Work from home
Flexible hours
Parafin
San Francisco, CA
4 hours ago
Software Engineer - Science Platform (BE) - San Francisco
...spend worldwide. Using frontier causal inference-based econometric models to run experiments... ...product managers, economists, and engineers from Google, Netflix, Meta, and Amazon,... ...experience building and shipping production software systems ~ Must have strong Python proficiency...
Full time
Work at office
Work from home
Worldwide
Flexible hours
Haus Analytics
San Francisco, CA
4 hours ago
Senior AI Inference Engineer - GPU, Rust & CUDA
$220k
...Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels... ...candidate has 3+ years of experience in software engineering with a focus on ML inference...
Perplexity
San Francisco, CA
4 days ago
Software Engineer
...Lux, Verified Capital, and Committed Capital. The Role: Software Engineer As a software engineer you’ll play a key role in... ...OpenSearch (Elasticsearch), AWS SQS, Spark, and a number of inference providers + LLMs, etc. About You You've built something...
Full time
Work experience placement
The Juice Box, Llc
San Francisco, CA
4 hours ago
Software Engineer, Productivity
...About the Team We’re hiring software engineers to make the Workload team more productive. The Workload team maintains the core components of OpenAI’s training and inference frameworks and helps execute frontier experiments. About the Role We’re looking for someone...
Full time
OpenAI
San Francisco, CA
4 hours ago
Software Engineer, Accelerators
...About the Team The Kernels team at OpenAI builds the low-level software that accelerates our most ambitious AI research. We work at... ..., and runtime improvements to make large-scale training and inference more efficient. Our work enables OpenAI to push the limits...
Full time
OpenAI
San Francisco, CA
4 hours ago
Software Engineer
$200k
...backed by Conviction, IRIS (the largest tax software provider in the UK), Skylark, Aprio,... ...internal tooling that powers our engineering team. This is production AI work, not research... ...components including production-level inference and prompt engineering Make key...
Full time
Shift work
Instead
San Francisco, CA
4 hours ago
Software Engineer, Supercomputing
$350k
...needs and goals. We are scientists, engineers, and builders who’ve created some of the... ...environment that powers large‑scale training and inference. You will deliver high‑performant,... ..., and capacity planning. Write software that abstracts cluster management and presents...
Full time
Immediate start
Visa sponsorship
Work visa
Relocation package
Thinking Machines Lab
San Francisco, CA
4 hours ago
Software Engineer, Workload Enablement
...responsible for the architectural and engineering backbone of OpenAI’s infrastructure. We... ...-edge AI models. Our work spans system software, networking, platform architecture, fleet... ...stress benchmarks, porting existing inference and training workloads to new, sometimes...
Full time
OpenAI
San Francisco, CA
4 hours ago
Software Engineer, Dev Productivity
...for GPT-4, GPT-3, embeddings, and fine-tuning. We also operate inference infrastructure at scale. There's a lot more on the immediate... ...features that were never before possible. About the Role The Engineering Acceleration team designs, builds and maintains the...
Full time
Immediate start
Relocation package
OpenAI
San Francisco, CA
4 hours ago
Software Engineer
$200k
...approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About the role As a Software Engineer at Magic, you will work on core systems or product surfaces that directly determine model...
Full time
Relocation
Visa sponsorship
Magic
San Francisco, CA
4 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Inference - TL. Be the first to apply!