Software Engineer, Inference - TL
OpenAI
About the Team
Our team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprises, and developers alike to access state-of-the-art AI models - unlocking new capabilities across productivity, creativity, and more. We focus on high-performance model inference and accelerating research through efficient and reliable infrastructure.
About the Role
We’re looking for a hands-on Tech Lead to drive the design, optimization, and scaling of our inference systems. In this role, you’ll lead engineering efforts to ensure our largest models run with exceptional efficiency in high-throughput, low-latency environments. You’ll be responsible for shaping our CUDA strategy, driving performance at the kernel level, and collaborating across teams to deliver end-to-end production readiness.
In this role, you will:
Lead the design and implementation of core inference infrastructure for serving frontier AI models in production.
Own and optimize CUDA-based systems and kernels to maximize performance across our fleet.
Partner with researchers to integrate novel model architectures into performant, scalable inference pipelines.
Build tooling and observability to detect bottlenecks, guide system tuning, and ensure stable deployment at scale.
Collaborate cross-functionally to align technical direction across research, infra, and product teams.
Mentor engineers on GPU performance, CUDA development, and distributed inference best practices.
You may thrive in this role if you:
Have deep expertise in CUDA, including writing and optimizing high-performance kernels for inference or training workloads.
Have experience leading complex engineering efforts, particularly at the systems and performance layer of large-scale ML infrastructure.
Understand the full inference stack - from model loading and memory management to communication libraries and deployment orchestration.Are comfortable working in large, distributed GPU environments and debugging performance issues across hardware and software layers.
Have strong familiarity with PyTorch and NVIDIA’s GPU software stack (NCCL, NVLink, MIG, etc.).
Take a systems-level view, but aren’t afraid to dive into low-level code when performance is on the line.
Bonus:
Experience with inference frameworks like TensorRT, vLLM, SGLang, or custom model parallelism infrastructure.
Familiarity with TPU, AMD GPUs, ROCm, HIP, TensorRT-LLM, Ray Serve, Megatron, MPI, or Horovod.
Familiarity with profiling tools (Nsight, nvprof, or custom observability stacks).
Background in HPC or large-scale distributed systems engineering.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.
We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.
For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link .
At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
$165k
...intelligence, join us in building what's next. About the Role Inference is now the defining cost and latency bottleneck for... ...deployments. Basic Qualifications 5+ years of professional software engineering experience with a track record of shipping production‑quality...SuggestedLocal area$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks' Foundation Model API. You'll work at the intersection of research and production, ensuring our large language...SuggestedLocal areaWorldwide$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step...Suggested$320k
...growing group of committed researchers, engineers, policy experts, and business leaders working... .... About the Role The Cloud Inference team scales and optimizes Claude to serve... ...is not required Have significant software engineering experience, with a strong background...SuggestedWork at officeVisa sponsorshipFlexible hours- ...tools consistently fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100 enterprises, YC startups... ...plus About the Role Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own profiling, batching, and...SuggestedWork at officeVisa sponsorshipRelocation package
- ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference... ...performance Analyze performance bottlenecks across the software and hardware stack, and implement targeted optimizations...WorldwideFlexible hours
$170k - $216k
...products that evaluate the Waymo Driver's software stack at a massive scale. We solve... ...for a broad range of customers Software Engineers, Product, Data Science, System Engineering... ...You will: Build and evolve ML inference infrastructure for simulations. Be responsible...Full timeRemote work- ...About the Team Our Inference team brings OpenAI's most capable research and technology... ...the Role We are looking for an engineer who wants to take the world's largest and... ...Have at least 5 years of professional software engineering experience. Have or can quickly...
- ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI... ...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms...InternshipImmediate start
- ...About the Team OpenAI’s Inference team powers the deployment of our most advanced models... ...world. We're a small, fast-moving team of engineers focused on delivering a world-class... ...About the Role We’re looking for a software engineer to help us serve OpenAI’s multimodal...Full time
- ...About the Team We’re hiring a Developer Productivity engineer to support OpenAI’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’re...Full time
- About the Team Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and... ..., analysis, and optimization. Enjoy collaborating with engineering and research teams to improve real production systems. About...
$405k
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...About the role We are seeking a Software Engineer to build and operate the safety... ...Safeguards organization and the Cloud Inference team: taking classifiers, detection signals...Full timeWork at officeVisa sponsorshipFlexible hours- ...systems that turn raw compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines that... ...call pager that keeps you honest about both. Researchers and ML engineers will hand you workloads that barely run; you'll hand them back...Flexible hours
$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that brings... ...container orchestration is a strong plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies (InfiniBand...Full timeLocal area- ...ABOUT BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence... ...Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE: Voice is becoming...Full timeFlexible hours
- ...infrastructure company in San Francisco is seeking an experienced engineer for its Inference Platform team. This role involves managing end-to-end... ...orchestration. Candidates should have deep experience in software engineering, particularly with Python or Go, and be...
$167.2k - $209k
A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong...Remote work- ...Tech Lead, Data & Inference Engineer Georgia, Georgia, United States About the Job Tech Lead, Data & Inference Engineer Our client A fast moving and venture backed advertising technology startup based in San Francisco. They have raised twelve million dollars...Full time
- Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming, etc. Distributed compute (with GPUs is a super plus) No degree required Company Luminal (YC S25) builds an AI compiler and serving...
- ...enable enterprises to implement AI workloads effectively. The role involves designing large-scale deployment architectures, solving AI inference challenges, and collaborating closely with customers' DevOps teams. Ideal candidates will have 3+ years in cloud infrastructure or...Flexible hours
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking... ...Capital One. Design, develop, test, deploy, and support AI software components including foundation model training, large...Full timePart timeLocal area- Saviynt, located in San Francisco, is seeking an AI Platform Engineer to manage and optimize the training and inference of AI models. You will lead efforts in operating the Ray ecosystem and distributed training on advanced GPU clusters. The ideal candidate has a solid...
- AI Platform Engineer - Training & Inference Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational...
$380k
...reliable, user-friendly, and aligned with our mission of broad societal benefit. About the Role We're looking for a GPU Inference Engineer to contribute to improvements in model serving efficiency for Sora. This is a high-impact role where you'll drive initiatives...Work at officeRelocation package- ...transformer workloads. Build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime... ...teams working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes....Work at officeRelocation package
- ...BASETEN Baseten powers mission‑critical inference for the world's most dynamic AI companies... .... Join us and help build the platform engineers turn to to ship AI products. THE ROLE As... ...scale and who enjoy working across product, software development, performance engineering,...Work experience placementFlexible hours
$200k - $280k
A leading AI company in San Francisco is looking for a Staff Machine Learning Engineer to enhance inference systems at production scale. You will design algorithms, optimize performance, and collaborate on RL and post-training pipelines. Ideal candidates have 3+ years of...Full time- ...Staff Technical Lead for Inference & ML Performance San Francisco fal is the generative media ecosystem powering the next generation... ...Role Matters You'll shape the future of fal's inference engine and ensure our generative models achieve best-in-class performance...
$230k - $385k
...video. Our team also manages large-scale inference and platform infrastructure that... ...over unchecked growth. Within Applied Engineering, the Ads Monetization team in Financial... ...Possess a minimum of 5 years of professional software engineering experience. Bring...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer, Inference - TL. Be the first to apply!
- graduate software developer San Francisco, CA
- rust software engineer San Francisco, CA
- senior software design engineer San Francisco, CA
- software engineer student San Francisco, CA
- software engineer amazon San Francisco, CA
- software developer positions San Francisco, CA
- software engineer full time San Francisco, CA
- software qa engineer San Francisco, CA
- new graduate software engineer San Francisco, CA
- junior software developer San Francisco, CA


