Software Engineer, Model Inference

OpenAI

About the Team

Our Inference team brings OpenAI's most capable research and technology to the world through our products. We empower consumers, enterprise and developers alike to use and access our start-of-the-art AI models, allowing them to do things that they've never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference.

About the Role

We are looking for an engineer who wants to take the world's largest and most capable AI models and optimize them for use in a high-volume, low-latency, and high-availability production and research environment.

In this role, you will:

Work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production.
Work alongside researchers to enable advanced research through awesome engineering.
Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our model inference stack.
Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues.
Optimize our code and fleet of Azure VMs to utilize every FLOP and every GB of GPU RAM of our hardware.

You might thrive in this role if you:

Have an understanding of modern ML architectures and an intuition for how to optimize their performance, particularly for inference.
Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.
Have at least 5 years of professional software engineering experience.
Have or can quickly gain familiarity with PyTorch, NVidia GPUs and the software stacks that optimize them (e.g. NCCL, CUDA), as well as HPC technologies such as InfiniBand, MPI, NVLink, etc.
Have experience architecting, building, observing, and debugging production distributed systems. Bonus point if worked on performance-critical distributed systems.
Have needed to rebuild or substantially refactor production systems several times over due to rapidly increasing scale.
Are self-directed and enjoy figuring out the most important problem to work on.
Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.

For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement.

Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.

To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Software Engineer, Model Inference in San Francisco, CA vacancy

Software Engineer, Productivity - Model Performance
$230k - $385k
About the Team We're hiring software engineers to make OpenAI's Model Performance teams more productive. These teams work on the systems, tooling, and... ...model performance across OpenAI's training and inference workloads at frontier scale. About the Role We're...
Suggested
OpenAI
San Francisco, CA
2 days ago
Software Engineer, Model Hub
...data, and run AI agents and models directly in their workflows.... ...therapeutics. As a full-stack engineer on the team, you'll focus on... ...infrastructure for model inference that is fast, reliable, and... ...~3+ years of software engineering or equivalent research...
Suggested
Work at office
Local area
Monday to Friday
Shift work
Benchling
San Francisco, CA
2 days ago
Senior Software Engineer - Model Performance
$220k - $320k
...Help us make inference blazingly fast. If you love squeezing every... ...and hosts specialized language models for companies that need frontier... ...-funded ten-person team of engineers who work in-person in... ...has founded and run their own software companies. We are high-agency...
Suggested
Work at office
Inference
San Francisco, CA
1 day ago
Senior Software Engineer, AI Model Lifecycle
$172.43k - $230.95k
...Senior Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the... ...frameworks. Performance optimizations on GPU systems and inference frameworks. Benefits ~ Competitive...
Suggested
Temporary work
Crusoe
San Francisco, CA
2 days ago
Software Engineer - Model Performance
...Baseten powers mission‑critical inference for the world's most dynamic... ...of AI to bring cutting‑edge models into production. We're... ...and help build the platform engineers turn to to ship AI products.... ...intelligence? We are looking for a Software Engineer focused on ML performance...
Suggested
Flexible hours
Baseten
San Francisco, CA
9 hours ago
Software Engineer - Model APIs
ABOUT BASETEN Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence... ...frontier of AI to bring cutting-edge models into production. With our recent $150M... ...contributions to open-source inference engines (vLLM, TensorRT-LLM, SGLang, TGI)...
Flexible hours
Baseten
San Francisco, CA
2 days ago
Senior Software Engineer, Model Serving
$166k - $225k
...to improve their business. Databricks’ Model Serving product provides enterprises with... .... It offers real-time, low-latency inference, governance, monitoring, and lineage. As... ...SLAs and cost efficiency. As a Senior Engineer, you’ll play a critical role in shaping...
Local area
Worldwide
Cacheflow
San Francisco, CA
2 days ago
Software Engineer - Model Developer Ecosystem
...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies... ...frontier of AI to bring cutting-edge models into production. We're growing quickly and... ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE...
Flexible hours
Baseten
San Francisco, CA
3 days ago
Software Engineer (Model Hub)
$173.11k - $234.39k
...Location Type Hybrid Department Engineering Compensation $173,113 - $234... ...data, and run AI agents and models directly in their workflows.... ...our architecture for fast inference. It’s early days for scientific... ...QUALIFICATIONS 3+ years of software engineering or equivalent research...
Full time
Work at office
Local area
Flexible hours
Shift work
3 days per week
Menlo Ventures
San Francisco, CA
9 hours ago
Senior AI Model Serving Engineer — Low-Latency Inference
A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates...
Jobleads-US
San Francisco, CA
3 days ago
Software Engineer, Inference
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step...
Luma AI
San Francisco, CA
2 days ago
Software Engineer - GenAI inference
$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks' Foundation Model API. You'll work at the intersection of research and production, ensuring our large language...
Local area
Worldwide
Databricks
San Francisco, CA
4 days ago
Software Engineer - AI Inference Engine
...We are seeking a highly technical Inference Engine Engineer to optimize the performance and... ...Analyze performance bottlenecks across the software and hardware stack, and implement targeted... ...optimizations Drive support for new model architectures and tensor compute...
Worldwide
Flexible hours
FriendliAI Corp
San Francisco, CA
2 days ago
LLM Inference & Model-Performance Engineer
A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....
Baseten
San Francisco, CA
9 hours ago
AI Inference & Model Routing Lead
Anysphere is looking for an experienced leader for the Model Routing & Inference team in San Francisco. This role involves owning the inference... ...has a strong background in high-throughput systems and software engineering fundamentals, combined with leadership skills to mentor...
Anysphere
San Francisco, CA
3 days ago
Engineering Manager, Model Routing & Inference Engineering · · San Francisco Apply →
...combination of inventive research, design, and engineering. Our organization is very flat, and... .... About the Role You will lead the Model Routing & Inference team at Cursor, owning the inference... ...information. You have strong software engineering fundamentals and enjoy shipping...
Anysphere
San Francisco, CA
3 days ago
Engineering Manager, Model Inference
...powered products are transforming the practice of medicine—and the inference systems that power them need to be fast, reliable, and world-class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The Inference team owns the end-to-end...
Hourly pay
Full time
Flexible hours
AI Chopping Block, Inc.
San Francisco, CA
9 hours ago
Software Engineer Intern (AI Infrastructure / Training / Inference)
...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems... ...because modern generative and vision models require infrastructure beyond... ...including GPU orchestration, large-scale inference systems, performance optimization, and...
Internship
Immediate start
SpreeAI
San Francisco, CA
9 hours ago
Software Engineer, ML Inference, Simulation Infrastructure
$170k - $216k
...evaluate the Waymo Driver's software stack at a massive scale. We... ...range of customers Software Engineers, Product, Data Science, System... ...will: Build and evolve ML inference infrastructure for simulations... ..., and user experience of ML model deployment and serving....
Full time
Remote work
Waymo
San Francisco, CA
9 hours ago
Software Engineer, Inference
...schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools... .... We are a small, fast-growing team of engineers in San Francisco powering Fortune 100... ...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own...
Work at office
Visa sponsorship
Relocation package
Pulse
San Francisco, CA
3 days ago
Software Engineer, Inference - Performance Optimization
About the Team Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper... ...analysis, and optimization. Enjoy collaborating with engineering and research teams to improve real production...
AI Chopping Block, Inc.
San Francisco, CA
9 hours ago
Remote Audio Inference Engineer, Model Efficiency
Jaide Health is seeking an engineer specializing in audio machine learning systems in San... ...Francisco. The role involves enhancing audio model serving metrics such as latency and... ...should have significant experience in audio inference systems and be proficient in C++ and...
Remote job
Jaide Health
San Francisco, CA
4 days ago
Model Performance Software Engineer, Claude Code
$405k
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...THE ROLE We're looking for a Staff Software Engineer to set technical direction at... ...Architect eval frameworks that measure model capabilities across diverse coding tasks...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
1 day ago
Staff ML Inference Engineer — Model Efficiency (Remote)
Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while... ...strong skills in C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and inclusive work culture...
Remote job
Jaide Health
San Francisco, CA
3 days ago
Machine Learning Infrastructure Engineer- Model Inference
...practicing MDs, AI scientists, PhDs, creatives, technologists, and engineers working together to empower people and make care make more... ...in Pittsburgh. The Role As an ML Infrastructure Engineer, Model Inference at Abridge, you’ll play a pivotal role in building and...
Hourly pay
Full time
Flexible hours
Abridge
San Francisco, CA
3 days ago
ML Infrastructure Engineer - Model Inference & Scale
A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional...
Abridge
San Francisco, CA
3 days ago
Software Engineer, Inference - TL
...to access state-of-the-art AI models - unlocking new capabilities... ...focus on high-performance model inference and accelerating research... ...systems. In this role, you’ll lead engineering efforts to ensure our largest... ...issues across hardware and software layers. Have strong...
Full time
OpenAI
San Francisco, CA
6 hours ago
Software Engineer, Productivity - Inference Runtime
...About the Team We’re hiring a Developer Productivity engineer to support OpenAI’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’re...
Full time
OpenAI
San Francisco, CA
9 hours ago
Software Engineer - Voice AI (Inference Runtime)
...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI... ...the frontier of AI to bring cutting-edge models into production. We're growing quickly and... ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE...
Full time
Flexible hours
Baseten
San Francisco, CA
9 hours ago
Software Engineer, Inference - Multi Modal
...About the Team OpenAI’s Inference team powers the deployment of our most advanced models - including our GPT models, 4o Image... ...re a small, fast-moving team of engineers focused on delivering a world-class... ...Role We’re looking for a software engineer to help us serve...
Full time
OpenAI
San Francisco, CA
6 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Model Inference. Be the first to apply!