Software Engineer, Model Inference
OpenAI
About the Team Our Inference team brings OpenAI's most capable research and technology to the world through our products. We empower consumers, enterprise and developers alike to use and access our start-of-the-art AI models, allowing them to do things that they've never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference.
About the Role We are looking for an engineer who wants to take the world's largest and most capable AI models and optimize them for use in a high-volume, low-latency, and high-availability production and research environment. In this role, you will:
We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.
For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement. Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link. OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
About the Role We are looking for an engineer who wants to take the world's largest and most capable AI models and optimize them for use in a high-volume, low-latency, and high-availability production and research environment. In this role, you will:
- Work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production.
- Work alongside researchers to enable advanced research through awesome engineering.
- Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our model inference stack.
- Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues.
- Optimize our code and fleet of Azure VMs to utilize every FLOP and every GB of GPU RAM of our hardware.
- Have an understanding of modern ML architectures and an intuition for how to optimize their performance, particularly for inference.
- Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.
- Have at least 5 years of professional software engineering experience.
- Have or can quickly gain familiarity with PyTorch, NVidia GPUs and the software stacks that optimize them (e.g. NCCL, CUDA), as well as HPC technologies such as InfiniBand, MPI, NVLink, etc.
- Have experience architecting, building, observing, and debugging production distributed systems. Bonus point if worked on performance-critical distributed systems.
- Have needed to rebuild or substantially refactor production systems several times over due to rapidly increasing scale.
- Are self-directed and enjoy figuring out the most important problem to work on.
- Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.
We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.
For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement. Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link. OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Software Engineer, Model Inference in San Francisco, CA vacancy
$230k - $385k
About the Team We're hiring software engineers to make OpenAI's Model Performance teams more productive. These teams work on the systems, tooling, and... ...model performance across OpenAI's training and inference workloads at frontier scale. About the Role We're...Suggested- ...data, and run AI agents and models directly in their workflows.... ...therapeutics. As a full-stack engineer on the team, you'll focus on... ...infrastructure for model inference that is fast, reliable, and... ...~3+ years of software engineering or equivalent research...SuggestedWork at officeLocal areaMonday to FridayShift work
$220k - $320k
...Help us make inference blazingly fast. If you love squeezing every... ...and hosts specialized language models for companies that need frontier... ...-funded ten-person team of engineers who work in-person in... ...has founded and run their own software companies. We are high-agency...SuggestedWork at office$172.43k - $230.95k
...Senior Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the... ...frameworks. Performance optimizations on GPU systems and inference frameworks. Benefits ~ Competitive...SuggestedTemporary work- ...Baseten powers mission‑critical inference for the world's most dynamic... ...of AI to bring cutting‑edge models into production. We're... ...and help build the platform engineers turn to to ship AI products.... ...intelligence? We are looking for a Software Engineer focused on ML performance...SuggestedFlexible hours
- ABOUT BASETEN Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence... ...frontier of AI to bring cutting-edge models into production. With our recent $150M... ...contributions to open-source inference engines (vLLM, TensorRT-LLM, SGLang, TGI)...Flexible hours
$166k - $225k
...to improve their business. Databricks’ Model Serving product provides enterprises with... .... It offers real-time, low-latency inference, governance, monitoring, and lineage. As... ...SLAs and cost efficiency. As a Senior Engineer, you’ll play a critical role in shaping...Local areaWorldwide- ...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies... ...frontier of AI to bring cutting-edge models into production. We're growing quickly and... ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE...Flexible hours
$173.11k - $234.39k
...Location Type Hybrid Department Engineering Compensation $173,113 - $234... ...data, and run AI agents and models directly in their workflows.... ...our architecture for fast inference. It’s early days for scientific... ...QUALIFICATIONS 3+ years of software engineering or equivalent research...Full timeWork at officeLocal areaFlexible hoursShift work3 days per week- A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates...
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step...$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks' Foundation Model API. You'll work at the intersection of research and production, ensuring our large language...Local areaWorldwide- ...We are seeking a highly technical Inference Engine Engineer to optimize the performance and... ...Analyze performance bottlenecks across the software and hardware stack, and implement targeted... ...optimizations Drive support for new model architectures and tensor compute...WorldwideFlexible hours
- A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....
- Anysphere is looking for an experienced leader for the Model Routing & Inference team in San Francisco. This role involves owning the inference... ...has a strong background in high-throughput systems and software engineering fundamentals, combined with leadership skills to mentor...
- ...combination of inventive research, design, and engineering. Our organization is very flat, and... .... About the Role You will lead the Model Routing & Inference team at Cursor, owning the inference... ...information. You have strong software engineering fundamentals and enjoy shipping...
- ...powered products are transforming the practice of medicine—and the inference systems that power them need to be fast, reliable, and world-class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The Inference team owns the end-to-end...Hourly payFull timeFlexible hours
- ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems... ...because modern generative and vision models require infrastructure beyond... ...including GPU orchestration, large-scale inference systems, performance optimization, and...InternshipImmediate start
$170k - $216k
...evaluate the Waymo Driver's software stack at a massive scale. We... ...range of customers Software Engineers, Product, Data Science, System... ...will: Build and evolve ML inference infrastructure for simulations... ..., and user experience of ML model deployment and serving....Full timeRemote work- ...schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools... .... We are a small, fast-growing team of engineers in San Francisco powering Fortune 100... ...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own...Work at officeVisa sponsorshipRelocation package
- About the Team Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper... ...analysis, and optimization. Enjoy collaborating with engineering and research teams to improve real production...
- Jaide Health is seeking an engineer specializing in audio machine learning systems in San... ...Francisco. The role involves enhancing audio model serving metrics such as latency and... ...should have significant experience in audio inference systems and be proficient in C++ and...Remote job
$405k
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...THE ROLE We're looking for a Staff Software Engineer to set technical direction at... ...Architect eval frameworks that measure model capabilities across diverse coding tasks...Work at officeVisa sponsorshipFlexible hours- Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while... ...strong skills in C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and inclusive work culture...Remote job
- ...practicing MDs, AI scientists, PhDs, creatives, technologists, and engineers working together to empower people and make care make more... ...in Pittsburgh. The Role As an ML Infrastructure Engineer, Model Inference at Abridge, you’ll play a pivotal role in building and...Hourly payFull timeFlexible hours
- A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional...
- ...to access state-of-the-art AI models - unlocking new capabilities... ...focus on high-performance model inference and accelerating research... ...systems. In this role, you’ll lead engineering efforts to ensure our largest... ...issues across hardware and software layers. Have strong...Full time
- ...About the Team We’re hiring a Developer Productivity engineer to support OpenAI’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’re...Full time
- ...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI... ...the frontier of AI to bring cutting-edge models into production. We're growing quickly and... ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE...Full timeFlexible hours
- ...About the Team OpenAI’s Inference team powers the deployment of our most advanced models - including our GPT models, 4o Image... ...re a small, fast-moving team of engineers focused on delivering a world-class... ...Role We’re looking for a software engineer to help us serve...Full time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer, Model Inference. Be the first to apply!
Related searches
- graduate software developer San Francisco, CA
- rust software engineer San Francisco, CA
- senior software design engineer San Francisco, CA
- software engineer student San Francisco, CA
- software engineer amazon San Francisco, CA
- software developer positions San Francisco, CA
- software engineer full time San Francisco, CA
- software qa engineer San Francisco, CA
- new graduate software engineer San Francisco, CA
- junior software developer San Francisco, CA


