Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer, Model Inference

OpenAI

About the Team

Our Inference team brings OpenAI's most capable research and technology to the world through our products. We empower consumers, enterprise and developers alike to use and access our start-of-the-art AI models, allowing them to do things that they've never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference.


About the Role

We are looking for an engineer who wants to take the world's largest and most capable AI models and optimize them for use in a high-volume, low-latency, and high-availability production and research environment.

In this role, you will:
  • Work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production.
  • Work alongside researchers to enable advanced research through awesome engineering.
  • Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our model inference stack.
  • Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues.
  • Optimize our code and fleet of Azure VMs to utilize every FLOP and every GB of GPU RAM of our hardware.
You might thrive in this role if you:
  • Have an understanding of modern ML architectures and an intuition for how to optimize their performance, particularly for inference.
  • Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.
  • Have at least 5 years of professional software engineering experience.
  • Have or can quickly gain familiarity with PyTorch, NVidia GPUs and the software stacks that optimize them (e.g. NCCL, CUDA), as well as HPC technologies such as InfiniBand, MPI, NVLink, etc.
  • Have experience architecting, building, observing, and debugging production distributed systems. Bonus point if worked on performance-critical distributed systems.
  • Have needed to rebuild or substantially refactor production systems several times over due to rapidly increasing scale.
  • Are self-directed and enjoy figuring out the most important problem to work on.
  • Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.


We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.


For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement.

Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.

To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Software Engineer, Model Inference in San Francisco, CA vacancy
  • $230k - $385k

    About the Team We're hiring software engineers to make OpenAI's Model Performance teams more productive. These teams work on the systems, tooling, and...  ...model performance across OpenAI's training and inference workloads at frontier scale. About the Role We're... 
    Suggested

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...data, and run AI agents and models directly in their workflows....  ...therapeutics. As a full-stack engineer on the team, you'll focus on...  ...infrastructure for model inference that is fast, reliable, and...  ...~3+ years of software engineering or equivalent research... 
    Suggested
    Work at office
    Local area
    Monday to Friday
    Shift work

    Benchling

    San Francisco, CA
    2 days ago
  • $220k - $320k

     ...Help us make inference blazingly fast. If you love squeezing every...  ...and hosts specialized language models for companies that need frontier...  ...-funded ten-person team of engineers who work in-person in...  ...has founded and run their own software companies. We are high-agency... 
    Suggested
    Work at office

    Inference

    San Francisco, CA
    1 day ago
  • $172.43k - $230.95k

     ...Senior Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the...  ...frameworks. Performance optimizations on GPU systems and inference frameworks. Benefits ~ Competitive... 
    Suggested
    Temporary work

    Crusoe

    San Francisco, CA
    2 days ago
  •  ...Baseten powers mission‑critical inference for the world's most dynamic...  ...of AI to bring cutting‑edge models into production. We're...  ...and help build the platform engineers turn to to ship AI products....  ...intelligence? We are looking for a Software Engineer focused on ML performance... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    9 hours ago
  • ABOUT BASETEN Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence...  ...frontier of AI to bring cutting-edge models into production. With our recent $150M...  ...contributions to open-source inference engines (vLLM, TensorRT-LLM, SGLang, TGI)... 
    Flexible hours

    Baseten

    San Francisco, CA
    2 days ago
  • $166k - $225k

     ...to improve their business. Databricks’ Model Serving product provides enterprises with...  .... It offers real-time, low-latency inference, governance, monitoring, and lineage. As...  ...SLAs and cost efficiency. As a Senior Engineer, you’ll play a critical role in shaping... 
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    2 days ago
  •  ...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies...  ...frontier of AI to bring cutting-edge models into production. We're growing quickly and...  ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE... 
    Flexible hours

    Baseten

    San Francisco, CA
    3 days ago
  • $173.11k - $234.39k

     ...Location Type Hybrid Department Engineering Compensation $173,113 - $234...  ...data, and run AI agents and models directly in their workflows....  ...our architecture for fast inference. It’s early days for scientific...  ...QUALIFICATIONS 3+ years of software engineering or equivalent research... 
    Full time
    Work at office
    Local area
    Flexible hours
    Shift work
    3 days per week

    Menlo Ventures

    San Francisco, CA
    9 hours ago
  • A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates... 

    Jobleads-US

    San Francisco, CA
    3 days ago
  • $187.5k - $395k

     ...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step... 

    Luma AI

    San Francisco, CA
    2 days ago
  • $142.2k - $204.6k

     ...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks' Foundation Model API. You'll work at the intersection of research and production, ensuring our large language... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    4 days ago
  •  ...We are seeking a highly technical Inference Engine Engineer to optimize the performance and...  ...Analyze performance bottlenecks across the software and hardware stack, and implement targeted...  ...optimizations Drive support for new model architectures and tensor compute... 
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    2 days ago
  • A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT.... 

    Baseten

    San Francisco, CA
    9 hours ago
  • Anysphere is looking for an experienced leader for the Model Routing & Inference team in San Francisco. This role involves owning the inference...  ...has a strong background in high-throughput systems and software engineering fundamentals, combined with leadership skills to mentor... 

    Anysphere

    San Francisco, CA
    3 days ago
  •  ...combination of inventive research, design, and engineering. Our organization is very flat, and...  .... About the Role You will lead the Model Routing & Inference team at Cursor, owning the inference...  ...information. You have strong software engineering fundamentals and enjoy shipping... 

    Anysphere

    San Francisco, CA
    3 days ago
  •  ...powered products are transforming the practice of medicine—and the inference systems that power them need to be fast, reliable, and world-class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The Inference team owns the end-to-end... 
    Hourly pay
    Full time
    Flexible hours

    AI Chopping Block, Inc.

    San Francisco, CA
    9 hours ago
  •  ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems...  ...because modern generative and vision models require infrastructure beyond...  ...including GPU orchestration, large-scale inference systems, performance optimization, and... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    9 hours ago
  • $170k - $216k

     ...evaluate the Waymo Driver's software stack at a massive scale. We...  ...range of customers Software Engineers, Product, Data Science, System...  ...will: Build and evolve ML inference infrastructure for simulations...  ..., and user experience of ML model deployment and serving.... 
    Full time
    Remote work

    Waymo

    San Francisco, CA
    9 hours ago
  •  ...schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools...  .... We are a small, fast-growing team of engineers in San Francisco powering Fortune 100...  ...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own... 
    Work at office
    Visa sponsorship
    Relocation package

    Pulse

    San Francisco, CA
    3 days ago
  • About the Team Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper...  ...analysis, and optimization. Enjoy collaborating with engineering and research teams to improve real production... 

    AI Chopping Block, Inc.

    San Francisco, CA
    9 hours ago
  • Jaide Health is seeking an engineer specializing in audio machine learning systems in San...  ...Francisco. The role involves enhancing audio model serving metrics such as latency and...  ...should have significant experience in audio inference systems and be proficient in C++ and... 
    Remote job

    Jaide Health

    San Francisco, CA
    4 days ago
  • $405k

     ...growing group of committed researchers, engineers, policy experts, and business leaders working...  ...THE ROLE We're looking for a Staff Software Engineer to set technical direction at...  ...Architect eval frameworks that measure model capabilities across diverse coding tasks... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    1 day ago
  • Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while...  ...strong skills in C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and inclusive work culture... 
    Remote job

    Jaide Health

    San Francisco, CA
    3 days ago
  •  ...practicing MDs, AI scientists, PhDs, creatives, technologists, and engineers working together to empower people and make care make more...  ...in Pittsburgh. The Role As an ML Infrastructure Engineer, Model Inference at Abridge, you’ll play a pivotal role in building and... 
    Hourly pay
    Full time
    Flexible hours

    Abridge

    San Francisco, CA
    3 days ago
  • A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional... 

    Abridge

    San Francisco, CA
    3 days ago
  •  ...to access state-of-the-art AI models - unlocking new capabilities...  ...focus on high-performance model inference and accelerating research...  ...systems. In this role, you’ll lead engineering efforts to ensure our largest...  ...issues across hardware and software layers. Have strong... 
    Full time

    OpenAI

    San Francisco, CA
    6 hours ago
  •  ...About the Team We’re hiring a Developer Productivity engineer to support OpenAI’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’re... 
    Full time

    OpenAI

    San Francisco, CA
    9 hours ago
  •  ...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI...  ...the frontier of AI to bring cutting-edge models into production. We're growing quickly and...  ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE... 
    Full time
    Flexible hours

    Baseten

    San Francisco, CA
    9 hours ago
  •  ...About the Team OpenAI’s Inference team powers the deployment of our most advanced models - including our GPT models, 4o Image...  ...re a small, fast-moving team of engineers focused on delivering a world-class...  ...Role We’re looking for a software engineer to help us serve... 
    Full time

    OpenAI

    San Francisco, CA
    6 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Model Inference. Be the first to apply!