Software Engineer, Model Inference

$325k

Centaur Labs

About the Team Our Inference team brings OpenAI's most capable research and technology to the world through our products. We empower consumers, enterprise and developers alike to use and access our start-of-the-art AI models, allowing them to do things that they've never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference. About the Role We are looking for an engineer who wants to take the world's largest and most capable AI models and optimize them for use in a high-volume, low-latency, and high-availability production and research environment. In this role, you will Work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production. Work alongside researchers to enable advanced research through awesome engineering. Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our model inference stack. Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues. Optimize our code and fleet of Azure VMs to utilize every FLOP and every GB of GPU RAM of our hardware. You might thrive in this role if you Have an understanding of modern ML architectures and an intuition for how to optimize their performance, particularly for inference. Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done. Have at least 5 years of professional software engineering experience. Have or can quickly gain familiarity with PyTorch, NVidia GPUs and the software stacks that optimize them (e.g. NCCL, CUDA), as well as HPC technologies such as InfiniBand, MPI, NVLink, etc. Have experience architecting, building, observing, and debugging production distributed systems. Bonus point if worked on performance-critical distributed systems. Have needed to rebuild or substantially refactor production systems several times over due to rapidly increasing scale. Are self-directed and enjoy figuring out the most important problem to work on. Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement. Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link. OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology. Compensation Range: $325K - $490K #J-18808-Ljbffr Centaur Labs

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Software Engineer, Model Inference in San Francisco, CA vacancy

AI Engineer — Model Performance & Inference Optimizer
Pantera Capital is looking for a Model Performance Engineer in San Francisco, California to optimize model inference speed, cost, and reliability. You will build fine-tuning infrastructure that accelerates the AI team’s processes. The role covers optimizing serving frameworks...
Suggested
Pantera Capital
San Francisco, CA
1 day ago
Senior Software Engineer - Model Performance
$220k - $320k
...Help us make inference blazingly fast. If you love squeezing every... ...and hosts specialized language models for companies that need frontier... ...-funded ten-person team of engineers who work in-person in... ...has founded and run their own software companies. We are high-agency...
Suggested
Work at office
Inference
San Francisco, CA
22 hours ago
Software Engineer - Model APIs
...ABOUT BASETEN Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence... ...frontier of AI to bring cutting-edge models into production. With our recent $150M... ...contributions to open-source inference engines (vLLM, TensorRT-LLM, SGLang, TGI)...
Suggested
Flexible hours
Baseten
San Francisco, CA
22 hours ago
Senior Software Engineer, AI Model LifeCycle
$300 per month
...Location Type On-site Department Cloud Engineering Crusoe's mission is to accelerate... .... About this role The Senior Software Engineer for the Model LifeCycle team will contribute to building... ...components (training, inference). Preferred Qualifications Proficiency...
Suggested
Full time
Temporary work
Epoch Biodesign
San Francisco, CA
22 hours ago
Software Engineer - Model Performance
...Baseten powers mission‑critical inference for the world's most dynamic... ...of AI to bring cutting‑edge models into production. We're... ...and help build the platform engineers turn to to ship AI products.... ...intelligence? We are looking for a Software Engineer focused on ML performance...
Suggested
Flexible hours
Baseten
San Francisco, CA
6 days ago
Senior Software Engineer, Model Serving
$166k - $225k
...to improve their business. Databricks’ Model Serving product provides enterprises with... .... It offers real-time, low-latency inference, governance, monitoring, and lineage. As... ...SLAs and cost efficiency. As a Senior Engineer, you’ll play a critical role in shaping...
Local area
Worldwide
Cacheflow
San Francisco, CA
3 days ago
Software Engineer, Productivity - Model Performance
$230k - $385k
...results, or market conditions. About the Team We’re hiring software engineers to make OpenAI’s Model Performance teams more productive. These teams work on... ...model performance across OpenAI’s training and inference workloads at frontier scale. About the Role We’re looking...
Full time
Work at office
Local area
Relocation package
Flexible hours
Slope
San Francisco, CA
2 days ago
LLM Inference & Model-Performance Engineer
...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....
Baseten
San Francisco, CA
1 day ago
Senior AI Model Serving Engineer — Low-Latency Inference
A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates...
Menlo Ventures
San Francisco, CA
3 days ago
AI Inference & Model Routing Lead
Anysphere is looking for an experienced leader for the Model Routing & Inference team in San Francisco. This role involves owning the inference... ...has a strong background in high-throughput systems and software engineering fundamentals, combined with leadership skills to mentor...
Anysphere
San Francisco, CA
4 days ago
Engineering Manager, Model Routing & Inference Engineering · · San Francisco Apply →
...combination of inventive research, design, and engineering. Our organization is very flat, and... .... About the Role You will lead the Model Routing & Inference team at Cursor, owning the inference... ...information. You have strong software engineering fundamentals and enjoy shipping...
Anysphere
San Francisco, CA
4 days ago
Engineering Manager, Model Inference
...powered products are transforming the practice of medicine—and the inference systems that power them need to be fast, reliable, and world-class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The Inference team owns the end-to-end...
Hourly pay
Full time
Flexible hours
AI Chopping Block, Inc.
San Francisco, CA
1 day ago
Staff Engineer - ML Inference & Model Efficiency
A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have...
Remote work
Cohere
San Francisco, CA
1 day ago
Real-Time Inference & Model Serving Engineer (Equity)
$220k - $320k
ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models...
3 days per week
Trades Workforce Solutions
San Francisco, CA
4 days ago
Software Engineer, Productivity - Inference Runtime
$230k - $385k
About the Team We're hiring a Developer Productivity engineer to support OpenAI's Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We're hiring...
Slope
San Francisco, CA
12 hours ago
Software Engineer, Inference - AMD GPU Enablement
$325k
About the Team Our Inference team brings OpenAI's most capable research and technology to... ...use and access our state-of-the-art AI models, allowing them to do things that they've... ...inference. About the Role We're hiring engineers to scale and optimize OpenAI's inference...
Centaur Labs
San Francisco, CA
22 hours ago
Software Engineer, Inference
...schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools... .... We are a small, fast-growing team of engineers in San Francisco powering Fortune 100... ...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own...
Work at office
Visa sponsorship
Relocation package
Trypulse
San Francisco, CA
4 days ago
Software Engineer (AI Infrastructure / Training / Inference)
Software Engineer (AI Infrastructure / Training / Inference) About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that... ...role exists because modern generative and vision models require infrastructure beyond traditional backend...
SpreeAI
San Francisco, CA
6 days ago
Staff ML Inference Engineer — Model Efficiency (Remote)
Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while... ...strong skills in C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and inclusive work culture...
Remote job
Jaide Health
San Francisco, CA
4 days ago
ML Infrastructure Engineer - Model Inference & Scale
A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional...
Abridge
San Francisco, CA
4 days ago
Software Engineer (Model Evaluation & Benchmarking)
Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently, and predictably as it moves from research into production. This position...
SpreeAI
San Francisco, CA
1 day ago
Model Performance Software Engineer, Claude Code
$320k
...growing group of committed researchers, engineers, policy experts, and business leaders working... .... About The Role We’re looking for a Software Engineer to work at the intersection of... ...build evaluation systems that measure model capabilities across diverse coding tasks...
Work experience placement
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
2 days ago
Staff Software Engineer, Model LifeCycle
$300 per month
...and intelligence. We’re crafting the engine that powers a world where people can... ...role About this role: The Staff Software Engineer for the Model LifeCycle team will play a key role... ...infrastructure, including training, inference. Preferred Qualifications: Proficiency...
Temporary work
Crusoe Energy Systems LLC
San Francisco, CA
4 days ago
Staff Software Engineer, Foundational Model Serving
$192k - $260k
...insights to improve their business. Foundation Model Serving is the API Product for hosting and serving frontier AI model inference for open source models like Llama, Qwen,... ...experience is necessary. We’re looking for engineers who have owned high‑scale operational...
Local area
Worldwide
Databricks
San Francisco, CA
1 day ago
Staff Software Engineer, Model Serving
$192k - $260k
...to improve their business. Databricks’ Model Serving product provides enterprises with... .... It offers real-time, low-latency inference, governance, monitoring, and lineage. As... ...strong SLAs and cost efficiency. As a Staff Engineer, you’ll play a critical role in shaping...
Local area
Worldwide
Cacheflow
San Francisco, CA
3 days ago
AI Engineer - Model Performance
...every day. ROLE OVERVIEW We're hiring a Model Performance Engineer to own the speed, cost, and reliability of our model inference stack, and to build the fine‑tuning infrastructure... ...The opportunity to shape the foundational software services of a growing company. A role...
Full time
Remote work
Pantera Capital
San Francisco, CA
12 hours ago
Applied AI Inference Engineer
...Baseten powers mission‑critical inference for the world's most dynamic... ...of AI to bring cutting‑edge models into production. We're... ...and help build the platform engineers turn to to ship AI products.... ...enjoy working across product, software development, performance engineering...
Work experience placement
Flexible hours
Baseten
San Francisco, CA
1 day ago
Staff Engineer: Foundation Model API & GPU Inference
$192k - $260k
A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams...
Databricks Inc.
San Francisco, CA
1 day ago
Senior Model Inference Engineer for Production-Scale AI
$325k
A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate has over 5 years of software engineering experience, strong familiarity with ML architectures, and experience...
OpenAI
San Francisco, CA
1 day ago
Model API Engineer: Fast, Scalable AI Inference
A technology startup in San Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience...
Baseten
San Francisco, CA
6 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Model Inference. Be the first to apply!