Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer, Model Inference

$325k

Centaur Labs

About the Team Our Inference team brings OpenAI's most capable research and technology to the world through our products. We empower consumers, enterprise and developers alike to use and access our start-of-the-art AI models, allowing them to do things that they've never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference. About the Role We are looking for an engineer who wants to take the world's largest and most capable AI models and optimize them for use in a high-volume, low-latency, and high-availability production and research environment. In this role, you will Work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production. Work alongside researchers to enable advanced research through awesome engineering. Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our model inference stack. Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues. Optimize our code and fleet of Azure VMs to utilize every FLOP and every GB of GPU RAM of our hardware. You might thrive in this role if you Have an understanding of modern ML architectures and an intuition for how to optimize their performance, particularly for inference. Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done. Have at least 5 years of professional software engineering experience. Have or can quickly gain familiarity with PyTorch, NVidia GPUs and the software stacks that optimize them (e.g. NCCL, CUDA), as well as HPC technologies such as InfiniBand, MPI, NVLink, etc. Have experience architecting, building, observing, and debugging production distributed systems. Bonus point if worked on performance-critical distributed systems. Have needed to rebuild or substantially refactor production systems several times over due to rapidly increasing scale. Are self-directed and enjoy figuring out the most important problem to work on. Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement. Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link. OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology. Compensation Range: $325K - $490K #J-18808-Ljbffr Centaur Labs

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Software Engineer, Model Inference in San Francisco, CA vacancy
  • Pantera Capital is looking for a Model Performance Engineer in San Francisco, California to optimize model inference speed, cost, and reliability. You will build fine-tuning infrastructure that accelerates the AI team’s processes. The role covers optimizing serving frameworks... 
    Suggested

    Pantera Capital

    San Francisco, CA
    1 day ago
  • $220k - $320k

     ...Help us make inference blazingly fast. If you love squeezing every...  ...and hosts specialized language models for companies that need frontier...  ...-funded ten-person team of engineers who work in-person in...  ...has founded and run their own software companies. We are high-agency... 
    Suggested
    Work at office

    Inference

    San Francisco, CA
    22 hours ago
  •  ...ABOUT BASETEN Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence...  ...frontier of AI to bring cutting-edge models into production. With our recent $150M...  ...contributions to open-source inference engines (vLLM, TensorRT-LLM, SGLang, TGI)... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    22 hours ago
  • $300 per month

     ...Location Type On-site Department Cloud Engineering Crusoe's mission is to accelerate...  .... About this role The Senior Software Engineer for the Model LifeCycle team will contribute to building...  ...components (training, inference). Preferred Qualifications Proficiency... 
    Suggested
    Full time
    Temporary work

    Epoch Biodesign

    San Francisco, CA
    22 hours ago
  •  ...Baseten powers mission‑critical inference for the world's most dynamic...  ...of AI to bring cutting‑edge models into production. We're...  ...and help build the platform engineers turn to to ship AI products....  ...intelligence? We are looking for a Software Engineer focused on ML performance... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    6 days ago
  • $166k - $225k

     ...to improve their business. Databricks’ Model Serving product provides enterprises with...  .... It offers real-time, low-latency inference, governance, monitoring, and lineage. As...  ...SLAs and cost efficiency. As a Senior Engineer, you’ll play a critical role in shaping... 
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    3 days ago
  • $230k - $385k

     ...results, or market conditions. About the Team We’re hiring software engineers to make OpenAI’s Model Performance teams more productive. These teams work on...  ...model performance across OpenAI’s training and inference workloads at frontier scale. About the Role We’re looking... 
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    2 days ago
  •  ...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT.... 

    Baseten

    San Francisco, CA
    1 day ago
  • A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates... 

    Menlo Ventures

    San Francisco, CA
    3 days ago
  • Anysphere is looking for an experienced leader for the Model Routing & Inference team in San Francisco. This role involves owning the inference...  ...has a strong background in high-throughput systems and software engineering fundamentals, combined with leadership skills to mentor... 

    Anysphere

    San Francisco, CA
    4 days ago
  •  ...combination of inventive research, design, and engineering. Our organization is very flat, and...  .... About the Role You will lead the Model Routing & Inference team at Cursor, owning the inference...  ...information. You have strong software engineering fundamentals and enjoy shipping... 

    Anysphere

    San Francisco, CA
    4 days ago
  •  ...powered products are transforming the practice of medicine—and the inference systems that power them need to be fast, reliable, and world-class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The Inference team owns the end-to-end... 
    Hourly pay
    Full time
    Flexible hours

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  • A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have... 
    Remote work

    Cohere

    San Francisco, CA
    1 day ago
  • $220k - $320k

    ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models... 
    3 days per week

    Trades Workforce Solutions

    San Francisco, CA
    4 days ago
  • $230k - $385k

    About the Team We're hiring a Developer Productivity engineer to support OpenAI's Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We're hiring... 

    Slope

    San Francisco, CA
    12 hours ago
  • $325k

    About the Team Our Inference team brings OpenAI's most capable research and technology to...  ...use and access our state-of-the-art AI models, allowing them to do things that they've...  ...inference. About the Role We're hiring engineers to scale and optimize OpenAI's inference... 

    Centaur Labs

    San Francisco, CA
    22 hours ago
  •  ...schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools...  .... We are a small, fast-growing team of engineers in San Francisco powering Fortune 100...  ...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own... 
    Work at office
    Visa sponsorship
    Relocation package

    Trypulse

    San Francisco, CA
    4 days ago
  • Software Engineer (AI Infrastructure / Training / Inference) About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that...  ...role exists because modern generative and vision models require infrastructure beyond traditional backend... 

    SpreeAI

    San Francisco, CA
    6 days ago
  • Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while...  ...strong skills in C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and inclusive work culture... 
    Remote job

    Jaide Health

    San Francisco, CA
    4 days ago
  • A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional... 

    Abridge

    San Francisco, CA
    4 days ago
  • Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently, and predictably as it moves from research into production. This position... 

    SpreeAI

    San Francisco, CA
    1 day ago
  • $320k

     ...growing group of committed researchers, engineers, policy experts, and business leaders working...  .... About The Role We’re looking for a Software Engineer to work at the intersection of...  ...build evaluation systems that measure model capabilities across diverse coding tasks... 
    Work experience placement
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    2 days ago
  • $300 per month

     ...and intelligence. We’re crafting the engine that powers a world where people can...  ...role About this role: The Staff Software Engineer for the Model LifeCycle team will play a key role...  ...infrastructure, including training, inference. Preferred Qualifications: Proficiency... 
    Temporary work

    Crusoe Energy Systems LLC

    San Francisco, CA
    4 days ago
  • $192k - $260k

     ...insights to improve their business. Foundation Model Serving is the API Product for hosting and serving frontier AI model inference for open source models like Llama, Qwen,...  ...experience is necessary. We’re looking for engineers who have owned high‑scale operational... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    1 day ago
  • $192k - $260k

     ...to improve their business. Databricks’ Model Serving product provides enterprises with...  .... It offers real-time, low-latency inference, governance, monitoring, and lineage. As...  ...strong SLAs and cost efficiency. As a Staff Engineer, you’ll play a critical role in shaping... 
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    3 days ago
  •  ...every day. ROLE OVERVIEW We're hiring a Model Performance Engineer to own the speed, cost, and reliability of our model inference stack, and to build the fine‑tuning infrastructure...  ...The opportunity to shape the foundational software services of a growing company. A role... 
    Full time
    Remote work

    Pantera Capital

    San Francisco, CA
    12 hours ago
  •  ...Baseten powers mission‑critical inference for the world's most dynamic...  ...of AI to bring cutting‑edge models into production. We're...  ...and help build the platform engineers turn to to ship AI products....  ...enjoy working across product, software development, performance engineering... 
    Work experience placement
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  • $192k - $260k

    A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams... 

    Databricks Inc.

    San Francisco, CA
    1 day ago
  • $325k

    A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate has over 5 years of software engineering experience, strong familiarity with ML architectures, and experience... 

    OpenAI

    San Francisco, CA
    1 day ago
  • A technology startup in San Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience... 

    Baseten

    San Francisco, CA
    6 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Model Inference. Be the first to apply!