Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Engineer - ML Inference & Model Efficiency

Cohere

A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over 5 years of coding experience in C++ or Python and a solid understanding of the LLM inference environment. This position offers a remote-friendly work model, a competitive salary, and extensive benefits including a generous vacation policy. #J-18808-Ljbffr Cohere

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Staff Engineer - ML Inference & Model Efficiency in San Francisco, CA vacancy
  • Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while enhancing core performance metrics across...  ...C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and... 
    Suggested
    Remote job

    Jaide Health

    San Francisco, CA
    16 hours ago
  •  ...and deploying frontier models for developers and enterprises...  ...a team of researchers, engineers, designers, and more,...  ...systems can do — but inference is still the bottleneck. The Model Efficiency team is responsible for...  ...locations. As a Staff Research Engineer, you... 
    Suggested
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    16 hours ago
  •  ...research company in San Francisco is seeking a Staff Research Engineer to enhance the efficiency of large language models. In this role, you will develop and implement...  ...and have experience with model architecture and inference optimization. Join a diverse team committed to... 
    Suggested
    Remote work

    Cohere

    San Francisco, CA
    2 days ago
  • Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale intelligence...  ...is a team of researchers, engineers, designers, and more, who are...  ...focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We develop... 
    Suggested
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    2 days ago
  • $295k

     ...About the Team Our Inference team brings OpenAI's...  ...our start-of-the-art AI models, allowing them to do things...  ...on performant and efficient model inference, as well...  ...We are looking for an engineer who wants to take the world...  ...of modern ML architectures and an intuition... 
    Suggested

    OpenAI

    San Francisco, CA
    1 day ago
  •  ...seeks candidates with expertise in AI simulation development. The role emphasizes optimizing training efficiency, enhancing GPU performance, and ensuring low-latency inference. Applicants should be proficient in methodologies for gradient checkpointing, Nsight profiling,... 

    Embedding VC

    San Francisco, CA
    16 hours ago
  •  ...purpose AI systems that run efficiently across deployment...  ...The Opportunity Our Edge Inference team compiles Liquid Foundation Models into optimized machine code...  ...understanding of both ML architectures and hardware...  ...Embedded software engineering experience or work on resource... 

    Liquid AI

    San Francisco, CA
    4 days ago
  • $192k - $260k

    A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams... 

    Databricks Inc.

    San Francisco, CA
    2 days ago
  •  ...ComfyUI. You'll be the person who takes the newest open-source models (image, video, 3D, audio, multimodal...) and brings them into ComfyUI...  ...-the-art open-source models to run natively in the ComfyUI core engine Design and build the native nodes that expose new model... 

    ComfyUI

    San Francisco, CA
    4 days ago
  • A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional... 

    Abridge

    San Francisco, CA
    16 hours ago
  • $150k - $300k

     ...systems as part of a hybrid team. This role focuses on developing efficient architecture for serving LLMs and optimizing performance using...  ...infrastructure tools. Ideal candidates will have significant experience with ML systems, ensuring robust performance and scalability. The... 
    Remote job

    Prime-Intellect

    San Francisco, CA
    16 hours ago
  • Acceler8 Talent is looking for a Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves...  ...workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency... 

    Acceler8 Talent

    San Francisco, CA
    1 day ago
  •  ...practice of medicine—and the inference systems that power them...  .... We’re looking for an Engineering Manager to lead and grow our Model Inference team. The...  ...engineers, partner closely with ML Research and the broader...  ...are operating at peak efficiency and reliability. What... 
    Hourly pay
    Full time
    Flexible hours

    AI Chopping Block, Inc.

    San Francisco, CA
    2 days ago
  • A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT.... 

    Baseten

    San Francisco, CA
    2 days ago
  • $217k - $312.2k

     ...their business. Databricks’ Model Serving product provides...  ...to deploy and manage AI/ML models — from traditional...  ...offers real-time, low-latency inference, governance, monitoring,...  ...with strong SLAs and cost efficiency. As a Senior Engineering Manager, you will lead the... 
    Local area

    I did my part and supported the Regular Toilet

    San Francisco, CA
    16 hours ago
  • $227.2k - $324.5k

     ...About the Role: This Software Engineering team works closely with Machine Learning...  ...platform. The team’s efforts take inference systems to the next level of low‑...  ...of the online feature store for efficiency and low latency. Work with ML engineers to understand their... 
    Full time
    Flexible hours

    Tubi Tv

    San Francisco, CA
    1 day ago
  •  ...Inference Engine Engineer We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack...  ...Good If You Touched Any Of ML compilers and framework internals:... 

    Perplexity AI

    San Francisco, CA
    4 days ago
  • $220k - $320k

     ...Help us make inference blazingly fast. If you love squeezing...  ...specialized language models for companies that need...  ...ten-person team of engineers who work in-person in downtown...  ...stack as fast and efficient as possible. Your work...  ...Collaborate with applied ML engineers to ensure... 
    Work at office

    Inference

    San Francisco, CA
    3 days ago
  • $205k - $272.5k

     ...long-tail scenarios, and model errors that matter most. Omnitag, our ML-powered multimodal data...  ...mining framework, is the engine that powers this discovery. As a Staff Machine Learning...  ...learning loops to hyper-efficient production inference. You will own system-level... 
    Work at office
    Remote work

    Motional

    San Francisco, CA
    10 days ago
  • $185.1k - $335.3k

     ...software that can run efficiently and reliably on...  ...new approaches to model export, kernel...  ..., and performance engineering so that every cycle...  ...into fast, reliable inference across GPUs powering...  ...The Role As a Staff Compiler Engineer...  ...and effortless for ML engineers across... 
    Local area
    Remote work
    Work from home
    Relocation package
    Flexible hours

    General Motors

    San Francisco, CA
    4 days ago
  • $300 per month

     ...site Department Cloud Engineering Crusoe's mission is...  ...Software Engineer for the Model LifeCycle team will...  ...failure recovery, and cost-efficient scaling. Implement...  ..., networking). AI/ML Expertise Familiarity...  ...components (training, inference). Preferred Qualifications... 
    Full time
    Temporary work

    Epoch Biodesign

    San Francisco, CA
    1 day ago
  •  ...BASETEN Baseten powers inference for the world's most dynamic...  ...to bring cutting-edge models into production. With...  ..., reliable, and cost‑efficient. As part of this team,...  ...open-source inference engines (vLLM, TensorRT-LLM, SGLang...  ...and curiosity. ML experience is a plus, but... 
    Flexible hours

    Baseten

    San Francisco, CA
    4 days ago
  • $192k - $260k

     ...business. Databricks' Model Serving product provides...  ...to deploy and manage AI/ML models - from traditional ML...  ...offers real-time, low-latency inference, governance, monitoring,...  ...with strong SLAs and cost efficiency. As a Staff Engineer, you'll play a critical role... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    2 days ago
  •  ...who loves optimizing model inference to join us in building...  ...bleeding-edge part of our engine. You'll be working on...  ...run faster and more efficiently than anyone thought possible...  ...the current state of ML deployment could be...  ...Member of Technical Staff, it’s long and silly for... 

    Comfy

    San Francisco, CA
    16 hours ago
  • Senior Staff Machine Learning Engineer, Post Training Remote - USA Airbnb...  ...we rely on ML to ensure that guests...  ...enhances various AI models, ML services and tools...  ...enhanced performance and efficiency. Hands‑on prototype...  ...models and inference run‑time Post‑training... 
    Work experience placement
    Remote work

    airbnb, Inc.

    San Francisco, CA
    3 days ago
  • $231k - $340k

    Harvey is seeking a Senior AI Engineer in San Francisco, CA, to design and enhance their AI platform, focusing on model integration, evaluation, and shared infrastructure. Candidates...  ...of backend systems experience, including AI/ML engineering, and a proven track record of... 

    Harvey

    San Francisco, CA
    2 days ago
  • $192k - $260k

     ...their business. Databricks’ Model Serving product provides...  ...platform to deploy and manage AI/ML models — from traditional...  ...real-time, low-latency inference, governance, monitoring, and...  ...with strong SLAs and cost efficiency. As a Staff Engineer, you’ll play a critical role... 
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    4 days ago
  • $300 per month

     ...We’re crafting the engine that powers a world...  ...About this role: The Staff Software Engineer for the Model LifeCycle team will...  ...recovery, and cost-efficient scaling. Implement...  ...database, etc. AI/ML Expertise: Experience...  ...including training, inference. Preferred Qualifications... 
    Temporary work

    Crusoe Energy Systems LLC

    San Francisco, CA
    16 hours ago
  • $207k - $290k

     ...an experienced AI Engineer with deep expertise...  ...team as a Senior Staff Architect. In this...  ...driven AI reasoning models and systems that...  ...systems are resilient, efficient, explainable and...  ...experience in AI/ML engineering,...  ...techniques , including inference-time search, chain... 
    Worldwide
    Flexible hours

    JazzX AI

    San Francisco, CA
    29 days ago
  • $227.33k - $312.58k

     ...We're looking for a Staff ML Data Engineer to join Procore's AI & Frontier Models organization. In this role, you'll be responsible...  ..., observability, and cost efficiency across AI data pipelines....  ...learning training, evaluation, or inference workflows. ~ Solid understanding... 
    Work at office
    Local area
    Immediate start
    3 days per week

    ProCore CPA

    San Francisco, CA
    14 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Engineer - ML Inference & Model Efficiency. Be the first to apply!