Staff Engineer - ML Inference & Model Efficiency

Cohere

A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over 5 years of coding experience in C++ or Python and a solid understanding of the LLM inference environment. This position offers a remote-friendly work model, a competitive salary, and extensive benefits including a generous vacation policy. #J-18808-Ljbffr Cohere

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Staff Engineer - ML Inference & Model Efficiency in San Francisco, CA vacancy

Staff ML Inference Engineer — Model Efficiency (Remote)
Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while enhancing core performance metrics across... ...C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and...
Suggested
Remote job
Jaide Health
San Francisco, CA
1 day ago
Staff Research Engineer, Model Efficiency
...and deploying frontier models for developers and enterprises... ...a team of researchers, engineers, designers, and more,... ...systems can do — but inference is still the bottleneck. The Model Efficiency team is responsible for... ...locations. As a Staff Research Engineer, you...
Suggested
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
1 day ago
Staff Research Engineer: AI Model Efficiency & Speed
...research company in San Francisco is seeking a Staff Research Engineer to enhance the efficiency of large language models. In this role, you will develop and implement... ...and have experience with model architecture and inference optimization. Join a diverse team committed to...
Suggested
Remote work
Cohere
San Francisco, CA
3 days ago
Member of Technical Staff, Model Efficiency
Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale intelligence... ...is a team of researchers, engineers, designers, and more, who are... ...focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We develop...
Suggested
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
3 days ago
Software Engineer, Model Inference
$295k
...About the Team Our Inference team brings OpenAI's... ...our start-of-the-art AI models, allowing them to do things... ...on performant and efficient model inference, as well... ...We are looking for an engineer who wants to take the world... ...of modern ML architectures and an intuition...
Suggested
OpenAI
San Francisco, CA
2 days ago
Staff ML Engineer: Efficient ML & Low-Latency AI
...seeks candidates with expertise in AI simulation development. The role emphasizes optimizing training efficiency, enhancing GPU performance, and ensuring low-latency inference. Applicants should be proficient in methodologies for gradient checkpointing, Nsight profiling,...
Embedding VC
San Francisco, CA
1 day ago
Member of Technical Staff - Edge Inference Engineer
...purpose AI systems that run efficiently across deployment... ...The Opportunity Our Edge Inference team compiles Liquid Foundation Models into optimized machine code... ...understanding of both ML architectures and hardware... ...Embedded software engineering experience or work on resource...
Liquid AI
San Francisco, CA
14 hours ago
Staff Engineer: Foundation Model API & GPU Inference
$192k - $260k
A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams...
Databricks Inc.
San Francisco, CA
3 days ago
Senior/Staff ML Engineer, Model Integration
...ComfyUI. You'll be the person who takes the newest open-source models (image, video, 3D, audio, multimodal...) and brings them into ComfyUI... ...-the-art open-source models to run natively in the ComfyUI core engine Design and build the native nodes that expose new model...
ComfyUI
San Francisco, CA
5 days ago
ML Infrastructure Engineer - Model Inference & Scale
A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional...
Abridge
San Francisco, CA
1 day ago
Staff ML Inference Engineer - Scalable LLM Serving (Remote)
$150k - $300k
...systems as part of a hybrid team. This role focuses on developing efficient architecture for serving LLMs and optimizing performance using... ...infrastructure tools. Ideal candidates will have significant experience with ML systems, ensuring robust performance and scalability. The...
Remote job
Prime-Intellect
San Francisco, CA
1 day ago
Staff ML Inference Systems Engineer - Scalable GPU Infra (SF)
Acceler8 Talent is looking for a Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves... ...workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency...
Acceler8 Talent
San Francisco, CA
2 days ago
Engineering Manager, Model Inference
...practice of medicine—and the inference systems that power them... .... We’re looking for an Engineering Manager to lead and grow our Model Inference team. The... ...engineers, partner closely with ML Research and the broader... ...are operating at peak efficiency and reliability. What...
Hourly pay
Full time
Flexible hours
AI Chopping Block, Inc.
San Francisco, CA
3 days ago
LLM Inference & Model-Performance Engineer
A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....
Baseten
San Francisco, CA
3 days ago
Senior Manager, Engineering - Model Serving
$217k - $312.2k
...their business. Databricks’ Model Serving product provides... ...to deploy and manage AI/ML models — from traditional... ...offers real-time, low-latency inference, governance, monitoring,... ...with strong SLAs and cost efficiency. As a Senior Engineering Manager, you will lead the...
Local area
I did my part and supported the Regular Toilet
San Francisco, CA
1 day ago
Member of Technical Staff (AI Inference Engineer)
...Inference Engine Engineer We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack... ...Good If You Touched Any Of ML compilers and framework internals:...
Perplexity AI
San Francisco, CA
5 days ago
Staff, ML Infrastructure Engineer
$227.2k - $324.5k
...About the Role: This Software Engineering team works closely with Machine Learning... ...platform. The team’s efforts take inference systems to the next level of low‑... ...of the online feature store for efficiency and low latency. Work with ML engineers to understand their...
Full time
Flexible hours
Tubi Tv
San Francisco, CA
2 days ago
Senior Software Engineer - Model Performance
$220k - $320k
...Help us make inference blazingly fast. If you love squeezing... ...specialized language models for companies that need... ...ten-person team of engineers who work in-person in downtown... ...stack as fast and efficient as possible. Your work... ...Collaborate with applied ML engineers to ensure...
Work at office
Inference
San Francisco, CA
4 days ago
Staff Machine Learning Engineer
$205k - $272.5k
...long-tail scenarios, and model errors that matter most. Omnitag, our ML-powered multimodal data... ...mining framework, is the engine that powers this discovery. As a Staff Machine Learning... ...learning loops to hyper-efficient production inference. You will own system-level...
Work at office
Remote work
Motional
San Francisco, CA
11 days ago
Staff ML Compiler Engineer
$185.1k - $335.3k
...software that can run efficiently and reliably on... ...new approaches to model export, kernel... ..., and performance engineering so that every cycle... ...into fast, reliable inference across GPUs powering... ...The Role As a Staff Compiler Engineer... ...and effortless for ML engineers across...
Local area
Remote work
Work from home
Relocation package
Flexible hours
General Motors
San Francisco, CA
5 days ago
Software Engineer - Model APIs
...BASETEN Baseten powers inference for the world's most dynamic... ...to bring cutting-edge models into production. With... ..., reliable, and cost‑efficient. As part of this team,... ...open-source inference engines (vLLM, TensorRT-LLM, SGLang... ...and curiosity. ML experience is a plus, but...
Flexible hours
Baseten
San Francisco, CA
5 days ago
Senior Software Engineer, AI Model LifeCycle
$300 per month
...site Department Cloud Engineering Crusoe's mission is... ...Software Engineer for the Model LifeCycle team will... ...failure recovery, and cost-efficient scaling. Implement... ..., networking). AI/ML Expertise Familiarity... ...components (training, inference). Preferred Qualifications...
Full time
Temporary work
Epoch Biodesign
San Francisco, CA
2 days ago
Staff Software Engineer, Model Serving
$192k - $260k
...business. Databricks' Model Serving product provides... ...to deploy and manage AI/ML models - from traditional ML... ...offers real-time, low-latency inference, governance, monitoring,... ...with strong SLAs and cost efficiency. As a Staff Engineer, you'll play a critical role...
Local area
Worldwide
Databricks
San Francisco, CA
3 days ago
Senior/Staff ML Engineer, Performance Optimization
...who loves optimizing model inference to join us in building... ...bleeding-edge part of our engine. You'll be working on... ...run faster and more efficiently than anyone thought possible... ...the current state of ML deployment could be... ...Member of Technical Staff, it’s long and silly for...
Comfy
San Francisco, CA
1 day ago
Senior Staff Machine Learning Engineer, Post Training
Senior Staff Machine Learning Engineer, Post Training Remote - USA Airbnb... ...we rely on ML to ensure that guests... ...enhances various AI models, ML services and tools... ...enhanced performance and efficiency. Hands‑on prototype... ...models and inference run‑time Post‑training...
Work experience placement
Remote work
airbnb, Inc.
San Francisco, CA
4 days ago
Staff AI Platform Engineer: Agent Infra & Model Routing
$231k - $340k
Harvey is seeking a Senior AI Engineer in San Francisco, CA, to design and enhance their AI platform, focusing on model integration, evaluation, and shared infrastructure. Candidates... ...of backend systems experience, including AI/ML engineering, and a proven track record of...
Harvey
San Francisco, CA
3 days ago
Staff Software Engineer, Model Serving
$192k - $260k
...their business. Databricks’ Model Serving product provides... ...platform to deploy and manage AI/ML models — from traditional... ...real-time, low-latency inference, governance, monitoring, and... ...with strong SLAs and cost efficiency. As a Staff Engineer, you’ll play a critical role...
Local area
Worldwide
Cacheflow
San Francisco, CA
14 hours ago
Staff Software Engineer, Model LifeCycle
$300 per month
...We’re crafting the engine that powers a world... ...About this role: The Staff Software Engineer for the Model LifeCycle team will... ...recovery, and cost-efficient scaling. Implement... ...database, etc. AI/ML Expertise: Experience... ...including training, inference. Preferred Qualifications...
Temporary work
Crusoe Energy Systems LLC
San Francisco, CA
1 day ago
Senior Staff AI Engineer
$207k - $290k
...an experienced AI Engineer with deep expertise... ...team as a Senior Staff Architect. In this... ...driven AI reasoning models and systems that... ...systems are resilient, efficient, explainable and... ...experience in AI/ML engineering,... ...techniques , including inference-time search, chain...
Worldwide
Flexible hours
JazzX AI
San Francisco, CA
a month ago
Staff ML Data Engineer (Datagrid)
$227.33k - $312.58k
...We're looking for a Staff ML Data Engineer to join Procore's AI & Frontier Models organization. In this role, you'll be responsible... ..., observability, and cost efficiency across AI data pipelines.... ...learning training, evaluation, or inference workflows. ~ Solid understanding...
Work at office
Local area
Immediate start
3 days per week
ProCore CPA
San Francisco, CA
15 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Engineer - ML Inference & Model Efficiency. Be the first to apply!