Staff Engineer - ML Inference & Model Efficiency
Cohere
A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over 5 years of coding experience in C++ or Python and a solid understanding of the LLM inference environment. This position offers a remote-friendly work model, a competitive salary, and extensive benefits including a generous vacation policy. #J-18808-Ljbffr Cohere
- Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while enhancing core performance metrics across... ...C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and...SuggestedRemote job
- ...and deploying frontier models for developers and enterprises... ...a team of researchers, engineers, designers, and more,... ...systems can do — but inference is still the bottleneck. The Model Efficiency team is responsible for... ...locations. As a Staff Research Engineer, you...SuggestedFull timeWork at officeRemote workFlexible hours
- ...research company in San Francisco is seeking a Staff Research Engineer to enhance the efficiency of large language models. In this role, you will develop and implement... ...and have experience with model architecture and inference optimization. Join a diverse team committed to...SuggestedRemote work
- Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale intelligence... ...is a team of researchers, engineers, designers, and more, who are... ...focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We develop...SuggestedFull timeWork at officeRemote workFlexible hours
$295k
...About the Team Our Inference team brings OpenAI's... ...our start-of-the-art AI models, allowing them to do things... ...on performant and efficient model inference, as well... ...We are looking for an engineer who wants to take the world... ...of modern ML architectures and an intuition...Suggested- ...seeks candidates with expertise in AI simulation development. The role emphasizes optimizing training efficiency, enhancing GPU performance, and ensuring low-latency inference. Applicants should be proficient in methodologies for gradient checkpointing, Nsight profiling,...
- ...purpose AI systems that run efficiently across deployment... ...The Opportunity Our Edge Inference team compiles Liquid Foundation Models into optimized machine code... ...understanding of both ML architectures and hardware... ...Embedded software engineering experience or work on resource...
$192k - $260k
A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams...- ...ComfyUI. You'll be the person who takes the newest open-source models (image, video, 3D, audio, multimodal...) and brings them into ComfyUI... ...-the-art open-source models to run natively in the ComfyUI core engine Design and build the native nodes that expose new model...
- A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional...
$150k - $300k
...systems as part of a hybrid team. This role focuses on developing efficient architecture for serving LLMs and optimizing performance using... ...infrastructure tools. Ideal candidates will have significant experience with ML systems, ensuring robust performance and scalability. The...Remote job- Acceler8 Talent is looking for a Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves... ...workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency...
- ...practice of medicine—and the inference systems that power them... .... We’re looking for an Engineering Manager to lead and grow our Model Inference team. The... ...engineers, partner closely with ML Research and the broader... ...are operating at peak efficiency and reliability. What...Hourly payFull timeFlexible hours
- A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....
$217k - $312.2k
...their business. Databricks’ Model Serving product provides... ...to deploy and manage AI/ML models — from traditional... ...offers real-time, low-latency inference, governance, monitoring,... ...with strong SLAs and cost efficiency. As a Senior Engineering Manager, you will lead the...Local area$227.2k - $324.5k
...About the Role: This Software Engineering team works closely with Machine Learning... ...platform. The team’s efforts take inference systems to the next level of low‑... ...of the online feature store for efficiency and low latency. Work with ML engineers to understand their...Full timeFlexible hours- ...Inference Engine Engineer We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack... ...Good If You Touched Any Of ML compilers and framework internals:...
$220k - $320k
...Help us make inference blazingly fast. If you love squeezing... ...specialized language models for companies that need... ...ten-person team of engineers who work in-person in downtown... ...stack as fast and efficient as possible. Your work... ...Collaborate with applied ML engineers to ensure...Work at office$205k - $272.5k
...long-tail scenarios, and model errors that matter most. Omnitag, our ML-powered multimodal data... ...mining framework, is the engine that powers this discovery. As a Staff Machine Learning... ...learning loops to hyper-efficient production inference. You will own system-level...Work at officeRemote work$185.1k - $335.3k
...software that can run efficiently and reliably on... ...new approaches to model export, kernel... ..., and performance engineering so that every cycle... ...into fast, reliable inference across GPUs powering... ...The Role As a Staff Compiler Engineer... ...and effortless for ML engineers across...Local areaRemote workWork from homeRelocation packageFlexible hours$300 per month
...site Department Cloud Engineering Crusoe's mission is... ...Software Engineer for the Model LifeCycle team will... ...failure recovery, and cost-efficient scaling. Implement... ..., networking). AI/ML Expertise Familiarity... ...components (training, inference). Preferred Qualifications...Full timeTemporary work- ...BASETEN Baseten powers inference for the world's most dynamic... ...to bring cutting-edge models into production. With... ..., reliable, and cost‑efficient. As part of this team,... ...open-source inference engines (vLLM, TensorRT-LLM, SGLang... ...and curiosity. ML experience is a plus, but...Flexible hours
$192k - $260k
...business. Databricks' Model Serving product provides... ...to deploy and manage AI/ML models - from traditional ML... ...offers real-time, low-latency inference, governance, monitoring,... ...with strong SLAs and cost efficiency. As a Staff Engineer, you'll play a critical role...Local areaWorldwide- ...who loves optimizing model inference to join us in building... ...bleeding-edge part of our engine. You'll be working on... ...run faster and more efficiently than anyone thought possible... ...the current state of ML deployment could be... ...Member of Technical Staff, it’s long and silly for...
- Senior Staff Machine Learning Engineer, Post Training Remote - USA Airbnb... ...we rely on ML to ensure that guests... ...enhances various AI models, ML services and tools... ...enhanced performance and efficiency. Hands‑on prototype... ...models and inference run‑time Post‑training...Work experience placementRemote work
$231k - $340k
Harvey is seeking a Senior AI Engineer in San Francisco, CA, to design and enhance their AI platform, focusing on model integration, evaluation, and shared infrastructure. Candidates... ...of backend systems experience, including AI/ML engineering, and a proven track record of...$192k - $260k
...their business. Databricks’ Model Serving product provides... ...platform to deploy and manage AI/ML models — from traditional... ...real-time, low-latency inference, governance, monitoring, and... ...with strong SLAs and cost efficiency. As a Staff Engineer, you’ll play a critical role...Local areaWorldwide$300 per month
...We’re crafting the engine that powers a world... ...About this role: The Staff Software Engineer for the Model LifeCycle team will... ...recovery, and cost-efficient scaling. Implement... ...database, etc. AI/ML Expertise: Experience... ...including training, inference. Preferred Qualifications...Temporary work$207k - $290k
...an experienced AI Engineer with deep expertise... ...team as a Senior Staff Architect. In this... ...driven AI reasoning models and systems that... ...systems are resilient, efficient, explainable and... ...experience in AI/ML engineering,... ...techniques , including inference-time search, chain...WorldwideFlexible hours$227.33k - $312.58k
...We're looking for a Staff ML Data Engineer to join Procore's AI & Frontier Models organization. In this role, you'll be responsible... ..., observability, and cost efficiency across AI data pipelines.... ...learning training, evaluation, or inference workflows. ~ Solid understanding...Work at officeLocal areaImmediate start3 days per week
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Engineer - ML Inference & Model Efficiency. Be the first to apply!
- staff automation engineer San Francisco, CA
- staff data engineer San Francisco, CA
- research assistant engineering San Francisco, CA
- assistant engineer San Francisco, CA
- staff engineer San Francisco, CA
- assistant mechanical engineer San Francisco, CA
- software engineer staff San Francisco, CA
- assistant engineering manager San Francisco, CA
- senior staff systems engineer San Francisco, CA
- assistant civil engineer San Francisco, CA



