Machine Learning Infrastructure Engineer- Model Inference

$179k - $248k

Abridge

San Francisco, CA

Machine Learning Infrastructure Engineer

Join to apply for the Machine Learning Infrastructure Engineer role at Abridge .

Base pay range

$179,000.00/yr - $248,000.00/yr

About Abridge

Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI‑powered platform was purpose‑built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients. Our enterprise‑grade technology transforms patient‑clinician conversations into structured clinical notes in real time, with deep EMR integrations. Powered by Linked Evidence and our purpose‑built, auditable AI, we are the only company that maps AI‑generated summaries to ground truth, helping providers quickly trust and verify the output. As pioneers in generative AI for healthcare, we are setting the industry standards for the responsible deployment of AI across health systems.

The Role

As a Senior Machine Learning Systems Engineer at Abridge, you’ll play a pivotal role in building and optimizing the core infrastructure that powers our machine learning models. Your work will be instrumental in enhancing scalability, efficiency, and performance of our AI‑driven solutions. You will work with our Infrastructure and Research teams to build, deploy, optimize and orchestrate across our AI models.

What You’ll Do

Design, deploy and maintain scalable Kubernetes clusters for AI model inference and training
Develop, optimize, and maintain ML model serving and training infrastructure, ensuring high‑performance and low‑latency.
Collaborate with ML and product teams to scale backend infrastructure for AI-driven products, focusing on model deployment, throughput optimization, and compute efficiency.
Optimize compute‑heavy workflows and enhance GPU utilization for ML workloads.
Build a robust model API orchestration system
Collaborate with leadership to define and implement strategies for scaling infrastructure as the company grows, ensuring long‑term efficiency and performance.

What You’ll Bring

Strong experience in building and deploying machine learning models in production environments.
Deep understanding of container orchestration and distributed systems architecture
Expertise in Kubernetes administration, including custom resource definitions, operators, and cluster management
Experience developing APIs and managing distributed systems for both batch and real‑time workloads
Excellent communication skills, with the ability to interface between research and product engineering

Bonus Points If

Expertise with model serving frameworks such as NVIDIA Triton Server, VLLM, TRT‑LLM and so on.
Expertise with ML toolchains such as PyTorch, Tensorflow or distributed training and inference libraries.
Familiarity with GPU cluster management and CUDA optimization
Knowledge of infrastructure as code (Terraform, Ansible) and GitOps practices
Experience with container registries, image optimization, and multi‑stage builds for ML workloads
Experience orchestrating across ASR models or LLM models for building various GenAI applications

Why Work at Abridge?

At Abridge, we’re transforming healthcare delivery experiences with generative AI, enabling clinicians and patients to connect in deeper, more meaningful ways. Our mission is clear: to power deeper understanding in healthcare. We’re driving real, lasting change, with millions of medical conversations processed each month.

Joining Abridge means stepping into a fast‑paced, high‑growth startup where your contributions truly make a difference. Our culture requires extreme ownership—every employee has the ability to (and is expected to) make an impact on our customers and our business.

Beyond individual impact, you will have the opportunity to work alongside a team of curious, high‑achieving people in a supportive environment where success is shared, growth is constant, and feedback fuels progress. At Abridge, it’s not just what we do—it’s how we do it. Every decision is rooted in empathy, always prioritizing the needs of clinicians and patients.

We’re committed to supporting your growth, both professionally and personally. Whether it's flexible work hours, an inclusive culture, or ongoing learning opportunities, we are here to help you thrive and do the best work of your life.

How we take care of Abridgers

Generous Time Off: 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees
Comprehensive Health Plans: Medical, Dental, and Vision coverage for all full‑time employees and their families.
Generous HSA Contribution: If you choose a High Deductible Health Plan, Abridge makes monthly contributions to your HSA.
Paid Parental Leave: Generous paid parental leave for all full‑time employees.
Family Forming Benefits: Resources and financial support to help you build your family.
401(k) Matching: Contribution matching to help invest in your future.
Personal Device Allowance: Tax free funds for personal device usage.
Pre‑tax Benefits: Access to Flexible Spending Accounts (FSA) and Commuter Benefits.
Lifestyle Wallet: Monthly contributions for fitness, professional development, coworking, and more.
Mental Health Support: Dedicated access to therapy and coaching to help you reach your goals.
Sabbatical Leave: Paid Sabbatical Leave after 5 years of employment.
Compensation and Equity: Competitive compensation and equity grants for full‑time employees.
… and much more!

Equal Opportunity Employer

Abridge is an equal opportunity employer and considers all qualified applicants equally without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, or disability.

Staying safe - Protect yourself from recruitment fraud

We are aware of individuals and entities fraudulently representing themselves as Abridge recruiters and/or hiring managers. Abridge will never ask for financial information or payment, or for personal information such as bank account number or social security number during the job application or interview process. Any emails from the Abridge recruiting team will come from an @abridge.com email address. You can learn more about how to protect yourself from these types of fraud by referring to this article. Please exercise caution and cease communications if something feels suspicious about your interactions.

#J-18808-Ljbffr

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Machine Learning Infrastructure Engineer- Model Inference in San Francisco, CA vacancy

ML Infrastructure Engineer - Model Inference & Scale
...A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional...
Suggested
Abridge
San Francisco, CA
3 days ago
Staff ML Inference Engineer — Model Efficiency (Remote)
Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while... ...strong skills in C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and inclusive work culture...
Suggested
Remote job
Jaide Health
San Francisco, CA
1 day ago
Machine Learning Engineer - Speech Model Training
$250k - $300k
...Machine Learning Engineer - Speech Model Training $250,000 - $300,000 San Francisco, CA Hybrid... ...all the way through to production inference on edge devices. At a company that... ...the hard problems in distributed infrastructure and ship solutions You likely have...
Suggested
Permanent employment
Full time
Work at office
Immediate start
Worldwide
DeepRec.ai
San Francisco, CA
3 days ago
Machine Learning Engineer, Inference & Serving (Speech LLM) - San Francisco
$180k - $270k
...the next-generation intelligence infrastructure and interfaces to capture, extract... ...security and privacy protection. To learn more about Plaud, please visit... ...high-throughput, ultra-low-latency inference engines for large language models or foundational speech models. Understand...
Suggested
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
3 days ago
Staff ML Infrastructure Engineer: Scale Training & Inference
$300k - $430k
...team. About the Team The ML Infrastructure team builds the systems that... ...power every stage of Decagon's model lifecycle. We own the... ...routing layer that manages inference across multiple providers. We... ...hiring a Staff ML Infrastructure Engineer to own the platforms...
Suggested
Work at office
Decagon
San Francisco, CA
2 days ago
ML Inference Infrastructure Engineer
...AI company is seeking an Infrastructure Software Engineer in San Francisco to build... ...maintain components of an ML inference platform. The successful... ...monitoring systems for model metrics. This role offers... ...to advancing AI and machine learning infrastructure. #J-18808-...
Baseten
San Francisco, CA
3 days ago
Machine Learning Engineering Manager, Model Delivery
$148.5k - $266.2k
...Machine Learning Engineering Manager, Model Delivery page is loaded## Machine Learning Engineering Manager, Model... ..., and cost improvements for inference and serving, including capacity planning... ...delivery)* Experience with cloud infrastructure and production observability (AWS...
Remote work
Autodesk
San Francisco, CA
2 days ago
LLM Inference & Model-Performance Engineer
...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT...
BaseTen
San Francisco, CA
2 days ago
Engineering Manager, Model Inference
...the practice of medicine—and the inference systems that power them need to be... ...-class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The Inference team... ...architecting low-latency, high-throughput infrastructure to pushing the frontier of LLM...
Hourly pay
Full time
Flexible hours
AI Chopping Block, Inc.
San Francisco, CA
3 days ago
AI Inference & Model Routing Lead
...Anysphere is looking for an experienced leader for the Model Routing & Inference team in San Francisco. This role involves owning the inference... ...strong background in high-throughput systems and software engineering fundamentals, combined with leadership skills to mentor...
Anysphere
San Francisco, CA
2 days ago
Engineering Manager, Model Routing & Inference Engineering San Francisco Apply
...programmers, using a combination of inventive research, design, and engineering. Our organization is very flat, and our team is small and... ..., and shipping code. About the Role You will lead the Model Routing & Inference team at Cursor, owning the inference platform that powers...
Anysphere
San Francisco, CA
2 days ago
Real-Time Inference & Model Serving Engineer (Equity)
$220k - $320k
...ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering... ...working across model serving, infrastructure, and performance optimisation....
3 days per week
Trades Workforce Solutions
San Francisco, CA
2 days ago
Machine Learning Engineer, Applied Research and Model Development
...teams to maintain rigid systems, Lightfield learns from how companies actually work,... ...founders and execs Pioneer the training of new models that leverage both historical data and... ...features Help build a world class AI/ML engineering team by recruiting and mentoring...
Work from home
LIGHTFIELD INC
San Francisco, CA
4 days ago
Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco
$180k - $270k
...next‑generation intelligence infrastructure and interfaces to capture,... ...and privacy protection. To learn more about Plaud, please visit... ...on. Possess strong software engineering skills (especially in Python... ...can run at scale against live model checkpoints. Can deeply...
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
3 days ago
Senior Machine Learning Engineer - Model Evaluations, Public Sector
$240.45k - $300.3k
...Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team at Scale deploys advanced AI systems-including... ...Build evaluation frameworks for LLM agents, including infrastructure for scenario-based and environment-based testing. Conduct...
Full time
Scale AI
San Francisco, CA
9 days ago
ML Model Serving Engineer
...variety of LLM, speech, and vision models. Partner with ML infrastructure and training engineers to build a fast, cost-... ...and custom kernels to speed up inference. Find ways to reduce model... .... Expert in optimizing machine learning models for serving reliably at...
Full time
Contract work
Flexible hours
SESAME
San Francisco, CA
5 days ago
Senior ML Platform Engineer - Remote, Scalable Inference
$230k - $265k
...Parafin is seeking a Software Engineer to lead the evolution of their ML Platform, ensuring robust and scalable systems for data scientists... ...and maintain core platform functionalities, enhance real-time inference processes, and collaborate across teams to ensure quality. A...
Remote work
Parafin Inc
San Francisco, CA
3 days ago
Machine Learning Infrastructure Engineer
$200k - $300k
...innovators. The Role: We’re looking for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-... ...build, and maintain the infrastructure powering ML model training, batch inference, and evaluation workflows Improve internal tools...
Work at office
3 days per week
Ambience Healthcare
San Francisco, CA
2 days ago
Machine Learning Infrastructure Engineer
...combines threat intelligence with machine learning, enables financial... ...scale. As a Senior Software Engineer, ML Infrastructure at TRM Labs, you will collaborate... ...across multiple concurrent models and users. Optimize high-throughput inference. Implement and tune serving...
Worldwide
TRM Labs
San Francisco, CA
2 days ago
Machine Learning Infrastructure Engineer
...transformer and detection models run efficiently on both... ...Responsibility The AI Infrastructure team at Zensors builds the engine that powers our visual... ...of video streams. As a Machine Learning Engineer in ML Runtime... ...accelerate the training and inference of computer vision...
Work at office
Zensors
San Francisco, CA
2 days ago
Senior Machine Learning Infrastructure Engineer
...Goal: 99.99% uptime We serve custom inference stacks that have irregular GPU load.... ...that have done genuinely amazing work in infrastructure that are interested in a challenge, working... ...infrastructure around inference engines and GPU loads. This is a role that will...
Morph Inc.
San Francisco, CA
7 days ago
Senior Machine Learning Infrastructure Engineer
$183.7k - $248.6k
...Unity is looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where... ...the infrastructure that brings ML models from training into production, ensuring... ...feature serving, model versioning, and inference optimization What we're...
Work at office
Remote work
Worldwide
Relocation package
UNITY
San Francisco, CA
7 days ago
Machine Learning Infrastructure Engineer
$245k - $345k
...Whatnot updates on our news and engineering blogs and join us as we... ...design and scale the core infrastructure that powers machine learning and self-hosted large language model applications across the company... ...& high‑throughput GPU inference. What you'll do: Own the infrastructure...
Work experience placement
Work at office
Local area
Remote work
Work from home
Home office
Flexible hours
Whatnot
San Francisco, CA
14 days ago
Machine Learning Infrastructure Engineer
...consumer AI investments is hiring an ML Infrastructure Engineer. The founding team helped build iconic... ...Infra hire helping scale training and inference systems that directly power a consumer... ...(GPU compute, orchestration, model serving) Own core systems for data...
Full time
Greylock Partners
San Francisco, CA
2 days ago
Machine Learning Infrastructure Engineer
...Job Posting Build the infrastructure to serve personal AI models privately and at scale.... ..., personal AI – one that learns your skills, judgment, and... ...architecture with the finetuning & inference code You Have • A deep understanding of the machine learning stack. You can...
Remote work
Shift work
Workshop Labs
San Francisco, CA
6 days ago
Software Engineer, Model Inference
$325k
...About the Team Our Inference team brings OpenAI's most capable research... ...our start-of-the-art AI models, allowing them to do things... ...Role We are looking for an engineer who wants to take the world'... ...role, you will Work alongside machine learning researchers, engineers, and...
OpenAI
San Francisco, CA
7 days ago
ML Inference Engineer San Francisco Engineering Full Time
...We're looking for an ML Inference Engineer with deep expertise in high-performance ML engineering. This is a highly technical, high-impact role... ...on squeezing every drop of performance from generative media models. You'll work across the inference stack, designing novel...
Full time
Visa sponsorship
Relocation package
Reactor.am
San Francisco, CA
2 days ago
ML Infra Engineer: Scale GPU Training & Inference
...Reducto, a fast-growing AI company in San Francisco, is hiring a Machine Learning Infra Engineer. This role involves building and maintaining the training and inference frameworks necessary for optimal performance. Ideal candidates should possess strong Python skills,...
Reducto
San Francisco, CA
2 days ago
High-Performance ML Inference Engineer for Diffusion Models
...Reactor is looking for an experienced ML Inference Engineer with deep expertise in high-performance ML engineering. This role focuses on optimizing the performance of generative media models, contributing to Reactor's competitive edge. The ideal candidate will drive model...
Reactor.am
San Francisco, CA
2 days ago
ML Inference Engineer PyTorch & Scalable AI
...A research-driven AI company is seeking a Machine Learning Engineer to join their Inference Engine team. You'll design and develop production systems to enhance AI inference performance, collaborating with researchers and engineers. The ideal candidate will have over 3...
Full time
Together
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Infrastructure Engineer- Model Inference. Be the first to apply!