LLM Inference & Model-Performance Engineer

Baseten

A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT. The ideal candidate will have a relevant degree and strong programming skills in Python or C++. Competitive compensation, generous PTO, and full medical benefits are offered. Join us in shaping the future of AI! #J-18808-Ljbffr Baseten

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the LLM Inference & Model-Performance Engineer in San Francisco, CA vacancy

LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together... ...inference for large language models (LLMs). Our mission is to... ...infrastructure, pushing the boundaries of performance, scalability, and cost-efficiency...
Performance
Full time
Together AI
San Francisco, CA
24 days ago
Distributed LLM Inference Engineer - Scale AI at Speed
Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
Performance
Anyscale
San Francisco, CA
2 days ago
Distributed LLM Inference Engineer
...without needing to be a distributed systems expert. About the Role As a Distributed LLM Inference Engineer, you will help systems and optimizations that push the boundaries of performance for inference at large scale. This is an incredibly critical role to Anyscale as it...
Performance
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
3 days ago
System Engineering In
Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...
Performance
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
3 days ago
Model API Engineer: Fast, Scalable AI Inference
...Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience with distributed...
Performance
Baseten
San Francisco, CA
2 days ago
Speech LLM Model Evaluations Engineer - Hybrid
$180k - $270k
...for productivity. The role involves collaborating with machine learning researchers and engineering teams to define metrics, improve model capabilities, and ensure effective performance tracking. Candidates should bring strong software engineering skills, particularly in...
Performance
Plaud
San Francisco, CA
4 days ago
LLM Inference & Optimization Engineer
Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed... ...inference engines that support multimodal models, optimizing frameworks for low-...
Performance
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
4 days ago
Staff Engineer - ML Inference & Model Efficiency
...AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over 5...
Performance
Remote work
Cohere
San Francisco, CA
2 days ago
Engineering Manager, Model Routing & Inference Engineering · · San Francisco Apply →
...inventive research, design, and engineering. Our organization is very... ...the Role You will lead the Model Routing & Inference team at Cursor, owning the... ..., optimizing for cost and performance. Designing routing... ...frameworks (vLLM, TensorRT‑LLM, TGI), load balancing, or building...
Performance
Anysphere
San Francisco, CA
5 days ago
Engineering Manager, Model Inference
...practice of medicine—and the inference systems that power them... ...class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The... ...to pushing the frontier of LLM serving techniques. You’ll lead a high-performing team of AI inference engineers...
Performance
Hourly pay
Full time
Flexible hours
AI Chopping Block, Inc.
San Francisco, CA
2 days ago
Backend Integration Engineer (AI/Model Services)
$74.38 - $83.8 per hour
...Specialty Software Engineer - API Developer - GenAI... ...Rather than building models, you'll be responsible... ...Exposure to model serving or inference gateways... ...Kubernetes GenAI, LLM, or agent-based frameworks... ...and optimize service performance Contribute to architecture...
Performance
Full time
Contract work
Temporary work
Flexible hours
Motion Recruitment
San Francisco, CA
3 days ago
Staff ML Inference Engineer — Model Efficiency (Remote)
Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses... ...ML systems while enhancing core performance metrics across model execution. You'... ...C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity...
Performance
Remote job
Jaide Health
San Francisco, CA
5 days ago
Model Implementation Engineer
...Model Implementation Engineer Sciforium is an AI infrastructure company developing next-generation... ...models across modalities, ensuring high performance and reliability from day one. You... ...with large-scale model training or inference systems. Contributions to open-...
Performance
Flexible hours
Sciforium
San Francisco, CA
9 days ago
Senior Inference Performance Engineer — GPU & CUDA
$220k - $320k
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2... ...in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation of $220,000...
Performance
Local area
Inference
San Francisco, CA
1 day ago
Senior Engineer 2: AI Inference Engine Systems
$167.2k - $209k
...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role,... ...can deploy and scale their models with industry-leading performance and reliability. This is a... ...inference serving frameworks such as llm‑d, NVIDIA Dynamo, or Ray Serve....
Performance
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
6 days ago
AI Inference & Model Routing Lead
...looking for an experienced leader for the Model Routing & Inference team in San Francisco. This role... ...routing, cluster management, and performance. The ideal candidate has a strong background... ...-throughput systems and software engineering fundamentals, combined with...
Performance
Anysphere
San Francisco, CA
5 days ago
System Performance Engineering
$350k
...committed researchers, engineers, policy experts, and business... ...Role Anthropic's inference fleet serves Claude to... ...: accelerator kernels, model servers, distributed... ...measure how the fleet performs against its theoretical... ...infrastructure or general LLM serving stacks. Direct...
Performance
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
5 days ago
Forward Deployed Engineer, Lead - LLM Post-training
...developing open weight models for individuals,... ...As a Forward Deployed Engineer Lead, Post-Training, you... ...that measure real-world performance. Own the data pipeline... ...infrastructure teams to ensure inference performance, cost... ...2+ years focused on LLM post-training in a...
Performance
Relocation package
Reflection
San Francisco, CA
2 days ago
Software Engineer, Model Inference
$295k
...About the Team Our Inference team brings OpenAI's most capable... ...access our start-of-the-art AI models, allowing them to do things... ...able to before. We focus on performant and efficient model inference... ...Role We are looking for an engineer who wants to take the world's...
Performance
OpenAI
San Francisco, CA
1 day ago
Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco
$180k - $270k
...to elevate productivity and performance through note‑taking solutions... ...on. Possess strong software engineering skills (especially in Python)... ...can run at scale against live model checkpoints. Can deeply partner... ..." looks like for a Speech LLM, translating capabilities (like...
Performance
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
4 days ago
Machine Learning Engineer: LLM Interpretability & Systems
...inherent to generative models. CTGT is the... ...reliable, controllable, and performant in practice. Our mission... ...the gap between LLM capabilities and domain... ...Senior Machine Learning Engineer will operate deep within... ...policy enforcement at inference time. Who You Are...
Performance
CTGT
San Francisco, CA
5 days ago
Founding Compiler Engineer - AI/ML Model Optimizer
Slope is seeking a Founding Compiler Engineer in San Francisco, responsible for designing core compiler infrastructure and optimizing AI models. You will write CUDA kernels and conduct performance reviews, contributing to Luminal's mission of making AI workloads portable...
Performance
Full time
Slope
San Francisco, CA
1 day ago
Inference Engineer
...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems... ..., and making state-of-the-art speech models actually behave correctly in realtime... ...The team already operates beyond the performance of most publicly available realtime...
Performance
Remote work
Flexible hours
techire ai
San Francisco, CA
2 days ago
Inference Engineer
...do. We're pioneering the model architectures that will make... ...model innovation and systems engineering paired with a design-minded... ...the Role We're hiring an Inference Engineer to advance our... ...systems with high demands on performance, reliability, and observability...
Performance
Work at office
Visa sponsorship
Flexible hours
Cartesia, Inc.
San Francisco, CA
2 days ago
Senior Inference Performance Engineer - GPU & CUDA
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...
Performance
inference.net
San Francisco, CA
2 days ago
LLM Evaluation & Benchmark Engineer
PetsApp is looking for strong engineers to evaluate and benchmark LLM models at their San Francisco office. The role involves analyzing model performance and working closely with both open-source and closed-source model labs. Candidates should have significant Python experience...
Performance
Work at office
Relocation package
PetsApp
San Francisco, CA
3 days ago
Real-Time GPU Inference Optimization Engineer
$300k
...firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal candidate... ...of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base salary of up to ~$300...
Performance
Visa sponsorship
Relocation package
Trades Workforce Solutions
San Francisco, CA
5 days ago
GPU Kernel Engineer for AI Inference & Performance
FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
Performance
FriendliAI
San Francisco, CA
2 days ago
Edge Inference Engineer: Optimize On-Device AI Kernels
Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize... ...kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming experience...
Performance
Flexible hours
Liquid AI
San Francisco, CA
4 days ago
INFERENCE ENGINEER
...THE ROLE You build and operate the inference systems that serve our models in production. The work spans... ...running real workloads. This is an engineering role, not a research role. You'll... ...models at high throughput Own the performance characteristics of those systems end...
Performance
MakerMaker.AI
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference & Model-Performance Engineer. Be the first to apply!