INFERENCE ENGINEER

MakerMaker.AI

ABOUT THE COMPANY We're building autonomous research agents for recursive self‑improvement (multi‑agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on‑site ABOUT THE ROLE You build and operate the inference systems that serve our models in production. The work spans serving infrastructure, runtime optimization, and the long tail of production infrastructure that come with running real workloads. This is an engineering role, not a research role. You'll measure, profile, debug, and ship. You'll work alongside researchers, but your job is to make their work fast and reliable in production. Real ownership, real autonomy. WHAT YOU'LL DO Build, operate, and harden production inference systems serving large models at high throughput Own the performance characteristics of those systems end‑to‑end: throughput, latency, cost‑per‑token, reliability under load Profile real workloads to identify bottlenecks; ship fixes that move the metric you set out to improve Implement and integrate inference optimizations from the research team (quantization, custom kernels, scheduling improvements, memory management) into production Design observability into the inference layer: metrics, tracing, alerting that surface regressions before users notice them Run capacity planning, autoscaling, and load testing for varied workload shapes (batch, online, mixed, agentic) Diagnose and resolve production incidents; write postmortems that turn bugs into systemic fixes WHAT WE'RE LOOKING FOR Senior ML systems engineer with 3+ years building production‑grade, large‑scale serving infrastructure Strong distributed systems experience; you've been on‑call for systems that matter Performance profiling and optimization fluency: you read flame graphs, you are analytical and measured before you change Experience with GPU‑accelerated inference at scale (multi‑GPU, multi‑node, batched and streaming workloads), preferably experience with AMD GPUs Fluent Python; comfortable reading and writing systems‑level code in at least one of the following languages: C++, CUDA, ROCm or Triton Track record of shipping production infrastructure, preferably surfaces serving millions of requests across diverse workloads Good written communication; you can write a runbook that someone else can follow at 3am NICE TO HAVE Open‑source contributions to inference / serving frameworks Experience with mixed cloud and on‑premises deployments Familiarity with hardware‑aware optimization (memory hierarchy, NCCL/RDMA, NUMA) Background in compilers, runtimes, or accelerator software stacks THIS ROLE IS PROBABLY NOT FOR YOU IF You're primarily a researcher, the work here is building, not exploring You want to focus narrowly on one component; this role spans the stack Production responsibility (incidents, on‑call, ownership of running systems) isn't appealing #J-18808-Ljbffr

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the INFERENCE ENGINEER in San Francisco, CA vacancy

Low-Latency Inference Systems Engineer - On-Device & GPU
Genesis AI is seeking an experienced individual to develop low-latency inference pipelines for on-device deployment in robotics. The role involves designing and optimizing distributed systems on GPU clusters, implementing efficient low-level code such as CUDA and Triton...
Suggested
Genesis AI
San Francisco, CA
2 days ago
Distributed Systems Engineer: Global-scale Inference
Sail Research in San Francisco is looking for an individual to design and implement high-performance scheduling systems for AI inference processes. This role requires strong foundational knowledge in distributed systems and an eagerness to work closely with agent-based...
Suggested
Sail Research
San Francisco, CA
2 days ago
Performance Engineer, Inference Systems
$350k
...Performance Engineer, Inference Systems San Francisco, CA | New York City, NY | Seattle, WA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as...
Suggested
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
5 days ago
Senior Inference & RL Systems Engineer
$225k
...Our approach combines frontier‑scale pre‑training, domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you will design and operate the distributed systems that...
Suggested
Relocation
Visa sponsorship
Magic
San Francisco, CA
3 days ago
Staff ML Inference Systems Engineer - Scalable GPU Infra (SF)
...Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves designing end-to-... ...real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency...
Suggested
Acceler8 Talent
San Francisco, CA
1 day ago
Distributed Systems Engineer, Data & Inference Platform
...systems that turn raw compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines that... ...call pager that keeps you honest about both. Researchers and ML engineers will hand you workloads that barely run; you'll hand them back...
Flexible hours
Adaption
San Francisco, CA
12 days ago
QA Engineer: AI Inference & SaaS Quality
...FriendliAI is seeking a QA engineer in San Francisco to ensure the quality of its innovative AI inference platform. The ideal candidate will have at least 3 years of experience in software quality engineering, strong Python skills, and familiarity with testing frameworks...
Flexible hours
FriendliAI
San Francisco, CA
2 days ago
GPU HPC Systems Engineer for AI Inference & Equity
$160k - $320k
A leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills, HPC experience, and knowledge of parallel programming techniques. Responsibilities include designing GPU kernels,...
Vast.ai
San Francisco, CA
2 days ago
Senior Inference Systems Engineer - Scale Production ML
Acceler8 Talent is looking for a Software Engineer in San Francisco to focus on building and optimizing inference systems for next-generation AI at scale. You will design production inference pipelines and improve system performance under real production constraints. The...
Acceler8 Talent
San Francisco, CA
23 hours ago
Production AI Inference Engineer Scale & Impact
...A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development...
Flexible hours
BaseTen
San Francisco, CA
2 days ago
Senior ML Inference Engineer Production Systems
...MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production...
MakerMaker.AI
San Francisco, CA
2 days ago
AI Inference Engineer — High-Performance GPU Systems
...fast-moving environments where the path forward isn't laid out for you , 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems , Familiarity with at least one deep learning framework (PyTorch, JAX,...
Perplexity AI
San Francisco, CA
13 hours ago
Remote Realtime Speech Inference Engineer
Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds genuinely matter? This role is with a fast-growing voice AI company building the realtime speech infrastructure layer behind hundreds of millions of production conversations...
Remote job
Flexible hours
Trades Workforce Solutions
San Francisco, CA
13 hours ago
Senior ML Inference Systems Engineer
...is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving... ...components. Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly...
Gimlet Labs
San Francisco, CA
23 hours ago
ML Inference Systems Engineer
...looking for a Member of Technical Staff focused on ML systems and inference in San Francisco. You will design and build inference systems... .... Candidates should have strong foundations in software engineering, experience with ML inference systems, and performance tuning...
Gimlet Labs, Inc.
San Francisco, CA
2 days ago
Real-Time GPU Inference Optimization Engineer
$300k
...leading technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal... ...understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base...
Visa sponsorship
Relocation package
Trades Workforce Solutions
San Francisco, CA
2 days ago
Senior Engineer 2: AI Inference Engine Systems
$167.2k - $209k
...expanding its AI Infrastructure layer to support the next generation of AI-driven applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you will be a key technical leader responsible for designing, developing, and...
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
6 days ago
GPU Kernel Engineer for AI Inference & Performance
...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
FriendliAI
San Francisco, CA
2 days ago
Realtime Inference Engineer Scalable AI Serving
...Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design and build scalable, low-latency model inference systems while collaborating with researchers. The ideal candidate has strong engineering skills...
Flexible hours
Cartesia, Inc.
San Francisco, CA
2 days ago
Senior Inference Performance Engineer GPU & CUDA
$220k - $320k
...A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation...
Local area
Inference
San Francisco, CA
3 days ago
Real-Time LLM Inference & Speech Serving Engineer
$180k - $270k
...Plaud is seeking talented individuals for AI infrastructure roles in San Francisco, focusing on building high-performance inference engines for speech AI. Ideal candidates will have substantial experience in GPU architecture and real-time systems. This position offers...
Plaud
San Francisco, CA
2 days ago
LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together.ai, we are building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs). Our mission is to...
Full time
Together AI
San Francisco, CA
14 days ago
Inference Engineer
...team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build... ...world's foremost experts in AI. About the Role We're hiring an Inference Engineer to advance our mission of building real-time multimodal...
Work at office
Visa sponsorship
Flexible hours
Cartesia, Inc.
San Francisco, CA
2 days ago
Edge Inference Engineer: Optimize On-Device AI Kernels
Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...
Flexible hours
Liquid AI
San Francisco, CA
4 days ago
Multimodal Inference Engineer — Scale GPU AI Models
An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate...
OpenAI
San Francisco, CA
1 day ago
Inference Engineer
...stealth, the company has already reached eight-figure revenue, raised an $80M Series A, and is scaling a world-class engineering team across inference, distributed systems, compiler infrastructure, and high-performance AI compute. Their platform automatically maps complex...
Acceler8 Talent
San Francisco, CA
23 hours ago
Founding Engineer, ML Inference
...Join a small, focused team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative video, developer... ...possible. About the Role We're looking for a Founding Engineer, ML Inference with deep expertise in high-performance ML engineering. This...
Relocation
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
3 days ago
Senior Inference Performance Engineer - GPU & CUDA
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...
inference.net
San Francisco, CA
2 days ago
LLM Inference & Optimization Engineer
Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed inference engines that support multimodal models, optimizing frameworks for low...
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
4 days ago
Distributed LLM Inference Engineer - Scale AI at Speed
Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
Anyscale
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to INFERENCE ENGINEER. Be the first to apply!