Engineer, Inference & Model serving

$220k - $320k

techire ai

Job Description

ML Model Serving Engineer

Want to build the layer that actually makes AI usable in real time?

You'll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models running in production, not offline experiments.

They're building real-time AI systems that need to respond instantly, reliably, and at scale. That means solving hard problems around batching, GPU efficiency, memory constraints, and system-level bottlenecks that most teams never fully crack.

You'll sit at the core of the platform, working across model serving, infrastructure, and performance optimisation. A big part of the role is pushing current tooling beyond its limits, extending frameworks, profiling bottlenecks, and designing systems that hold up under real-world load.

This is not about training models. It's about making them fast, efficient, and production-ready.

What you'll work on:

Building high-performance serving systems for LLM, speech, and vision models
Scaling inference to production workloads with strict latency requirements
Optimising GPU utilisation and execution efficiency
Implementing techniques like continuous batching, KV cache optimisation, speculative decoding, and prefill/decode separation
Improving frameworks such as vLLM, TensorRT-LLM, Triton, and SGLang
Profiling and debugging performance across GPU, memory, and system layers

What you'll bring:

Strong experience with ML inference or model serving systems
ID: 34247
Copilot Symbol
Access Evo Actions
Engineer, Inference & Model serving
Sesame AI
Job ID: 34247
Applications
57
Shortlisted
4
Sent
11
1st Interview
13
2nd+ Interview
0
Offers
0
Placed
0
Renewal
0
Details Custom Fields Descriptions & Ratings Compensation & Fees Activities Files Onboarding Approval process Shift Setting Integrations
Upload JD
No file chosen
Original document
Job Summary
Public job description
Internal job description
Ratings & Screening questions
Note: This JD will be posted to job boards; please remember to remove the Company details and Contact information.
Quick Post Job
Job title
Engineer, Inference & Model serving
Job owner: Marc Powell
Company: Sesame AI
Contact: Brown Ryan
Privacy
Only Public Jobs can be shared
Private Public
Apps
Visit the App Store
indeed
Your job will go live on Indeed once it adheres to their quality standards.
For more information on this, please head to our Help Center
Your changes have been saved successfully.
Deep understanding of latency and throughput optimisation in production
Solid Python and PyTorch skills, plus a systems or performance engineering mindset
Familiarity with distributed systems and production infrastructure

Exposure to CUDA, GPU profiling tools, or systems like Kubernetes and Ray is useful, but the key is knowing how to make models run efficiently at scale.

You'll join a highly technical team with experience across major AI labs and big tech. The environment is pragmatic, focused on solving real performance problems rather than abstract research.

There's real ownership here. You'll help define how next-generation AI systems are served.

Package:
$220,000 - $320,000 base + equity
San Francisco, onsite 3 days per week

If you're interested in working on the part of AI that actually determines whether it works in the real world, this is worth exploring.

All applicants will receive a response.

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Engineer, Inference & Model serving in San Francisco, CA vacancy

Model Implementation Engineer
...Model Implementation Engineer Sciforium is an AI infrastructure company developing next-generation multimodal... ...and a proprietary, high-efficiency serving platform. Backed by multi-million-... ...with large-scale model training or inference systems. Contributions to open-...
Suggested
Flexible hours
Sciforium
San Francisco, CA
1 day ago
Senior Engineer 2: AI Inference Engine Systems
$167.2k - $209k
...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you... ...can deploy and scale their models with industry-leading performance and... ...Familiarity with distributed inference serving frameworks such as llm‑d, NVIDIA Dynamo...
Suggested
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
3 days ago
LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At... ...scalable inference for large language models (LLMs). Our mission is to optimize... ...parallelism for high-performance serving. Apply CUDA graph optimizations...
Suggested
Full time
Together AI
San Francisco, CA
6 days ago
Inference Engineer
...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems... ..., and making state-of-the-art speech models actually behave correctly in realtime... ...TensorRT, Triton, ONNX Runtime, and custom serving systems Managing KV cache systems,...
Suggested
Remote work
Flexible hours
techire ai
San Francisco, CA
4 days ago
Performance Engineer, Inference Systems
$350k
...growing group of committed researchers, engineers, policy experts, and business leaders... ...systems. About the Role Anthropic's inference fleet serves Claude to millions of users across our... ...tightly coupled: accelerator kernels, model servers, distributed routing,...
Suggested
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
2 days ago
Senior AI Model Serving Engineer — Low-Latency Inference
A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates...
Jobleads-US
San Francisco, CA
2 days ago
ML Model Serving Engineer
...Responsibilities: Turbocharge our serving layer, consisting of a variety of LLM, speech, and vision models. Partner with ML infrastructure and training engineers to build a fast, cost-effective,... ...and custom kernels to speed up inference. Find ways to reduce model...
Full time
Contract work
Flexible hours
SESAME
San Francisco, CA
1 day ago
Senior Software Engineer - Model Performance
$220k - $320k
...Help us make inference blazingly fast. If you love squeezing every... ...and hosts specialized language models for companies that need frontier... ...-funded ten-person team of engineers who work in-person in... ...approaches, always with the goal of serving models faster and cheaper at...
Work at office
Inference
San Francisco, CA
5 days ago
Software Engineer, Model Inference
...About the Team Our Inference team brings OpenAI's most capable research and technology... ...use and access our start-of-the-art AI models, allowing them to do things that they've... ...About the Role We are looking for an engineer who wants to take the world's largest...
OpenAI
San Francisco, CA
3 days ago
Distributed Systems Engineer, Data & Inference Platform
...compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines... ...about both. Researchers and ML engineers will hand you workloads that barely run... ...matter. Responsibilities Serve Models at Scale: Design and operate...
Flexible hours
Adaption
San Francisco, CA
4 days ago
Staff Software Engineer, Model Serving
$192k - $260k
...improve their business. Databricks' Model Serving product provides enterprises with a... .... It offers real-time, low-latency inference, governance, monitoring, and lineage. As... ...SLAs and cost efficiency. As a Staff Engineer, you'll play a critical role in shaping...
Local area
Worldwide
Databricks
San Francisco, CA
4 days ago
Staff Software Engineer, Foundational Model Serving
$192k - $260k
...improve their business. Foundation Model Serving is the API Product for hosting and serving frontier AI model inference for open source models like Llama, Qwen,... ...experience is necessary. We're looking for engineers who have owned high scale operational sensitive...
Local area
Worldwide
Databricks
San Francisco, CA
5 days ago
Founding Engineer - FlowGen Labs
$240k - $400k
...artificial intelligence. Role Summary As our Founding Engineer, you will own a zero-to-one product and its... ...generation. Familiarity with LangGraph is a plus. Stand up inference paths with low latency serving and token-level observability Productionize prompt...
Visa sponsorship
pear.ai
San Francisco, CA
5 hours ago
Senior Agentic Engineer
$175k - $275k
...centric design. We are seeking an Agentic Engineer with over 6 years of experience and a... ...transform how our platform operates and serves customers. Key Responsibilities... ...databases, embedding systems, and real-time inference. Experience with agent architecture patterns...
MGT Insurance
San Francisco, CA
3 hours ago
Performance Engineer
$280k
...group of committed researchers, engineers, policy experts, and business... ...OS internals Language modeling with transformers Representative... ...our models to low-precision inference Write a custom load-balancing algorithm to optimize serving efficiency Build...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
5 days ago
Performance Engineer, GPU
$280k
...group of committed researchers, engineers, policy experts, and business... ...possible with large language models. You'll be responsible for... ...capabilities and dramatically improve inference efficiency. Working at the... ...bottlenecks in production serving infrastructure Partner...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
3 days ago
Real-Time GPU Inference Optimization Engineer
$300k
...technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems.... ...of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base salary of up to ~$300,000 and...
Visa sponsorship
Relocation package
Trades Workforce Solutions
San Francisco, CA
4 days ago
Construction Engineer - AI Model Training - Remote
...Location: Remote Role Description If you’re a senior construction engineering professional who thrives on precision, constructability, field... ...work. You’ll challenge and evaluate advanced language models on construction engineering topics to strengthen model reasoning...
For contractors
Remote work
YO IT Consulting
San Francisco, CA
5 days ago
Inference Engineer
...do. We're pioneering the model architectures that will make... ...model innovation and systems engineering paired with a design-minded product... ...Role We're hiring an Inference Engineer to advance our... ...reliable model inference and serving stack for our cutting edge foundation...
Work at office
Visa sponsorship
Flexible hours
Cartesia, Inc.
San Francisco, CA
4 days ago
GPU Kernel Engineer
...GPU Kernel Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding... ...large-scale training and inference. This role is ideal for someone...
Flexible hours
Sciforium
San Francisco, CA
1 day ago
Senior Storage Engineer
...is a leader in AI cloud infrastructure serving tens of thousands of customers. Our customers... ...day is currently Tuesday. Product Engineering at Lambda is responsible for building... ...systems and supporting AI training and inference at scale. Lambda's Infrastructure Engineering...
Work at office
Local area
Work from home
Flexible hours
Lambda
San Francisco, CA
1 day ago
Software Engineer - Model APIs
...Baseten powers mission-critical inference for the world's most dynamic... ...of AI to bring cutting-edge models into production. We're... ...and help build the platform engineers turn to to ship AI products.... ...spans distributed systems, model serving, and developer experience. You...
Full time
Flexible hours
Baseten
San Francisco, CA
1 day ago
Lead Software Engineer, Model Serving Platform
...Sciforium's Next-Generation Model Serving Platform Architect Sciforium is an AI infrastructure... ...AMD with hands-on support from AMD engineers the team is scaling rapidly to build... ...batching, scheduling, and distributed inference systems. Develop high-performance C++...
Work at office
Flexible hours
Sciforium
San Francisco, CA
1 day ago
Senior AI Infrastructure Engineer, Model Serving Platform
$216k - $270k
...As a Software Engineer on the ML Infrastructure team, you will design... ..., reliable, and efficient serving of LLMs. Our platform powers... ...engineers to integrate and optimize models for production and research... ...-LLM, or text-generation-inference. Compensation packages...
Full time
Scale AI
San Francisco, CA
7 hours ago
Project Engineer, Energy Storage
$155k - $245k
...from batteries we already have. Project Engineer, Energy Storage Position Summary:... ...and escalate schedule or cost risks Serve as the primary interface between Business... ...professional or employment information, and inferences drawn from your PI. We collect your PI...
Full time
Casual work
Work at office
Local area
Redwood Materials
San Francisco, CA
4 hours ago
Staff Mechanical Design Engineer, EPC
$187.5k - $247.5k
.... Staff Mechanical Design Engineer, EPC Redwood Materials is... ...Responsibilities will include: Serve as Mechanical Engineer of... ...Pipe-Flo (or other hydraulic modeling application), Caesar II, AspenTech... ...employment information, and inferences drawn from your PI. We...
Full time
Work experience placement
Redwood Materials
San Francisco, CA
4 days ago
QA Engineering Tech
...that our platform delivers AI inference. Validating whether inference... ...looking for a dedicated QA engineer who can own the product's quality... ...AI inference quality, model deployments, and integrations... ...~ Working knowledge of LLM serving. ~ Strong experience testing...
Worldwide
Flexible hours
FriendliAI Corp
San Francisco, CA
2 days ago
Quantitative Model Validation Analyst
...journey to do our best. Helping the customers and businesses we serve to make better and smarter financial decisions and enabling the... ...statistical Treasury Risk and Pre-provision Net Revenue (PPNR) models. Regularly reviews model monitoring reports. The models may cover...
Temporary work
3 days per week
U.S. Bancorp
San Francisco, CA
5 days ago
Model Validation Manager - Wholesale and Small Business Credit Loss
$170.26k - $200.3k
...journey to do our best. Helping the customers and businesses we serve to make better and smarter financial decisions and enabling the... ...One. Job Description U.S. Bank is seeking an experienced Model Validation Manager to lead validation efforts for our Wholesale...
Temporary work
Local area
3 days per week
U.S. Bank
San Francisco, CA
2 days ago
Model Performance Software Engineer, Claude Code
$405k
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...* Architect eval frameworks that measure model capabilities across diverse coding tasks... ...them—and drive them to completion * Serve as a senior technical bridge between...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Engineer, Inference & Model serving. Be the first to apply!