Senior AI Model Serving Engineer Low-Latency Inference

Jobleads-US

A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates will have a strong foundation in algorithms and system design, along with a passion for mentoring others. The position offers a competitive salary and generous benefits. #J-18808-Ljbffr Jobleads-US

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Senior AI Model Serving Engineer Low-Latency Inference in San Francisco, CA vacancy

Senior Model Serving Engineer - Low-Latency AI Platform
A leading data and AI company in San Francisco is seeking a Staff Engineer to design and implement systems for their AI/ML Model Serving platform. You will collaborate with product, infrastructure, and research teams to ensure high-performance system delivery. The ideal...
Senior
Menlo Ventures
San Francisco, CA
2 days ago
Real-Time Inference & Model Serving Engineer (Equity)
$220k - $320k
ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models...
Suggested
3 days per week
Trades Workforce Solutions
San Francisco, CA
2 days ago
Senior Model Inference Engineer for Production-Scale AI
$325k
A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate has over 5 years of software engineering experience, strong familiarity with ML architectures, and experience...
Senior
Jobleads-US
San Francisco, CA
4 days ago
Senior Software Engineer, Model Serving
$166k - $225k
...the world's best data and AI infrastructure platform so... ...their business. Databricks’ Model Serving product provides... ...models. It offers real-time, low-latency inference, governance, monitoring, and... ...and cost efficiency. As a Senior Engineer, you’ll play a critical role...
Senior
Local area
Worldwide
Cacheflow
San Francisco, CA
1 day ago
Senior ML Inference Engineer Production Systems
MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability... ...in production-grade serving infrastructure, be fluent in Python...
Senior
MakerMaker.AI
San Francisco, CA
1 day ago
Senior Engineer 2: AI Inference Engine Systems
$167.2k - $209k
...DigitalOcean is expanding its AI Infrastructure layer to... .... We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role... ...can deploy and scale their models with industry-leading performance... ...distributed inference serving frameworks such as llm‑d,...
Senior
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
3 days ago
Senior Software Engineer - Model Performance
$220k - $320k
...Help us make inference blazingly fast. If you... ...specialized language models for companies that... ...frontier-quality AI at a fraction of... ...ten-person team of engineers who work in-person... ...with the goal of serving models faster and... ...inference performance: latency, throughput, cost...
Senior
Work at office
Inference
San Francisco, CA
3 days ago
Speech LLM Inference Engineer — Ultra-Low Latency Serving
$200k
Plaud is seeking skilled AI engineers to join their core SpeechLLM lab in San Francisco. You will play a crucial role in building high-throughput inference engines for conversational AI and optimizing GPU performance while collaborating with various teams. The position...
Work at office
Plaud
San Francisco, CA
4 days ago
Model Implementation Engineer
Sciforium is an AI infrastructure... ...multimodal AI models and a proprietary... ...-efficiency serving platform. Backed... ...support from AMD engineers the team is... ...internal performance, latency, and efficiency... ...with low‑level performance... ...model training or inference systems. Contributions...
Flexible hours
Sciforium
San Francisco, CA
4 days ago
Senior Inference & RL Systems Engineer
$225k
...generation to improve models and solve alignment more... ...ultra‑long context, and inference‑time compute to achieve... ...The Role As a Software Engineer on the Inference & RL... ...distributed systems that serve our models in... ...that determine inference latency, throughput, stability,...
Senior
Relocation
Visa sponsorship
Magic
San Francisco, CA
17 hours ago
Senior research Scientist - Machine Learning Systems & Efficiency Engineer
$142.7k - $270.95k
...is seeking a Senior researcher - Machine... ...& Efficiency Engineer to join our R&... ...in inference performance, latency, and cost efficiency... ...intersection of model architecture,... ...Intelligence (AI), ML systems,... ...Responsibilities Inference & Serving Optimization:... ...-throughput, low-latency...
Senior
Full time
Temporary work
Local area
Worldwide
Adobe
San Francisco, CA
2 days ago
Senior Infrastructure Engineer: Scale, Low Latency, DevEx
A leading AI platform company in San Francisco is looking for a Senior Infrastructure Engineer to design and operate production infrastructure for high-scale, low-latency systems. Your focus will be on critical services, improving reliability, and enhancing developer velocity...
Senior
Decagon
San Francisco, CA
3 days ago
Engineering Manager, Model Inference
...Role Our generative AI-powered products are... ...of medicine—and the inference systems that power them... ...We’re looking for an Engineering Manager to lead and grow our Model Inference team. The... ...of how our models are served: from architecting low-latency, high-throughput infrastructure...
Hourly pay
Full time
Flexible hours
AI Chopping Block, Inc.
San Francisco, CA
4 days ago
Engineering Manager, Model Routing & Inference Engineering · · San Francisco Apply →
...research, design, and engineering. Our organization... ...will lead the Model Routing & Inference team at Cursor,... ...that powers every AI interaction in... ...calls that balance latency, cost, reliability... ...high‑throughput, low‑latency... ...especially in inference serving, traffic routing,...
Anysphere
San Francisco, CA
2 days ago
Senior Manager, AI Engineering (People Leader) (Gen AI Platform Services)
$250.8k - $286.2k
...responsible and reliable AI systems, changing... ...science and engineering teams to deliver our... ...reimagine how we serve our customers and... ...customers. Our AI models and platforms empower... ...language model inference, similarity search... ...scalability, cost, latency, throughput — of...
Senior
Full time
Part time
Local area
Capital One
San Francisco, CA
3 days ago
Software Engineer - Model APIs
...BASETEN Baseten powers inference for the world's most dynamic AI companies, like... ...bring cutting-edge models into production. With... ...systems, model serving, and developer experience... ...record of owning low‑latency, reliable backend... ...open-source inference engines (vLLM, TensorRT-LLM...
Flexible hours
Baseten
San Francisco, CA
3 days ago
Senior AI/ML Infra & SRE Engineer
Senior Infrastructure Engineer - Bland As a Senior Infrastructure Engineer... ...with strict latency and reliability requirements... ...and real-time inference serving across multiple regions... ...industries. Lead - AI/ML Stack Infrastructure... ...containerized AI models. The engineer will lead...
Senior
Temporary work
AI Chopping Block, Inc.
San Francisco, CA
4 days ago
Senior Civil Engineer for AI Model Training (Remote)
YO IT Consulting is seeking an experienced Senior Civil Engineer specializing in evaluating AI-generated content. This remote role involves ensuring technical accuracy, challenging AI models with real-world engineering scenarios, and shaping AI communication standards....
Senior
Remote job
YO IT Consulting
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
$166.9k - $225.9k
...operates as both a central engineering function and an embedded reliability... ...with peers—while also serving as the dedicated reliability... ...Experience with AIOps—using AI/ML‑based tooling for anomaly... ...‑backed services (e.g., LLM inference latency, non‑determinism, prompt...
Senior
Flexible hours
Drata
San Francisco, CA
3 days ago
Senior Staff Engineer - AI Safety & Model Evaluation
Xcede is looking for a Member of Technical Staff focused on AI Safety to lead red-teaming efforts and ensure the robustness of next... ...should have deep expertise in LLM safety, strong software engineering skills, and relevant academic qualifications in AI or related fields...
Senior
Xcede
San Francisco, CA
3 days ago
ML Infrastructure Engineer - Model Inference & Scale
...technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional teams. Ideal...
Abridge
San Francisco, CA
2 days ago
Software Engineer, Model Inference
$325k
About the Team Our Inference team brings OpenAI's most capable research... ...access our start-of-the-art AI models, allowing them to do things... ...Role We are looking for an engineer who wants to take the world'... ...for use in a high-volume, low-latency, and high-availability...
Centaur Labs
San Francisco, CA
4 days ago
Senior Staff Systems Engineer, Rust Eng.
...Company: Sequen AI is leading the... ...building frontier ranking models for search and... ...frontier AI models serve production traffic at... ...for a Core Systems Engineer with a deep mastery... ...and compute complex inference logic with ultra-low latency. You will replace high...
Senior
Sequen
San Francisco, CA
17 hours ago
Model API Engineer: Fast, Scalable AI Inference
A technology startup in San Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience...
Baseten
San Francisco, CA
4 days ago
Senior ML Inference Systems Engineer
A tech startup focused on AI workloads is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and... ...Ideal candidates should have strong software engineering skills and experience with ML inference...
Senior
Gimlet Labs
San Francisco, CA
2 days ago
Staff Software Engineer, Foundational Model Serving
$192k - $260k
...world’s best data and AI infrastructure platform... ...business. Foundation Model Serving is the API Product for... ...frontier AI model inference for open source models... ...necessary. We’re looking for engineers who have owned high‑... ...high‑throughput, low‑latency inference on GPU workloads...
Local area
Worldwide
Databricks
San Francisco, CA
4 days ago
Staff Software Engineer, Model Serving
$192k - $260k
...the world's best data and AI infrastructure platform so... ...their business. Databricks’ Model Serving product provides enterprises... .... It offers real-time, low-latency inference, governance, monitoring, and... ...cost efficiency. As a Staff Engineer, you’ll play a critical role...
Local area
Worldwide
Cacheflow
San Francisco, CA
1 day ago
Senior Inference Engineer - AI Infrastructure
$250k
...Ready to architect AI infrastructure... ...a serverless inference platform, beginning... ...expanding into low-latency, real-time inference and custom model hosting. This is... ...chance to join as a Senior Inference Platform Engineer at an early... ...latest models, serving frameworks, and...
Senior
Permanent employment
San Francisco, CA
more than 2 months ago
AI Engineer — Model Performance & Inference Optimizer
Pantera Capital is looking for a Model Performance Engineer in San Francisco, California to optimize model inference speed, cost, and reliability.... ...infrastructure that accelerates the AI team’s processes. The role covers optimizing serving frameworks and ensuring...
Pantera Capital
San Francisco, CA
4 days ago
Sr. Manager, Engineering - Model Serving
$217k - $312.2k
...the world's best data and AI infrastructure platform so... ...their business. Databricks’ Model Serving product provides... ...models. It offers real‑time, low‑latency inference, governance, monitoring, and... ...and cost efficiency. As a Senior Engineering Manager, you will lead the...
Senior
Local area
Worldwide
Databricks Inc.
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Model Serving Engineer Low-Latency Inference. Be the first to apply!