Senior AI Model Serving Engineer — Low-Latency Inference
Jobleads-US
A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates will have a strong foundation in algorithms and system design, along with a passion for mentoring others. The position offers a competitive salary and generous benefits. #J-18808-Ljbffr Jobleads-US
- A leading data and AI company in San Francisco is seeking a Staff Engineer to design and implement systems for their AI/ML Model Serving platform. You will collaborate with product, infrastructure, and research teams to ensure high-performance system delivery. The ideal...Senior
$325k
A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate has over 5 years of software engineering experience, strong familiarity with ML architectures, and experience...Senior$166k - $225k
...the world's best data and AI infrastructure platform so... ...their business. Databricks’ Model Serving product provides... ...models. It offers real-time, low-latency inference, governance, monitoring, and... ...and cost efficiency. As a Senior Engineer, you’ll play a critical role...SeniorLocal areaWorldwide- ...understanding in healthcare. Our AI-powered platform was... ..., technologists, and engineers working together to... ...Engineer, Model Inference at Abridge, you’ll play... ...and maintain ML model serving infrastructure, ensuring... ...high-performance and low-latency. Collaborate with ML...SuggestedHourly payFull timeFlexible hours
$220k - $320k
...Help us make inference blazingly fast. If you... ...specialized language models for companies that... ...frontier-quality AI at a fraction of... ...ten-person team of engineers who work in-person... ...with the goal of serving models faster and... ...inference performance: latency, throughput, cost...SeniorWork at office- Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design and build scalable, low-latency model inference systems while collaborating with researchers. The ideal candidate has strong engineering skills and...Flexible hours
- Genesis AI is seeking an experienced individual to develop low-latency inference pipelines for on-device deployment in robotics. The role involves designing and optimizing distributed systems on GPU clusters, implementing efficient low-level code such as CUDA and Triton...
- ...Model Implementation Engineer Sciforium is an AI infrastructure company developing next... ..., high-efficiency serving platform. Backed... ...internal performance, latency, and efficiency... ...Familiarity with low-level performance... ...model training or inference systems. Contributions...Flexible hours
$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco About... ...Role Together AI is building the Inference... ...generative AI models to the world. Our... ...from optimizing latency down to the last... ..., ensuring low-latency load balancing... ...throughput to serve diverse workloads...SeniorFull timeLocal area- Nooks in San Francisco is seeking a Senior Engineer to develop low-latency Voice AI systems. You will collaborate with a world-class team to innovate in voice technology, providing crucial insights from customer interactions. Your role includes making foundational technical...Senior
- A leading AI platform company in San Francisco is looking for a Senior Infrastructure Engineer to design and operate production infrastructure for high-scale, low-latency systems. Your focus will be on critical services, improving reliability, and enhancing developer velocity...Senior
- ...About the Team Our Inference team brings OpenAI's most capable... ...access our start-of-the-art AI models, allowing them to do things... ...We are looking for an engineer who wants to take the world'... ...them for use in a high-volume, low-latency, and high-availability production...
- YO IT Consulting is seeking an experienced Senior Civil Engineer specializing in evaluating AI-generated content. This remote role involves ensuring technical accuracy, challenging AI models with real-world engineering scenarios, and shaping AI communication standards....SeniorRemote job
$192k - $260k
...world's best data and AI infrastructure platform... .... Foundation Model Serving is the API Product for... ...frontier AI model inference for open source models... ...necessary. We're looking for engineers who have owned high... ...high-throughput, low-latency inference on GPU workloads...Local areaWorldwide$166.9k - $225.9k
...operates as both a central engineering function and an embedded reliability... ...with peers—while also serving as the dedicated reliability... ...Experience with AIOps—using AI/ML‑based tooling for anomaly... ...‑backed services (e.g., LLM inference latency, non‑determinism, prompt...SeniorFlexible hours- ...BASETEN Baseten powers inference for the world's most dynamic AI companies, like... ...bring cutting-edge models into production. With... ...systems, model serving, and developer experience... ...record of owning low‑latency, reliable backend... ...open-source inference engines (vLLM, TensorRT-LLM...Flexible hours
- ...technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional teams. Ideal...
$250.8k - $286.2k
...responsible and reliable AI systems, changing... ...science and engineering teams to deliver our... ...reimagine how we serve our customers and... ...customers. Our AI models and platforms empower... ...language model inference, similarity search... ...scalability, cost, latency, throughput — of...SeniorFull timePart timeLocal area- A technology startup in San Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience...
$216k - $270k
...As a Software Engineer on the ML Infrastructure team,... ...reliable, and efficient serving of LLMs. Our platform powers... ...integrate and optimize models for production and... ...LLM, or text-generation-inference. Compensation packages... ...is to develop reliable AI systems for the world's...SeniorFull time- A tech startup focused on AI workloads is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and... ...Ideal candidates should have strong software engineering skills and experience with ML inference...Senior
$192k - $260k
...the world's best data and AI infrastructure platform so... ...their business. Databricks’ Model Serving product provides enterprises... .... It offers real-time, low-latency inference, governance, monitoring, and... ...cost efficiency. As a Staff Engineer, you’ll play a critical role...Local areaWorldwide$204k - $348k
...Principal/ Principal Software Engineer, AI Lab Execution System... ...We are seeking a Senior Principal or Principal... ...this role, you will serve as a technical leader... ...Design systems that model scientific intent, experiment... ...high availability, low latency, observability, fault...SeniorFull timeWork at officeLocal areaFlexible hours$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam... ...At Together.ai, we are building state... ...for large language models (LLMs). Our mission... ...role will focus on low-latency, high-throughput inference... ...high-performance serving. Apply CUDA...Full time- ...Machine Learning Engineer, Inference Want to solve... ...fast-growing voice AI company building the... ...under production latency constraints. Think... ...of-the-art speech models actually behave... ...inference systems behind low-latency... ...Runtime, and custom serving systems Managing...Remote workFlexible hours
- Cubiq Recruitment is seeking a Robotics Software Engineer in San Francisco, California. This technical role focuses on building and optimizing low-latency systems that power advanced robotics applications and AI systems. The ideal candidate will have strong expertise in...
- ...Software Engineer Opportunity at Abridge Abridge's services and engineering... ...to identify performance and latency bottlenecks across all of our... ...as service templates and self-serve infrastructure. Work with... ...research as we pioneer new AI-first cloud-native-first...Senior
- ...responsibilities including performance optimization, systems debugging, and research. The role requires top-tier C++ skills, a strong background in low-level systems, and leadership potential. Candidates will work in a high-pressure, customer-facing environment. This is a full-time,...SeniorFull time
- ...Next-Generation Model Serving Platform Architect... ...Sciforium is an AI infrastructure company... ...support from AMD engineers the team is... ...to market. As a senior technical leader,... ...and distributed inference systems. Develop... ...models and ensure low-latency, scalable inference...Work at officeFlexible hours
- ...infrastructure, and AI. Our compute-to-... ...About the Role The inference layer is the... ...critical path between a model and the image a... ...sees. As Inference Engineer, you will own that... ...layer end-to-end: serving architecture, batching... ...will orchestrate low-latency highly performant...Work at officeImmediate startRemote workFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Model Serving Engineer — Low-Latency Inference. Be the first to apply!
- machine learning ai engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- ai engineer remote San Francisco, CA
- ai ml engineer San Francisco, CA
- ai engineer San Francisco, CA
- ai developer San Francisco, CA
- ai research engineer San Francisco, CA
- ai prompt engineer San Francisco, CA
- senior development executive San Francisco, CA
- senior technical manager San Francisco, CA

