Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Model Serving Engineer — Low-Latency Inference

Jobleads-US

A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates will have a strong foundation in algorithms and system design, along with a passion for mentoring others. The position offers a competitive salary and generous benefits. #J-18808-Ljbffr Jobleads-US

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior AI Model Serving Engineer — Low-Latency Inference in San Francisco, CA vacancy
  • A leading data and AI company in San Francisco is seeking a Staff Engineer to design and implement systems for their AI/ML Model Serving platform. You will collaborate with product, infrastructure, and research teams to ensure high-performance system delivery. The ideal... 
    Senior

    Menlo Ventures

    San Francisco, CA
    2 days ago
  • $325k

    A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate has over 5 years of software engineering experience, strong familiarity with ML architectures, and experience... 
    Senior

    OpenAI

    San Francisco, CA
    4 days ago
  • $166k - $225k

     ...the world's best data and AI infrastructure platform so...  ...their business. Databricks’ Model Serving product provides...  ...models. It offers real-time, low-latency inference, governance, monitoring, and...  ...and cost efficiency. As a Senior Engineer, you’ll play a critical role... 
    Senior
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    1 day ago
  •  ...understanding in healthcare. Our AI-powered platform was...  ..., technologists, and engineers working together to...  ...Engineer, Model Inference at Abridge, you’ll play...  ...and maintain ML model serving infrastructure, ensuring...  ...high-performance and low-latency. Collaborate with ML... 
    Suggested
    Hourly pay
    Full time
    Flexible hours

    Abridge

    San Francisco, CA
    2 days ago
  • $220k - $320k

     ...Help us make inference blazingly fast. If you...  ...specialized language models for companies that...  ...frontier-quality AI at a fraction of...  ...ten-person team of engineers who work in-person...  ...with the goal of serving models faster and...  ...inference performance: latency, throughput, cost... 
    Senior
    Work at office

    Inference

    San Francisco, CA
    7 hours ago
  • Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design and build scalable, low-latency model inference systems while collaborating with researchers. The ideal candidate has strong engineering skills and... 
    Flexible hours

    Cartesia

    San Francisco, CA
    2 days ago
  • Genesis AI is seeking an experienced individual to develop low-latency inference pipelines for on-device deployment in robotics. The role involves designing and optimizing distributed systems on GPU clusters, implementing efficient low-level code such as CUDA and Triton... 

    Genesis AI

    San Francisco, CA
    4 days ago
  •  ...Model Implementation Engineer Sciforium is an AI infrastructure company developing next...  ..., high-efficiency serving platform. Backed...  ...internal performance, latency, and efficiency...  ...Familiarity with low-level performance...  ...model training or inference systems. Contributions... 
    Flexible hours

    Sciforium

    San Francisco, CA
    1 day ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San Francisco About...  ...Role Together AI is building the Inference...  ...generative AI models to the world. Our...  ...from optimizing latency down to the last...  ..., ensuring low-latency load balancing...  ...throughput to serve diverse workloads... 
    Senior
    Full time
    Local area

    Together AI

    San Francisco, CA
    1 day ago
  • Nooks in San Francisco is seeking a Senior Engineer to develop low-latency Voice AI systems. You will collaborate with a world-class team to innovate in voice technology, providing crucial insights from customer interactions. Your role includes making foundational technical... 
    Senior

    nooks

    San Francisco, CA
    3 days ago
  • A leading AI platform company in San Francisco is looking for a Senior Infrastructure Engineer to design and operate production infrastructure for high-scale, low-latency systems. Your focus will be on critical services, improving reliability, and enhancing developer velocity... 
    Senior

    Decagon

    San Francisco, CA
    3 days ago
  •  ...About the Team Our Inference team brings OpenAI's most capable...  ...access our start-of-the-art AI models, allowing them to do things...  ...We are looking for an engineer who wants to take the world'...  ...them for use in a high-volume, low-latency, and high-availability production... 

    OpenAI

    San Francisco, CA
    3 days ago
  • YO IT Consulting is seeking an experienced Senior Civil Engineer specializing in evaluating AI-generated content. This remote role involves ensuring technical accuracy, challenging AI models with real-world engineering scenarios, and shaping AI communication standards.... 
    Senior
    Remote job

    YO IT Consulting

    San Francisco, CA
    2 days ago
  • $192k - $260k

     ...world's best data and AI infrastructure platform...  .... Foundation Model Serving is the API Product for...  ...frontier AI model inference for open source models...  ...necessary. We're looking for engineers who have owned high...  ...high-throughput, low-latency inference on GPU workloads... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    7 hours ago
  • $166.9k - $225.9k

     ...operates as both a central engineering function and an embedded reliability...  ...with peers—while also serving as the dedicated reliability...  ...Experience with AIOps—using AI/ML‑based tooling for anomaly...  ...‑backed services (e.g., LLM inference latency, non‑determinism, prompt... 
    Senior
    Flexible hours

    Drata

    San Francisco, CA
    3 days ago
  •  ...BASETEN Baseten powers inference for the world's most dynamic AI companies, like...  ...bring cutting-edge models into production. With...  ...systems, model serving, and developer experience...  ...record of owning low‑latency, reliable backend...  ...open-source inference engines (vLLM, TensorRT-LLM... 
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  •  ...technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional teams. Ideal... 

    Abridge

    San Francisco, CA
    2 days ago
  • $250.8k - $286.2k

     ...responsible and reliable AI systems, changing...  ...science and engineering teams to deliver our...  ...reimagine how we serve our customers and...  ...customers. Our AI models and platforms empower...  ...language model inference, similarity search...  ...scalability, cost, latency, throughput — of... 
    Senior
    Full time
    Part time
    Local area

    Capital One

    San Francisco, CA
    2 days ago
  • A technology startup in San Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience... 

    Baseten

    San Francisco, CA
    4 days ago
  • $216k - $270k

     ...As a Software Engineer on the ML Infrastructure team,...  ...reliable, and efficient serving of LLMs. Our platform powers...  ...integrate and optimize models for production and...  ...LLM, or text-generation-inference. Compensation packages...  ...is to develop reliable AI systems for the world's... 
    Senior
    Full time

    Scale AI

    San Francisco, CA
    1 day ago
  • A tech startup focused on AI workloads is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and...  ...Ideal candidates should have strong software engineering skills and experience with ML inference... 
    Senior

    Gimlet Labs

    San Francisco, CA
    2 days ago
  • $192k - $260k

     ...the world's best data and AI infrastructure platform so...  ...their business. Databricks’ Model Serving product provides enterprises...  .... It offers real-time, low-latency inference, governance, monitoring, and...  ...cost efficiency. As a Staff Engineer, you’ll play a critical role... 
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    1 day ago
  • $204k - $348k

     ...Principal/ Principal Software Engineer, AI Lab Execution System...  ...We are seeking a Senior Principal or Principal...  ...this role, you will serve as a technical leader...  ...Design systems that model scientific intent, experiment...  ...high availability, low latency, observability, fault... 
    Senior
    Full time
    Work at office
    Local area
    Flexible hours

    Lila Sciences

    San Francisco, CA
    4 days ago
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam...  ...At Together.ai, we are building state...  ...for large language models (LLMs). Our mission...  ...role will focus on low-latency, high-throughput inference...  ...high-performance serving. Apply CUDA... 
    Full time

    Together AI

    San Francisco, CA
    1 day ago
  •  ...Machine Learning Engineer, Inference Want to solve...  ...fast-growing voice AI company building the...  ...under production latency constraints. Think...  ...of-the-art speech models actually behave...  ...inference systems behind low-latency...  ...Runtime, and custom serving systems Managing... 
    Remote work
    Flexible hours

    techire ai

    San Francisco, CA
    4 days ago
  • Cubiq Recruitment is seeking a Robotics Software Engineer in San Francisco, California. This technical role focuses on building and optimizing low-latency systems that power advanced robotics applications and AI systems. The ideal candidate will have strong expertise in... 

    Cubiq Recruitment

    San Francisco, CA
    5 days ago
  •  ...Software Engineer Opportunity at Abridge Abridge's services and engineering...  ...to identify performance and latency bottlenecks across all of our...  ...as service templates and self-serve infrastructure. Work with...  ...research as we pioneer new AI-first cloud-native-first... 
    Senior

    Abridge

    San Francisco, CA
    3 days ago
  •  ...responsibilities including performance optimization, systems debugging, and research. The role requires top-tier C++ skills, a strong background in low-level systems, and leadership potential. Candidates will work in a high-pressure, customer-facing environment. This is a full-time,... 
    Senior
    Full time

    Thunder Compute

    San Francisco, CA
    2 days ago
  •  ...Next-Generation Model Serving Platform Architect...  ...Sciforium is an AI infrastructure company...  ...support from AMD engineers the team is...  ...to market. As a senior technical leader,...  ...and distributed inference systems. Develop...  ...models and ensure low-latency, scalable inference... 
    Work at office
    Flexible hours

    Sciforium

    San Francisco, CA
    1 day ago
  •  ...infrastructure, and AI. Our compute-to-...  ...About the Role The inference layer is the...  ...critical path between a model and the image a...  ...sees. As Inference Engineer, you will own that...  ...layer end-to-end: serving architecture, batching...  ...will orchestrate low-latency highly performant... 
    Work at office
    Immediate start
    Remote work
    Flexible hours

    Midjourney

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Model Serving Engineer — Low-Latency Inference. Be the first to apply!