Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Model Serving Engineer Low-Latency Inference

Jobleads-US

A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates will have a strong foundation in algorithms and system design, along with a passion for mentoring others. The position offers a competitive salary and generous benefits. #J-18808-Ljbffr Jobleads-US

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Senior AI Model Serving Engineer Low-Latency Inference in San Francisco, CA vacancy
  • A leading data and AI company in San Francisco is seeking a Staff Engineer to design and implement systems for their AI/ML Model Serving platform. You will collaborate with product, infrastructure, and research teams to ensure high-performance system delivery. The ideal... 
    Senior

    Menlo Ventures

    San Francisco, CA
    2 days ago
  • $220k - $320k

    ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models... 
    Suggested
    3 days per week

    Trades Workforce Solutions

    San Francisco, CA
    2 days ago
  • $325k

    A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate has over 5 years of software engineering experience, strong familiarity with ML architectures, and experience... 
    Senior

    Jobleads-US

    San Francisco, CA
    4 days ago
  • $166k - $225k

     ...the world's best data and AI infrastructure platform so...  ...their business. Databricks’ Model Serving product provides...  ...models. It offers real-time, low-latency inference, governance, monitoring, and...  ...and cost efficiency. As a Senior Engineer, you’ll play a critical role... 
    Senior
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    1 day ago
  • MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability...  ...in production-grade serving infrastructure, be fluent in Python... 
    Senior

    MakerMaker.AI

    San Francisco, CA
    1 day ago
  • $167.2k - $209k

     ...DigitalOcean is expanding its AI Infrastructure layer to...  .... We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role...  ...can deploy and scale their models with industry-leading performance...  ...distributed inference serving frameworks such as llm‑d,... 
    Senior
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    3 days ago
  • $220k - $320k

     ...Help us make inference blazingly fast. If you...  ...specialized language models for companies that...  ...frontier-quality AI at a fraction of...  ...ten-person team of engineers who work in-person...  ...with the goal of serving models faster and...  ...inference performance: latency, throughput, cost... 
    Senior
    Work at office

    Inference

    San Francisco, CA
    3 days ago
  • $200k

    Plaud is seeking skilled AI engineers to join their core SpeechLLM lab in San Francisco. You will play a crucial role in building high-throughput inference engines for conversational AI and optimizing GPU performance while collaborating with various teams. The position... 
    Work at office

    Plaud

    San Francisco, CA
    4 days ago
  • Sciforium is an AI infrastructure...  ...multimodal AI models and a proprietary...  ...-efficiency serving platform. Backed...  ...support from AMD engineers the team is...  ...internal performance, latency, and efficiency...  ...with low‑level performance...  ...model training or inference systems. Contributions... 
    Flexible hours

    Sciforium

    San Francisco, CA
    4 days ago
  • $225k

     ...generation to improve models and solve alignment more...  ...ultra‑long context, and inference‑time compute to achieve...  ...The Role As a Software Engineer on the Inference & RL...  ...distributed systems that serve our models in...  ...that determine inference latency, throughput, stability,... 
    Senior
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    17 hours ago
  • $142.7k - $270.95k

     ...is seeking a Senior researcher - Machine...  ...& Efficiency Engineer to join our R&...  ...in inference performance, latency, and cost efficiency...  ...intersection of model architecture,...  ...Intelligence (AI), ML systems,...  ...Responsibilities Inference & Serving Optimization:...  ...-throughput, low-latency... 
    Senior
    Full time
    Temporary work
    Local area
    Worldwide

    Adobe

    San Francisco, CA
    2 days ago
  • A leading AI platform company in San Francisco is looking for a Senior Infrastructure Engineer to design and operate production infrastructure for high-scale, low-latency systems. Your focus will be on critical services, improving reliability, and enhancing developer velocity... 
    Senior

    Decagon

    San Francisco, CA
    3 days ago
  •  ...Role Our generative AI-powered products are...  ...of medicine—and the inference systems that power them...  ...We’re looking for an Engineering Manager to lead and grow our Model Inference team. The...  ...of how our models are served: from architecting low-latency, high-throughput infrastructure... 
    Hourly pay
    Full time
    Flexible hours

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  •  ...research, design, and engineering. Our organization...  ...will lead the Model Routing & Inference team at Cursor,...  ...that powers every AI interaction in...  ...calls that balance latency, cost, reliability...  ...high‑throughput, low‑latency...  ...especially in inference serving, traffic routing,... 

    Anysphere

    San Francisco, CA
    2 days ago
  • $250.8k - $286.2k

     ...responsible and reliable AI systems, changing...  ...science and engineering teams to deliver our...  ...reimagine how we serve our customers and...  ...customers. Our AI models and platforms empower...  ...language model inference, similarity search...  ...scalability, cost, latency, throughput — of... 
    Senior
    Full time
    Part time
    Local area

    Capital One

    San Francisco, CA
    3 days ago
  •  ...BASETEN Baseten powers inference for the world's most dynamic AI companies, like...  ...bring cutting-edge models into production. With...  ...systems, model serving, and developer experience...  ...record of owning low‑latency, reliable backend...  ...open-source inference engines (vLLM, TensorRT-LLM... 
    Flexible hours

    Baseten

    San Francisco, CA
    3 days ago
  • Senior Infrastructure Engineer - Bland As a Senior Infrastructure Engineer...  ...with strict latency and reliability requirements...  ...and real-time inference serving across multiple regions...  ...industries. Lead - AI/ML Stack Infrastructure...  ...containerized AI models. The engineer will lead... 
    Senior
    Temporary work

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  • YO IT Consulting is seeking an experienced Senior Civil Engineer specializing in evaluating AI-generated content. This remote role involves ensuring technical accuracy, challenging AI models with real-world engineering scenarios, and shaping AI communication standards.... 
    Senior
    Remote job

    YO IT Consulting

    San Francisco, CA
    2 days ago
  • $166.9k - $225.9k

     ...operates as both a central engineering function and an embedded reliability...  ...with peers—while also serving as the dedicated reliability...  ...Experience with AIOps—using AI/ML‑based tooling for anomaly...  ...‑backed services (e.g., LLM inference latency, non‑determinism, prompt... 
    Senior
    Flexible hours

    Drata

    San Francisco, CA
    3 days ago
  • Xcede is looking for a Member of Technical Staff focused on AI Safety to lead red-teaming efforts and ensure the robustness of next...  ...should have deep expertise in LLM safety, strong software engineering skills, and relevant academic qualifications in AI or related fields... 
    Senior

    Xcede

    San Francisco, CA
    3 days ago
  •  ...technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional teams. Ideal... 

    Abridge

    San Francisco, CA
    2 days ago
  • $325k

    About the Team Our Inference team brings OpenAI's most capable research...  ...access our start-of-the-art AI models, allowing them to do things...  ...Role We are looking for an engineer who wants to take the world'...  ...for use in a high-volume, low-latency, and high-availability... 

    Centaur Labs

    San Francisco, CA
    4 days ago
  •  ...Company: Sequen AI is leading the...  ...building frontier ranking models for search and...  ...frontier AI models serve production traffic at...  ...for a Core Systems Engineer with a deep mastery...  ...and compute complex inference logic with ultra-low latency. You will replace high... 
    Senior

    Sequen

    San Francisco, CA
    17 hours ago
  • A technology startup in San Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience... 

    Baseten

    San Francisco, CA
    4 days ago
  • A tech startup focused on AI workloads is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and...  ...Ideal candidates should have strong software engineering skills and experience with ML inference... 
    Senior

    Gimlet Labs

    San Francisco, CA
    2 days ago
  • $192k - $260k

     ...world’s best data and AI infrastructure platform...  ...business. Foundation Model Serving is the API Product for...  ...frontier AI model inference for open source models...  ...necessary. We’re looking for engineers who have owned high‑...  ...high‑throughput, low‑latency inference on GPU workloads... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    4 days ago
  • $192k - $260k

     ...the world's best data and AI infrastructure platform so...  ...their business. Databricks’ Model Serving product provides enterprises...  .... It offers real-time, low-latency inference, governance, monitoring, and...  ...cost efficiency. As a Staff Engineer, you’ll play a critical role... 
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    1 day ago
  • $250k

     ...Ready to architect AI infrastructure...  ...a serverless inference platform, beginning...  ...expanding into low-latency, real-time inference and custom model hosting. This is...  ...chance to join as a Senior Inference Platform Engineer at an early...  ...latest models, serving frameworks, and... 
    Senior
    Permanent employment
    San Francisco, CA
    more than 2 months ago
  • Pantera Capital is looking for a Model Performance Engineer in San Francisco, California to optimize model inference speed, cost, and reliability....  ...infrastructure that accelerates the AI team’s processes. The role covers optimizing serving frameworks and ensuring... 

    Pantera Capital

    San Francisco, CA
    4 days ago
  • $217k - $312.2k

     ...the world's best data and AI infrastructure platform so...  ...their business. Databricks’ Model Serving product provides...  ...models. It offers real‑time, low‑latency inference, governance, monitoring, and...  ...and cost efficiency. As a Senior Engineering Manager, you will lead the... 
    Senior
    Local area
    Worldwide

    Databricks Inc.

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Model Serving Engineer Low-Latency Inference. Be the first to apply!