Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Real-Time LLM Inference & Speech Serving Engineer

$180k - $270k

Plaud

Plaud is seeking talented individuals for AI infrastructure roles in San Francisco, focusing on building high-performance inference engines for speech AI. Ideal candidates will have substantial experience in GPU architecture and real-time systems. This position offers a competitive salary range of $180K - $270K, along with equity and comprehensive benefits including unlimited PTO and hybrid work options. Join Plaud as an early foundational member and make a significant impact in the AI domain. #J-18808-Ljbffr Plaud

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Real-Time LLM Inference & Speech Serving Engineer in San Francisco, CA vacancy
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together...  ...parallelism for high-performance serving. Apply CUDA graph optimizations...  ...base salary range for this full-time position is: $160,000 - $230,000 +... 
    Suggested
    Full time

    Together AI

    San Francisco, CA
    1 day ago
  •  ...progress of AI applications out into the real world. With Anyscale, were building the...  ...date. About the role As a Distributed LLM Inference Engineer, you will help with systems and...  ...consistent. As the market data changes over time, the target salary for this role may be... 
    Suggested
    Work at office

    Anyscale

    San Francisco, CA
    4 days ago
  •  ...Description Machine Learning Engineer, Inference Want to solve realtime...  ...building the realtime speech infrastructure layer behind...  ...reliably, and fast enough for real human interaction. Your work...  ..., ONNX Runtime, and custom serving systems Managing KV cache... 
    Suggested
    Remote work
    Flexible hours

    techire ai

    San Francisco, CA
    4 days ago
  •  ...and is scaling a world-class engineering team across inference, distributed systems,...  ...modern models end-to-end under real production constraints. Responsibilities...  ...owning inference or model serving infrastructure end‑to‑end...  ...Experience with TensorRT-LLM, vLLM, or custom inference... 
    Suggested

    Acceler8 Talent

    San Francisco, CA
    2 days ago
  •  ...Tech firm. About the Role The inference layer is the critical path between...  ...image a user sees. As Inference Engineer, you will own that layer end-to-end: serving architecture, batching, queue...  ...subsecond image generation and real-time experiences. The work involves... 
    Suggested
    Work at office
    Immediate start
    Remote work
    Flexible hours

    Midjourney

    San Francisco, CA
    2 days ago
  •  ...Building a new kind of platform for real-time generative media, enabling...  ...unicorn founders and senior engineers with deep expertise in 3D,...  ...for a Founding Engineer, ML Inference with deep expertise in high-performance...  ...You'll work across the model-serving stack, designing novel... 
    Relocation
    Visa sponsorship
    Relocation package

    Reactor

    San Francisco, CA
    9 hours ago
  • Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design and build scalable, low-latency model inference systems while collaborating with researchers. The ideal candidate has strong engineering skills... 
    Flexible hours

    Cartesia

    San Francisco, CA
    2 days ago
  •  ...intelligence that evolves in real-time. Our vision is AI systems...  ...useful intelligence - the inference services that serve LLMs at scale and the data...  ...about both. Researchers and ML engineers will hand you workloads...  ...: hands-on experience with LLM inference engines (vLLM, SGLang... 
    Flexible hours

    Adaption

    San Francisco, CA
    3 days ago
  •  ...GPU Kernel Engineer Sciforium is an AI infrastructure...  ..., high-efficiency serving platform. Backed by multi...  ...AI models and real-time applications. About...  ...large-scale training and inference. This role is ideal...  ...focus on large-scale LLM training and inference... 
    Flexible hours

    Sciforium

    San Francisco, CA
    1 day ago
  • $350k

     ...committed researchers, engineers, policy experts, and...  ...the Role Anthropic's inference fleet serves Claude to millions of...  ...regression from request timing down through routing and...  ...or general LLM serving stacks. Direct...  ...signals reliably catch real model-output regressions... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    1 hour ago
  • Introduction A Customer Success Engineer Co-Op (CSE) focused on IBM...  ...eligible and available for full time work (40 hours a week) during...  ...experience with model deployment (e.g. serving Hugging Face models via APIs) and LLM inference (batch vs. real‑time). Understanding of model... 
    Full time

    IBM

    San Francisco, CA
    3 days ago
  • An innovative AI solutions company in San Francisco seeks a Perception Engineer to develop and optimize monocular SLAM algorithms for real-time localization and 3D mapping. The ideal candidate will have strong expertise in C++ and Python, with a solid background in computer... 

    EchoTwin AI

    San Francisco, CA
    2 days ago
  • Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open... 

    Anyscale

    San Francisco, CA
    4 days ago
  • $180k - $270k

     ...This role involves building advanced audio and speech models and includes responsibilities related to research and engineering. Successful candidates will earn a competitive...  ...comprehensive healthcare, 401(k) plans, and unlimited paid time off. #J-18808-Ljbffr Plaud
    Work at office

    Plaud

    San Francisco, CA
    1 day ago
  • $153k - $169k

    Uber Technologies, Inc. is looking for a Software Engineer I in San Francisco, CA, with a focus on developing and optimizing algorithms for real-time supply and demand matching for drivers and riders. The salary range is $153,000 to $169,000 annually. The position offers... 
    Remote work

    Wilder Wealthy & Wise

    San Francisco, CA
    5 days ago
  •  ...leading technology firm in San Francisco is seeking a skilled Perception Engineer to develop SLAM systems using monocular cameras. The ideal candidate will design and optimize algorithms for robust real-time localization and mapping in dynamic environments. Candidates should... 

    EchoTwin AI, Inc.

    San Francisco, CA
    9 hours ago
  •  ...Model Implementation Engineer Sciforium is an AI...  ...proprietary, high-efficiency serving platform. Backed by...  ...AI models and real-time applications. We are...  ...domains (e.g., NLP, vision, speech, generative models)....  ...scale model training or inference systems.... 
    Flexible hours

    Sciforium

    San Francisco, CA
    1 day ago
  • A cutting-edge AI company in San Francisco is seeking a Software Engineer to advance real-time multimodal intelligence. The role focuses on designing low latency model inference systems and collaborating with research teams. Strong engineering skills in technologies like... 
    Work at office

    Cartesia

    San Francisco, CA
    9 hours ago
  • 1mind is seeking an AI Systems Engineer to design and build the technical foundation for Superhumans, real-time multimodal AI beings. You'll integrate large language, speech, and rendering models into a low-latency system, enabling human-like interactions. This role requires... 
    Remote job
    Work at office

    1mind

    San Francisco, CA
    2 days ago
  •  ...will help us develop an intelligence system that dynamically interacts with a live marketplace, pricing in real time. With 2-3 years of experience shipping LLM products, you’ll play a key role in transforming how the construction industry operates, leveraging technology... 

    Arbor

    San Francisco, CA
    1 day ago
  • etc. is hiring a Vision Systems Engineer in San Francisco to develop detection and tracking algorithms for space-based IR sensing programs. This role involves deploying real-time software solutions on embedded hardware for US national security missions. Candidates should... 

    etc.

    San Francisco, CA
    4 days ago
  •  ...Employment Type Full time Location Type On-site...  ...building our Forward Deployed Engineering function. This team...  ...‑edge AI research with real‑world enterprise deployments...  ...orchestrating complex LLM workflows, integrating...  ...internal codebase for inference, fine‑tuning, and... 
    Full time
    Relocation package

    B Capital

    San Francisco, CA
    9 hours ago
  • $160k - $200k

    A leading technology company in San Francisco seeks a Software Engineer to design scalable, event-driven billing systems. You will integrate with Stripe and Orb for real-time usage tracking and payments. Responsibilities include developing microservices on Kubernetes, collaborating... 

    Fal

    San Francisco, CA
    1 day ago
  • $170k - $240k

    Chef Robotics, based in San Francisco, is seeking a Perception Engineer to develop robust AI and robotics solutions. This role involves designing deep learning models and optimizing them for real-time performance, specifically in food robotics. Candidates should have 5+... 
    Flexible hours

    Chef Robotics

    San Francisco, CA
    3 days ago
  • MLabs Ltd is seeking a talented engineer to design and implement core systems for a real-time distributed platform. Based in New York, the role demands expertise in Rust and extensive experience in building distributed systems. Candidates will have the opportunity for significant... 
    Remote job

    MLabs Ltd

    San Francisco, CA
    2 days ago
  •  ...produces auditable decisions in real time. While we sit on the edge...  ...By bridging the gap between LLM capabilities and domain-...  ...CTGT's Senior Machine Learning Engineer will operate deep within the...  ...deterministic policy enforcement at inference time. Who You Are... 

    CTGT

    San Francisco, CA
    2 days ago
  • A tech company specializing in computer vision seeks a Senior State Estimation Engineer in San Francisco to develop algorithms for real-time pose estimation and mapping. The ideal candidate holds a Master's degree and has over 5 years of experience in software engineering... 

    Hayden AI

    San Francisco, CA
    4 days ago
  • Hayden AI Technologies, Inc. in San Francisco is seeking a Senior State Estimation Engineer to derive and implement novel algorithms for real-time pose estimation. This role involves collaborating on high-impact projects, developing software in C++ and Python, and mentoring... 

    Hayden AI Technologies, Inc.

    San Francisco, CA
    4 days ago
  • AI Chopping Block, Inc. is seeking a Senior State Estimation Engineer to develop novel real-time pose estimation algorithms and solve large-scale mapping challenges. Candidates should have a Master's degree in a relevant field and over five years of experience in software... 

    AI Chopping Block, Inc.

    San Francisco, CA
    9 hours ago
  •  ...cloud infrastructure serving tens of thousands...  .... The Field Engineering team is a group of...  ...the team, dig into real customer optimization...  ...experience in ML inference, model optimization...  ...Familiarity with LLM inference optimization...  ...) Flexible paid time off plan that we... 
    Hourly pay
    Summer work
    Internship
    Work at office
    Local area
    Flexible hours

    Lambda Inc.

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Real-Time LLM Inference & Speech Serving Engineer. Be the first to apply!