Real-Time LLM Inference & Speech Serving Engineer

$180k - $270k

Plaud

Plaud is seeking talented individuals for AI infrastructure roles in San Francisco, focusing on building high-performance inference engines for speech AI. Ideal candidates will have substantial experience in GPU architecture and real-time systems. This position offers a competitive salary range of $180K - $270K, along with equity and comprehensive benefits including unlimited PTO and hybrid work options. Join Plaud as an early foundational member and make a significant impact in the AI domain. #J-18808-Ljbffr Plaud

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Real-Time LLM Inference & Speech Serving Engineer in San Francisco, CA vacancy

LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together... ...parallelism for high-performance serving. Apply CUDA graph optimizations... ...base salary range for this full-time position is: $160,000 - $230,000 +...
Suggested
Full time
Together AI
San Francisco, CA
1 day ago
Distributed LLM Inference Engineer
...progress of AI applications out into the real world. With Anyscale, were building the... ...date. About the role As a Distributed LLM Inference Engineer, you will help with systems and... ...consistent. As the market data changes over time, the target salary for this role may be...
Suggested
Work at office
Anyscale
San Francisco, CA
4 days ago
Inference Engineer
...Description Machine Learning Engineer, Inference Want to solve realtime... ...building the realtime speech infrastructure layer behind... ...reliably, and fast enough for real human interaction. Your work... ..., ONNX Runtime, and custom serving systems Managing KV cache...
Suggested
Remote work
Flexible hours
techire ai
San Francisco, CA
4 days ago
Inference Engineer
...and is scaling a world-class engineering team across inference, distributed systems,... ...modern models end-to-end under real production constraints. Responsibilities... ...owning inference or model serving infrastructure end‑to‑end... ...Experience with TensorRT-LLM, vLLM, or custom inference...
Suggested
Acceler8 Talent
San Francisco, CA
2 days ago
Inference Engineer
...Tech firm. About the Role The inference layer is the critical path between... ...image a user sees. As Inference Engineer, you will own that layer end-to-end: serving architecture, batching, queue... ...subsecond image generation and real-time experiences. The work involves...
Suggested
Work at office
Immediate start
Remote work
Flexible hours
Midjourney
San Francisco, CA
2 days ago
Founding Engineer, ML Inference
...Building a new kind of platform for real-time generative media, enabling... ...unicorn founders and senior engineers with deep expertise in 3D,... ...for a Founding Engineer, ML Inference with deep expertise in high-performance... ...You'll work across the model-serving stack, designing novel...
Relocation
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
9 hours ago
Realtime Inference Engineer — Scalable AI Serving
Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design and build scalable, low-latency model inference systems while collaborating with researchers. The ideal candidate has strong engineering skills...
Flexible hours
Cartesia
San Francisco, CA
2 days ago
Distributed Systems Engineer, Data & Inference Platform
...intelligence that evolves in real-time. Our vision is AI systems... ...useful intelligence - the inference services that serve LLMs at scale and the data... ...about both. Researchers and ML engineers will hand you workloads... ...: hands-on experience with LLM inference engines (vLLM, SGLang...
Flexible hours
Adaption
San Francisco, CA
3 days ago
GPU Kernel Engineer
...GPU Kernel Engineer Sciforium is an AI infrastructure... ..., high-efficiency serving platform. Backed by multi... ...AI models and real-time applications. About... ...large-scale training and inference. This role is ideal... ...focus on large-scale LLM training and inference...
Flexible hours
Sciforium
San Francisco, CA
1 day ago
Performance Engineer, Inference Systems
$350k
...committed researchers, engineers, policy experts, and... ...the Role Anthropic's inference fleet serves Claude to millions of... ...regression from request timing down through routing and... ...or general LLM serving stacks. Direct... ...signals reliably catch real model-output regressions...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
1 hour ago
Customer Success Engineer Co-op - Fall Sales Co-op Program 2026
Introduction A Customer Success Engineer Co-Op (CSE) focused on IBM... ...eligible and available for full time work (40 hours a week) during... ...experience with model deployment (e.g. serving Hugging Face models via APIs) and LLM inference (batch vs. real‑time). Understanding of model...
Full time
IBM
San Francisco, CA
3 days ago
Monocular SLAM Engineer for Real-Time 3D Mapping
An innovative AI solutions company in San Francisco seeks a Perception Engineer to develop and optimize monocular SLAM algorithms for real-time localization and 3D mapping. The ideal candidate will have strong expertise in C++ and Python, with a solid background in computer...
EchoTwin AI
San Francisco, CA
2 days ago
Distributed LLM Inference Engineer - Scale AI at Speed
Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
Anyscale
San Francisco, CA
4 days ago
Speech LLM Engineer: Audio AI Training
$180k - $270k
...This role involves building advanced audio and speech models and includes responsibilities related to research and engineering. Successful candidates will earn a competitive... ...comprehensive healthcare, 401(k) plans, and unlimited paid time off. #J-18808-Ljbffr Plaud
Work at office
Plaud
San Francisco, CA
1 day ago
Real-Time Marketplace Engineer I
$153k - $169k
Uber Technologies, Inc. is looking for a Software Engineer I in San Francisco, CA, with a focus on developing and optimizing algorithms for real-time supply and demand matching for drivers and riders. The salary range is $153,000 to $169,000 annually. The position offers...
Remote work
Wilder Wealthy & Wise
San Francisco, CA
5 days ago
Monocular SLAM Engineer for Real-Time 3D Mapping
...leading technology firm in San Francisco is seeking a skilled Perception Engineer to develop SLAM systems using monocular cameras. The ideal candidate will design and optimize algorithms for robust real-time localization and mapping in dynamic environments. Candidates should...
EchoTwin AI, Inc.
San Francisco, CA
9 hours ago
Model Implementation Engineer
...Model Implementation Engineer Sciforium is an AI... ...proprietary, high-efficiency serving platform. Backed by... ...AI models and real-time applications. We are... ...domains (e.g., NLP, vision, speech, generative models).... ...scale model training or inference systems....
Flexible hours
Sciforium
San Francisco, CA
1 day ago
Real-Time Multimodal AI Systems Engineer
A cutting-edge AI company in San Francisco is seeking a Software Engineer to advance real-time multimodal intelligence. The role focuses on designing low latency model inference systems and collaborating with research teams. Strong engineering skills in technologies like...
Work at office
Cartesia
San Francisco, CA
9 hours ago
Remote AI Systems Engineer - Real-Time Multimodal AI
1mind is seeking an AI Systems Engineer to design and build the technical foundation for Superhumans, real-time multimodal AI beings. You'll integrate large language, speech, and rendering models into a low-latency system, enabling human-like interactions. This role requires...
Remote job
Work at office
1mind
San Francisco, CA
2 days ago
Production AI Engineer — Real-Time Market Intelligence
...will help us develop an intelligence system that dynamically interacts with a live marketplace, pricing in real time. With 2-3 years of experience shipping LLM products, you’ll play a key role in transforming how the construction industry operates, leveraging technology...
Arbor
San Francisco, CA
1 day ago
IR Vision Systems Engineer - Real-Time Space Embedded
etc. is hiring a Vision Systems Engineer in San Francisco to develop detection and tracking algorithms for space-based IR sensing programs. This role involves deploying real-time software solutions on embedded hardware for US national security missions. Candidates should...
etc.
San Francisco, CA
4 days ago
Forward Deployed Engineer, Lead - LLM Post-training
...Employment Type Full time Location Type On-site... ...building our Forward Deployed Engineering function. This team... ...‑edge AI research with real‑world enterprise deployments... ...orchestrating complex LLM workflows, integrating... ...internal codebase for inference, fine‑tuning, and...
Full time
Relocation package
B Capital
San Francisco, CA
9 hours ago
Billing Systems Engineer: Real-Time Usage & Payments
$160k - $200k
A leading technology company in San Francisco seeks a Software Engineer to design scalable, event-driven billing systems. You will integrate with Stripe and Orb for real-time usage tracking and payments. Responsibilities include developing microservices on Kubernetes, collaborating...
Fal
San Francisco, CA
1 day ago
Senior Real-Time Robotic Vision Engineer (Equity)
$170k - $240k
Chef Robotics, based in San Francisco, is seeking a Perception Engineer to develop robust AI and robotics solutions. This role involves designing deep learning models and optimizing them for real-time performance, specifically in food robotics. Candidates should have 5+...
Flexible hours
Chef Robotics
San Francisco, CA
3 days ago
Remote Real-Time Distributed Systems Engineer
MLabs Ltd is seeking a talented engineer to design and implement core systems for a real-time distributed platform. Based in New York, the role demands expertise in Rust and extensive experience in building distributed systems. Candidates will have the opportunity for significant...
Remote job
MLabs Ltd
San Francisco, CA
2 days ago
Machine Learning Engineer: LLM Interpretability & Systems
...produces auditable decisions in real time. While we sit on the edge... ...By bridging the gap between LLM capabilities and domain-... ...CTGT's Senior Machine Learning Engineer will operate deep within the... ...deterministic policy enforcement at inference time. Who You Are...
CTGT
San Francisco, CA
2 days ago
Senior Real-Time Pose Estimation Engineer (SLAM & Sensors)
A tech company specializing in computer vision seeks a Senior State Estimation Engineer in San Francisco to develop algorithms for real-time pose estimation and mapping. The ideal candidate holds a Master's degree and has over 5 years of experience in software engineering...
Hayden AI
San Francisco, CA
4 days ago
Senior Real-Time Pose Estimation Engineer (SLAM & Sensors)
Hayden AI Technologies, Inc. in San Francisco is seeking a Senior State Estimation Engineer to derive and implement novel algorithms for real-time pose estimation. This role involves collaborating on high-impact projects, developing software in C++ and Python, and mentoring...
Hayden AI Technologies, Inc.
San Francisco, CA
4 days ago
Senior Real-Time Pose Estimation Engineer (SLAM & Sensors)
AI Chopping Block, Inc. is seeking a Senior State Estimation Engineer to develop novel real-time pose estimation algorithms and solve large-scale mapping challenges. Candidates should have a Master's degree in a relevant field and over five years of experience in software...
AI Chopping Block, Inc.
San Francisco, CA
9 hours ago
Field Engineering Intern - Summer 2026
...cloud infrastructure serving tens of thousands... .... The Field Engineering team is a group of... ...the team, dig into real customer optimization... ...experience in ML inference, model optimization... ...Familiarity with LLM inference optimization... ...) Flexible paid time off plan that we...
Hourly pay
Summer work
Internship
Work at office
Local area
Flexible hours
Lambda Inc.
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Real-Time LLM Inference & Speech Serving Engineer. Be the first to apply!