Real-Time LLM Inference & Speech Serving Engineer
$180k - $270kPlaud
Plaud is seeking talented individuals for AI infrastructure roles in San Francisco, focusing on building high-performance inference engines for speech AI. Ideal candidates will have substantial experience in GPU architecture and real-time systems. This position offers a competitive salary range of $180K - $270K, along with equity and comprehensive benefits including unlimited PTO and hybrid work options. Join Plaud as an early foundational member and make a significant impact in the AI domain. #J-18808-Ljbffr Plaud
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together... ...parallelism for high-performance serving. Apply CUDA graph optimizations... ...base salary range for this full-time position is: $160,000 - $230,000 +...SuggestedFull time- ...progress of AI applications out into the real world. With Anyscale, were building the... ...date. About the role As a Distributed LLM Inference Engineer, you will help with systems and... ...consistent. As the market data changes over time, the target salary for this role may be...SuggestedWork at office
- ...Description Machine Learning Engineer, Inference Want to solve realtime... ...building the realtime speech infrastructure layer behind... ...reliably, and fast enough for real human interaction. Your work... ..., ONNX Runtime, and custom serving systems Managing KV cache...SuggestedRemote workFlexible hours
- ...and is scaling a world-class engineering team across inference, distributed systems,... ...modern models end-to-end under real production constraints. Responsibilities... ...owning inference or model serving infrastructure end‑to‑end... ...Experience with TensorRT-LLM, vLLM, or custom inference...Suggested
- ...Tech firm. About the Role The inference layer is the critical path between... ...image a user sees. As Inference Engineer, you will own that layer end-to-end: serving architecture, batching, queue... ...subsecond image generation and real-time experiences. The work involves...SuggestedWork at officeImmediate startRemote workFlexible hours
- ...Building a new kind of platform for real-time generative media, enabling... ...unicorn founders and senior engineers with deep expertise in 3D,... ...for a Founding Engineer, ML Inference with deep expertise in high-performance... ...You'll work across the model-serving stack, designing novel...RelocationVisa sponsorshipRelocation package
- Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design and build scalable, low-latency model inference systems while collaborating with researchers. The ideal candidate has strong engineering skills...Flexible hours
- ...intelligence that evolves in real-time. Our vision is AI systems... ...useful intelligence - the inference services that serve LLMs at scale and the data... ...about both. Researchers and ML engineers will hand you workloads... ...: hands-on experience with LLM inference engines (vLLM, SGLang...Flexible hours
- ...GPU Kernel Engineer Sciforium is an AI infrastructure... ..., high-efficiency serving platform. Backed by multi... ...AI models and real-time applications. About... ...large-scale training and inference. This role is ideal... ...focus on large-scale LLM training and inference...Flexible hours
$350k
...committed researchers, engineers, policy experts, and... ...the Role Anthropic's inference fleet serves Claude to millions of... ...regression from request timing down through routing and... ...or general LLM serving stacks. Direct... ...signals reliably catch real model-output regressions...Work at officeVisa sponsorshipFlexible hours- Introduction A Customer Success Engineer Co-Op (CSE) focused on IBM... ...eligible and available for full time work (40 hours a week) during... ...experience with model deployment (e.g. serving Hugging Face models via APIs) and LLM inference (batch vs. real ‑time). Understanding of model...Full time
- An innovative AI solutions company in San Francisco seeks a Perception Engineer to develop and optimize monocular SLAM algorithms for real-time localization and 3D mapping. The ideal candidate will have strong expertise in C++ and Python, with a solid background in computer...
- Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
$180k - $270k
...This role involves building advanced audio and speech models and includes responsibilities related to research and engineering. Successful candidates will earn a competitive... ...comprehensive healthcare, 401(k) plans, and unlimited paid time off. #J-18808-Ljbffr PlaudWork at office$153k - $169k
Uber Technologies, Inc. is looking for a Software Engineer I in San Francisco, CA, with a focus on developing and optimizing algorithms for real-time supply and demand matching for drivers and riders. The salary range is $153,000 to $169,000 annually. The position offers...Remote work- ...leading technology firm in San Francisco is seeking a skilled Perception Engineer to develop SLAM systems using monocular cameras. The ideal candidate will design and optimize algorithms for robust real-time localization and mapping in dynamic environments. Candidates should...
- ...Model Implementation Engineer Sciforium is an AI... ...proprietary, high-efficiency serving platform. Backed by... ...AI models and real-time applications. We are... ...domains (e.g., NLP, vision, speech, generative models).... ...scale model training or inference systems....Flexible hours
- A cutting-edge AI company in San Francisco is seeking a Software Engineer to advance real-time multimodal intelligence. The role focuses on designing low latency model inference systems and collaborating with research teams. Strong engineering skills in technologies like...Work at office
- 1mind is seeking an AI Systems Engineer to design and build the technical foundation for Superhumans, real-time multimodal AI beings. You'll integrate large language, speech, and rendering models into a low-latency system, enabling human-like interactions. This role requires...Remote jobWork at office
- ...will help us develop an intelligence system that dynamically interacts with a live marketplace, pricing in real time. With 2-3 years of experience shipping LLM products, you’ll play a key role in transforming how the construction industry operates, leveraging technology...
- etc. is hiring a Vision Systems Engineer in San Francisco to develop detection and tracking algorithms for space-based IR sensing programs. This role involves deploying real-time software solutions on embedded hardware for US national security missions. Candidates should...
- ...Employment Type Full time Location Type On-site... ...building our Forward Deployed Engineering function. This team... ...‑edge AI research with real‑world enterprise deployments... ...orchestrating complex LLM workflows, integrating... ...internal codebase for inference, fine‑tuning, and...Full timeRelocation package
$160k - $200k
A leading technology company in San Francisco seeks a Software Engineer to design scalable, event-driven billing systems. You will integrate with Stripe and Orb for real-time usage tracking and payments. Responsibilities include developing microservices on Kubernetes, collaborating...$170k - $240k
Chef Robotics, based in San Francisco, is seeking a Perception Engineer to develop robust AI and robotics solutions. This role involves designing deep learning models and optimizing them for real-time performance, specifically in food robotics. Candidates should have 5+...Flexible hours- MLabs Ltd is seeking a talented engineer to design and implement core systems for a real-time distributed platform. Based in New York, the role demands expertise in Rust and extensive experience in building distributed systems. Candidates will have the opportunity for significant...Remote job
- ...produces auditable decisions in real time. While we sit on the edge... ...By bridging the gap between LLM capabilities and domain-... ...CTGT's Senior Machine Learning Engineer will operate deep within the... ...deterministic policy enforcement at inference time. Who You Are...
- A tech company specializing in computer vision seeks a Senior State Estimation Engineer in San Francisco to develop algorithms for real-time pose estimation and mapping. The ideal candidate holds a Master's degree and has over 5 years of experience in software engineering...
- Hayden AI Technologies, Inc. in San Francisco is seeking a Senior State Estimation Engineer to derive and implement novel algorithms for real-time pose estimation. This role involves collaborating on high-impact projects, developing software in C++ and Python, and mentoring...
- AI Chopping Block, Inc. is seeking a Senior State Estimation Engineer to develop novel real-time pose estimation algorithms and solve large-scale mapping challenges. Candidates should have a Master's degree in a relevant field and over five years of experience in software...
- ...cloud infrastructure serving tens of thousands... .... The Field Engineering team is a group of... ...the team, dig into real customer optimization... ...experience in ML inference, model optimization... ...Familiarity with LLM inference optimization... ...) Flexible paid time off plan that we...Hourly paySummer workInternshipWork at officeLocal areaFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Real-Time LLM Inference & Speech Serving Engineer. Be the first to apply!


