Real-Time Inference & Model Serving Engineer (Equity)

$220k - $320k

Trades Workforce Solutions

ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models running in production, not offline experiments. They’re building real-time AI systems that need to respond instantly, reliably, and at scale. That means solving hard problems around batching, GPU efficiency, memory constraints, and system-level bottlenecks that most teams never fully crack. You’ll sit at the core of the platform, working across model serving, infrastructure, and performance optimisation. A big part of the role is pushing current tooling beyond its limits, extending frameworks, profiling bottlenecks, and designing systems that hold up under real-world load. This is not about training models. It’s about making them fast, efficient, and production-ready. What you’ll work on: Building high-performance serving systems for LLM, speech, and vision models Scaling inference to production workloads with strict latency requirements Optimising GPU utilisation and execution efficiency Implementing techniques like continuous batching, KV cache optimisation, speculative decoding, and prefill/decode separation Improving frameworks such as vLLM, TensorRT-LLM, Triton, and SGLang Profiling and debugging performance across GPU, memory, and system layers What you’ll bring: Strong experience with ML inference or model serving systems Deep understanding of latency and throughput optimisation in production Solid Python and PyTorch skills, plus a systems or performance engineering mindset Familiarity with distributed systems and production infrastructure Exposure to CUDA, GPU profiling tools, or systems like Kubernetes and Ray is useful, but the key is knowing how to make models run efficiently at scale. You’ll join a highly technical team with experience across major AI labs and big tech. The environment is pragmatic, focused on solving real performance problems rather than abstract research. There’s real ownership here. You’ll help define how next-generation AI systems are served. Package: $220,000 – $320,000 base + equity San Francisco, onsite 3 days per week If you’re interested in working on the part of AI that actually determines whether it works in the real world, this is worth exploring. All applicants will receive a response. #J-18808-Ljbffr Trades Workforce Solutions

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Real-Time Inference & Model Serving Engineer (Equity) in San Francisco, CA vacancy

Real-Time GPU Inference Optimization Engineer
$300k
...Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal... ...and a knack for optimizing inference latency for large generative models. With a competitive base... ...to ~$300,000 and meaningful equity, this opportunity emphasizes...
Employment Equity
Visa sponsorship
Relocation package
Trades Workforce Solutions
San Francisco, CA
2 days ago
Model Implementation Engineer
...generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed... ...support from AMD engineers the team is... ...frontier AI models and real‑time applications.... ...model training or inference systems. Contributions... ...salary and equity Equal opportunity...
Employment Equity
Flexible hours
Sciforium
San Francisco, CA
4 days ago
Engineering Manager, Model Inference
...medicine—and the inference systems that power... ...re looking for an Engineering Manager to lead and grow our Model Inference team. The... ...our models are served: from architecting... ...reliable, distributed, real‑time systems at scale... ...Compensation and Equity : Competitive compensation...
Employment Equity
Hourly pay
Full time
Flexible hours
AI Chopping Block, Inc.
San Francisco, CA
4 days ago
Model Behavior Engineer
$98k - $140k
...our customers more time for their life’s work... ...with product and engineering teams to build systems... ...'ll shape Notion's model strategy and work... .... You'll have real ownership from day... ...data — You can self‑serve insights from large... ...cash compensation, equity, and benefits. The...
Employment Equity
Live in
Local area
mmmanyfold dev studio
San Francisco, CA
4 days ago
Founding Engineer, ML Inference
...kind of platform for real-time generative media,... ...and senior engineers with deep expertise... ...Founding Engineer, ML Inference with deep expertise... ...generative media models. You'll work across the model-serving stack, designing novel... ...meaningful early equity. • We sponsor...
Employment Equity
Relocation
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
5 days ago
Distributed Training and Inference Engineer
...generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed... ...support from AMD engineers the team is... ...frontier AI models and real-time applications.... ...Distributed Training and Inference Engineer to build,... ...salary and equity Equal opportunity...
Employment Equity
Flexible hours
Sciforium
San Francisco, CA
3 days ago
Senior Systems Engineering
$225k
...generation to improve models and solve... ...long context, and inference‑time compute to achieve... ...Role As a Software Engineer on the Inference &... ...distributed systems that serve our models in... ...throughput under real‑world workloads. You... ...range: $225K - $550K Equity is a significant...
Employment Equity
Relocation
Visa sponsorship
Magic
San Francisco, CA
5 days ago
Senior Engineering Manager, Real-Time Model Serving
...technology company in San Francisco is looking for a Senior Engineering Manager to oversee the Model Serving product. This role involves leading a high-... ...scale distributed systems, and a strong background in real-time serving systems. This position offers a competitive...
Databricks Inc.
San Francisco, CA
3 days ago
Founding Engineer — Build Real-Time AI Platform (Equity)
MENFEM is seeking its first engineering hire in San Francisco to build core infrastructure for real-time content generation. This role offers a unique opportunity to work... ...includes benefits like health insurance, equity opportunities, and unlimited PTO. #J-18808-Ljbffr...
Employment Equity
MENFEM
San Francisco, CA
2 days ago
Billing Systems Engineer: Real-Time Usage & Payments
$160k - $200k
...company in San Francisco seeks a Software Engineer to design scalable, event-driven billing... ...will integrate with Stripe and Orb for real-time usage tracking and payments.... ...Compensation ranges from $160,000 to $200,000 plus equity and benefits. #J-18808-Ljbffr Fal
Employment Equity
Fal
San Francisco, CA
1 day ago
Remote- Systems Engineering -
MLabs Ltd is seeking a talented engineer to design and implement core systems for a real-time distributed platform. Based in New York, the role demands expertise... ...Candidates will have the opportunity for significant equity and will shape the engineering culture while...
Employment Equity
Remote job
MLabs Ltd
San Francisco, CA
2 days ago
Remote Realtime Speech Inference Engineer
Machine Learning Engineer, Inference Want to solve realtime inference... ...-of-the-art speech models actually behave... ..., and fast enough for real human interaction. Your... ...ONNX Runtime, and custom serving systems Managing KV cache... ...strong salary, equity, and benefits. Location...
Employment Equity
Remote job
Flexible hours
Trades Workforce Solutions
San Francisco, CA
1 day ago
LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...efficient and scalable inference for large language models (LLMs). Our mission... ...and Optimization Engineer to design, develop,... ...for high‑performance serving. Apply CUDA graph optimizations... ..., startup equity, health insurance,... ...range for this full‑time position is: $160,00...
Employment Equity
Full time
Together AI
San Francisco, CA
3 days ago
FPGA & Real-Time Robotics Electrical Engineer
$105k - $175.5k
Latent AI seeks a talented Electrical Engineer for their Surgery & Robotics Hardware Team... ...experience in FPGA hardware design for real-time applications. Competitive compensation ranges... ...with substantial benefits including equity options, healthcare, and flexible time off...
Employment Equity
Flexible hours
Latent AI
San Francisco, CA
2 days ago
Senior Real-Time Robotic Vision Engineer (Equity)
$170k - $240k
...Francisco, is seeking a Perception Engineer to develop robust AI and robotics solutions... ...involves designing deep learning models and optimizing them for real-time performance, specifically in food... ...benefits package including equity, medical insurance, and flexible PTO...
Employment Equity
Flexible hours
Chef Robotics
San Francisco, CA
3 days ago
Senior Robotics Controls Engineer — Real-Time & Hardware-In-Loop
...design and integrate control systems, working on real hardware alongside a small, dedicated team.... ...robotics with hands-on experience in real-time control system design. The position offers competitive salary, meaningful equity, and relocation support. #J-18808-Ljbffr...
Employment Equity
Relocation package
Industrial Next (YC W22)
San Francisco, CA
3 days ago
Senior Engineer 2: AI Inference Engine Systems
$167.2k - $209k
...are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In... ...deploy and scale their models with industry-... ...distributed inference serving frameworks such as llm... ...can spend more time creating software that... ...performance. We also provide equity compensation to...
Employment Equity
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
3 days ago
Client-Facing Agent Engineer for Real-Time AI
AI Chopping Block, Inc. is looking for an Agent Engineer to join the San Francisco team. The role involves serving as a technical interface with customers and building... ...environment. Competitive compensation, including equity and benefits, is offered. #J-18808-Ljbffr AI...
Employment Equity
AI Chopping Block, Inc.
San Francisco, CA
2 days ago
Real-Time Telephony Engineer - Europe
...native operating system for the real economy—a system that closes... ...and action. By combining real-time truth, specialized AI workers,... ...We're looking for a Telephony Engineer based in Madrid to own and scale... ...- Competitive salary + equity in a high-growth startup. Work...
Employment Equity
Shift work
HappyRobot
San Francisco, CA
2 days ago
Autonomous Defense Software Engineer — Real-Time, Equity
...defense technology company in San Francisco is seeking a Software Engineer to develop and optimize autonomous defense systems. The role... ...C++, Rust, and Python, along with a strong understanding of real-time performance and embedded systems. Candidates should have experience...
Employment Equity
Mach Industries
San Francisco, CA
5 days ago
Research Scientist, Real-Time Interactivity / Inference San Francisco · Research · Full Time →
Research Scientist, Real-Time Interactivity / Inference San Francisco · Research... ...to an existing model, from architectures... ...with our inference engineers to understand where... ...real bottlenecks in serving real-time interactive... ...and meaningful early equity Visa sponsorship and...
Employment Equity
Full time
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
4 days ago
Latency-Focused Real-Time Robotics Software Engineer (C++, GPU)
...defense tech startup is looking for a Robotics Software Engineer in San Francisco, CA. You will optimize real-time systems performance and ensure subsystem... ...level C++ skills. The role offers competitive salary, equity, and unique field testing opportunities, ensuring that...
Employment Equity
Aurelius Systems, Inc
San Francisco, CA
2 days ago
Sr. Manager, Engineering - Model Serving
$217k - $312.2k
...their business. Databricks’ Model Serving product provides... ...language models. It offers real‑time, low‑latency inference, governance, monitoring,... ...efficiency. As a Senior Engineering Manager, you will lead the... ...annual performance bonus, equity, and the benefits listed...
Employment Equity
Local area
Worldwide
Databricks Inc.
San Francisco, CA
3 days ago
Engineering Manager, Model Routing & Inference Engineering · · San Francisco Apply →
...inventive research, design, and engineering. Our organization is very flat,... ...About the Role You will lead the Model Routing & Inference team at Cursor, owning the inference... ..., especially in inference serving, traffic routing, or real‑time data pipelines. You're comfortable...
Anysphere
San Francisco, CA
2 days ago
Founding Engineer
$120k - $185k
...video editing. We serve both top studios and... .... We need more engineers, deeper integration... ...serve the best AI models available. To contribute... ...asynchronous AI inference pipelines that... ...in Postgres with real‑time sync You will incorporate... ...year 0.75% – 2.5% equity Relocation support...
Employment Equity
Full time
H1b
Immediate start
Relocation package
Woven
San Francisco, CA
2 days ago
Founding Engineer
$150k - $200k
...organizations across IT, Engineering, Financial... ..., we proudly serve clients and candidates... ...vision‑language models (VLMs) to legacy... ...hardware. It collects real‑world data to... ...‑latency, edge inference, or real‑time streaming systems... ...Stack Engineer). Equity: 2–3 % (Founding)...
Employment Equity
Remote work
1 day per week
AI Talent Now
San Francisco, CA
2 days ago
Senior Software Engineer, Model Serving
$166k - $225k
...their business. Databricks’ Model Serving product provides... ...language models. It offers real-time, low-latency inference, governance, monitoring,... ...efficiency. As a Senior Engineer, you’ll play a critical role... ...annual performance bonus, equity, and the benefits listed...
Employment Equity
Local area
Worldwide
Cacheflow
San Francisco, CA
1 day ago
Production ML Model Engineer for Real-Time AI
Sciforium in San Francisco is seeking a Model Implementation Engineer to implement and optimize state-of-the-art machine learning models. The role... ...collaborating closely with systems teams. Benefits include medical insurance, 401k, and flexible time off. #J-18808-Ljbffr Sciforium
Flexible hours
Sciforium
San Francisco, CA
5 days ago
Staff Software Engineer, Model Serving
$192k - $260k
...their business. Databricks’ Model Serving product provides... ...language models. It offers real-time, low-latency inference, governance, monitoring,... ...efficiency. As a Staff Engineer, you’ll play a critical role... ...annual performance bonus, equity, and the benefits listed...
Employment Equity
Local area
Worldwide
Cacheflow
San Francisco, CA
1 day ago
Forward Deployed Engineer San Francisco · Engineering · Full Time →
...intersection of ML engineering and customer... ...person who makes it real for our lab partners... ...understanding their models deeply, and doing... ...targets that make real-time interactive video... ...tracing through inference pipelines to find... ...meaningful early equity Visa sponsorship and...
Employment Equity
Full time
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Real-Time Inference & Model Serving Engineer (Equity). Be the first to apply!