Real-Time Inference & Model Serving Engineer (Equity)
$220k - $320kTrades Workforce Solutions
ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models running in production, not offline experiments. They’re building real-time AI systems that need to respond instantly, reliably, and at scale. That means solving hard problems around batching, GPU efficiency, memory constraints, and system-level bottlenecks that most teams never fully crack. You’ll sit at the core of the platform, working across model serving, infrastructure, and performance optimisation. A big part of the role is pushing current tooling beyond its limits, extending frameworks, profiling bottlenecks, and designing systems that hold up under real-world load. This is not about training models. It’s about making them fast, efficient, and production-ready. What you’ll work on: Building high-performance serving systems for LLM, speech, and vision models Scaling inference to production workloads with strict latency requirements Optimising GPU utilisation and execution efficiency Implementing techniques like continuous batching, KV cache optimisation, speculative decoding, and prefill/decode separation Improving frameworks such as vLLM, TensorRT-LLM, Triton, and SGLang Profiling and debugging performance across GPU, memory, and system layers What you’ll bring: Strong experience with ML inference or model serving systems Deep understanding of latency and throughput optimisation in production Solid Python and PyTorch skills, plus a systems or performance engineering mindset Familiarity with distributed systems and production infrastructure Exposure to CUDA, GPU profiling tools, or systems like Kubernetes and Ray is useful, but the key is knowing how to make models run efficiently at scale. You’ll join a highly technical team with experience across major AI labs and big tech. The environment is pragmatic, focused on solving real performance problems rather than abstract research. There’s real ownership here. You’ll help define how next-generation AI systems are served. Package: $220,000 – $320,000 base + equity San Francisco, onsite 3 days per week If you’re interested in working on the part of AI that actually determines whether it works in the real world, this is worth exploring. All applicants will receive a response. #J-18808-Ljbffr Trades Workforce Solutions
$180k - $270k
...focusing on building high-performance inference engines for speech AI. Ideal candidates will have... ...experience in GPU architecture and real-time systems. This position offers a competitive... ...range of $180K - $270K, along with equity and comprehensive benefits including unlimited...Employment Equity$300k
...Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal... ...and a knack for optimizing inference latency for large generative models. With a competitive base... ...to ~$300,000 and meaningful equity, this opportunity emphasizes...Employment EquityVisa sponsorshipRelocation package- ...Model Implementation Engineer Sciforium is an AI infrastructure company... ..., high-efficiency serving platform. Backed by... ...AI models and real-time applications. We... ...scale model training or inference systems. Contributions... ...salary and equity Equal opportunity...Employment EquityFlexible hours
- Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design... ...and build scalable, low-latency model inference systems while collaborating... ...include competitive salary, equity, flexible PTO, and daily meals....Employment EquityFlexible hours
$98k - $140k
...it helps them save time and money. In-person... ...with product and engineering teams to build systems... ...'ll shape Notion’s model strategy and work... .... You'll have real ownership from day... ...data — You can self‑serve insights from large... ...cash compensation, equity, and benefits. The...Employment EquityLive inWork at officeLocal area- ...kind of platform for real-time generative media,... ...and senior engineers with deep expertise... ...Founding Engineer, ML Inference with deep expertise... ...generative media models. You'll work across the model-serving stack, designing novel... ...meaningful early equity. • We sponsor...Employment EquityRelocationVisa sponsorshipRelocation package
- ...generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed... ...support from AMD engineers the team is... ...frontier AI models and real-time applications.... ...Distributed Training and Inference Engineer to build,... ...salary and equity Equal opportunity...Employment EquityFull timeFlexible hours
- Databricks is seeking a Senior Engineering Manager to lead the Model Serving team, responsible for both customer-facing capabilities and foundational infrastructure... ...in technical leadership, extensive knowledge of real-time APIs, and a strong background in distributed systems....
- ...technology company in San Francisco is looking for a Senior Engineering Manager to oversee the Model Serving product. This role involves leading a high-... ...scale distributed systems, and a strong background in real-time serving systems. This position offers a competitive...
- ...structured clinical notes in real-time with deep EMR... ...creatives technologists and engineers working together to... ...Engineer Model Inference at Abridge youll play... ...and maintain ML model serving infrastructure ensuring... ...~ Compensation and Equity : Competitive compensation...Employment EquityHourly payFull timeFlexible hours
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam... ...for large language models (LLMs). Our mission... ...high-performance serving. Apply CUDA... ...compensation, startup equity, health insurance and... ...for this full-time position is: $160,0...Employment EquityFull time$167.2k - $209k
...are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In... ...deploy and scale their models with industry-... ...distributed inference serving frameworks such as llm... ...can spend more time creating software that... ...performance. We also provide equity compensation to...Employment EquityLocal areaRemote workWorldwideFlexible hours- Machine Learning Engineer, Inference Want to solve realtime inference... ...-of-the-art speech models actually behave... ..., and fast enough for real human interaction. Your... ...ONNX Runtime, and custom serving systems Managing KV cache... ...strong salary, equity, and benefits. Location...Employment EquityRemote workFlexible hours
$170k - $240k
...Francisco, is seeking a Perception Engineer to develop robust AI and robotics solutions... ...involves designing deep learning models and optimizing them for real-time performance, specifically in food... ...benefits package including equity, medical insurance, and flexible PTO...Employment EquityFlexible hours- ...design and integrate control systems, working on real hardware alongside a small, dedicated team.... ...robotics with hands-on experience in real-time control system design. The position offers competitive salary, meaningful equity, and relocation support. #J-18808-Ljbffr...Employment EquityRelocation package
- ...defense technology company in San Francisco is seeking a Software Engineer to develop and optimize autonomous defense systems. The role... ...C++, Rust, and Python, along with a strong understanding of real-time performance and embedded systems. Candidates should have experience...Employment Equity
$350k
...committed researchers, engineers, policy experts,... ...Anthropic's inference fleet serves Claude to millions... ...kernels, model servers, distributed... ...regression from request timing down through... ...signals reliably catch real model-output... ...benefits, optional equity donation matching...Employment EquityWork at officeVisa sponsorshipFlexible hours- ...involves designing production-grade voice features, creating responsive voice interactions, and building low-latency systems for real-time applications. Candidates should have extensive experience in software development, AI/ML, and voice technologies. The position offers...Employment EquityFlexible hours
$217k - $312.2k
...their business. Databricks’ Model Serving product provides... ...language models. It offers real-time, low-latency inference, governance, monitoring,... ...efficiency. As a Senior Engineering Manager, you will lead the... ...annual performance bonus, equity, and the benefits listed...Employment EquityLocal area$192k - $260k
...business. Databricks' Model Serving product provides... ...language models. It offers real-time, low-latency inference, governance, monitoring,... ...efficiency. As a Staff Engineer, you'll play a critical role... ...performance bonus, equity, and the benefits listed...Employment EquityLocal areaWorldwide- ...GPU Kernel Engineer Sciforium is an AI infrastructure... ...multimodal AI models and a proprietary, high-efficiency serving platform. Backed... ...AI models and real-time applications.... ...scale training and inference. This role is ideal... ...salary and equity Equal Opportunity...Employment EquityFlexible hours
- Sciforium in San Francisco is seeking a Model Implementation Engineer to implement and optimize state-of-the-art machine learning models. The role... ...collaborating closely with systems teams. Benefits include medical insurance, 401k, and flexible time off. #J-18808-Ljbffr SciforiumFlexible hours
$192k - $260k
...their business. Databricks’ Model Serving product provides... ...language models. It offers real-time, low-latency inference, governance, monitoring,... ...efficiency. As a Staff Engineer, you’ll play a critical role... ...annual performance bonus, equity, and the benefits listed...Employment EquityLocal areaWorldwide- ...re pioneering the model architectures that... ...innovation and systems engineering paired with a... ...Role We're hiring an Inference Engineer to... ...mission of building real-time multimodal intelligence... ...inference and serving stack for our cutting... ...attractive equity package. Commuter...Employment EquityWork at officeVisa sponsorshipFlexible hours
- ...Sciforium's Next-Generation Model Serving Platform Architect... ...-on support from AMD engineers the team is scaling... ...AI models and real-time applications. About... ...scheduling, and distributed inference systems. Develop... ...salary and equity Equal Opportunity...Employment EquityWork at officeFlexible hours
- ...AI applications out into the real world. With Anyscale, were building... ...role As a Distributed LLM Inference Engineer, you will help with systems... ...the market data changes over time, the target salary for this... ...to participate in Anyscale\'s Equity and Benefits offerings, including...Employment EquityWork at office
$216k - $270k
...As a Software Engineer on the ML Infrastructure... ...and efficient serving of LLMs. Our... ...and optimize models for production... ...-generation-inference. Compensation... ...base salary, equity, and benefits.... ...for this full-time position in the... ...applications that deliver real impact. We work...Employment EquityFull time- ...involves We’re looking for exceptional storage & database engineers who want to join us on a new project to deeply... ...building xAI’s new storage tier that powers training, inference, recommendations, and real-time data extraction , Design, build, and launch to production...
$248.8k - $311k
...frontier of data and model evaluation for... ...As an ML Systems Engineer on the Physical AI... ...reliable, and efficient serving of foundation... ...ensuring low latency for real-time applications.... ...tracking of model inference. Lead: Own... ...include base salary, equity, and benefits. The...Employment EquityFull time- ...years of professional software engineering experience building and... ...you have not been the primary model owner on a team , Collaboration... ..., (Desirable) Background in real-time decision engines or stateful... ..., Artificial Intelligence to serve as a technical architect for...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Real-Time Inference & Model Serving Engineer (Equity). Be the first to apply!
- private equity intern San Francisco, CA
- hedge fund private equity tax manager San Francisco, CA
- equity specialist San Francisco, CA
- private equity sales San Francisco, CA
- equity fund manager San Francisco, CA
- private equity accountant San Francisco, CA
- entry level private equity San Francisco, CA
- equity research associate San Francisco, CA
- private equity San Francisco, CA
- equity sales San Francisco, CA


