Senior Inference Platform Engineer Low-Latency, Multi-Tenant
MongoDB HQ
A leading data platform company in Palo Alto seeks a Senior Engineer to develop a cutting-edge inference platform supporting semantic search and AI-native experiences. The ideal candidate will have over five years of experience in backend systems and proficiency in languages like Go, Rust, or Python. You'll work alongside ML researchers to enhance infrastructure for real-time inference, ensuring high performance and reliability. This hybrid role offers a unique opportunity to influence the next generation of developer solutions. #J-18808-Ljbffr
- ...We’re looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for... ...for real-time, low-latency, and high-scale inference — fully... ...Design and build components of a multi-tenant inference platform integrated...SeniorLocal areaWorldwide
$100k - $120k
...Cerebras is seeking a Senior Engineer to join their Cloud Platform team in Mountain View, California. In this role, you will own the architecture of our multi-tenant SaaS platform, working at the intersection of backend engineering and cloud infrastructure. The ideal candidate...Senior$128.7k - $261.3k
...The Model Deployment & Inference Solutions team in GM... ...build the ML deployment platform that makes model... ...they meet the real‑time latency and memory budgets required... ...performed manually by engineers. Build the developer... ...toolchains. Experience with low‑latency or real‑time...SeniorLocal areaRemote workFlexible hoursShift work$175k - $220k
ThoughtSpot, Inc. is seeking a Senior Engineer to join their Cloud Platform team in Mountain View, California. This role involves owning the architecture and evolution of their multi-tenant SaaS platform, driving architectural decisions, and mentoring engineers. The ideal...Senior- ...Israelvcforum is looking for a Senior ML Infrastructure Engineer in Mountain View, California. This position aims to build and scale robust platforms for ML inference workflows supporting GM’s AI efforts. You will collaborate with ML engineers and researchers to implement...SeniorRemote work
$500 per month
...Senior Platform Engineer Palo-Alto (In-office); San Francisco, California (In-Office) Banking... ...implement core platforms that power multi-agent orchestration, real-time decisioning... ...to ensure reliability and low latency. Develop abstractions that allow AI...SeniorWork at office- ...Senior Cloud Platform Engineer Own the production inferencing service reliability and scale Location: Palo Alto, California... .... Your primary focus will be ensuring our inference endpoints have exceptional uptime, low-latency response times, and efficient resource utilization...SeniorImmediate startFlexible hoursShift work
$236k - $339.25k
...accelerate your impact. We look for low-ego individuals who thrive in... ...Snowflake Machine Learning Platform team's mission is to enable... ...and proactively with senior architects, PMs, and team leadership... ...in serving LLMs using inference engines like vLLM, TensorRT-LLM, TEI,...SeniorFlexible hours$200k - $287.5k
...Senior Software Engineer On Billing Platform At Snowflake, we are powering the... ...impact. We look for low-ego individuals who... ...auditability, and low-latency processing. This... ...engineer on large, multi-team initiatives.... ...including token-based inference, agent workflows,...SeniorFlexible hours$100k - $120k
About the Role We are looking for a Senior Engineer to join our Cloud Platform team and take ownership of the architecture and evolution of our multi-tenant SaaS platform. You will work at... ...path components for high‑throughput, low‑latency workloads Build and enforce...Senior$155.42k - $395.9k
...Description About the Team: The ML Inference Platform is part of the AV ML Infrastructure... .... About the Role: We are seeking a Senior ML Infrastructure engineer to help build and scale robust platforms... .... Ability to thrive in a dynamic, multi-tasking environment with ever-...SeniorLocal areaRemote workRelocationRelocation packageFlexible hours$248.71k - $292.6k
...delivers fast, efficient AI inference. Our LPU-based system powers... ...Build fast. Sr. Staff Software Engineer - High Performance GPU Inference... ...and implement scalable, low-latency runtime systems that coordinate... ...NVLink/Fabric topologies, and multi‑accelerator systems (...Senior$137k - $270k
...leading developer data platform, MongoDB Atlas, is... ...globally distributed, multi-cloud database and is... ...re looking for a Lead Engineer, Inference Platform to join our... ...time, high-scale, and low-latency inference — all... ...scalability in a multi-tenant, cloud-native environment...Local areaWorldwideFlexible hours$256k - $414k
NVIDIA Gruppe is looking for a Senior Manager to lead high-performance networking operations for cloud gaming. This role will oversee complex data center interconnects, ensuring ultra-low latency and high reliability. The ideal candidate will have extensive networking...Senior- Groq is seeking a Sr. Staff Software Engineer to enhance their High Performance GPU Inference Systems. Located in California, this position involves designing scalable, low-latency systems and optimizing GPU performance. The ideal candidate will have expertise in distributed...Senior
$196k - $310.5k
...organization is looking for a Senior Cybersecurity Engineer – Identity Platform & Access Management to... .... Partner with multi‑functional collaborators... ...with high availability and low latency. Deep understanding of delegated... ...in complex, multi‑tenant environments. Hands‑on application...SeniorWorldwide- ...General Motors is seeking a Senior ML Infrastructure Engineer to build and scale a robust platform for machine learning inference workflows. You will design backend software components, collaborate with ML engineers, and lead initiatives across GM's ML ecosystem. With...SeniorRemote work
- A technology company in Mountain View is seeking a candidate with hands-on experience in building multi-tenant SaaS platforms. The role requires knowledge of Kubernetes, cloud services (AWS/GCP/Azure), and the ability to integrate AI tools for enhancing productivity. Responsibilities...Senior
$152k - $241.5k
...superchip. We are looking for expert engineers to come and help design rack... ...scaling AI supercomputing platforms. What you will be doing Drive... ..., optimize firmware for low latency APIs. Strong knowledge of analyzing... .... Experience with ML and multi‑variable optimization...Senior$120.1k - $225.7k
...Entails End-to-End Inference Optimization: Lead the... ...and minimize latency. Heterogeneous Computing... ...Computer Science, Electronic Engineering, AI, or related fields;... ...techniques, including multi-level KV Cache... ...deep understanding of low-level programming models...SeniorRelocation package$148.6k - $306.3k
...What you’ll do As a Senior Platform Engineer, contribute across design, coding, testing, operability, and quality assurance in a development... ...OAuth 2.0, mTLS, proxy or tunnel-based architectures, and multi-tenant service design. Experience with observability practices including...Senior$184k - $287.5k
...features related to CUDA’s memory model and multi‑node scalability geared towards next‑... ...in Computer Science, Electrical Engineering or related field (or equivalent... ...experience with parallel computing, PyTorch, low‑latency AI inference Understanding of system‑level...Senior- ...Senior Software Development Engineer, Annapurna Labs, Elastic Collectives job at Annapurna Labs (U.S.) Inc.. Cupertino, CA. DESCRIPTION Annapurna... ...on AWS. We are seeking an experienced engineer with low-level latency networking or interconnect expertise to optimize...SeniorInternshipWork from homeFlexible hours
- ...Senior Backend Engineer — AI Agents Platform HOAi is a fast-growing startup revolutionizing... ...executing complex, multi-step processes with... ..., and multi-tenant isolation.... ...optimize throughput/latency, queue backlogs, caching... ..., storage, and inference/tool-call costs; build...SeniorWork at officeRemote workFlexible hours
- ...Senior Backend Engineer — AI Voice Agent Platform HOAi is a fast-growing startup revolutionizing... ...executing complex, multi-step processes with human... ...that supports real-time, low-latency conversations at scale.... ...idempotency, rate limits, and per-tenant budgets/quotas....SeniorWork at officeRemote workFlexible hours
- ...Time · Department: Backend Engineer · Work type: On-Site About... ...the world's first AI platform to bring AI into the real... ...critical services, optimize for latency and throughput, and... ...support high-throughput, low-latency AI model inference and data services. Partner...SeniorFull time
- ...’s best data and AI infrastructure platform so our customers can use deep data... ...improve their business. Founded by engineers — and customer obsessed — we leap at... ...performance‑sensitive systems (e.g., latency‑critical services, multi‑tenant platforms, large‑scale indexing...Senior
$152k - $241.5k
...NVIDIA is the platform upon which every new AI‑powered application... ...built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source... ...enables high‑throughput, low‑latency inference at scale. This... ...work. Improve multi‑GPU inference performance...Senior$180k - $240k
...We are seeking a Senior Cloud Infrastructure Engineer to architect and manage... ...of our AI platform, ensuring that multi-GPU clusters, distributed... ...using Triton Inference Server, Ray Serve,... ...Optimization: Optimize low-level... ...RoCE v2) to minimize latency for 3D Gaussian Splatting...SeniorOdd jobWork at office$119.8k - $234.7k
...Ourconverged AI fabricdelivers inference capabilities for all LLMs... ..., Llama, and more. As a Senior Software Engineer , you will shape the... ...efficiently, and with ultra-low latency-enabling a rich set of AI-... ...large-scale AI services and platform capabilities that power...SeniorOngoing contractLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Inference Platform Engineer Low-Latency, Multi-Tenant. Be the first to apply!
- client platform engineer Palo Alto, CA
- platform engineer Palo Alto, CA
- senior platform engineer Palo Alto, CA
- platform engineering manager Palo Alto, CA
- data platform engineer Palo Alto, CA
- platform developer Palo Alto, CA
- senior manager quality engineering Palo Alto, CA
- senior software test automation engineer Palo Alto, CA
- senior design verification engineer Palo Alto, CA
- senior director quality Palo Alto, CA

