LLM Inference Engineer: Scalable, Low-Latency Serving

Hippocratic AI

Hippocratic AI is seeking an experienced LLM Inference Engineer in Palo Alto to optimize large language model serving infrastructure. The role requires expertise in optimizing LLM inference systems at scale and deploying efficient, scalable LLM systems in production. The ideal candidate will design multi-node architectures, apply quantization techniques, and benchmark system performance. With a focus on safety in healthcare, this role offers the chance to work with a team of healthcare and AI leaders. #J-18808-Ljbffr Hippocratic AI

Apply

Vacancy posted 6 hours ago

Similar jobs that could be interesting for youBased on the LLM Inference Engineer: Scalable, Low-Latency Serving in Palo Alto, CA vacancy

LLM Inference Engineer
$126 per hour
...healthcare‑only, safety‑focused LLM—a breakthrough platform... ...an experienced LLM Inference Engineer to optimize our large language model serving infrastructure. The... ...deploying efficient, scalable LLM systems in production... ...decoding and other latency‑optimization strategies...
Suggested
Work at office
Hippocratic AI
Palo Alto, CA
6 hours ago
Senior Software Development Engineer - LLM Kernel & Inference Systems
...senior member of the LLM inference framework team, you will... ...driving performance, scalability, and reliability, enabling... ...platform for LLM serving. This role sits at the... ...intersection of inference engines, distributed systems,... ...terms of throughput, latency, memory movement, and...
Suggested
Advanced Micro Devices
Santa Clara, CA
3 days ago
Senior Backend Engineer, ML Inference Systems
$135.8k - $237.05k
...View, CA, USA Senior Backend Engineer, ML Inference Systems Location Mountain View... ...performance, reliability, and scalability of inference systems. Join us... ...inference platform, with a focus on low-latency, high-throughput serving infrastructure Partner with ML...
Suggested
Work at office
Worldwide
Relocation package
Unity Technologies
Mountain View, CA
1 day ago
Engineering Manager, Agentic Systems - Moveworks
...Moveworks’ Reasoning Engine and natural... ...the long-term scalability of our core AI... ...the entire ML/LLM lifecycle. This... ...distributed training and inference, model... ...frameworks, and LLM latency optimization.... ...team builds serve as the foundation... ...to ultra-low-latency inference...
Suggested
Work at office
Remote work
Flexible hours
ServiceNow
Mountain View, CA
7 days ago
Senior Research Engineer, On-Device Inference, Robotics, DeepMind
$207k - $300k
Senior Research Engineer, On-Device Inference, Robotics, DeepMind corporate_fare DeepMind place Mountain... ...Gemini Robotics models for deployment in low-latency on-device applications. You will... ...that is representative of the users we serve, creating a culture of belonging, and...
Suggested
Full time
Google Inc.
Mountain View, CA
2 days ago
Senior Backend Engineer: Distributed Systems for AI Inference
...Department: Backend Engineer · Work type: On-Site... ...real-time multimodal LLM for real life,... ...building performant, scalable, and resilient distributed... ..., optimize for latency and throughput, and... ...support high-throughput, low-latency AI model inference and data services....
Full time
Neara
Palo Alto, CA
4 days ago
Senior Performance Engineer, Inference
...industry-leading training and inference speeds and empowers... ...a Senior Performance Engineer to join our Product... ...inference performance and will serve as our resident expert... ..., time to first token, latency under concurrency, and... ...vLLM, SGLang, TensorRT-LLM), GPU kernel-level...
Contract work
Shift work
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
1 day ago
Runtime Engineer — Remote ML Inference & Systems
MatX is seeking a skilled software engineer to build custom silicon for AI language models in Mountain View, California. The role... ...building libraries for memory management and developing an LLM inference serving stack. Benefits include generous salary, equity offerings,...
Remote job
MatX
Mountain View, CA
4 days ago
Senior Inference Engineer, AIConfigurator for Dynamo
$184k - $287.5k
NVIDIA is recruiting a Senior Inference Engineer to advance AIConfigurator ( a... ...configurations for large-scale LLM inference. This role integrates GPU systems, model serving, performance modeling, and production... ...by optimizing efficiency, latency, parallelism, and resource...
Full time
NVIDIA
Santa Clara, CA
1 day ago
Senior Machine Learning Engineer, Agentic Systems - Moveworks
...Moveworks' Reasoning Engine and natural language capabilities... ...for building and serving LLM's at Moveworks. This... ...training and inference pipeline for large language... ...monitoring framework, LLM latency optimization, etc.... ...solving many challenges on scalability of services as well as...
Work at office
Remote work
Flexible hours
ServiceNow
Mountain View, CA
2 days ago
Senior Machine Learning Engineer, Agentic Systems - Moveworks
...for a Machine Learning Engineer to help build cutting... ...for building and serving LLM’s at Moveworks. This... ...distributed training and inference pipeline for large... ...monitoring framework, LLM latency optimization, etc.... ...solving many challenges on scalability of services as well...
Moveworks
Mountain View, CA
16 hours ago
Robotics Software Engineer - Low-Latency Middleware & CI/CD
Garuda Ventures in Palo Alto is seeking Robotics Software Engineers to build high-performance middleware and runtime systems for robotic platforms. You will design low-latency execution frameworks and optimize inter-process communication. Successful candidates will develop...
Garuda Ventures
Palo Alto, CA
3 days ago
Runtime Engineer
$120k - $250k
...Runtime Engineer Mountain View, CA What MatX Is Building... ...silicon for large-language-model inference and training, with HW/SW co-... ...consumers Build the LLM inference serving stack — paged KV cache, continuous... ..., paged, arena) or other low-level memory work ML framework...
Full time
Contract work
Work experience placement
Local area
Remote work
Monday to Friday
Flexible hours
MatX
Mountain View, CA
2 days ago
LLM Infra Engineer
...that fuels it, recursively accelerating the path to artificial superintelligence. We are interested in best-in-class engineers to focus on a variety of challenges relating to scaling, low-level optimization, and core infrastructure for LLM training and inference....
Ricursive Intelligence
Palo Alto, CA
16 hours ago
Performance Engineer
...Performance Engineer RadixArk is hiring a Performance Engineer... ..., CA — someone who can push LLM inference and training systems to the... ...RadixArk infrastructure stack: latency, throughput, GPU utilization... ...cost-per-token and serving efficiency Partner with customers...
Flexible hours
RadixArk
Palo Alto, CA
16 hours ago
Senior Edge-LLM Real-Time Inference Engineer
NVIDIA Gruppe is looking for a skilled engineer to join their TensorRT Edge-LLM team in Santa Clara, California. The role involves developing a state-of-the-art inference framework for large language models and optimizing it for real-time performance on embedded platforms...
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Forward Deployed Engineer
...hiring a Forward Deployed Engineer to embed directly with... ..., sub‑100ms latency budgets, regulated data... ...targets. Tune streaming and inference paths to hit sub‑100ms... ...10 customers can self‑serve from. What You Bring 4+... ...LiveKit, Pipecat, SIP, or low‑latency audio/video. Experience...
Contract work
Work at office
Remote work
Visa sponsorship
39 Ai, Inc.
Mountain View, CA
3 days ago
Senior LLM Performance Engineer - GPU Inference
$184k - $356.5k
...leading AI computing company in California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will analyze and enhance LLM inference performance, working in cross-collaborative teams to implement cutting-edge...
Full time
NVIDIA Corporation
Santa Clara, CA
2 days ago
Founding Engineer - FlowGen Labs
$240k - $400k
...artificial intelligence. Role Summary As our Founding Engineer, you will own a zero‑to‑one product and its infrastructure... .... Familiarity with LangGraph is a plus. Stand up inference paths with low latency serving and token‑level observability Productionize prompt pipelines...
Visa sponsorship
Pear VC
Palo Alto, CA
4 days ago
Senior DL Algorithms Engineer - Inference Performance
$184k - $287.5k
Senior DL Algorithms Engineer - Inference Performance page is loaded## Senior DL Algorithms Engineer - Inference Performancelocations... ..., fix bugs and deliver production code to TRT-LLM, NVIDIA’s open-source inference serving library.* Profile and analyze bottlenecks across...
NVIDIA Corporation
Santa Clara, CA
2 days ago
Sr. AI Inference Systems Engineer
$120.1k - $225.7k
...End-to-End Inference Optimization: Lead... ...for Large Models (LLM, Multimodal); focus... ...throughput and minimize latency. Heterogeneous... ...Science, Electronic Engineering, AI, or related... ...understanding of low-level programming... ...allow us to better serve our users and the...
Relocation package
Tencent
Palo Alto, CA
2 days ago
Siri Runtime Systems Engineer — On‑Device, Low-Latency
$147.4k - $272.1k
Apple Inc. is seeking a Software Development Engineer for Siri Runtime Systems and Interaction in Cupertino, California. This role involves... ...and integrating next-generation Siri experiences, focusing on low-latency interactions and system performance optimization. Ideal...
Apple Inc.
Cupertino, CA
3 days ago
Staff Inference Systems Engineer
$180k
A cutting-edge AI firm in California is seeking engineers focused on optimizing AI model inference. Candidates should have experience with Python, Rust, and... ...optimizations. The role involves building reliable serving systems and contributing to innovative AI technologies...
Pantera Capital
Palo Alto, CA
1 day ago
ML Systems Engineer
$300k - $400k
...frontier model training and inference fast, efficient, and... ...Kubernetes, minimizing latency and maximizing... ...the RL loop tight and low-latency Build fast... ...shifting, scheduling, and serving architecture at production... ...best — the scientists, engineers, and problem-solvers who...
Visa sponsorship
Flexible hours
Shift work
Periodic Labs
Menlo Park, CA
4 days ago
GTM Engineer - ABM, Advertising and Growth Marketing
...your impact. We look for low-ego individuals who... ...how work gets done. GTM Engineer - ABM, Advertising and... ...platforms to build the scalable technical backbone powering... ...touchpoints at scale. Serve as the primary... ...emerging AI tools (e.g., new LLM capabilities, AI-native...
Snowflake
Menlo Park, CA
8 hours ago
Research Engineer, Frontier Safety Mitigations, DeepMind
$174k - $252k
...from research concepts to scalable production systems. Experience... ...ML projects, including LLM training, inference, and fine-tuning. Experience... ...are looking for a research engineer for the Frontier Safety... ...representative of the users we serve, creating a culture of...
Full time
Google Inc.
Mountain View, CA
1 day ago
Senior Backend Engineer - Recommendation Systems (Contractor)
...(App Store, Global Search, Game Center). You’ll build scalable, reliable systems that serve millions of daily users. Key Responsibilities:... ...services for recommendation and search with a focus on low latency and high availability. Implement multi-level caching...
For contractors
OPPO US Research Center
Palo Alto, CA
2 days ago
Senior ML Infra Engineer for Scalable LLM Systems
Moveworks is seeking a Machine Learning Engineer in Mountain View, California, to design and optimize scalable ML infrastructure for large language models. This pivotal role requires collaboration with cross-functional teams to enhance AI product scalability. The ideal...
Moveworks
Mountain View, CA
16 hours ago
Staff+ Production Engineer
...We are seeking a Production Engineer to take complete, end-to-end... ...on AWS to ensure exceptional scalability and fault tolerance. Lead... ...unique challenges of real-time, low-latency systems, particularly... ...designing and operating robust self-serve developer platforms,...
Sanas
Palo Alto, CA
16 hours ago
Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll... ...Desired Experience Experience building and optimizing LLM inference engines (e.g., vLLM, SGLang). Hands‑on...
NVIDIA Gruppe
Santa Clara, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference Engineer: Scalable, Low-Latency Serving. Be the first to apply!