Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

LLM Inference Engineer

Dormont Manufacturing Co

About Us Hippocratic AI is the leading generative AI company in healthcare. We have the only system that can have safe, autonomous, clinical conversations with patients. We have trained our own LLMs as part of our Polaris constellation, resulting in a system with over 99.9% accuracy. About the Role We're seeking an experienced LLM Inference Engineer to optimize our large language model (LLM) serving infrastructure. The ideal candidate has: Extensive hands‑on experience with state‑of‑the‑art inference optimization techniques A track record of deploying efficient, scalable LLM systems in production environments What You'll Do Design and implement multi-node serving architectures for distributed LLM inference Optimize multi-LoRA serving systems Apply advanced quantization techniques (FP4/FP6) to reduce model footprint while preserving quality Implement speculative decoding and other latency optimization strategies Develop disaggregated serving solutions with optimized caching strategies for prefill and decoding phases Continuously benchmark and improve system performance across various deployment scenarios and GPU types What You Bring Must-Have: Experience optimizing LLM inference systems at scale Proven expertise with distributed serving architectures for large language models Hands‑on experience implementing quantization techniques for transformer models Strong understanding of modern inference optimization methods, including: Speculative decoding techniques with draft models Eagle speculative decoding approaches Proficiency in Python and C++ Experience with CUDA programming and GPU optimization Nice‑to‑Have: Contributions to open‑source inference frameworks such as vLLM , SGLang , or TensorRT-LLM Experience with custom CUDA kernels Track record of deploying inference systems in production environments Deep understanding of performance optimization systems

  • Show us what you've built: Tell us about an LLM inference or training project that makes you proud! Whether you've optimized inference pipelines to achieve breakthrough performance, designed innovative training techniques, or built systems that scale to billions of parameters – we want to hear your story._
  • Open source contributor? Even better! If you've contributed to projects like vllm, sglang, lmdeploy or similar LLM optimization frameworks, we'd love to see your PRs. Your contributions to these communities demonstrate exactly the kind of collaborative innovation we value._
Join a team where your expertise won't just be appreciated—it will be celebrated and amplified. Help us shape the future of AI deployment at scale! #J-18808-Ljbffr Dormont Manufacturing Co

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the LLM Inference Engineer in Palo Alto, CA vacancy
  • Hippocratic AI is looking for an experienced LLM Inference Engineer based in Palo Alto, CA, to optimize their large language model (LLM) serving infrastructure. You'll design multi-node serving architectures and implement advanced optimization techniques while collaborating... 
    Suggested

    Hippocratic AI

    Palo Alto, CA
    1 day ago
  • ScOp Venture Capital is looking for an ML Systems Engineer to optimize LLM inference systems crucial for their AI platform. The role focuses on enhancing performance and efficiency via low-level systems optimization, directly impacting industry leader processes in semiconductor... 
    Suggested

    ScOp Venture Capital

    Santa Clara, CA
    4 days ago
  • NVIDIA Gruppe is looking for a skilled engineer to join their TensorRT Edge-LLM team in Santa Clara, California. The role involves developing a state-of-the-art inference framework for large language models and optimizing it for real-time performance on embedded platforms... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • MatX is seeking a skilled software engineer to build custom silicon for AI language models in Mountain View, California. The role requires...  ...building libraries for memory management and developing an LLM inference serving stack. Benefits include generous salary, equity... 
    Suggested
    Remote job

    MatX

    Mountain View, CA
    4 days ago
  • Modular Mailing Systems, Inc. is seeking an experienced Performance Engineer to optimize LLM inference on their cloud platform. This pivotal role involves building optimization infrastructures and collaborating with teams to enhance performance across GPUs and ASICs. The... 
    Suggested
    Remote job
    Flexible hours

    Modular Mailing Systems, Inc.

    Los Altos, CA
    3 days ago
  • $198k - $286k

     ...About the role At Modular, we optimize inference from kernel to cloud on one unified stack...  ...optimizations across kernels, the inference engine, and distributed systems so that customer...  ...direction of Modular Cloud, delivering LLM performance on the Pareto frontier for agentic... 
    Remote job
    Work experience placement
    Work at office
    Local area
    Flexible hours

    Modular Mailing Systems, Inc.

    Los Altos, CA
    3 days ago
  • $124.8k - $235k

     ...businesses. What the Role Entails End-to-End Inference Optimization: Lead the optimization of...  ...inference pipeline for Large Models (LLM, Multimodal); focus on KV Cache storage...  ...Ph.D. in Computer Science, Electronic Engineering, AI, or related fields; significant professional... 
    Full time
    Relocation package

    Tencent

    Palo Alto, CA
    1 day ago
  •  ...Frontier Group sits at the leading edge of what’s possible with LLM inference on heterogeneous hardware. Our charter spans the full stack:...  ...unique computational fabric. We are an applied research and engineering team that moves fast, ships real systems, and works directly... 
    Full time

    d-Matrix

    Santa Clara, CA
    3 days ago
  •  ...infrastructure company in California is seeking a Member of Technical Staff — Inference to design and optimize large-scale AI inference systems. The role demands 5+ years in systems engineering and expertise in large-scale inference systems. Successful candidates will... 
    Flexible hours

    RadixArk

    Palo Alto, CA
    2 days ago
  • $192.6k - $305.6k

     ...world impact converge at scale. We’re hiring a Staff Backend Engineer to build and operate the infrastructure those models depend on...  ...with a focus on the performance, reliability, and scalability of inference systems. Join us and help influence how billions of gaming... 
    Work at office
    Worldwide
    Relocation package

    Dormont Manufacturing Co

    Mountain View, CA
    1 day ago
  • $184k - $287.5k

     ...Overview We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme...  ...Desired Experience Experience building and optimizing LLM inference engines (e.g., vLLM, SGLang). Hands‑on work... 

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $135k

    United States Digital Space LLC in Palo Alto seeks an Application Software Engineer to develop high-performance AI inference systems. This role emphasizes the design and optimization of large-scale systems used for mission-critical applications. The ideal candidate possesses... 
    Remote work

    United States Digital Space LLC

    Palo Alto, CA
    5 days ago
  •  ...financial institution is seeking a Senior Principal Software Engineer to provide engineering expertise within the Commercial & Investment...  ...solutions, implementing MLOps practices, and optimizing model inference for high-performance applications. The ideal candidate will... 

    JPMorgan Chase & Co.

    Palo Alto, CA
    1 day ago
  •  ...workflow automation with Moveworks’ Reasoning Engine and natural language capabilities, we...  ...scaling the end-to-end systems for the entire ML/LLM lifecycle. This includes our infrastructure for distributed training and inference, model evaluation frameworks, and LLM latency... 
    Work at office
    Remote work
    Flexible hours

    ServiceNow

    Mountain View, CA
    2 days ago
  • Poetiq is seeking an AI Engineer in Los Altos, California. In this role, you will build a robust platform for AI intelligence, relying on foundation models and complex LLM interactions. Your skills in designing high-performance infrastructure and working autonomously on... 

    Poetiq

    Los Altos, CA
    1 day ago
  • $190k - $250k

     ...Location Type Hybrid Department AI We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust,...  ...respond to system outages Explore novel research and implement LLM inference optimizations Qualifications Experience with ML... 
    Full time

    Kindredventures

    Palo Alto, CA
    3 days ago
  • A leading technology company in California is seeking an Engineering Manager to lead the development of cutting-edge LLM/VLM technologies. In this hands-on leadership role, you will manage a team responsible for optimizing runtime and frameworks, while collaborating with... 

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  •  ...we are building a mathematical reasoning engine that operates with absolute precision. While...  ...to our distributed training loops and inference engines. We are seeking engineers who view...  ...reinforcement learning and low-latency LLM production traffic. Tune the inference engine... 

    Harmonic

    Palo Alto, CA
    4 days ago
  •  ...to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based...  ...Role We are hiring a Senior Performance Engineer to join our Product team. You are an...  ...source inference stacks (vLLM,SGLang,TensorRT-LLM), GPU kernel-level optimization... 
    Contract work
    Shift work

    Cerebras Systems, Inc.

    Sunnyvale, CA
    2 days ago
  • Sarah Smith Fund in Palo Alto is seeking an Inference Engineer to enhance the performance of AI-driven products. In this technical role, you will design and optimize inference pipelines, implementing cutting-edge acceleration techniques to improve model speed and efficiency... 

    Sarah Smith Fund

    Palo Alto, CA
    4 days ago
  • $152k - $241.5k

    We are looking for a Senior DL Algorithms Engineer for LLM/Omni model optimizations! Seeking senior engineers who are mindful of performance...  ...models (like Nemotron and Cosmos) on NVIDIA’s accelerated inference SW stack. Contribute new features, fix bugs and deliver... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     .... NVIDIA is seeking top‑tier AI Compiler Engineers to drive innovation within our world‑class...  ...problems for AI workloads (both inference and training) and successfully transition...  .../or custom AI accelerator architectures. LLM Knowledge: Deep understanding of Large Language... 

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • Cerebras is seeking a Staff Engineer to join their Inference Platform team in Sunnyvale, California. This role involves leading and contributing to projects focused on cloud and ML components, with responsibilities spanning design, mentoring, and coding. The ideal candidate... 

    Cerebras

    Sunnyvale, CA
    4 days ago
  • $170.5k - $315.49k

     ...models fast on the hardware people actually own. You optimize inference engines (llama.cpp, vLLM) for constrained local and edge environments...  ...Python; comfortable reading systems‑level code Understands how LLM inference works (attention, KV cache, decoding) Has profiled... 
    Local area
    Immediate start
    Shift work

    Intel

    Santa Clara, CA
    3 days ago
  • NVIDIA is seeking a Senior DL Algorithms Engineer to optimize LLM/Omni models and enhance performance across its software stack. The ideal candidate...  ...3+ years of experience in deep learning, specifically in inference. This role involves profiling, analyzing bottlenecks, and... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • Pika is looking for a Senior Inference Engineer in Palo Alto to enhance the performance of AI-driven products. This pivotal role involves designing and optimizing inference pipelines, applying advanced techniques to improve model speed and efficiency. The ideal candidate... 

    Pika

    Palo Alto, CA
    3 days ago
  • Pika is seeking an experienced Inference Engineer to enhance the performance of its AI-driven products in Palo Alto, CA. The role encompasses optimizing inference pipelines, applying advanced acceleration techniques, and collaborating on AI model deployment. The ideal... 

    Pika

    Palo Alto, CA
    5 days ago
  • $128.7k - $261.3k

    About the Team The Model Deployment & Inference Solutions team in GM AV deploys machine learning...  ...currently performed manually by engineers. Build the developer experience that ML model...  ...Qualifications Experience building agentic or LLM‑powered developer tooling. Experience... 
    Local area
    Remote work
    Flexible hours
    Shift work

    General Motors

    Mountain View, CA
    3 days ago
  • $180k - $250k

    A leading AI infrastructure firm is seeking a TPU Systems Engineer to develop high-performance systems using JAX, XLA, and Pallas. This role involves pushing large-model workloads on TPU hardware and optimizing performance across the stack. Candidates should have at least... 

    RadixArk

    Palo Alto, CA
    2 days ago
  •  ...companies. As a Senior Principal Software Engineer at JPMorganChase within the Commercial &...  ...deployment and optimization using Model Inference servers such as Triton Inference Server and...  ...success architecting and deploying LLM & GNN solutions on AWS (e.g., SageMaker,... 

    TwinThread LLC

    Palo Alto, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference Engineer. Be the first to apply!