Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Engineer - ML/LLM Serving

$180k - $220k

Alldus

Senior Software Engineer - ML/LLM Serving

This range is provided by Alldus. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

$180,000.00/yr - $220,000.00/yr

Direct message the job poster from Alldus

About the Role

We are seeking a senior/staff Machine Learning Serving Software Engineer who thrives in a fast-paced, customer-focused environment and can build robust, flexible infrastructure to serve a diverse range of ML models including both LLMs and classical ML.

The Role

As an ML Serving Engineer, you will design, implement, and optimize infrastructure that powers the deployment and inference of machine learning models across varied customer environments. You’ll work closely with product, research, and customer engineering teams to deliver low-latency, secure, and scalable ML serving solutions.

Responsibilities
  • Design and build scalable, high-performance ML serving infrastructure capable of handling diverse model types (LLMs, recommendation systems, etc.).
  • Optimize inference pipelines for latency, throughput, and cost efficiency.
  • Integrate with a wide range of customer environments, adapting serving strategies to fit their infrastructure and compliance needs.
  • Deploy, monitor, and maintain ML models in production using modern deployment stacks.
  • Collaborate with ML researchers to operationalize new models and ensure seamless integration into customer workflows.
  • Ensure security and privacy best practices are applied to model deployment and inference, aligning with enterprise-grade data security requirements.
  • Stay up-to-date with the latest serving technologies and frameworks, evaluating and integrating them where relevant.
Qualifications

Required

  • 5+ years of professional software engineering experience, with at least 3+ years focused on ML serving, inference infrastructure, or similar domains.
  • Proven experience deploying and optimizing large language models (LLMs) in production.
  • Hands-on expertise with multiple ML serving frameworks (e.g., TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, Ray Serve, vLLM, etc.).
  • Strong programming skills in Python, Go, or C++.
  • Experience with distributed systems and container orchestration tools (Kubernetes, Docker).
  • Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) and performance profiling for inference workloads.
  • Solid understanding of secure data handling and privacy-preserving ML practices.
  • Knowledge of cloud platforms (AWS, GCP, Azure) and hybrid/on-prem deployment scenarios.

Preferred

  • Prior experience serving multiple model types beyond LLMs, e.g., recommendation engines and classical ML models.
  • Exposure to model quantization, distillation, caching, and other optimization techniques for inference efficiency.
  • Experience working with enterprise customers or within compliance-heavy environments.
Details
  • Seniority level: Mid-Senior level
  • Employment type: Full-time
  • Job function: Software Development

Referrals increase your chances of interviewing at Alldus by 2x

Related roles

AI/ML Engineer (Multiple roles and seniority levels) – San Jose, CA (examples of similar roles and salary ranges).

#J-18808-Ljbffr
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior Software Engineer - ML/LLM Serving in San Jose, CA vacancy
  •  ...THE ROLE: As a senior member of the LLM inference framework team...  ...platform for LLM serving. This role sits...  ...of inference engines, distributed systems,...  ...are a systems-minded ML engineer who thinks in...  ...kernel development Software Engineering ~ Expertise... 
    Senior

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    5 days ago
  • $184k - $287.5k

    Senior Deep Learning Software Engineer, LLM Performance page is loaded## Senior Deep Learning Software Engineer, LLM Performancelocations: US, CA, Santa...  ...to enable the performance optimization, deployment and serving of these DL solutions. We specialize in developing GPU... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    5 days ago
  • $184k - $287.5k

     ...outstanding AI systems engineers to develop groundbreaking...  ...the inference systems software stack! We build...  ...kernel implementations, new LLM inference runtimes components...  ...abstractions for LLM serving engines Building...  ...industry) experience with ML/DL systems development... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  •  ...Join NVIDIA’s TensorRT Edge‑LLM team and help shape the next...  ...automotive and robotics. We build the software stack that enables Large...  ...with autoregressive model serving capabilities, including speculative...  ...Science, Electrical/Computer Engineering, or a closely related field.... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...We are seeking a Senior Software Engineer to drive integration of the NVIDIA Grove...  ...such as Dynamo, llm-d, Ray, PyTorch, and other emerging...  ...distributed runtimes, model serving stacks). Solid understanding...  ...Kubernetes ecosystem, or related ML infrastructure projects.... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $152k - $204k

     ...Senior Software Engineer, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud...  ...performance. ~ Optimize end-to-end ML system performance by developing and...  ...inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    2 hours ago
  • $207k - $300k

    Software Engineer, GDC LLM Serving and GPU Performance Google Sunnyvale, CA, USA Qualifications Bachelor’s degree or equivalent practical experience....  ...reinforcement learning (e.g., sequential decision making), ML infrastructure, or specialization in another ML field. 5... 
    Full time

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $152k - $241.5k

    Senior Software Engineer, Quantized Inference page is loaded## Senior Software Engineer...  ...engines (vLLM, TRT-LLM, SGLang). The candidate will...  ...serialize correctly for downstream serving* Build prototypes and...  ...assisted tooling* Experience with ML accelerators with a basic... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $181.1k - $318.4k

    Senior Software Engineer, On-Device Health Agentic Systems Cupertino, California, United...  ...‑edge iOS applications that serve as the primary interface for...  ...integrates powerful LLM agents into the Apple ecosystem...  ...features on iOS. Collaborate with ML and AI researchers to... 
    Senior
    Relocation

    Apple Inc.

    Cupertino, CA
    3 days ago
  • $229.9k - $262.4k

    Senior Lead AI Engineer (AI Foundations, LLM Core and Agentic AI) Overview: At Capital One, we...  ...our applications of AI & ML are bringing humanity and...  ...capabilities to reimagine how we serve our customers and...  ..., deploy, and support AI software components including foundation... 
    Senior
    Full time
    Part time
    Local area

    Capital One

    San Jose, CA
    2 days ago
  • $110k - $190k

     ...We are hiring a Senior Software & AI Engineer to build production-grade AI systems, with...  ...applications and APIs that serve both our internal scientists...  ...systems in production Build LLM-powered applications,...  ...world use Apply traditional ML techniques where appropriate... 
    Senior

    Covalent

    Sunnyvale, CA
    3 days ago
  • $174k - $252k

    Senior Software Engineer, Infrastructure AI/ML, Google Cloud Google Sunnyvale, CA, USA Apply Bachelor’s degree or equivalent...  ...following: Large Language Model (LLM)/Agent Deployment, reinforcement...  ...is representative of the users we serve, creating a culture of belonging,... 
    Senior
    Full time

    Google Inc.

    Sunnyvale, CA
    3 days ago
  •  .... Now we need an experienced software engineer to make these systems scale. You...  ...production services. This is not an ML research role. This is a...  ...reliability of services that call LLM APIs, interact with Azure cloud services, and serve critical data to our... 
    Senior

    Kai

    San Jose, CA
    3 days ago
  • $184k - $287.5k

     ...seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme...  ...frontier for the field of ML Systems; survey recent...  ...Experience building and optimizing LLM inference engines (e.g., vLLM,... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $174k - $252k

    Senior Software ML Engineer, AI/ML GenAI, Gemini Enterprise corporate_fare Google place Sunnyvale, CA, USA...  ...Experience building production-grade LLM-based applications and working with...  ...that is representative of the users we serve, creating a culture of belonging, and... 
    Senior
    Full time

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $272k - $431.25k

     ...inference framework for serving generative AI and reasoning...  ...of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and...  ...and memory pools. Mentor senior and junior engineers, set...  ...performance storage, or ML systems infrastructure in... 
    Local area
    Remote work

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...Visualization. The GPU, our invention, serves as the visual cortex of modern...  ...for a motivated Deep Learning engineer to bring advanced CUDA...  ..., including PyTorch, TRT-LLM, vLLM, SGLang, JAX, etc. You will...  ...systems principles (aka systems software fundamentals) ~ Adaptability... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $217k - $310k

     ...looking for a Sr. Principal Software Engineer to join our team. This is a San...  ...architecture that serves over 15 million users. What...  ...Foundational understanding of AI/ML technologies and experience leveraging...  ..., large language model (LLM) orchestration frameworks, and... 
    Senior
    Full time
    Work at office
    Local area
    Worldwide

    Zscaler

    San Jose, CA
    3 days ago
  • $152k - $241.5k

     ...NVIDIA Gruppe is seeking a Senior Software Engineer – AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  •  ...Business Area: Engineering Seniority Level: Mid-Senior level Job Description...  ..., self-driven Senior Software Engineer with deep expertise...  ..., engineering, and AI/ML across the world's largest enterprises...  ...Catalog service that serves as a single endpoint for segregated... 
    Senior
    Remote work
    Work from home
    Flexible hours

    Cloudera

    San Jose, CA
    3 days ago
  • $212.8k

     ...TikTok's live streaming and RTC scenarios. Serving billions of users worldwide, our network...  ...- Bachelor's degree in Computer Science, Engineering, or a related field. - Hands-on...  ...distributed systems. - Experience applying AI/ML techniques to improve efficiency or performance... 
    Senior
    Temporary work
    Local area

    Tik Tok

    San Jose, CA
    3 days ago
  • $165k - $241.4k

     ...Networking Engineer This is a hybrid role working onsite in Milpitas 3 days a week....  ...support, and engineering collaboration, serving as a key bridge between customers and internal...  ...operational data, and leverage AI/ML or LLM-based approaches where appropriate for log... 
    Senior
    Full time
    Temporary work
    Flexible hours
    3 days per week

    Webex Events (formerly Socio)

    Milpitas, CA
    4 days ago
  • $2,000 per month

     ...generation models and extremely deep chain-of-thought reasoning. Software Engineer, LLM Infrastructure Transformer ASICs, like those built by...  ...digit millisecond latency means nothing if the rest of the serving stack takes 100+ ms, and customers actually use it (or adopt... 
    Work at office
    Relocation package

    OpenReq

    Cupertino, CA
    3 days ago
  • $171.35k - $232.5k

     ...supply chain. Intelligent software orchestrates advanced...  ...we need As a Software Engineer, you will play a key role...  ...intellectually curious Senior Software Engineer who...  ...other team members by serving as a technical mentor to...  ...Plus: Experience with ML and cloud computing models... 
    Senior

    Dormont Manufacturing Co

    Milpitas, CA
    2 days ago
  • $166k - $244k

    Senior Software Engineer, Infra, Vertex Gemini API+ Serving - Sunnyvale, CA, USA. About the job Google's software engineers develop the next‑generation technologies...  ...architecting production‑quality Machine Learning (ML) infrastructure. Experience in AI/ML related... 
    Senior
    Full time

    Carlsbad Tech

    Sunnyvale, CA
    2 days ago
  • $153k - $222k

     ...Machine Learning Engineer Applied Intuition, Inc. is powering the...  ...machine learning pipelines and ML engineers that want to work beyond...  ...s degree in Computer Science, Software Engineering, or equivalent...  ...Tensorflow, etc.), and model serving platforms (TorchServe, TensorFlow... 
    Senior
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Remote work
    Day shift

    Applied Intuition

    Sunnyvale, CA
    4 days ago
  • $152k - $241.5k

    We are now looking for a Senior Deep Learning Software Engineer, TensorRT Performance! NVIDIA is seeking an experienced...  ...optimization, deployment and serving of these DL inference solutions. We...  ...libraries (e.g. TensorRT, TensorRT-LLM, vLLM, SGLang, FlashInfer). Experience... 
    Senior

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $174k - $252k

    Senior Software Engineer, Machine Health Google Sunnyvale, CA, USA Qualifications Bachelor’s degree or...  ...the workflows that help get it to serve customers needs by turning it up, mitigate...  ...systems reliably and safely and ensure the ML, compute and storage capacity to all... 
    Senior
    Full time

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $166k - $244k

    Senior Software Engineer, AI/ML GenAI, Google Workspace link Copy link corporate_fare Google Sunnyvale, CA, USA Mid Experience driving progress, solving...  ...a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment... 
    Senior
    Full time

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $181.1k - $318.4k

    Senior Software Engineer - Routing Intelligence Cupertino, California, United States Software and Services...  .... We support production systems serving billions of requests daily, so occasional...  ...data structures Experience with using AI/ML to solve complex technical problems... 
    Senior
    Local area
    Relocation

    Apple

    Cupertino, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer - ML/LLM Serving. Be the first to apply!