Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed LLM Inference Engineer

Full-time

Anyscale

About Anyscale

At  Anyscale , we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing  Ray , a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like  OpenAI ,  Uber ,  Spotify ,  Instacart ,  Cruise , and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.

With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.

Proud to be backed by  Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.

About the role

As a Distributed LLM Inference Engineer, you will help systems and optimizations that push the boundaries of performance for inference at large scale. This is an incredibly critical role to Anyscale as it allows us to achieve a market leading position for AI infrastructure.

As part of this role, you will

  • Iterate very quickly with product teams to ship the end to end solutions for Batch and Online inference at high scale which will be used by open-source Ray users and customers of Anyscale

  • Work across the stack integrating Ray Data and LLM engine providing optimizations achieving low cost solutions for large scale ML inference 

  • Integrate with Open source software like vLLM, work closely with the community to adopt these techniques in Anyscale solutions, and also contribute improvements to open source

  • Follow the latest state-of-the-art in the open source and the research community, implementing and extending best practices

We'd love to hear from you if you have

  • Familiarity with running ML inference at large scale with high throughput and low latency

  • Familiarity with deep learning and deep learning frameworks (e.g. PyTorch)

  • Solid understanding of distributed systems, ML inference challenges

Bonus points!

  • ML Systems knowledge

  • Experience using Ray 

  • Work closely with community on LLM engines like vLLM, TensorRT-LLM

  • Contributions to deep learning frameworks (PyTorch, TensorFlow)

  • Contributions to deep learning compilers (Triton, TVM, MLIR)

  • Prior experience working on GPUs / CUDA

Compensation

At Anyscale, we take a market-based approach to compensation. We are data-driven, transparent, and consistent. As the market data changes over time, the target salary for this role may be adjusted.

This role is also eligible to participate in Anyscale's Equity and Benefits offerings, including the following:

  • Stock Options

  • Healthcare plans, with premiums covered by Anyscale at 99% for both employees and dependents

  • 401k Retirement Plan

  • Education & Wellbeing Stipend

  • Paid Parental Leave

  • Fertility Benefits

  • Paid Time Off

  • Commute reimbursement

  • 100% of in-office meals covered

Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.  

Anyscale Inc. is an E-Verify company and you may review the  Notice of E-Verify Participation and the  Right to Work posters in English and Spanish

Vacancy posted 16 hours ago
Similar jobs that could be interesting for youBased on the Distributed LLM Inference Engineer in San Francisco, CA vacancy
  •  ...into useful intelligence - the inference services that serve LLMs at...  ...about both. Researchers and ML engineers will hand you workloads that...  ...Scale: Design and operate distributed inference systems for LLMs, optimizing...  ...: hands-on experience with LLM inference engines (vLLM,... 
    Suggested
    Flexible hours

    Adaption

    San Francisco, CA
    5 days ago
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together.ai, we are building state-of-...  ...Optimization Engineer to design, develop, and optimize distributed inference engines that support multimodal and... 
    Suggested
    Full time

    Together AI

    San Francisco, CA
    7 days ago
  • $167.2k - $209k

     ...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you...  ...will work at the intersection of distributed systems and specialized AI hardware...  ...inference serving frameworks such as llm‑d, NVIDIA Dynamo, or Ray Serve. Hardware... 
    Suggested
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    4 days ago
  •  ...vector database team at Redis, shipped 100+ LLM applications, and is a contributor to...  ...assembled authentication, integrations, distributed systems, and AI experts from Okta, Redis...  ...desire to ship. ~7+ years of software engineering experience comprising of: ~5+ years... 
    Suggested
    Work at office
    Shift work

    Arcade

    San Francisco, CA
    2 days ago
  • $146.5k

     ...preferences. About the team: The ML Data Engineering team powers metadata extraction,...  ...machine learning, data engineering, and distributed systems, collaborating closely with...  ...product teams to deploy scalable ML and LLM-powered solutions in production. Role... 
    Suggested
    Local area
    Worldwide
    Home office
    Flexible hours

    Scribd

    San Francisco, CA
    23 hours ago
  • $350k

     ...group of committed researchers, engineers, policy experts, and business...  ...the Role Anthropic's inference fleet serves Claude to millions...  ...accelerator kernels, model servers, distributed routing, autoscaling,...  ...inference infrastructure or general LLM serving stacks. Direct large-... 
    Work at office
    Visa sponsorship
    Flexible hours
    San Francisco, CA
    14 days ago
  •  ...Fortune 500. By bridging the gap between LLM capabilities and domain-specific requirements...  ...?" CTGT's Senior Machine Learning Engineer will operate deep within the model stack,...  ...enable deterministic policy enforcement at inference time. Who You Are Strong... 

    CTGT

    San Francisco, CA
    3 days ago
  •  ...A leading open-source technology firm is seeking an Engineering Manager to lead the MAAS team in San Francisco. This role requires technical...  ...and commitment to open-source technology, providing a dynamic, distributed work environment for its employees. #J-18808-Ljbffr... 

    Canonical

    San Francisco, CA
    23 hours ago
  •  ...A leading cybersecurity company is seeking an experienced Infrastructure Engineer to optimize and maintain their platform components. This remote position involves solving complex distributed systems problems and scaling infrastructure using Go, Kubernetes, GCP, and AWS... 
    Remote work

    Palo Alto Networks

    San Francisco, CA
    23 hours ago
  • $197.3k - $225.1k

     ...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine learning to create real-time, personalized... 
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Francisco, CA
    1 day ago
  •  ...and performance goals of the infrastructure. Help design, implement, and monitor testnets. Expert knowledge of peer-to-peer distributed system design and implementation (required) Ability to build and maintain high available infrastructure (required)... 

    1872 Consulting

    San Francisco, CA
    2 days ago
  • $160k - $320k

     ...deliver excellence.  We seek engineers/researchers with strong...  ...experience to help scale AI inference. You’ll leverage your knowledge...  ...tooling Familiarity with distributed training/inference frameworks...  ...architectures (virtual, 30 minutes) LLM-assisted coding assessment (... 
    Full time
    Work at office

    Vast

    San Francisco, CA
    23 hours ago
  •  ...E2B Infrastructure Engineer E2B is a fast-growing Series A startup with 8-figure revenue. We've raised over $37M since our founding...  ...other software apps. Your job will be: # Building a distributed system for millions and billions of AI agents running on E2B... 
    Work from home
    Relocation

    E2B

    San Francisco, CA
    1 day ago
  •  ...more time putting knowledge into action. We're looking for engineers who want to build the operating system for AI Data Applications...  .... About the role We're looking for experienced distributed systems engineers to build the core infrastructure for our durable... 

    Tensorlake, Inc.

    San Francisco, CA
    4 days ago
  •  ...Distributed Systems Engineer As a distributed systems engineer, you'll work across the stack to solve problems as they come up and help build Archil volumes. You'll have significant influence over the technical and product direction. We'll expect you to be able... 
    Flexible hours

    Archil

    San Francisco, CA
    23 hours ago
  •  ...About the Role We are looking for an Inference Engineering Manager to lead our AI Inference team....  ..., vLLM) ~ Strong understanding of LLM architecture: Multi-Head Attention,...  ...analysis ~ Experience deploying reliable, distributed, real-time systems at scale ~ Track... 

    Perplexity

    San Francisco, CA
    2 days ago
  •  ...Inference Engine Engineer We build and run the inference engine behind every Perplexity query...  ...is a plus. You understand modern LLM architectures and are able to bring them...  ...You've built and operated production distributed systems under real load - ideally performance... 

    Perplexity AI

    San Francisco, CA
    2 days ago
  •  ...that our platform delivers AI inference. Validating whether inference...  ...looking for a dedicated QA engineer who can own the product's quality...  ...strategies that account for LLM inference. Work closely...  ...~ Strong experience testing distributed systems with multiple interconnected... 
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    3 days ago
  • $180k - $310k

     ...technical investments with rapid shipping velocity. As Software Engineer on the Platform team, you'll collaborate across frontend,...  ...What you'll do Design and implement scalable APIs, distributed systems, and data infrastructure that serve millions of users... 
    Full time
    Work at office
    Work from home

    Gamma

    San Francisco, CA
    3 days ago
  • $142.2k - $204.6k

     ...About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize...  ..., ensuring our large language model (LLM) serving systems are fast, scalable,...  ...Integrate with federated, distributed inference infrastructure - orchestrate... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    4 days ago
  • $189.6k - $237k

     ...RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been...  ...automatic training and evaluation of LLM's, as well as evaluation of data...  ...ML systems Strong software engineering skills, proficient in... 
    Full time

    Scale AI

    San Francisco, CA
    4 days ago
  •  ...orchestration, reproducible environments, multi-tenant isolation. Distributed Systems Architecture: Sharding, replication, coordination...  .../SLO design, tail latency optimization, service reliability engineering. Ideal candidates have: A strong software... 
    Relocation package

    Reflection AI

    San Francisco, CA
    2 days ago
  • $405k

     ...group of committed researchers, engineers, policy experts, and business...  ...'t have been shed. The Inference Routing team owns this layer....  ...'ll inherit a strong team of distributed-systems engineers, and you'll...  ...have: Experience with LLM inference serving - KV... 
    Work at office
    Visa sponsorship
    Flexible hours
    Shift work

    Anthropic

    San Francisco, CA
    3 days ago
  • $320k

     ...group of committed researchers, engineers, policy experts, and business...  ...the Role The Cloud Inference team scales and optimizes Claude...  ...Have a strong interest in LLM serving; prior inference or ML...  ...high-performance, large-scale distributed systems serving millions of users... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    23 hours ago
  • $139.2k - $174k

     ...applications.We are seeking a Senior Engineer 2 to play a key role in our...  ...for running AI workloads— inference, training, fine-tuning— at...  ...bridge the gap between high-scale distributed systems and specialized AI...  ...complex orchestration for LLM inference and hosting services... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    23 hours ago
  • $230k - $385k

     ...level AI capabilities with the constraints of physical systems to improve peoples' lives. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...Baseten powers mission-critical inference for the world's most dynamic...  ...and help build the platform engineers turn to to ship AI products....  ...building, scaling, and optimizing LLM inference workloads for...  ...understanding of GPU infrastructure, distributed inference, or model... 
    Full time
    Temporary work
    Flexible hours

    Baseten

    San Francisco, CA
    16 hours ago
  •  ...DevOps Distributed Messaging System Engineer Location: San Francisco, CA Duration: 24 Months Required Skills: ~7+ years of experience supporting large scale highly robust, low latency web applications ~2+ years of implementation experience with Kafka and... 

    InterSources

    San Francisco, CA
    2 days ago
  • $170k - $260k

    | Software Engineer, Distributed Systems (Core) | Title of Role: | Software Engineer, Distributed Systems (Core) | Location: San Francisco, CA, remote Company Stage of Funding: Series C - Software Development Office Type: Remote Salary: $170K-$260K... 
    Work at office
    Remote work
    Visa sponsorship

    Recruiting from Scratch

    San Francisco, CA
    2 days ago
  • $100k

     ...approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About the role: As a distributed systems engineer, you will build the data and coordination systems that enable ultra-long context... 
    Remote job
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    more than 2 months ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed LLM Inference Engineer. Be the first to apply!