Distributed LLM Inference Engineer

Full-time

Anyscale

About Anyscale

At Anyscale , we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray , a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI , Uber , Spotify , Instacart , Cruise , and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.

With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.

Proud to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.

About the role

As a Distributed LLM Inference Engineer, you will help systems and optimizations that push the boundaries of performance for inference at large scale. This is an incredibly critical role to Anyscale as it allows us to achieve a market leading position for AI infrastructure.

As part of this role, you will

Iterate very quickly with product teams to ship the end to end solutions for Batch and Online inference at high scale which will be used by open-source Ray users and customers of Anyscale

Work across the stack integrating Ray Data and LLM engine providing optimizations achieving low cost solutions for large scale ML inference
Integrate with Open source software like vLLM, work closely with the community to adopt these techniques in Anyscale solutions, and also contribute improvements to open source
Follow the latest state-of-the-art in the open source and the research community, implementing and extending best practices

We'd love to hear from you if you have

Familiarity with running ML inference at large scale with high throughput and low latency
Familiarity with deep learning and deep learning frameworks (e.g. PyTorch)
Solid understanding of distributed systems, ML inference challenges

Bonus points!

ML Systems knowledge

Experience using Ray
Work closely with community on LLM engines like vLLM, TensorRT-LLM
Contributions to deep learning frameworks (PyTorch, TensorFlow)
Contributions to deep learning compilers (Triton, TVM, MLIR)
Prior experience working on GPUs / CUDA

Compensation

At Anyscale, we take a market-based approach to compensation. We are data-driven, transparent, and consistent. As the market data changes over time, the target salary for this role may be adjusted.

This role is also eligible to participate in Anyscale's Equity and Benefits offerings, including the following:

Stock Options
Healthcare plans, with premiums covered by Anyscale at 99% for both employees and dependents
401k Retirement Plan
Education & Wellbeing Stipend
Paid Parental Leave
Fertility Benefits
Paid Time Off
Commute reimbursement
100% of in-office meals covered

Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.

Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish

Apply

Vacancy posted 16 hours ago

Similar jobs that could be interesting for youBased on the Distributed LLM Inference Engineer in San Francisco, CA vacancy

Distributed Systems Engineer, Data & Inference Platform
...into useful intelligence - the inference services that serve LLMs at... ...about both. Researchers and ML engineers will hand you workloads that... ...Scale: Design and operate distributed inference systems for LLMs, optimizing... ...: hands-on experience with LLM inference engines (vLLM,...
Suggested
Flexible hours
Adaption
San Francisco, CA
5 days ago
LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together.ai, we are building state-of-... ...Optimization Engineer to design, develop, and optimize distributed inference engines that support multimodal and...
Suggested
Full time
Together AI
San Francisco, CA
7 days ago
Senior Engineer 2: AI Inference Engine Systems
$167.2k - $209k
...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you... ...will work at the intersection of distributed systems and specialized AI hardware... ...inference serving frameworks such as llm‑d, NVIDIA Dynamo, or Ray Serve. Hardware...
Suggested
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
4 days ago
Distributed Systems Engineer
...vector database team at Redis, shipped 100+ LLM applications, and is a contributor to... ...assembled authentication, integrations, distributed systems, and AI experts from Okta, Redis... ...desire to ship. ~7+ years of software engineering experience comprising of: ~5+ years...
Suggested
Work at office
Shift work
Arcade
San Francisco, CA
2 days ago
Senior Software Engineer (Python + Distributed systems)
$146.5k
...preferences. About the team: The ML Data Engineering team powers metadata extraction,... ...machine learning, data engineering, and distributed systems, collaborating closely with... ...product teams to deploy scalable ML and LLM-powered solutions in production. Role...
Suggested
Local area
Worldwide
Home office
Flexible hours
Scribd
San Francisco, CA
23 hours ago
System Performance Engineering
$350k
...group of committed researchers, engineers, policy experts, and business... ...the Role Anthropic's inference fleet serves Claude to millions... ...accelerator kernels, model servers, distributed routing, autoscaling,... ...inference infrastructure or general LLM serving stacks. Direct large-...
Work at office
Visa sponsorship
Flexible hours
San Francisco, CA
14 days ago
Machine Learning Engineer: LLM Interpretability & Systems
...Fortune 500. By bridging the gap between LLM capabilities and domain-specific requirements... ...?" CTGT's Senior Machine Learning Engineer will operate deep within the model stack,... ...enable deterministic policy enforcement at inference time. Who You Are Strong...
CTGT
San Francisco, CA
3 days ago
Engineering Manager, MAAS Distributed Systems Leader
...A leading open-source technology firm is seeking an Engineering Manager to lead the MAAS team in San Francisco. This role requires technical... ...and commitment to open-source technology, providing a dynamic, distributed work environment for its employees. #J-18808-Ljbffr...
Canonical
San Francisco, CA
23 hours ago
Principal Infra Engineer - Distributed Systems (Remote)
...A leading cybersecurity company is seeking an experienced Infrastructure Engineer to optimize and maintain their platform components. This remote position involves solving complex distributed systems problems and scaling infrastructure using Go, Kubernetes, GCP, and AWS...
Remote work
Palo Alto Networks
San Francisco, CA
23 hours ago
Lead AI Engineer (FM Hosting, LLM Inference)
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine learning to create real-time, personalized...
Full time
Part time
Local area
Capital One Financial Corp
San Francisco, CA
1 day ago
Distributed Systems Engineer
...and performance goals of the infrastructure. Help design, implement, and monitor testnets. Expert knowledge of peer-to-peer distributed system design and implementation (required) Ability to build and maintain high available infrastructure (required)...
1872 Consulting
San Francisco, CA
2 days ago
GPU Systems Engineer - HPC / Parallel Computing
$160k - $320k
...deliver excellence. We seek engineers/researchers with strong... ...experience to help scale AI inference. You’ll leverage your knowledge... ...tooling Familiarity with distributed training/inference frameworks... ...architectures (virtual, 30 minutes) LLM-assisted coding assessment (...
Full time
Work at office
Vast
San Francisco, CA
23 hours ago
Distributed Systems Engineer
...E2B Infrastructure Engineer E2B is a fast-growing Series A startup with 8-figure revenue. We've raised over $37M since our founding... ...other software apps. Your job will be: # Building a distributed system for millions and billions of AI agents running on E2B...
Work from home
Relocation
E2B
San Francisco, CA
1 day ago
Distributed Systems Engineer
...more time putting knowledge into action. We're looking for engineers who want to build the operating system for AI Data Applications... .... About the role We're looking for experienced distributed systems engineers to build the core infrastructure for our durable...
Tensorlake, Inc.
San Francisco, CA
4 days ago
Distributed Systems Engineer
...Distributed Systems Engineer As a distributed systems engineer, you'll work across the stack to solve problems as they come up and help build Archil volumes. You'll have significant influence over the technical and product direction. We'll expect you to be able...
Flexible hours
Archil
San Francisco, CA
23 hours ago
Engineering Manager (AI Inference)
...About the Role We are looking for an Inference Engineering Manager to lead our AI Inference team.... ..., vLLM) ~ Strong understanding of LLM architecture: Multi-Head Attention,... ...analysis ~ Experience deploying reliable, distributed, real-time systems at scale ~ Track...
Perplexity
San Francisco, CA
2 days ago
Member of Technical Staff (AI Inference Engineer)
...Inference Engine Engineer We build and run the inference engine behind every Perplexity query... ...is a plus. You understand modern LLM architectures and are able to bring them... ...You've built and operated production distributed systems under real load - ideally performance...
Perplexity AI
San Francisco, CA
2 days ago
QA Engineering Tech
...that our platform delivers AI inference. Validating whether inference... ...looking for a dedicated QA engineer who can own the product's quality... ...strategies that account for LLM inference. Work closely... ...~ Strong experience testing distributed systems with multiple interconnected...
Worldwide
Flexible hours
FriendliAI Corp
San Francisco, CA
3 days ago
Software Engineer, Distributed Systems - Infra
$180k - $310k
...technical investments with rapid shipping velocity. As Software Engineer on the Platform team, you'll collaborate across frontend,... ...What you'll do Design and implement scalable APIs, distributed systems, and data infrastructure that serve millions of users...
Full time
Work at office
Work from home
Gamma
San Francisco, CA
3 days ago
Software Engineer - GenAI inference
$142.2k - $204.6k
...About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize... ..., ensuring our large language model (LLM) serving systems are fast, scalable,... ...Integrate with federated, distributed inference infrastructure - orchestrate...
Local area
Worldwide
Databricks
San Francisco, CA
4 days ago
ML Research Engineer, ML Systems
$189.6k - $237k
...RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been... ...automatic training and evaluation of LLM's, as well as evaluation of data... ...ML systems Strong software engineering skills, proficient in...
Full time
Scale AI
San Francisco, CA
4 days ago
Member of Technical Staff - Distributed Systems Engineer
...orchestration, reproducible environments, multi-tenant isolation. Distributed Systems Architecture: Sharding, replication, coordination... .../SLO design, tail latency optimization, service reliability engineering. Ideal candidates have: A strong software...
Relocation package
Reflection AI
San Francisco, CA
2 days ago
Engineering Manager, Inference Routing and Performance
$405k
...group of committed researchers, engineers, policy experts, and business... ...'t have been shed. The Inference Routing team owns this layer.... ...'ll inherit a strong team of distributed-systems engineers, and you'll... ...have: Experience with LLM inference serving - KV...
Work at office
Visa sponsorship
Flexible hours
Shift work
Anthropic
San Francisco, CA
3 days ago
Staff + Sr. Software Engineer, Cloud Inference Launch Engineering
$320k
...group of committed researchers, engineers, policy experts, and business... ...the Role The Cloud Inference team scales and optimizes Claude... ...Have a strong interest in LLM serving; prior inference or ML... ...high-performance, large-scale distributed systems serving millions of users...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
23 hours ago
Senior Engineer, Inference Data Plane
$139.2k - $174k
...applications.We are seeking a Senior Engineer 2 to play a key role in our... ...for running AI workloads— inference, training, fine-tuning— at... ...bridge the gap between high-scale distributed systems and specialized AI... ...complex orchestration for LLM inference and hosting services...
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
23 hours ago
Software Engineer, Distributed Data Systems - Robotics
$230k - $385k
...level AI capabilities with the constraints of physical systems to improve peoples' lives. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at...
Work at office
Relocation package
OpenAI
San Francisco, CA
2 days ago
Engineering Manager - Forward Deployed Engineering (LLM)
...Baseten powers mission-critical inference for the world's most dynamic... ...and help build the platform engineers turn to to ship AI products.... ...building, scaling, and optimizing LLM inference workloads for... ...understanding of GPU infrastructure, distributed inference, or model...
Full time
Temporary work
Flexible hours
Baseten
San Francisco, CA
16 hours ago
DevOps Distributed Messaging System Engineer
...DevOps Distributed Messaging System Engineer Location: San Francisco, CA Duration: 24 Months Required Skills: ~7+ years of experience supporting large scale highly robust, low latency web applications ~2+ years of implementation experience with Kafka and...
InterSources
San Francisco, CA
2 days ago
| Software Engineer, Distributed Systems (Core) |
$170k - $260k
| Software Engineer, Distributed Systems (Core) | Title of Role: | Software Engineer, Distributed Systems (Core) | Location: San Francisco, CA, remote Company Stage of Funding: Series C - Software Development Office Type: Remote Salary: $170K-$260K...
Work at office
Remote work
Visa sponsorship
Recruiting from Scratch
San Francisco, CA
2 days ago
Distributed Systems Engineer
$100k
...approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About the role: As a distributed systems engineer, you will build the data and coordination systems that enable ultra-long context...
Remote job
Relocation
Visa sponsorship
Magic
San Francisco, CA
more than 2 months ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed LLM Inference Engineer. Be the first to apply!