Distributed LLM Inference Engineer
Anyscale
About Anyscale
At Anyscale , we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray , a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI , Uber , Spotify , Instacart , Cruise , and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.
With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.
Proud to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.
About the role
As a Distributed LLM Inference Engineer, you will help systems and optimizations that push the boundaries of performance for inference at large scale. This is an incredibly critical role to Anyscale as it allows us to achieve a market leading position for AI infrastructure.
As part of this role, you will
Iterate very quickly with product teams to ship the end to end solutions for Batch and Online inference at high scale which will be used by open-source Ray users and customers of Anyscale
Work across the stack integrating Ray Data and LLM engine providing optimizations achieving low cost solutions for large scale ML inference
Integrate with Open source software like vLLM, work closely with the community to adopt these techniques in Anyscale solutions, and also contribute improvements to open source
Follow the latest state-of-the-art in the open source and the research community, implementing and extending best practices
We'd love to hear from you if you have
Familiarity with running ML inference at large scale with high throughput and low latency
Familiarity with deep learning and deep learning frameworks (e.g. PyTorch)
Solid understanding of distributed systems, ML inference challenges
Bonus points!
ML Systems knowledge
Experience using Ray
Work closely with community on LLM engines like vLLM, TensorRT-LLM
Contributions to deep learning frameworks (PyTorch, TensorFlow)
Contributions to deep learning compilers (Triton, TVM, MLIR)
Prior experience working on GPUs / CUDA
Compensation
At Anyscale, we take a market-based approach to compensation. We are data-driven, transparent, and consistent. As the market data changes over time, the target salary for this role may be adjusted.
This role is also eligible to participate in Anyscale's Equity and Benefits offerings, including the following:
Stock Options
Healthcare plans, with premiums covered by Anyscale at 99% for both employees and dependents
401k Retirement Plan
Education & Wellbeing Stipend
Paid Parental Leave
Fertility Benefits
Paid Time Off
Commute reimbursement
100% of in-office meals covered
Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.
Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish
- ...into useful intelligence - the inference services that serve LLMs at... ...about both. Researchers and ML engineers will hand you workloads that... ...Scale: Design and operate distributed inference systems for LLMs, optimizing... ...: hands-on experience with LLM inference engines (vLLM,...SuggestedFlexible hours
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together.ai, we are building state-of-... ...Optimization Engineer to design, develop, and optimize distributed inference engines that support multimodal and...SuggestedFull time$167.2k - $209k
...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you... ...will work at the intersection of distributed systems and specialized AI hardware... ...inference serving frameworks such as llm‑d, NVIDIA Dynamo, or Ray Serve. Hardware...SuggestedLocal areaRemote workWorldwideFlexible hours- ...vector database team at Redis, shipped 100+ LLM applications, and is a contributor to... ...assembled authentication, integrations, distributed systems, and AI experts from Okta, Redis... ...desire to ship. ~7+ years of software engineering experience comprising of: ~5+ years...SuggestedWork at officeShift work
$146.5k
...preferences. About the team: The ML Data Engineering team powers metadata extraction,... ...machine learning, data engineering, and distributed systems, collaborating closely with... ...product teams to deploy scalable ML and LLM-powered solutions in production. Role...SuggestedLocal areaWorldwideHome officeFlexible hours$350k
...group of committed researchers, engineers, policy experts, and business... ...the Role Anthropic's inference fleet serves Claude to millions... ...accelerator kernels, model servers, distributed routing, autoscaling,... ...inference infrastructure or general LLM serving stacks. Direct large-...Work at officeVisa sponsorshipFlexible hours- ...Fortune 500. By bridging the gap between LLM capabilities and domain-specific requirements... ...?" CTGT's Senior Machine Learning Engineer will operate deep within the model stack,... ...enable deterministic policy enforcement at inference time. Who You Are Strong...
- ...A leading open-source technology firm is seeking an Engineering Manager to lead the MAAS team in San Francisco. This role requires technical... ...and commitment to open-source technology, providing a dynamic, distributed work environment for its employees. #J-18808-Ljbffr...
- ...A leading cybersecurity company is seeking an experienced Infrastructure Engineer to optimize and maintain their platform components. This remote position involves solving complex distributed systems problems and scaling infrastructure using Go, Kubernetes, GCP, and AWS...Remote work
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine learning to create real-time, personalized...Full timePart timeLocal area- ...and performance goals of the infrastructure. Help design, implement, and monitor testnets. Expert knowledge of peer-to-peer distributed system design and implementation (required) Ability to build and maintain high available infrastructure (required)...
$160k - $320k
...deliver excellence. We seek engineers/researchers with strong... ...experience to help scale AI inference. You’ll leverage your knowledge... ...tooling Familiarity with distributed training/inference frameworks... ...architectures (virtual, 30 minutes) LLM-assisted coding assessment (...Full timeWork at office- ...E2B Infrastructure Engineer E2B is a fast-growing Series A startup with 8-figure revenue. We've raised over $37M since our founding... ...other software apps. Your job will be: # Building a distributed system for millions and billions of AI agents running on E2B...Work from homeRelocation
- ...more time putting knowledge into action. We're looking for engineers who want to build the operating system for AI Data Applications... .... About the role We're looking for experienced distributed systems engineers to build the core infrastructure for our durable...
- ...Distributed Systems Engineer As a distributed systems engineer, you'll work across the stack to solve problems as they come up and help build Archil volumes. You'll have significant influence over the technical and product direction. We'll expect you to be able...Flexible hours
- ...About the Role We are looking for an Inference Engineering Manager to lead our AI Inference team.... ..., vLLM) ~ Strong understanding of LLM architecture: Multi-Head Attention,... ...analysis ~ Experience deploying reliable, distributed, real-time systems at scale ~ Track...
- ...Inference Engine Engineer We build and run the inference engine behind every Perplexity query... ...is a plus. You understand modern LLM architectures and are able to bring them... ...You've built and operated production distributed systems under real load - ideally performance...
- ...that our platform delivers AI inference. Validating whether inference... ...looking for a dedicated QA engineer who can own the product's quality... ...strategies that account for LLM inference. Work closely... ...~ Strong experience testing distributed systems with multiple interconnected...WorldwideFlexible hours
$180k - $310k
...technical investments with rapid shipping velocity. As Software Engineer on the Platform team, you'll collaborate across frontend,... ...What you'll do Design and implement scalable APIs, distributed systems, and data infrastructure that serve millions of users...Full timeWork at officeWork from home$142.2k - $204.6k
...About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize... ..., ensuring our large language model (LLM) serving systems are fast, scalable,... ...Integrate with federated, distributed inference infrastructure - orchestrate...Local areaWorldwide$189.6k - $237k
...RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been... ...automatic training and evaluation of LLM's, as well as evaluation of data... ...ML systems Strong software engineering skills, proficient in...Full time- ...orchestration, reproducible environments, multi-tenant isolation. Distributed Systems Architecture: Sharding, replication, coordination... .../SLO design, tail latency optimization, service reliability engineering. Ideal candidates have: A strong software...Relocation package
$405k
...group of committed researchers, engineers, policy experts, and business... ...'t have been shed. The Inference Routing team owns this layer.... ...'ll inherit a strong team of distributed-systems engineers, and you'll... ...have: Experience with LLM inference serving - KV...Work at officeVisa sponsorshipFlexible hoursShift work$320k
...group of committed researchers, engineers, policy experts, and business... ...the Role The Cloud Inference team scales and optimizes Claude... ...Have a strong interest in LLM serving; prior inference or ML... ...high-performance, large-scale distributed systems serving millions of users...Work at officeVisa sponsorshipFlexible hours$139.2k - $174k
...applications.We are seeking a Senior Engineer 2 to play a key role in our... ...for running AI workloads— inference, training, fine-tuning— at... ...bridge the gap between high-scale distributed systems and specialized AI... ...complex orchestration for LLM inference and hosting services...Local areaRemote workWorldwideFlexible hours$230k - $385k
...level AI capabilities with the constraints of physical systems to improve peoples' lives. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at...Work at officeRelocation package- ...Baseten powers mission-critical inference for the world's most dynamic... ...and help build the platform engineers turn to to ship AI products.... ...building, scaling, and optimizing LLM inference workloads for... ...understanding of GPU infrastructure, distributed inference, or model...Full timeTemporary workFlexible hours
- ...DevOps Distributed Messaging System Engineer Location: San Francisco, CA Duration: 24 Months Required Skills: ~7+ years of experience supporting large scale highly robust, low latency web applications ~2+ years of implementation experience with Kafka and...
$170k - $260k
| Software Engineer, Distributed Systems (Core) | Title of Role: | Software Engineer, Distributed Systems (Core) | Location: San Francisco, CA, remote Company Stage of Funding: Series C - Software Development Office Type: Remote Salary: $170K-$260K...Work at officeRemote workVisa sponsorship$100k
...approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About the role: As a distributed systems engineer, you will build the data and coordination systems that enable ultra-long context...Remote jobRelocationVisa sponsorship
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Distributed LLM Inference Engineer. Be the first to apply!


