Distributed LLM Inference Engineer - Scale AI at Speed

Anyscale

Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open-source technologies and contributing to community projects. Candidates should have a solid understanding of distributed systems and familiarity with deep learning frameworks, ideally with experience in PyTorch and Ray. Anyscale offers competitive compensation and extensive benefits, including healthcare coverage and stock options. #J-18808-Ljbffr Anyscale

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Distributed LLM Inference Engineer - Scale AI at Speed in San Francisco, CA vacancy

System Engineering In
Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...
Suggested
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
15 hours ago
Distributed LLM Inference Engineer
...on a mission to democratize distributed computing and make it accessible... ...accelerate the progress of AI applications out into the... ...developer or data scientist can scale an ML application from their... ...the Role As a Distributed LLM Inference Engineer, you will help systems and...
Suggested
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
15 hours ago
Staff Engineer, AI Inference & Distributed Systems
...San Francisco is seeking a talented engineer to design and implement robust systems... ...that ensure fast and cost-efficient AI inference at global scale. You will be responsible for... ...candidate has a strong background in distributed systems and is eager to engage in complex...
Suggested
Sail Research
San Francisco, CA
4 days ago
Multimodal Inference Engineer — Scale GPU AI Models
...company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing... ...infrastructure for large-scale multimodal models, focusing on high-... ...product teams to push the boundaries of AI technology, ensuring reliable production...
Suggested
Jobleads-US
San Francisco, CA
4 days ago
LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...the Role At Together.ai, we are building... ...efficient and scalable inference for large language... ...and Optimization Engineer to design, develop, and optimize distributed inference engines that... ...language models at scale. This role will... ...shape the future of LLM inference infrastructure...
Suggested
Full time
Togetherai
San Francisco, CA
15 hours ago
GPU Systems Engineer - Scale AI Inference (On-site SF/LA)
Vast.ai Inc. is seeking a systems engineer with HPC or parallel programming experience to help scale AI inference. You will design and optimize GPU kernels and tensor libraries, leveraging CUDA/C++ and related frameworks to push the bleeding edge of AI performance. This...
Vast.ai Inc.
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
...* **Move at Drata Speed (Precision & Velocity... ...as both a central engineering function and an... ...stack to help Drata scale reliably for a... ...SLO tracking, and distributed tracing* Experience... ...with AIOps - using AI/ML-based tooling for... ...services (e.g., LLM inference latency, non-determinism...
Work at office
Immediate start
Worldwide
Monday to Friday
Flexible hours
Careers at Drata
San Francisco, CA
3 days ago
Research Scientist/Engineer - Post-training, Inference, & Safety and Security
Research Scientist/Engineer - Post-training, Inference, & Safety and Security Virtue AI Full-time Unknown About Virtue AI Virtue... ...Python, along with expertise in LLM libraries like PyTorch,... ...Experience in conducting large‑scale red‑teaming for LLMs and agents...
Full time
SupportFinity™
San Francisco, CA
2 days ago
Senior Model Inference Engineer for Production-Scale AI
$325k
A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate... ...with ML architectures, and experience with distributed systems. This role involves collaboration with researchers...
Jobleads-US
San Francisco, CA
4 days ago
Software Systems Engineering
$153k - $376k
...designs into code, or iterating with AI. From idea to product, Figma... ...everything we build. As a Software Engineer on our Infrastructure team, you’... ...millions of people worldwide. We’re scaling fast, and we’re looking for experienced distributed systems engineers across a...
Full time
Remote work
Work from home
Worldwide
Figma
San Francisco, CA
2 days ago
Founding Voice AI Engineer: Scale LLM Calls
Health Harbor, located in San Francisco, is seeking experienced engineers to build and scale their Voice AI LLM and orchestration system. The role demands strong problem-solving skills and the ability to work under high pressure, with a commitment of about 70 hours a week...
Flexible hours
Health Harbor
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
$166.9k - $225.9k
...operates as both a central engineering function and an... ...native stack to help Drata scale reliably for a rapidly... ..., SLO tracking, and distributed tracing. Experience... ...Experience with AIOps—using AI/ML‑based tooling for... ...services (e.g., LLM inference latency, non‑determinism...
Flexible hours
Drata
San Francisco, CA
3 days ago
Software Engineer, Infrastructure
...experiences with AI. We are primarily... ...As a Software Engineer, Infrastructure at... ...teams to deliver with speed and confidence.... ...our platform and LLM inference serving as we rapidly... ...Architect and operate distributed systems that... ...SaaS services at scale, and agentic architecture...
Full time
Flexible hours
Sierra
San Francisco, CA
12 days ago
Senior Software Engineer, Full-Stack - Scale GP
$216.2k - $270.25k
Senior Software Engineer, Full-Stack - Scale GP Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative... ...for knowledge retrieval, inference, evaluation, and more. We... ...in Python, working with distributed systems, data pipelines, and ML/LLM components. Integrate...
Full time
Dormont Manufacturing Company
San Francisco, CA
2 days ago
Member of Technical Staff (AI Inference Engineer)
$220k
We build and run the inference engine behind every Perplexity query and deploy... ...of model architectures at scale with tight latency and cost... .... You understand modern LLM architectures and are able to... ...built and operated production distributed systems under real load -...
Perplexity
San Francisco, CA
4 days ago
Senior Real-Time AI Distributed Systems Engineer
Acceler8 Talent is looking for a Senior Distributed Systems Engineer with over 7 years of experience in... ...Francisco focuses on building systems for AI-powered clinical environments,... ...TypeScript, combined with experience in scaling systems and working with relational databases...
Acceler8 Talent
San Francisco, CA
2 days ago
Software Engineer, Applied AI
...Shepherd is an AI-native commercial... ...built for this speed: Complex... ...a Software Engineer focused on Applied... ...build, and deploy LLM-powered... ...model serving, inference, evaluation, and... ...scalable APIs and distributed systems Hands... ...AI workloads at scale A product...
Work experience placement
Work at office
Shepherd Labs Inc.
San Francisco, CA
1 day ago
Distributed Web Crawling Engineer for Large-Scale AI Data
Reflection in San Francisco is seeking an engineer to build and operate web-scale systems for data collection. This role involves working closely with AI researchers to optimize crawling... ...will have a strong background in distributed systems and web data processing. Competitive...
Reflection
San Francisco, CA
3 days ago
Senior Software Engineer, Full-Stack - Scale GP
$216k - $270k
Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative... ...knowledge retrieval, inference, evaluation, and... ...strong Senior Full-Stack Engineer to help us build, scale... ...Python, working with distributed systems, data pipelines, and ML/LLM components. *...
Full time
Scale AI
San Francisco, CA
15 hours ago
LLM Inference & Optimization Engineer
Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed inference engines that support multimodal models, optimizing frameworks for...
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
1 day ago
Staff GenAI Inference Engineer: Optimize LLM Serving Latency
$190.9k - $232.8k
A leading data and AI company is seeking a Staff Software Engineer for GenAI inference to lead the architecture and optimization of the inference engine. The role... ...requires expertise in CUDA, GPU programming, and distributed systems design. Ideal candidates will have a...
Menlo Ventures
San Francisco, CA
1 day ago
Developer Relations Engineer
...infrastructure foundation for AI teams. With instant... ...and serve low-latency inference. Companies like Suno,... ..., including Lovable, Scale AI, Substack, and Suno... ..., and experienced engineering and product leaders with... ...primarily create and distribute technical content that...
Work at office
Modal
San Francisco, CA
4 days ago
API LLM Integration / ReactJS Engineer - Hybrid
$87.95k
...now. We are currently seeking a API LLM Integration / ReactJS Engineer - Hybrid to join our team in Santa... ...Integration Engineer will work with our AI team, software engineers, and... ...unmatched capabilities in enterprise-scale AI, cloud, security, connectivity, data...
Full time
Temporary work
Work experience placement
Work at office
Remote work
Flexible hours
NTT DATA Services
Brisbane, CA
1 day ago
GPU Systems Engineer — HPC & AI Inference (On-site)
Vast.ai is seeking a systems engineer to scale AI inference and optimize GPU performance at our San Francisco or Los Angeles offices. You will leverage your HPC background to push the bleeding edge of AI, working with CUDA/C++ and a modern tech stack. Ideal candidates...
Full time
Vast.ai
San Francisco, CA
2 days ago
Staff Product Security Engineer
$250k - $285k
...vertically integrated AI infrastructure... ...who believe in the scale of our ambition... ...Product Security Engineer with deep AI/ML security... ..., and distributed AI systems. This is... ...to‑end, including LLM pipelines, vector... ...including MLOps, inference architectures, vector...
Temporary work
Crusoe
San Francisco, CA
5 days ago
Founding Data Engineer
...with is building AI systems to... ...to add a Data Engineer to help build out... ...—designing and scaling pipelines, building... ...exposure to LLM-driven workflows... ...processing and inference pipelines Implement... ...that improve speed and efficiency... ...with distributed systems or large...
Glocomms
San Francisco, CA
2 days ago
Staff Data Quality Engineer, LLM Post-Training
B Capital is seeking a data engineer to ensure high data quality for training AI models. You will own the upstream data quality for LLM post-training and design automated QA methods in a collaborative environment. Ideal candidates will have strong engineering skills, a...
B Capital
San Francisco, CA
3 days ago
Senior AI Software Engineer
$149k - $240k
...IQ is HP’s new AI innovation lab.... ...with HP’s global scale, we’re building... ...‑class team—engineers, designers, researchers... ...real‑time AI inference and processing.... ...practices for distributed AI systems.... ...Proficient in LLM integration into... ...Work with the speed and focus of a...
Full time
Temporary work
Local area
Flexible hours
SupportFinity™
San Francisco, CA
2 days ago
Sr. Staff Production Engineer - Data Platform
$228.6k - $314.25k
...world's best data and AI infrastructure... ...business. Founded by engineers — and customer obsessed... ...with data to scaling our services and infrastructure... ...or SRE in highly distributed, multi-cloud... .... Familiarity with LLM infrastructure, training/inference pipelines, or...
Full time
Worldwide
Databricks
San Francisco, CA
1 day ago
Senior Security Operations Engineer
$192k - $240k
...Security Operations Engineer Brex is the intelligent finance... ...spend effortlessly. Brex's AI-native automation and world-... ...about building systems that scale with speed and intention. Our teams span... ...Experience with securing distributed systems in AWS, cloud and Kubernetes...
Work at office
Remote work
Work from home
Colorwave Inc
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed LLM Inference Engineer - Scale AI at Speed. Be the first to apply!