Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed LLM Inference Engineer - Scale AI at Speed

Anyscale

Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open-source technologies and contributing to community projects. Candidates should have a solid understanding of distributed systems and familiarity with deep learning frameworks, ideally with experience in PyTorch and Ray. Anyscale offers competitive compensation and extensive benefits, including healthcare coverage and stock options. #J-18808-Ljbffr Anyscale

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Distributed LLM Inference Engineer - Scale AI at Speed in San Francisco, CA vacancy
  • Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients... 
    Suggested

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    3 days ago
  •  ...About Us Most AI is frozen in place...  ...intelligence - the inference services that serve LLMs at scale and the data pipelines...  ...Researchers and ML engineers will hand you...  ...Design and operate distributed inference systems for...  ...on experience with LLM inference engines (... 
    Suggested
    Flexible hours

    Adaption

    San Francisco, CA
    22 days ago
  •  ...on a mission to democratize distributed computing and make it accessible...  ...accelerate the progress of AI applications out into the...  ...developer or data scientist can scale an ML application from their...  ...the role As a Distributed LLM Inference Engineer, you will help with systems... 
    Suggested
    Work at office

    Anyscale

    San Francisco, CA
    2 days ago
  •  ...mission‑critical inference for the world's most dynamic AI companies, like...  ...build the platform engineers turn to to ship...  ...system for distributed, heterogeneous AI...  ...believe that as LLM and multi‑modal workloads scale, the network is...  ...operates at wire‑speed. In this role,... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    2 days ago
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About...  ...the Role At Together.ai, we are building state-of...  ..., develop, and optimize distributed inference engines that support...  ...and language models at scale. This role will focus on... 
    Suggested
    Full time

    Together AI

    San Francisco, CA
    24 days ago
  • A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development... 
    Flexible hours

    Baseten

    San Francisco, CA
    2 days ago
  • $167.2k - $209k

     ...DigitalOcean is expanding its AI Infrastructure layer...  ...seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In...  ...and delivering high-scale, resilient data...  ...the intersection of distributed systems and...  ...frameworks such as llm‑d, NVIDIA Dynamo, or... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    6 days ago
  • $146.5k

     ...team: The ML Data Engineering team powers metadata extraction...  ...operate at massive scale, supporting diverse...  ...data engineering, and distributed systems, collaborating...  ...deploy scalable ML and LLM-powered solutions in production...  ...-edge generative AI and metadata enrichment... 
    For contractors
    Local area
    Worldwide
    Home office
    Flexible hours

    Scribd

    San Francisco, CA
    2 days ago
  •  ...AI Systems Engineer Transluce is a fast-moving research...  ...systems that can scale to thousands of...  ...: Inference stacks that are as...  ...Behavior elicitation: Distributed RL training and roll...  ...internal tools to speed up the team Help...  ...Bonus: can set up LLM pipelines, e.g. multiple... 
    Flexible hours

    Transluce

    San Francisco, CA
    4 days ago
  •  ...* **Move at Drata Speed (Precision & Velocity...  ...as both a central engineering function and an...  ...stack to help Drata scale reliably for a...  ...SLO tracking, and distributed tracing* Experience...  ...with AIOps - using AI/ML-based tooling for...  ...services (e.g., LLM inference latency, non-determinism... 
    Work at office
    Immediate start
    Worldwide
    Monday to Friday
    Flexible hours

    Careers at Drata

    San Francisco, CA
    1 day ago
  • $142.2k - $204.6k

     ...Role As a software engineer for GenAI inference, you will help design...  ...language model (LLM) serving systems are...  ...optimized for large-scale LLMs inference Collaborate...  ...with federated, distributed inference infrastructure...  ...is the data and AI company. More than 10... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    1 day ago
  •  ...AI/ML Engineer (RL & Physical Systems) FLUIX is...  ...systems to power distribution, where milliseconds...  ...and real megawatt-scale infrastructure....  ...Support integration of LLM-based tools and...  ...knowledge distillation, inference orchestration, etc...  ...at startup speed. Bonus Points... 
    Weekend work

    Fluix AI

    San Francisco, CA
    4 days ago
  •  ...are hiring Software Engineers focused on AI Infrastructure to build...  ...at production scale. This role exists because...  ...orchestration, large-scale inference systems, performance...  ...pipelines Distributed GPU infrastructure...  ...and moves at lightning speed. You'll have the autonomy... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    2 days ago
  •  ...Staff+ Software Engineer, Inference Runtime Remote-Friendly...  ...interpretable, and steerable AI systems. We want AI to...  ...customers with the speed, reliability, and...  ...efforts spanning serving, scaling, and accelerator teams...  ..., large-scale distributed systems serving millions... 
    Work at office
    Remote work
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    7 days ago
  •  ...our platform delivers AI inference. Validating whether inference...  ...for a dedicated QA engineer who can own the...  ...strategies that account for LLM inference. Work...  ...Strong experience testing distributed systems with multiple...  ...in an early-stage or scaling environment.... 
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    12 hours ago
  •  ...Langfuse Open Source LLM Engineering Platform that helps teams build useful AI applications via...  ...mainly booth scans and swag distribution. It is not a pure...  ...ClickHouse for tracing at scale, S3 for file storage,...  ...develops at breakneck speed and our customers are at... 
    Part time
    Work at office
    Remote work

    Langfuse GmbH

    San Francisco, CA
    1 day ago
  • $180k - $310k

     ...the role You'll build and scale the application and data...  ...shipping velocity. As Software Engineer on the Platform team, you'll collaborate...  ...and implement scalable APIs, distributed systems, and data...  ...systems, event pipelines, or AI-powered applications (Nice to... 
    Full time
    Work at office
    Work from home

    Gamma

    San Francisco, CA
    5 days ago
  • Health Harbor, located in San Francisco, is seeking experienced engineers to build and scale their Voice AI LLM and orchestration system. The role demands strong problem-solving skills and the ability to work under high pressure, with a commitment of about 70 hours a week... 
    Flexible hours

    Health Harbor

    San Francisco, CA
    12 hours ago
  • $230k - $385k

     ...factors. We strive to seamlessly blend high-level AI capabilities with the constraints of physical systems...  ...' lives. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    4 days ago
  • $150k - $237.5k

     ...low-cost and large-scale energy storage and...  ...Senior Software Engineer, Energy Storage...  ...we build and the speed at which we build...  ...over time * Apply AI tools to accelerate...  ...scalable, and secure distributed systems *...  ...information, and inferences drawn from your PI... 
    Full time

    Redwood Materials

    San Francisco, CA
    1 day ago
  • $139.2k - $174k

     ...DigitalOcean is expanding its AI Infrastructure...  ...seeking a Senior Engineer 2 to play a key...  ...running AI workloads— inference, training, fine‑tuning— at scale. In this high‑...  ...between high‑scale distributed systems and specialized...  ...orchestration for LLM inference and... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    3 days ago
  • $320k

     ...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA...  ...interpretable, and steerable AI systems. We want AI...  ...Cloud Inference team scales and optimizes Claude...  ..., large-scale distributed systems serving millions...  ...Are curious about LLM serving; prior inference... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    3 days ago
  •  ...Principal AI Engineer (LLM Agents & Orchestration) Job Title: Principal AI Engineer (...  ...GenAI companies in the world. We've scaled faster than most funded startups - with...  ...Latency & Reliability: Optimize inference pipelines for speed (streaming, token optimization) and... 

    Vyro

    San Francisco, CA
    3 days ago
  • $216k - $270k

     ...Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative...  ...knowledge retrieval, inference, evaluation, and more...  ...Senior Full-Stack Engineer to help us build, scale...  ...Python, working with distributed systems, data pipelines, and ML/LLM components.... 
    Full time

    Scale AI

    San Francisco, CA
    16 days ago
  • $255k - $405k

     ...on integrating multimodal functionalities into our AI products, ensuring they are reliable, user‑friendly...  ...broad societal benefit. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large‑scale... 
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    2 days ago
  •  ...Technical Lead for Inference & ML Performance...  ...next generation of AI products. We build...  ...production, and do it at scale without compromise...  ...fal's inference engine and ensure our...  ..., ML compilers, distributed inference) to build...  ...enhancing inference speed and scalability.... 

    Fal

    San Francisco, CA
    1 day ago
  • $192k - $260k

     ...Databricks Databricks is the data and AI company. More than 10,000...  ...: MS or PhD in databases, distributed systems. Comfortable working...  ...system that combines the scale and cost-efficiency of data lakes...  ...of real-world data engineering architecture. Delta Pipelines... 
    Worldwide

    Cacheflow

    San Francisco, CA
    2 days ago
  • $250k - $380k

     ...Full time Department Scaling Compensation $250K -...  ...and running OpenAI’s LLM training and inference infrastructure that powers...  ...are looking for an engineer to design and...  ...performance bottlenecks in distributed dataset loading (e.g....  ...OpenAI OpenAI is an AI research and deployment... 
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    12 hours ago
  •  ...Inference Engine Engineer We build and run the inference engine behind...  ...dozens of model architectures at scale with tight latency and cost...  ...You understand modern LLM architectures and are able to...  ...built and operated production distributed systems under real load -... 

    Perplexity AI

    San Francisco, CA
    4 days ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San Francisco...  ...the Role Together AI is building the Inference...  ...and speech models at scale. If you get a thrill...  ...reduce model compute and speed up responses....  ...scale, fault-tolerant, distributed systems and API microservices... 
    Full time
    Local area

    Together AI

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed LLM Inference Engineer - Scale AI at Speed. Be the first to apply!