Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed LLM Inference Engineer - Scale AI at Speed

Anyscale

Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open-source technologies and contributing to community projects. Candidates should have a solid understanding of distributed systems and familiarity with deep learning frameworks, ideally with experience in PyTorch and Ray. Anyscale offers competitive compensation and extensive benefits, including healthcare coverage and stock options. #J-18808-Ljbffr Anyscale

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Distributed LLM Inference Engineer - Scale AI at Speed in San Francisco, CA vacancy
  •  ...About Us Most AI is frozen in place...  ...intelligence - the inference services that serve LLMs at scale and the data pipelines...  ...Researchers and ML engineers will hand you...  ...Design and operate distributed inference systems for...  ...on experience with LLM inference engines (... 
    Suggested
    Flexible hours

    Adaption

    San Francisco, CA
    1 day ago
  •  ...on a mission to democratize distributed computing and make it accessible...  ...accelerate the progress of AI applications out into the...  ...developer or data scientist can scale an ML application from their...  ...the role As a Distributed LLM Inference Engineer, you will help with systems... 
    Suggested
    Work at office

    Anyscale

    San Francisco, CA
    1 day ago
  •  ...mission‑critical inference for the world's most dynamic AI companies, like...  ...build the platform engineers turn to to ship...  ...system for distributed, heterogeneous AI...  ...believe that as LLM and multi‑modal workloads scale, the network is...  ...operates at wire‑speed. In this role,... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About...  ...the Role At Together.ai, we are building state-of...  ..., develop, and optimize distributed inference engines that support...  ...and language models at scale. This role will focus on... 
    Suggested
    Full time

    Together AI

    San Francisco, CA
    3 days ago
  •  ...company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing...  ...infrastructure for large-scale multimodal models, focusing on high-...  ...product teams to push the boundaries of AI technology, ensuring reliable production... 
    Suggested

    OpenAI

    San Francisco, CA
    5 days ago
  • A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development... 
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  •  ...looking for an individual to design and implement high-performance scheduling systems for AI inference processes. This role requires strong foundational knowledge in distributed systems and an eagerness to work closely with agent-based technologies. The ideal candidate... 

    Sail Research

    San Francisco, CA
    1 day ago
  • $197.3k - $225.1k

     ...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking for...  ...performance - scalability, cost, latency, throughput - of large scale production AI systems. Contribute to the technical... 
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Francisco, CA
    7 days ago
  • $146.5k

     ...team: The ML Data Engineering team powers metadata extraction...  ...operate at massive scale, supporting diverse...  ...data engineering, and distributed systems, collaborating...  ...deploy scalable ML and LLM-powered solutions in production...  ...-edge generative AI and metadata enrichment... 
    Local area
    Worldwide
    Home office
    Flexible hours

    Scribd

    San Francisco, CA
    1 day ago
  •  ...to enable enterprises to implement AI workloads effectively. The role involves designing large-scale deployment architectures, solving AI inference challenges, and collaborating closely...  ...Terraform, and a solid understanding of distributed systems. Benefits include... 
    Flexible hours

    FriendliAI

    San Francisco, CA
    1 day ago
  • $165k

     ...partner with top AI labs,...  ...compute at the speed of light. We’re...  ...About the Role Inference is now the defining...  ...customers run on it: LLM serving...  ...intersection of distributed systems, model...  ...supports them at scale. Profile and resolve...  ...software engineering experience with... 
    Local area

    Fluidstack

    San Francisco, CA
    2 days ago
  • $180k - $275k

     ...this role matters now AI has dramatically lowered...  ...producing abusive content at scale no longer requires...  .... This means designing distributed systems for real-time...  ...’ll collaborate across engineering, product, and design to...  ...activity Leverage AI/LLM-based detection to stay... 
    Full time
    Work at office
    Work from home

    Gamma

    San Francisco, CA
    2 days ago
  • $146.5k - $228k

     ...the team: The ML Data Engineering team powers metadata extraction...  ...operate at massive scale, supporting diverse...  ...data engineering, and distributed systems, collaborating...  ...deploy scalable ML and LLM-powered solutions in production...  ...-edge generative AI and metadata enrichment... 
    Temporary work
    Local area
    Worldwide
    Home office
    Flexible hours

    Scribd

    San Francisco, CA
    2 days ago
  • $142.2k - $204.6k

     ...Role As a software engineer for GenAI inference, you will help design...  ...language model (LLM) serving systems are...  ...optimized for large-scale LLMs inference Collaborate...  ...with federated, distributed inference infrastructure...  ...is the data and AI company. More than 10... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    7 hours ago
  •  ...Senior Software Engineer, LLM Performance SF Bay Area...  ...Parasail is redefining AI infrastructure by enabling...  ...deployment across a distributed network of GPUs,...  ...efficiently at enterprise scale while driving continuous...  .... Contributions to inference engines such as vLLM is... 

    Parasail

    San Francisco, CA
    3 days ago
  •  ...AI/ML Engineer (RL & Physical Systems) FLUIX is...  ...systems to power distribution, where milliseconds...  ...and real megawatt-scale infrastructure....  ...Support integration of LLM-based tools and...  ...knowledge distillation, inference orchestration, etc...  ...at startup speed. Bonus Points... 
    Weekend work

    Fluix AI

    San Francisco, CA
    3 days ago
  •  ...* **Move at Drata Speed (Precision & Velocity...  ...as both a central engineering function and an...  ...stack to help Drata scale reliably for a...  ...SLO tracking, and distributed tracing* Experience...  ...with AIOps - using AI/ML-based tooling for...  ...services (e.g., LLM inference latency, non-determinism... 
    Work at office
    Immediate start
    Worldwide
    Monday to Friday
    Flexible hours

    Careers at Drata

    San Francisco, CA
    5 days ago
  • $300k

     ...interpretable, and steerable AI systems. We want...  ...researchers, engineers, policy experts,...  ...the role Our Inference team is...  ...We tackle complex, distributed systems challenges...  ...performance, large-scale distributed systems...  ...management systems LLM inference optimization... 
    Work at office
    Worldwide
    Visa sponsorship
    Flexible hours

    anthropic

    San Francisco, CA
    3 days ago
  •  ...are hiring Software Engineers focused on AI Infrastructure to build...  ...at production scale. This role exists because...  ...orchestration, large-scale inference systems, performance...  ...pipelines Distributed GPU infrastructure...  ...and moves at lightning speed. You'll have the autonomy... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    7 hours ago
  •  ...our platform delivers AI inference. Validating whether inference...  ...for a dedicated QA engineer who can own the...  ...strategies that account for LLM inference. Work...  ...Strong experience testing distributed systems with multiple...  ...in an early-stage or scaling environment.... 
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    4 days ago
  •  ...—from edge devices to large-scale deployments. Our work spans...  ...scalable training, efficient inference, and real-world deployment....  ...seeking a Staff-level (or higher) AI/ML engineer to lead large-scale model...  ..., implement, and optimize distributed training systems for large-scale... 

    PrismML

    San Francisco, CA
    5 days ago
  • $325k

    A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate...  ...with ML architectures, and experience with distributed systems. This role involves collaboration with researchers... 

    OpenAI

    San Francisco, CA
    1 day ago
  •  ...Langfuse Open Source LLM Engineering Platform that helps teams build useful AI applications via...  ...mainly booth scans and swag distribution. It is not a pure...  ...ClickHouse for tracing at scale, S3 for file storage,...  ...develops at breakneck speed and our customers are at... 
    Part time
    Work at office
    Remote work

    Langfuse GmbH

    San Francisco, CA
    2 hours ago
  • $150k - $237.5k

     ...low-cost and large-scale energy storage and...  ...Senior Software Engineer, Energy Storage...  ...we build and the speed at which we build...  ...over time * Apply AI tools to accelerate...  ...scalable, and secure distributed systems *...  ...information, and inferences drawn from your PI... 
    Full time

    Redwood Materials

    San Francisco, CA
    7 hours ago
  •  ...Software Engineer Opportunity at Abridge Abridge's services and engineering...  ...by multiples. This is a distributed systems oriented role and is...  ..., will be under tremendous scale, and presents many opportunities...  ...and research as we pioneer new AI-first cloud-native-first... 

    Abridge

    San Francisco, CA
    7 hours ago
  • $180k - $310k

     ...the role You'll build and scale the application and data...  ...shipping velocity. As Software Engineer on the Platform team, you'll collaborate...  ...and implement scalable APIs, distributed systems, and data...  ...systems, event pipelines, or AI-powered applications (Nice to... 
    Full time
    Work at office
    Work from home

    Gamma

    San Francisco, CA
    3 days ago
  • $230k - $385k

     ...factors. We strive to seamlessly blend high-level AI capabilities with the constraints of physical systems...  ...' lives. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...Principal AI Engineer (LLM Agents & Orchestration) Job Title: Principal AI Engineer (...  ...GenAI companies in the world. We've scaled faster than most funded startups - with...  ...Latency & Reliability: Optimize inference pipelines for speed (streaming, token optimization) and... 

    Vyro

    San Francisco, CA
    2 days ago
  • $216k - $270k

     ...Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative...  ...knowledge retrieval, inference, evaluation, and more...  ...Senior Full-Stack Engineer to help us build, scale...  ...Python, working with distributed systems, data pipelines, and ML/LLM components.... 
    Full time

    Scale AI

    San Francisco, CA
    3 days ago
  • $166k - $225k

     ...running the world's best data and AI infrastructure platform so...  ...their business. Founded by engineers — and customer obsessed — we...  ...for interfacing with data to scaling our services and infrastructure...  ...building the next generation distributed data storage and processing... 
    Local area
    Worldwide

    Databricks Inc.

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed LLM Inference Engineer - Scale AI at Speed. Be the first to apply!