Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed Systems Engineer, Data & Inference Platform

Adaption

About Us

Most AI is frozen in place - it doesn't adapt to the world. We think that's backwards. Our mandate is to build efficient intelligence that evolves in real-time. Our vision is AI systems that are flexible, personalized, and accessible to everyone. We believe efficiency is what makes this possible - it's how we expand access and ensure innovation benefits the many, not the few. We believe in talent density: bringing together the best and most driven individuals to push the boundaries of continual adaptation. We're looking for builders and creative thinkers ready to shape the next era of intelligence.

The Role

You'll build and operate the systems that turn raw compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines that feed them. One week you're hunting a tail-latency regression in a production inference service handling millions of requests; the next you're redesigning a Ray Data pipeline so it stops melting down at petabyte scale. The work spans architecture, implementation, and the on-call pager that keeps you honest about both. Researchers and ML engineers will hand you workloads that barely run; you'll hand them back systems that run reliably, efficiently, and cheaply enough to matter.

Responsibilities
  • Serve Models at Scale: Design and operate distributed inference systems for LLMs, optimizing throughput, latency, and cost across heterogeneous GPU fleets. Batching, scheduling, KV cache management, autoscaling - you own the levers that make inference economical.
  • Move the Data: Build large-scale data pipelines (Ray Data, Spark, or equivalents) that ingest, transform, and curate the datasets behind training and evaluation. The bottleneck is rarely where people think it is, and you find it.
  • Debug the Undebuggable: Chase down the failure modes that only emerge under real production traffic - stragglers, head-of-line blocking, silent data corruption, GPU memory fragmentation - and write the postmortems that prevent the next ten. Define SLOs, build the observability to measure them, and own the on-call rotation that defends them.
  • Partner Across the Stack: Work directly with researchers and ML engineers to take experimental workloads from "runs on one node" to "runs in production." You're a systems partner, not a ticket queue.
Qualifications
  • 5+ years building and operating distributed systems in production.
  • Deep experience with at least one large-scale data or compute framework (Ray, Spark, Flink, Beam, Dask).
  • Strong fluency in Python and at least one systems language (Go, Rust, C++).
  • Working knowledge of the GPU/accelerator stack: CUDA fundamentals, NCCL, mixed precision, memory layout. You don't need to write kernels, but you should know why a workload is bound by what it's bound by.
  • Experience operating Kubernetes-based infrastructure, including custom operators or schedulers.
  • A track record of owning hard production incidents end-to-end - diagnosis, mitigation, and the durable fix.
  • Bonus: hands-on experience with LLM inference engines (vLLM, SGLang, TensorRT-LLM, TGI), modern lakehouse formats (Iceberg, Delta, Hudi), or open-source contributions to relevant projects.
Above all, we're looking for great teammates who make work feel lighter and aren't afraid to go out on a limb with bold ideas. You don't need to be perfect, but you do need to be adaptable. We encourage you to apply, even if you don't check every box.

Benefits
  • Flexible work : In-person collaboration in the Bay Area, a distributed global-first team, and team offsites.
  • Adaption Passport : Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.
  • Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.
  • Well-Being : Comprehensive medical benefits and generous paid time off.
Vacancy posted 21 days ago
Similar jobs that could be interesting for youBased on the Distributed Systems Engineer, Data & Inference Platform in San Francisco, CA vacancy
  • $200k - $300k

     ...tech startup in San Francisco seeks a Lead Software Engineer to build and optimize foundational backend systems for a massive AI video dataset. You will lead...  ...years in backend engineering, strong experience with distributed systems, and is proficient in Go, Python, or Node... 
    Platform

    Troveo AI

    San Francisco, CA
    3 days ago
  • A leading tech company based in San Francisco is seeking a Software Engineer to enhance its data and AI platform. The role involves developing high-performance distributed data systems and delivering on ambitious projects such as Delta Lake and performance engineering.... 
    Platform

    Databricks Inc.

    San Francisco, CA
    4 days ago
  • $295k

     ...capabilities with the constraints of physical systems to improve peoples' lives. About the Role As a Research Engineer, Distributed Data Systems, you will design and scale the...  ..., and security. Ensure our data platform can scale by orders of magnitude while... 
    Platform
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    3 days ago
  • $139.2k - $174k

     ...are seeking a Senior Engineer 2 to play a key role...  ...running AI workloads— inference, training, fine‑tuning...  ...between high‑scale distributed systems and specialized AI inference...  ...to ensure our global platform remains simple,...  ...position is based on market data, relevant years of... 
    Platform
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    2 days ago
  • $192k - $260k

     ...Databricks Databricks is the data and AI company. More than 10...  ...Data Intelligence Platform to unify and democratize data...  ...Optional: MS or PhD in databases, distributed systems. Comfortable working towards...  ...complexity of real-world data engineering architecture. Delta Pipelines... 
    Platform
    Worldwide

    Cacheflow

    San Francisco, CA
    1 day ago
  • $255k - $405k

     ...of broad societal benefit. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that...  ...scalability, reliability, and security. Ensure our data platform can scale by orders of magnitude while remaining reliable... 
    Platform
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    1 day ago
  • Voiceflow is seeking a Software Engineer (Distributed Systems) in San Francisco. As a founding engineer, you will focus on building a real-time database...  ...processing, and prefers working in-person. Join us in shaping the future of data replication! #J-18808-Ljbffr Voiceflow

    Voiceflow

    San Francisco, CA
    9 hours ago
  •  ...Distributed Systems Engineer As a distributed systems engineer, you'll work across the stack to solve...  ...building the next, default storage platform in the cloud. Over the past 15 years...  ...the default way to store inactive data sets in the cloud, but the next-generation... 
    Platform
    Flexible hours

    Archil

    San Francisco, CA
    1 day ago
  • MLabs Ltd is seeking a talented engineer to design and implement core systems for a real-time distributed platform. Based in New York, the role demands expertise in Rust and extensive experience in building distributed systems. Candidates will have the opportunity for... 
    Platform
    Remote job

    MLabs Ltd

    San Francisco, CA
    4 days ago
  •  ...Tensorlake is to unlock your data wherever it is. We...  ...action. We're looking for engineers who want to build the operating system for AI Data Applications...  ...looking for experienced distributed systems engineers to...  ...primarily in DevOps, SRE, or platform operations (Terraform,... 
    Platform

    Tensorlake, Inc.

    San Francisco, CA
    5 days ago
  • $350k

     ...and steerable AI systems. We want AI to be...  ...committed researchers, engineers, policy experts,...  ...Anthropic's inference fleet serves Claude...  ...'s largest cloud platforms. The stack that...  ..., model servers, distributed routing, autoscaling...  ...t write Solid data analysis skills (... 
    Platform
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    4 days ago
  •  ...definitive tools catalog and tool-calling platform that will unlock AI's true potential....  ...authentication, integrations, distributed systems, and AI experts from Okta, Redis, Microsoft...  ...desire to ship. ~7+ years of software engineering experience comprising of: ~5+ years... 
    Platform
    Work at office
    Shift work

    Arcade AI, Inc

    San Francisco, CA
    2 days ago
  • $180k - $300k

     ...discovered, priced, and distributed in real time. The...  ...transparency, and efficiency to systems where value is...  ...distributed systems, including data ingestion, low-latency...  ...-stakes distributed platform. End-to-End Ownership...  ...while establishing engineering best practices and... 
    Platform
    Full time
    Remote work
    Flexible hours

    MLabs Ltd

    San Francisco, CA
    4 days ago
  • $142.6k - $261.5k

     ...The opportunity The Platforms Practice specializes in...  ...team of product leaders, data scientists, designers, and software engineers enable our clients to...  ...practices. Knowledgeable in system development lifecycle...  ...interest in cloud and distributed systems architectures... 
    Platform
    Summer holiday
    Flexible hours

    EY

    San Francisco, CA
    3 days ago
  • Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses...  ...teams, integrating Ray Data and LLM engines, while keeping...  ...and knowledge of distributed systems is crucial. #J-18808-Ljbffr Gravity... 

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    2 days ago
  •  ...As a Research Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at OpenAI. You’ll manage distributed data pipelines, collaborate closely with researchers to translate requirements... 

    OpenAI

    San Francisco, CA
    1 day ago
  •  ...Tech Lead, Data & Inference Engineer Massachusetts, Massachusetts, United...  ...targetable audiences across platforms such as Meta, Google and YouTube...  ...world of intelligent systems. Location: San...  ...Comfortable in hybrid or distributed environments with strong ownership... 
    Platform
    Full time

    Catalyst Labs, LLC

    San Francisco, CA
    2 days ago
  •  ...B Capital is seeking a skilled software engineer in San Francisco to develop foundational AI systems. You will work on shared services and improve operational...  ...development, experience with APIs, and familiarity with distributed systems. This role offers top-tier compensation,... 
    Platform

    B Capital

    San Francisco, CA
    1 day ago
  • $255k - $405k

    Slope is seeking a Software Engineer for its team in San Francisco, CA. The role focuses...  .... Responsibilities include managing distributed data pipelines and collaborating closely with...  ...exhibit strong experience in distributed systems and possess excellent organizational... 

    Slope

    San Francisco, CA
    1 day ago
  •  ...company-wide foundations platform that accelerates every...  ..., and high-throughput data ingestion tooling...  ...production environments. These systems form the foundational...  ...-tenant isolation. Distributed Systems Architecture:...  ..., service reliability engineering. About You... 
    Platform
    Relocation package

    Reflection AI

    San Francisco, CA
    3 days ago
  • $230k - $310k

     ...the role You'll own Gamma's data infrastructure and...  ...pipeline architecture, designing distributed systems that handle massive scale with...  ...'ll solve the hardest data engineering challenges we face while...  ...processing) and event streaming platforms ~ Extensive hands-on... 
    Platform
    Full time
    Work at office
    Work from home

    Gamma

    San Francisco, CA
    3 days ago
  • $130k - $170k

     ...Analytics Engineer — Data Warehouse San Francisco About...  ...building high-performance inference compute and the software platform around it. We're...  ...referential integrity, distribution drift, anomaly detection...  ...open and transparent AI systems will drive innovation and... 
    Platform
    Full time
    Internship

    Together AI

    San Francisco, CA
    1 day ago
  •  ...by the inefficiency of the data that feeds it. At scale, each...  ...modeling , and distributed systems to design self-optimizing data...  ...represented and used by AI. This engineering team partners closely with...  ...development of distributed compute platforms that scale predictively and... 
    Platform
    Flexible hours

    Granica

    San Francisco, CA
    5 days ago
  •  ...ParadeDB Cloud Engineer ParadeDB is a Postgres-native...  ...eliminate ETL/change data capture tools, add...  ...environment. We're primarily distributed across the United...  ...looking for a distributed systems engineer to join our...  ...our managed database platform built on Kubernetes... 
    Platform
    Full time
    Work at office

    ParadeDB

    San Francisco, CA
    1 day ago
  • B Capital in San Francisco is looking for a Senior/Lead/Principal Distributed Systems Software Engineer. The role involves designing and maintaining a distributed systems engineering platform for public cloud environments. Candidates should have over 3 years of backend... 
    Platform

    B Capital

    San Francisco, CA
    4 days ago
  •  ...Product Infrastructure Engineer Truewind is...  ...across ERP and financial systems. To make this reliable...  ...engineer who can build the data foundation and...  ...the middle of a major platform transition. We are...  ...data infrastructure, or distributed systems ~ Strong experience... 
    Platform

    Truewind

    San Francisco, CA
    4 days ago
  •  ...Baseten powers mission‑critical inference for the world's most...  ...Join us and help build the platform engineers turn to to ship AI...  ...building the global operating system for distributed, heterogeneous AI hardware...  ...interconnects to ensure that data movement operates at wire‑... 
    Platform
    Flexible hours

    Baseten

    San Francisco, CA
    6 days ago
  • deCircle is seeking an engineer to design and implement core systems for its agentic AI platform. This role involves building production systems, ensuring reliable cloud...  ...has over 3 years of experience in backend or distributed systems engineering, strong skills in... 
    Platform

    deCircle

    San Francisco, CA
    5 days ago
  •  ...Job Description Job Description About the Role Join a startup building an agentic data lakehouse platform. As a Senior Software Engineer, Distributed Data Systems, you'll work on a greenfield project to build scalable data infrastructure that transforms enterprise... 
    Platform

    Clera

    San Francisco, CA
    14 days ago
  • Engineering Manager — Foundational Data Systems for AI Location: Downtown Mountain View, CA (office-based, 5 days/...  ...depends on. You’ll lead a globally distributed team of ~15-20 senior engineers...  ...compute, storage, or data platforms Experience building or operating... 
    Platform
    Work at office
    Flexible hours

    Granica

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed Systems Engineer, Data & Inference Platform. Be the first to apply!