Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed Systems Engineer, Data & Inference Platform

Adaption

About Us

Most AI is frozen in place - it doesn't adapt to the world. We think that's backwards. Our mandate is to build efficient intelligence that evolves in real-time. Our vision is AI systems that are flexible, personalized, and accessible to everyone. We believe efficiency is what makes this possible - it's how we expand access and ensure innovation benefits the many, not the few. We believe in talent density: bringing together the best and most driven individuals to push the boundaries of continual adaptation. We're looking for builders and creative thinkers ready to shape the next era of intelligence.

The Role

You'll build and operate the systems that turn raw compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines that feed them. One week you're hunting a tail-latency regression in a production inference service handling millions of requests; the next you're redesigning a Ray Data pipeline so it stops melting down at petabyte scale. The work spans architecture, implementation, and the on-call pager that keeps you honest about both. Researchers and ML engineers will hand you workloads that barely run; you'll hand them back systems that run reliably, efficiently, and cheaply enough to matter.

Responsibilities
  • Serve Models at Scale: Design and operate distributed inference systems for LLMs, optimizing throughput, latency, and cost across heterogeneous GPU fleets. Batching, scheduling, KV cache management, autoscaling - you own the levers that make inference economical.
  • Move the Data: Build large-scale data pipelines (Ray Data, Spark, or equivalents) that ingest, transform, and curate the datasets behind training and evaluation. The bottleneck is rarely where people think it is, and you find it.
  • Debug the Undebuggable: Chase down the failure modes that only emerge under real production traffic - stragglers, head-of-line blocking, silent data corruption, GPU memory fragmentation - and write the postmortems that prevent the next ten. Define SLOs, build the observability to measure them, and own the on-call rotation that defends them.
  • Partner Across the Stack: Work directly with researchers and ML engineers to take experimental workloads from "runs on one node" to "runs in production." You're a systems partner, not a ticket queue.
Qualifications
  • 5+ years building and operating distributed systems in production.
  • Deep experience with at least one large-scale data or compute framework (Ray, Spark, Flink, Beam, Dask).
  • Strong fluency in Python and at least one systems language (Go, Rust, C++).
  • Working knowledge of the GPU/accelerator stack: CUDA fundamentals, NCCL, mixed precision, memory layout. You don't need to write kernels, but you should know why a workload is bound by what it's bound by.
  • Experience operating Kubernetes-based infrastructure, including custom operators or schedulers.
  • A track record of owning hard production incidents end-to-end - diagnosis, mitigation, and the durable fix.
  • Bonus: hands-on experience with LLM inference engines (vLLM, SGLang, TensorRT-LLM, TGI), modern lakehouse formats (Iceberg, Delta, Hudi), or open-source contributions to relevant projects.
Above all, we're looking for great teammates who make work feel lighter and aren't afraid to go out on a limb with bold ideas. You don't need to be perfect, but you do need to be adaptable. We encourage you to apply, even if you don't check every box.

Benefits
  • Flexible work : In-person collaboration in the Bay Area, a distributed global-first team, and team offsites.
  • Adaption Passport : Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.
  • Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.
  • Well-Being : Comprehensive medical benefits and generous paid time off.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Distributed Systems Engineer, Data & Inference Platform in San Francisco, CA vacancy
  • $200k - $300k

     ...tech startup in San Francisco seeks a Lead Software Engineer to build and optimize foundational backend systems for a massive AI video dataset. You will lead...  ...years in backend engineering, strong experience with distributed systems, and is proficient in Go, Python, or Node... 
    Platform

    Troveo AI

    San Francisco, CA
    2 days ago
  • A leading tech company based in San Francisco is seeking a Software Engineer to enhance its data and AI platform. The role involves developing high-performance distributed data systems and delivering on ambitious projects such as Delta Lake and performance engineering.... 
    Platform

    Databricks Inc.

    San Francisco, CA
    4 days ago
  • $295k

     ...capabilities with the constraints of physical systems to improve peoples' lives. About the Role As a Research Engineer, Distributed Data Systems, you will design and scale the...  ..., and security. Ensure our data platform can scale by orders of magnitude while... 
    Platform
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    2 days ago
  • $166k - $225k

     ...passionate about enabling data teams to solve the world...  ...and AI infrastructure platform so our customers can use...  ...their business. Founded by engineers — and customer obsessed...  ...the next generation distributed data storage and processing systems that can outperform specialized... 
    Platform
    Local area
    Worldwide

    Databricks Inc.

    San Francisco, CA
    3 days ago
  • $255k - $405k

     ...of broad societal benefit. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that...  ...scalability, reliability, and security. Ensure our data platform can scale by orders of magnitude while remaining reliable... 
    Platform
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    11 hours ago
  •  ...Francisco is looking for an individual to design and implement high-performance scheduling systems for AI inference processes. This role requires strong foundational knowledge in distributed systems and an eagerness to work closely with agent-based technologies. The ideal... 

    Sail Research

    San Francisco, CA
    11 hours ago
  • Voiceflow is seeking a Software Engineer (Distributed Systems) in San Francisco. As a founding engineer, you will focus on building a real-time database...  ...processing, and prefers working in-person. Join us in shaping the future of data replication! #J-18808-Ljbffr Voiceflow

    Voiceflow

    San Francisco, CA
    4 days ago
  • MLabs Ltd is seeking a talented engineer to design and implement core systems for a real-time distributed platform. Based in New York, the role demands expertise in Rust and extensive experience in building distributed systems. Candidates will have the opportunity for... 
    Platform
    Remote job

    MLabs Ltd

    San Francisco, CA
    3 days ago
  • $172k - $215k

     ...technology firm in San Francisco is seeking a Data Engineer to design and implement high-throughput data...  ...involves architecting a robust reporting platform that ensures reliability and scalability across distributed systems. Ideal candidates will have strong experience... 
    Platform

    Unity3d

    San Francisco, CA
    11 hours ago
  • $180k - $300k

     ...discovered, priced, and distributed in real time. The...  ...transparency, and efficiency to systems where value is...  ...distributed systems, including data ingestion, low-latency...  ...-stakes distributed platform. End-to-End Ownership...  ...while establishing engineering best practices and... 
    Platform
    Full time
    Remote work
    Flexible hours

    MLabs Ltd

    San Francisco, CA
    3 days ago
  • $142.6k - $261.5k

     ...The opportunity The Platforms Practice specializes in...  ...team of product leaders, data scientists, designers, and software engineers enable our clients to...  ...practices. Knowledgeable in system development lifecycle...  ...interest in cloud and distributed systems architectures... 
    Platform
    Summer holiday
    Flexible hours

    EY

    San Francisco, CA
    2 days ago
  •  ...DevOps Distributed Messaging System Engineer Location: San Francisco, CA Duration: 24 Months Required Skills: ~7+ years of experience...  ..., creating and modifying topics, monitoring of the platform and data governance. ~ Strong verbal and written... 
    Platform

    InterSources

    San Francisco, CA
    2 days ago
  •  ...As a Research Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at OpenAI. You’ll manage distributed data pipelines, collaborate closely with researchers to translate requirements... 

    OpenAI

    San Francisco, CA
    11 hours ago
  •  ...Role As a distributed systems engineer, you’ll work across the stack to solve problems as they come up...  ...by building the next, default storage platform in the cloud. Over the past 15...  ...become the default way to store inactive data sets in the cloud, but the next-... 
    Platform
    Full time
    Flexible hours

    Archil

    San Francisco, CA
    11 hours ago
  •  ...Tech Lead, Data & Inference Engineer Georgia, Georgia, United States About...  ...audiences across platforms such as Meta, Google and YouTube...  ...evolving world of intelligent systems. Location: San...  ...Comfortable in hybrid or distributed environments with strong ownership... 
    Platform
    Full time

    Catalyst Labs, LLC

    San Francisco, CA
    1 day ago
  • $255k - $405k

    Slope is seeking a Software Engineer for its team in San Francisco, CA. The role focuses...  .... Responsibilities include managing distributed data pipelines and collaborating closely with...  ...exhibit strong experience in distributed systems and possess excellent organizational... 

    Slope

    San Francisco, CA
    11 hours ago
  • $230k - $310k

     ...the role You'll own Gamma's data infrastructure and...  ...pipeline architecture, designing distributed systems that handle massive scale with...  ...'ll solve the hardest data engineering challenges we face while...  ...processing) and event streaming platforms ~ Extensive hands-on... 
    Platform
    Full time
    Work at office
    Work from home

    Gamma

    San Francisco, CA
    2 days ago
  • $130k - $170k

     ...Analytics Engineer — Data Warehouse San Francisco About...  ...building high-performance inference compute and the software platform around it. We're...  ...referential integrity, distribution drift, anomaly detection...  ...open and transparent AI systems will drive innovation and... 
    Platform
    Full time
    Internship

    Together AI

    San Francisco, CA
    11 hours ago
  •  ...Type On-site Department Engineering Our Mission...  ...company-wide foundations platform that accelerates every...  ..., and high-throughput data ingestion tooling enabling...  ...production environments. These systems form the foundational...  ...-tenant isolation. Distributed Systems Architecture:... 
    Platform
    Full time
    Relocation package

    B Capital

    San Francisco, CA
    11 hours ago
  •  ...Infrastructure Engineer ParadeDB is a transactional alternative...  ...: Eliminate ETL/change data capture tools Add transactional...  ..., with some team members distributed across the United States and...  ...Postgres to build the best Postgres platform. You're excited about... 
    Platform
    Full time
    Work at office

    ParadeDB

    San Francisco, CA
    11 hours ago
  • $293.6k - $335.1k

    COMFORT SYSTEMS is seeking a Distinguished Software Engineer to join our innovative team in San Francisco, CA. You will lead technical contributions and mentor...  ...in software engineering, particularly in distributed systems and cloud technologies. This role offers a... 
    Platform

    COMFORT SYSTEMS

    San Francisco, CA
    11 hours ago
  • B Capital is seeking a skilled software engineer in San Francisco to develop foundational AI systems. You will work on shared services and improve operational...  ...development, experience with APIs, and familiarity with distributed systems. This role offers top-tier compensation,... 
    Platform

    B Capital

    San Francisco, CA
    11 hours ago
  • B Capital in San Francisco is looking for a Senior/Lead/Principal Distributed Systems Software Engineer. The role involves designing and maintaining a distributed systems engineering platform for public cloud environments. Candidates should have over 3 years of backend... 
    Platform

    B Capital

    San Francisco, CA
    3 days ago
  •  ...Baseten powers mission‑critical inference for the world's most...  ...Join us and help build the platform engineers turn to to ship AI...  ...building the global operating system for distributed, heterogeneous AI hardware...  ...interconnects to ensure that data movement operates at wire‑... 
    Platform
    Flexible hours

    Baseten

    San Francisco, CA
    11 hours ago
  • $227.2k - $417k

     ...Software Engineer, ML Infra & Distributed Systems (Staff & Principal) San Francisco, CA; Los Angeles, CA;...  ...build world-class machine learning inference platforms. These platforms power essential services...  ...stack state of the art. Take a data driven approach to identifying &... 
    Platform
    Full time
    Temporary work
    Local area
    Remote work
    Flexible hours

    Tubi

    San Francisco, CA
    2 days ago
  • deCircle is seeking an engineer to design and implement core systems for its agentic AI platform. This role involves building production systems, ensuring reliable cloud...  ...has over 3 years of experience in backend or distributed systems engineering, strong skills in... 
    Platform

    deCircle

    San Francisco, CA
    4 days ago
  • $117.2k - $313.7k

     ...Category Software Engineering Job Details About...  ...components/frameworks in distributed filesystems in an ever...  ...of users of our cloud platform. Build efficient...  ...innovations that improve system scalability, robustness...  ...& Experience with Big-Data/ML and S3 Hands-on... 
    Platform
    Immediate start
    Remote work

    Salesforce

    San Francisco, CA
    3 days ago
  •  ...by the inefficiency of the data that feeds it. At scale, each...  ...modeling , and distributed systems to design self-optimizing data...  ...represented and used by AI. This engineering team partners closely with...  ...development of distributed compute platforms that scale predictively and... 
    Platform
    Flexible hours

    Granica

    San Francisco, CA
    4 days ago
  • $240k - $275k

     ...high-performance AI inference infrastructure and the software platform around it. We’re...  ...a senior Analytics Engineer who sits at the intersection of data engineering and business...  ...engineering systems and business stakeholders...  ...integrity, distribution drift, and business... 
    Platform
    Full time

    Together AI

    San Francisco, CA
    3 days ago
  •  ...leading cybersecurity company is seeking an experienced Infrastructure Engineer to optimize and maintain their platform components. This remote position involves solving complex distributed systems problems and scaling infrastructure using Go, Kubernetes, GCP, and AWS.... 
    Platform
    Remote work

    Palo Alto Networks

    San Francisco, CA
    15 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed Systems Engineer, Data & Inference Platform. Be the first to apply!