Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

ML Platform & Infrastructure Engineer

AGI

What You’ll Do Training Automation: Design and implement robust CI/CD pipelines for machine learning workflows. Automate nightly and on-demand training runs, including data ingestion, job orchestration, checkpointing, and artifact management, with reliability as a first-class requirement. Evaluation Infrastructure: Build scalable evaluation harnesses that automatically benchmark models on every merge. Optimize latency and resource usage so experimentation stays fast, and performance regressions are caught immediately. Research Tooling: Develop internal SDKs, CLIs, and lightweight UIs (e.g., Streamlit, Retool) that empower researchers to: Inspect trajectories and traces Visualize model failures Curate and manage datasets Iterate without friction You’ll make experimentation ergonomic. Observability & Performance: Implement comprehensive tracking for: Model latency, throughput, and error rates GPU utilization and cluster health Inference cost and unit economics Build dashboards and alerting systems that give real-time visibility into system performance and reliability. Minimum Qualifications Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience 3+ years in Software Engineering, MLOps, or ML Infrastructure Strong Python proficiency Experience building internal developer tools, CLIs, or dashboards Experience with cloud infrastructure (AWS or GCP) and containerization (Docker, Kubernetes) Preferred Qualifications Experience designing CI/CD pipelines specifically for ML workflows Familiarity with LLM serving stacks such as vLLM or TGI Experience managing GPU clusters and optimizing distributed workloads Why This Role Matters Great research without great infrastructure slows to a crawl. Great infrastructure multiplies the impact of every researcher. You will define how experiments scale, how reliability is measured, and how quickly we can ship improvements to real users. The systems you build will directly shape the speed and quality of our progress toward everyday AGI. Our Culture All in, in person — work moves faster face-to-face Ship by default — novel and polished can coexist, speed is the feature One band, one sound — radical candor, zero politics, help each other win Perks Competitive company-sponsored medical, dental, and vision insurance Top-tier relocation and immigration support #J-18808-Ljbffr

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the ML Platform & Infrastructure Engineer in San Francisco, CA vacancy
  •  ...Job Title Senior Platform and Infrastructure Engineer Company Description Lux Capital and General Catalyst backed AI startup Job Description As our first...  ...modern infrastructure tools and a deep interest in AI/ML infrastructure, Vercel deployments, and developer experience... 
    Suggested
    Live in

    Jack & Jill/External ATS

    San Francisco, CA
    2 days ago
  •  ...institutions in the world. Our platform is deployed in highly secure, high...  ..., performance, and strong engineering fundamentals are essential. As an Infrastructure Engineer at Brain Co., you will...  ...work closely with engineering, AI/ML, and product to design scalable... 
    Suggested

    BRAIN CORP

    San Francisco, CA
    3 days ago
  • A leading livestream shopping platform is looking for an AI/ML Platform Engineer to shape the future of AI and ML systems. This role involves designing the infrastructure that powers machine learning applications, working alongside experts to deploy models at scale. Candidates... 
    Suggested
    Remote work
    Flexible hours

    Whatnot

    San Francisco, CA
    3 days ago
  • $212k - $318.4k

     ...leading technology company in San Francisco is seeking a Software Engineer to join its Applied Machine Learning team. This role focuses on designing and building a robust ML platform and infrastructure to support enterprise-level initiatives. Candidates should have at... 
    Suggested

    Apple

    San Francisco, CA
    3 days ago
  •  ...Full time Location Type On-site Department Engineering Think Different. Build the Future. Our...  ...a first‑class requirement. Evaluation Infrastructure: Build scalable evaluation harnesses that...  ...years in Software Engineering, MLOps, or ML Infrastructure Strong Python proficiency... 
    Suggested
    Full time
    Work at office
    Immediate start
    Relocation package
    Night shift

    Pantera Capital

    San Francisco, CA
    2 days ago
  • $200k

     ...A global hedge fund is seeking an AI Platform Engineer to lead the development and management of cutting-edge AI infrastructure. The successful candidate will design and manage federated...  ...development experience and expertise in AI/ML infrastructure, cloud platforms, and... 

    Xcede

    San Francisco, CA
    2 days ago
  •  ...Requirements ~ Strong Software Engineering Foundation: Demonstrate...  ...code , ~ Data Infrastructure Expertise: Possess deep experience...  ...designing and building data platforms, including data warehouses,...  ...building infrastructure for AI/ML workloads or cloud computing... 
    Full time

    Crusoe Energy Systems

    San Francisco, CA
    2 days ago
  • $160k - $225k

     ...defense. Fable is the human risk platform that directly shapes employee...  ...scale the foundational data infrastructure powering a category‑defining product Work closely with engineering, data science, and product...  ...collaborate closely with data and ML teams and contribute to the... 
    Work experience placement
    Relocation package
    Flexible hours

    Fable Security LLP

    San Francisco, CA
    2 days ago
  •  ...Senior Software Engineer, Infrastructure & Platform Role Overview: As a Senior Software Engineer, Infrastructure & Platform at AfterQuery, you will design...  ...with AI infrastructure, LLM evaluation systems, or ML pipelines Experience working at high-growth startups or scaling... 

    AfterQuery

    San Francisco, CA
    2 days ago
  • $212k - $318.4k

     ...Software Engineer, ML platform and Infrastructure San Francisco Bay Area, California, United States Software and Services The Applied Machine Learning team has been at the forefront of accelerating digital transformation through machine learning across Apple's enterprise... 
    Relocation

    Apple

    San Francisco, CA
    2 days ago
  •  ...developer or data scientist can scale an ML application from their laptop to the...  ...Anyscale is looking for a Software Engineer to join the Infrastructure team. Anyscale aims to provide the...  ...infrastructure that powers Anyscale’s cloud platform. You will have the opportunity... 

    Anyscale

    San Francisco, CA
    2 days ago
  • $250k - $350k

     ...Senior Software Engineer – Infrastructure/Platform — AfterQuery Location: San Francisco, CA (Onsite) Compensation: $250,000 – $350,000 base + competitive...  ...at scale AI infrastructure, LLM evaluation systems, or ML platform infrastructure Human-in-the-loop or workflow orchestration... 
    Full time
    Visa sponsorship

    David Joseph & Company

    San Francisco, CA
    3 days ago
  • $160k - $225k

     ...across the United States to help them hire. Software Engineer - Platform / Infrastructure Location: San Francisco, CA (Hybrid) Company Stage...  ...operational performance Collaborate closely with data and ML teams Support production data ingestion and serving... 
    Work at office
    Remote work
    Flexible hours

    Recruiting from Scratch

    San Francisco, CA
    7 days ago
  •  ...A technology company in San Francisco is seeking an experienced ML Infrastructure Engineer to develop platforms for machine learning jobs and to lead cross-functional initiatives. The ideal candidate will have experience with continuous integration and deployment models... 

    Delphina

    San Francisco, CA
    2 days ago
  •  ...A leading technology company in San Francisco is seeking an experienced Software Engineer for its Machine Learning Platform team. You will design and build services that support Apple’s machine learning and computer vision efforts. Responsibilities include leading systems... 

    Apple

    San Francisco, CA
    2 days ago
  • $216k - $270k

     ...As a Software Engineer on the Machine Learning Infrastructure team, you will build the "Operating System" for our large...  ...a high-performance training platform that handles the immense complexity...  ...least 2 years focused on orchestrating ML workloads at scale (100+ GPU nodes)... 
    Full time

    Scale AI

    San Francisco, CA
    9 days ago
  • $181.1k - $318.4k

     ...AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Infrastructure San Francisco Bay Area, California, United States Machine Learning and AI Description As an engineer on ML Compute team, your work will include: Drive large‑scale pre‑training... 
    Relocation

    Apple

    San Francisco, CA
    3 days ago
  • Principal Engineer, AI Platform & Infrastructure About the Role SPREEAI is building the future of AI-powered commerce through photorealistic virtual try‑...  ...experiences for global retail partners. This role spans ML platform engineering, deployment systems, GPU infrastructure... 

    SpreeAI

    San Francisco, CA
    3 days ago
  • $216k - $270k

     ...As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting-edge research and production systems, supporting both internal and external use cases across various... 
    Full time

    Scale AI

    San Francisco, CA
    9 days ago
  •  ...Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time...  ...when and where it’s needed most. Our platform routes training and inference jobs across...  ..., networking, orchestration, and ML frameworks. Drive blameless post‑mortems... 
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    2 days ago
  •  ...Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time About...  ...when and where it’s needed most. Our platform routes training and inference jobs across...  ...incident response. Nice to Have Exposure to ML/AI infrastructure or GPU-based systems... 
    Full time
    Remote work

    Andromeda Cluster

    San Francisco, CA
    2 days ago
  • $250k

     ...opportunity? Join a seed-stage AI infrastructure company building large-scale training and inference platforms previously accessible only to...  ...workloads. Collaborate with ML, networking, and platform...  ..., DevOps, or Infrastructure Engineering roles supporting large-scale... 
    Immediate start

    Hamilton Barnes Associates Limited

    San Francisco, CA
    4 days ago
  • $232k - $319k

     ...secures AI by building the trusted, neutral infrastructure that enables organizations to safely...  ...too, let's talk. The Infrastructure Platform and Shared Services Team Okta...  ...partnership with architects and product engineering Build a world-class observability platform... 
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    San Francisco, CA
    6 days ago
  • $200k - $265k

     ...healthcare technology firm is seeking a Senior Software Engineer to design and maintain the infrastructure that empowers healthcare providers. This role...  ...years in software engineering and experience with cloud platforms, containers, and databases. The position offers... 

    Ambience Healthcare, Inc.

    San Francisco, CA
    2 days ago
  •  ...Zyphra in San Francisco is hiring a Platform Engineer responsible for designing and maintaining robust infrastructure. You will collaborate with teams to enhance system observability...  ..., infrastructure as code, and managing ML systems. The company offers competitive... 

    Zyphra

    San Francisco, CA
    2 days ago
  • $200k - $260k

     ...is building the best inference infrastructure for voice applications. Our Voice AI platform powers production-grade, real-time...  ...looking for a Senior Platform Engineer to own the API and infrastructure...  ...provider. Collaborate with the ML engineering side of the team on... 
    Full time

    Together AI

    San Francisco, CA
    13 days ago
  • $213k - $263k

     ...applied to a range of vehicle platforms and product use cases. The...  ...experienced data-minded software engineers and data scientists to help...  ...high-quality Machine Learning (ML) and Evaluation datasets....  ...and implement robust tools and infrastructure to facilitate data mining, exploration... 
    Full time
    Remote work

    Waymo

    San Francisco, CA
    20 hours ago
  •  ...the largest livestream shopping platform in North America and Europe to...  ...Whatnot updates on our news and engineering blogs and join us as we enable...  ...to shape the future of AI and ML at Whatnot. You'll design and scale the core infrastructure that powers large language model... 
    Work experience placement
    Work at office
    Local area
    Remote work
    Work from home
    Home office
    Flexible hours

    Whatnot

    San Francisco, CA
    7 days ago
  •  ...About the Role You'll be the foundational engineer owning Known's core infrastructure and platform systems - the backbone that powers our AI-driven matching...  ...You'll work directly with the founding team (AI/ML, product, and design) to establish Known's technical... 

    pear.ai

    San Francisco, CA
    4 days ago
  •  ...time. We believe businesses deserve financial infrastructure tailored to how they actually operate. That...  ...and is on a mission to build the best engineering team in the world. We’re looking for a Senior Infrastructure/Platform Engineer focusedon owning and scaling the foundation... 
    Work at office

    Slash Financial

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Platform & Infrastructure Engineer. Be the first to apply!