Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

ML Infrastructure Engineer — Large-Scale AI Systems

Causal Labs

A leading AI research organization in San Francisco seeks an Infrastructure Engineer to design and maintain large distributed ML training and inference clusters. The ideal candidate will have a strong grasp of optimizing training workloads and experience with distributed training frameworks like FSDP and DeepSpeed. Proficiency in cloud platforms and containerization is essential. Join us to tackle unsolved problems and shape the future of AI. #J-18808-Ljbffr Causal Labs

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the ML Infrastructure Engineer — Large-Scale AI Systems in San Francisco, CA vacancy
  • $320k - $405k

     ...interpretable, and steerable AI systems. We want AI to be...  ...researchers, engineers, policy experts,...  ...Machine Learning Infrastructure Engineer to join...  ...you'll build and scale the critical infrastructure...  ...machine learning, large-scale distributed...  ...and implement ML infrastructure... 
    Suggested
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    4 days ago
  •  ...We're looking for an experienced HPC infrastructure engineer to lead bringup, administration, and...  ...on is probably the largest anime AI training cluster in the world . You...  ...get to combine your love of anime and large-scale GPU systems. You're familiar with the modern... 
    Suggested
    Work at office
    Visa sponsorship

    Spellbrush

    San Francisco, CA
    5 days ago
  • $100k - $200k

    Coval Simulation & Evaluation that scales voice and chat AI agents ML‑Infrastructure Engineer Salary $100K - $200K Equity 0.2...  ...foundations, the queuing systems, the monitoring patterns are in...  ...tasks. You'll develop and architect large parts of our compute infrastructure... 
    Suggested
    Full time
    Live in
    Work at office

    Voiceflow

    San Francisco, CA
    2 days ago
  •  ...advanced hardware engineering and AI solutions. Our...  ...Machine Learning Infrastructure Engineer to join...  ...design, build, and scale infrastructure to...  ...production-grade ML ecosystem to support...  ...Design and build systems ML cloud...  ...software engineering, large-scale data infrastructure... 
    Suggested
    Flexible hours

    Echo Neurotechnologies

    San Francisco, CA
    5 days ago
  • A leading AI technology firm in San Francisco is seeking an experienced ML Systems Engineer focused on developing and optimizing machine learning pipelines for robotics and...  ...offers competitive compensation and a supportive work environment. #J-18808-Ljbffr Scale AI, Inc.
    Suggested

    Scale AI, Inc.

    San Francisco, CA
    4 days ago
  • A leading AI company in San Francisco is seeking a skilled ML Infrastructure Engineer to manage and optimize large-scale training systems. In this role, you will design and maintain infrastructure for model training, ensuring efficient GPU/TPU utilization while working... 

    Physical Intelligence

    San Francisco, CA
    5 days ago
  •  ...are seeking a Data Infrastructure Engineer to build and...  ...bar for production systems. You will define clear...  ...and product usage scale. What You'll Do...  ...scalable data and ML infrastructure on...  ...search, indexing, and large-scale querying...  ...sensing, data, and AI systems with real-... 
    Permanent employment
    Full time

    Matter Intelligence

    San Francisco, CA
    4 days ago
  •  ...ML Engineer At Krea, we are building next-generation AI creative tools. We are dedicated to making AI intuitive...  ...and recommendation systems from scratch. You'll...  ...points Experience with large-scale data systems and production ML infrastructure Prior work on or... 
    H1b

    Krea

    San Francisco, CA
    5 days ago
  • A pioneering AI firm based in San Francisco is seeking a Research Engineer, Distributed Data Systems. In this role, you will design and maintain infrastructure for large-scale multimodal training, ensuring scalability and reliability of data systems. Candidates should... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    4 days ago
  • A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal candidates... 
    Remote job

    Pluralis Research

    San Francisco, CA
    2 days ago
  • A leading AI technology company in San Francisco is...  ...looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning...  ...distributed training systems and optimize GPU...  ...years of experience in ML infrastructure and a strong... 

    Baseten

    San Francisco, CA
    3 days ago
  • $248.8k - $311k

     ...Scale's Physical AI business unit is dedicated to solving the...  ...AI and developing ML pipelines for processing...  ...Role As an ML Systems Engineer on the Physical AI team...  ...experience building large-scale, high-...  ...in machine learning infrastructure. Algorithm Optimization... 
    Full time

    Scale AI

    San Francisco, CA
    19 days ago
  • $250k - $325k

     ...runs on the same infrastructure: agreements between...  ...We're building the AI that finally...  ...last 12 months. Engineering at Ivo Engineers...  ...agentic RAG [2023] • Large-scale LLM-based legal...  ...strategies to isolate ML vs API workloads...  ...in distributed systems Experience managing... 
    Contract work
    Work at office
    Remote work

    IVO Inc

    San Francisco, CA
    5 days ago
  •  ...is looking for skilled engineers to work on autonomous R&D systems in machine learning. You...  ...design experiments, build infrastructure, and implement systems that perform reliably in large-scale ML settings. The ideal...  ...research background in AI or related fields. This... 
    Full time

    Thesis (YC F25)

    San Francisco, CA
    1 day ago
  • $189.6k - $237k

     ...Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference....  ...heart of the field of AI as an indispensable...  ...to optimize our ML system Ideally you'd have...  ...Strong software engineering skills, proficient in... 
    Full time

    Scale AI

    San Francisco, CA
    5 days ago
  • Xterraai is looking for an ML Software Engineer to help build innovative AI agents capable of tackling complex scientific challenges. The position involves designing and developing systems that support cutting-edge research in geospatial and geophysics intelligence. The... 

    Xterraai

    San Francisco, CA
    1 day ago
  •  ...Build the data infrastructure for robots operating...  ...be improved, engineers rely on data to...  ...and autonomous systems teams to ingest...  ...looking for a ML Platform...  ...design, deploy, and scale the systems that...  ...pipelines over large, heterogeneous...  ...vehicles, or physical AI workflows... 
    Remote work

    Foxglove Technologies, Inc

    San Francisco, CA
    1 day ago
  • Reducto, Inc. is seeking an Infrastructure Engineer to design, build, and maintain scalable infrastructure for AI and ML workloads. The role involves automating cloud infrastructure and implementing robust monitoring systems to ensure reliability. With a requirement of... 

    Reducto, Inc.

    San Francisco, CA
    2 days ago
  • $131.76k - $161.06k

     ...Software Engineer ESnet delivers high-bandwidth,...  ...s Integrated Research Infrastructure. As part of ESnet's Pilots...  ...into production systems, and independently delivering...  ...issues and simulating large-scale deployments....  ...experience in applying AI tools and agentic workflows... 
    Full time
    Work at office
    Remote work

    Berkely Lab

    San Francisco, CA
    3 days ago
  • $250k - $380k

     ...time Department Scaling Compensation $2...  ...and inference infrastructure that powers frontier...  ...scale. Our systems unify how...  ...looking for an engineer to design and implement...  ...across large fleets of machines...  ...glamorous) part of the ML stack. Bonus...  ...OpenAI is an AI research and... 
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    1 day ago
  • A decentralized AI platform company in the United...  ...an experienced ML Training Platform Engineer to design and build robust infrastructure for ML training. The...  ...deployments and distributed systems. Responsibilities...  ...essential for enabling large-scale, collaborative AI... 

    Pluralis Research

    San Francisco, CA
    2 days ago
  •  ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building...  ...for engineering physical systems. Our model SGS-1 is state-of...  ...the cutting edge of applied AI at Meta, Autodesk Research and...  ...multi-node training at scale Deep understanding of profiler... 

    Spectral Labs

    San Francisco, CA
    15 days ago
  •  ...blockchain analytics and AI solutions to help law...  ...build a safer financial system for billions of people around...  ...at an unprecedented scale. As a Senior Software Engineer, ML Infrastructure at TRM Labs, you will collaborate...  ...serving patterns for large-scale models. Implement... 
    Worldwide

    TRM Labs

    San Francisco, CA
    2 days ago
  • Whatnot is seeking an AI/ML Platform Engineer to shape the future of machine learning within a fast-growing livestream shopping platform. In this role, you'll design and scale systems that support various business functions, prototype novel architectures, and build robust... 
    Remote job

    Whatnot

    San Francisco, CA
    3 days ago
  • Andiamo is seeking a Member of Technical Staff specializing in AI/ML Engineering in San Francisco. This role involves building intelligent systems to modernize financial operations, developing machine learning applications, and collaborating with cross-functional teams.... 
    Flexible hours

    Andiamo

    San Francisco, CA
    4 days ago
  • $117.2k - $313.7k

     ...Category Software Engineering Job Details About...  ...Salesforce is the #1 AI CRM, where humans with...  ...Salesforce. Distributed Systems Software Engineer -...  ...innovating and maintaining a large scale distributed systems...  ...Deliver cloud infrastructure automation tools, frameworks... 

    Salesforce

    San Francisco, CA
    3 days ago
  • $100k - $200k

    Voiceflow is seeking a skilled ML-Infrastructure Engineer in San Francisco to architect and operate auto-scaling systems for our voice AI simulation platform. The role includes optimizing GPU and compute infrastructure, ensuring high performance and reliability. Ideal... 
    Work at office

    Voiceflow

    San Francisco, CA
    3 days ago
  • $50 - $80 per hour

     ...Are you a network engineer who's curious about data...  ...at the intersection of infrastructure and intelligence? At...  ...integrated networking systems and now we're using the...  ...are built and used in AI systems. In this...  ...schemas and structure for large-scale data pipelines You'... 
    Hourly pay
    Contract work

    Meter Service

    San Francisco, CA
    1 day ago
  • A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional... 

    Abridge

    San Francisco, CA
    1 day ago
  • $250k

     ...Consulting Ltd is looking for a talented ML/AI Research Engineer to join their San Francisco team. You...  ...responsible for designing and managing the infrastructure that powers training, deployment, and governance of large-scale AI systems. The ideal candidate has a strong... 

    Alldus International Consulting Ltd

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Infrastructure Engineer — Large-Scale AI Systems. Be the first to apply!