Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior ML Infra Engineer - Large-Scale Training & Pipelines

Kindredventures

Responsibilities Design, deploy, and maintain large distributed ML training and inference clusters Develop efficient, scalable end-to-end pipelines to manage petabyte-scale datasets and model training throughout the entire ML lifecycle Research and test various training approaches including parallelization techniques and numerical precision trade-offs across different model scales Analyze, profile and debug low-level GPU operations to optimize performance Stay up-to-date on research to bring new ideas to work What we’re looking for We value a relentless approach to problem-solving, rapid execution, and the ability to quickly learn in unfamiliar domains. Strong grasp of state-of-the‑art techniques for optimizing training and inference workloads Demonstrated proficiency with distributed training frameworks (e.g. FSDP, DeepSpeed) to train large foundation models Knowledge of cloud platforms (GCP, AWS, or Azure) and their ML/AI service offerings Familiarity with containerization and orchestration frameworks (e.g., Kubernetes, Docker) Background working on distributed task management systems and scalable model serving & deployment architectures Understanding of monitoring, logging, observability, and version control best practices for ML systems You don’t have to meet every single requirement above. #J-18808-Ljbffr

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior ML Infra Engineer - Large-Scale Training & Pipelines in San Francisco, CA vacancy
  • $250k - $350k

     ...actually work. We’re hiring ML Infrastructure Engineers to tackle a hard, real-world...  ...sites using wearable devices, large-scale video, and AI. This isn’t clean...  ...: High-throughput video pipelines handling millions of hours of data Training and inference systems for multimodal... 
    Pipeline
    Training

    Trades Workforce Solutions

    San Francisco, CA
    2 days ago
  •  ...Francisco seeks a Machine Learning Engineer to work with the full ML stack, implementing advanced model...  ...and building extensive data pipelines for large datasets. The ideal candidate will...  ...principles, extensive experience in training models, and familiarity with distributed... 
    Pipeline
    Senior
    Training

    Kindredventures

    San Francisco, CA
    2 days ago
  •  ...role you will help scale and optimize our training systems and core...  ...for large-scale training,...  ...efficient JAX training pipelines. You’ll work closely...  ...and model engineers to translate ideas...  ...intersection of ML, software engineering...  ...needs into infra capabilities and... 
    Pipeline
    Training

    Physical Intelligence

    San Francisco, CA
    3 days ago
  •  ...seeking talent to build and optimize GPU infrastructure for large-scale model inference and training workloads. The ideal candidate will have hands-on...  ...synthetic data generation and reinforcement learning pipelines. This role offers top-tier compensation and comprehensive... 
    Pipeline
    Training

    Reflection

    San Francisco, CA
    2 days ago
  •  ...pioneering tech startup in neurotechnology is seeking a Senior Machine Learning Infrastructure Engineer to design and scale critical infrastructure powering ML applications. This role involves creating robust data pipelines and optimizing modeling processes, essential for... 
    Pipeline
    Senior

    Echo Neurotechnologies

    San Francisco, CA
    4 days ago
  • $250k

    Hamilton Barnes Associates Limited in San Francisco is seeking an experienced engineer to design and maintain large-scale GPU clusters for training and inference. The candidate should have over 7 years in SRE or DevOps, with strong skills in Kubernetes and Linux systems... 
    Senior
    Training

    Hamilton Barnes Associates Limited

    San Francisco, CA
    5 days ago
  •  ...Clera is seeking a Senior AI/ML Engineer to build production-grade ML infrastructure. You will design and ship end-to-end ML systems including data pipelines, training, and deployment. The role requires 4+ years of applied ML engineering experience in production settings... 
    Pipeline
    Senior
    Training
    Full time

    Clera

    San Francisco, CA
    17 hours ago
  •  ...Slope is seeking an experienced Software Engineer in San Francisco to build machine learning infrastructure for monetization systems. You will design large-scale data pipelines, work on model training platforms, and enhance system performance. The ideal candidate will... 
    Pipeline
    Training

    Slope

    San Francisco, CA
    3 days ago
  •  ...Francisco is seeking an experienced Software Engineer to develop machine learning...  ...systems. The role involves building data pipelines, creating training platforms, and collaborating with various...  ...in distributed systems and ML workflows. Join us in shaping the future... 
    Pipeline
    Senior
    Training

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  • $204k - $259k

     ...autolabels at a massive scale, serving as the foundation for training and validating the...  ...are an advanced ML and engineering team that...  ...framework that integrates large foundation models...  ...with the ML Infra, Perception, Behavior...  ...scale data processing pipelines for ML training.... 
    Pipeline
    Senior
    Training
    Full time
    Remote work

    Waymo

    San Francisco, CA
    4 days ago
  • $200k - $300k

     ...Robotics, Chef is rapidly scaling with multiple multi-year...  ...Foundation Model. As a Senior ML Engineer, Foundation Models, you...  ...work at the frontier of large-scale robot learning: training and fine-tuning the Food...  ...fine-tuning, and alignment pipelines that improve the model's... 
    Pipeline
    Senior
    Training
    Flexible hours

    Alumni Ventures

    San Francisco, CA
    17 hours ago
  •  ...Senior ML/RL Engineer, Behavior Planning At Bot Auto, we are revolutionizing the...  ...Modeling: Develop and train diverse, conditioned policies...  ...comfort. Scalable Training Pipelines: Contribute to the optimization of our large-scale, high-throughput training environments... 
    Pipeline
    Senior
    Training
    Shift work

    Bot Auto

    San Francisco, CA
    4 days ago
  • A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate... 
    Senior
    Training

    Reflection AI

    San Francisco, CA
    4 days ago
  •  ...A leading AI technology company in San Francisco is seeking an engineering professional to develop and manage intelligent job scheduling systems for large-scale AI applications. This role focuses on ensuring efficient resource allocation across GPU and TPU clusters while... 
    Training

    Physical Intelligence

    San Francisco, CA
    3 days ago
  • ML Systems Engineer - Robotics & AI We are building the full-stack foundation for...  ...and handling scenarios unseen in training. We work at the intersection of large-scale learning, robotics, and systems...  ...sharded training, tensor/pipeline parallelism, gradient accumulation... 
    Pipeline
    Training

    Maxwell Bond

    San Francisco, CA
    1 day ago
  • $320k - $405k

     ...researchers, engineers, policy experts...  ..., Node Infra About the role...  ...quickly we can train new models, how...  ...we can scale Claude to millions...  ...alignment across senior stakeholders...  ...managing large scale compute...  ...provisioning pipelines Low-level systems...  ...distributed ML workloads.... 
    Pipeline
    Senior
    Training
    Visa sponsorship

    Menlo Ventures

    San Francisco, CA
    5 days ago
  •  ...believe culture can be engineered - but when it...  ...re looking for an ML infrastructure engineer...  ..., build, and scale the foundational systems...  ...stage of the ML training flywheel and be an...  ...curation to large-scale model training...  ...Develop batch compute pipelines for cataloging,... 
    Pipeline
    Training
    Local area

    Humble Robotics

    San Francisco, CA
    4 days ago
  • $250k - $400k

     ...experienced professionals to build and scale systems for AI-driven scientific discovery. The role involves developing training pipelines, supporting model deployment, and...  ...plus equity, with opportunities for ML Engineers, ML Infra, Research Engineers, and Research Scientists... 
    Pipeline
    Training
    Remote job

    Trades Workforce Solutions

    San Francisco, CA
    3 days ago
  •  ...technology company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation models. You will...  ...candidates have over 5 years of experience in ML infrastructure and a strong background in... 
    Senior
    Training

    Baseten

    San Francisco, CA
    4 days ago
  • $200k - $240k

     ...for all. The AI Engineering Team is chartered...  ...special focus on Large Language Models (...  ...to build robust pipelines, high-performance...  ...speed, safety, and scale. We manage petabyte...  ...the market. As a Senior or Staff ML Systems Engineer...  ...for model training, evaluation, and... 
    Pipeline
    Senior
    Training
    Remote work
    Worldwide

    TRM Labs

    San Francisco, CA
    3 days ago
  • A leading AI company in San Francisco is seeking a skilled ML Infrastructure Engineer to manage and optimize large-scale training systems. In this role, you will design and maintain infrastructure for model training, ensuring efficient GPU/TPU utilization while working... 
    Training

    Physical Intelligence

    San Francisco, CA
    1 day ago
  •  ...Define the ML strategy, raise the...  ...the systems that scale. You are a...  ...between research and engineering reality. You...  ...feature store, training infrastructure,...  ...registry, deployment pipelines, and...  ...stack. Mentor senior and mid-level engineers...  ...with multiple large‑scale... 
    Pipeline
    Senior
    Training

    Sierracorp

    San Francisco, CA
    3 days ago
  •  ...deliver reliable outcomes at scale.  About the Role We...  ...are looking for a visionary Senior ML Engineer who will bridge the gap...  ...Automate SDLC processes and CI/CD pipelines, elevate QA standards, and...  ...+ years of ML, specifically training or fine-tuning LLM models, embeddings... 
    Pipeline
    Senior
    Training
    Shift work

    Palm Venture Studios

    San Francisco, CA
    5 days ago
  •  ...firm in San Francisco is seeking an experienced ML Research Engineer to develop innovative training systems for atomistic simulation models. The...  ...distributed cloud setups. Responsibilities include scaling training pipelines, defining strategies for deep models, and... 
    Pipeline
    Senior
    Training

    Achira

    San Francisco, CA
    2 days ago
  •  ...Highlight AI We're a small, senior team building the...  ...Role We're hiring a Senior ML Engineer to help build the AI systems...  ...across the full ML stack: data pipelines, model training, retrieval, ranking, evals,...  ...Head of AI Engineering to scale our ML capabilities. You will... 
    Pipeline
    Senior
    Training
    Work at office
    Relocation
    Relocation package
    Flexible hours

    Highlight AI

    San Francisco, CA
    4 days ago
  • $230k - $332k

     ...Senior/Staff Machine Learning Engineer - Perception Offline Driving Intelligence...  ...multimodal large language models...  ...sophisticated training techniques,...  ...the overall large scale data we have at...  ...Drive end‑to‑end ML solutions from research...  ...extensive data pipelines and... 
    Pipeline
    Senior
    Training
    Full time

    jobs.frontdoordefense.com - Jobboard

    San Francisco, CA
    4 days ago
  • $200k - $400k

     ...generation data platform to train AI video models....  ...end-to-end data pipeline connects creators,...  ...strategic engineer to help us scale. Role Overview The Senior Machine Learning Engineer...  ...building, and optimizing large‑scale machine...  ...work across the full ML lifecycle, from... 
    Pipeline
    Senior
    Training
    Work experience placement

    Troveo AI

    San Francisco, CA
    3 days ago
  • $200k - $300k

     ...Amazon Robotics, Chef is rapidly scaling with multiple multi-year...  ...configurations. As a Senior ML Engineer, Manipulation, you will own...  ...data collection strategies and training policies, to deploying and debugging...  ...to build data collection pipelines using teleoperation,... 
    Pipeline
    Senior
    Training
    Flexible hours

    Alumni Ventures

    San Francisco, CA
    17 hours ago
  • $196k - $278k

    Senior Machine Learning Engineer - Mapping Develop algorithms for high-definition...  ...algorithms and ML models for 3D...  ...environments. Leverage our large‑scale ML infrastructure to...  .... Experience with training/deploying Deep...  ...production Machine Learning pipelines: dataset creation,... 
    Pipeline
    Senior
    Training
    Temporary work
    Relocation package

    jobs.frontdoordefense.com - Jobboard

    San Francisco, CA
    2 days ago
  • $225k - $325k

     ...calls that once required large teams of human agents...  ...investors, we have scaled to $60M ARR with a...  ...high-ownership role for ML engineers who want to build...  ...constraints. As a Founding Senior Machine Learning...  ...performance end-to-end-from training pipelines to post-deployment... 
    Pipeline
    Senior
    Training
    H1b
    Work at office

    Retell AI

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior ML Infra Engineer - Large-Scale Training & Pipelines. Be the first to apply!