Senior ML Infra Engineer - Large-Scale Training & Pipelines
Kindredventures
Responsibilities Design, deploy, and maintain large distributed ML training and inference clusters Develop efficient, scalable end-to-end pipelines to manage petabyte-scale datasets and model training throughout the entire ML lifecycle Research and test various training approaches including parallelization techniques and numerical precision trade-offs across different model scales Analyze, profile and debug low-level GPU operations to optimize performance Stay up-to-date on research to bring new ideas to work What we’re looking for We value a relentless approach to problem-solving, rapid execution, and the ability to quickly learn in unfamiliar domains. Strong grasp of state-of-the‑art techniques for optimizing training and inference workloads Demonstrated proficiency with distributed training frameworks (e.g. FSDP, DeepSpeed) to train large foundation models Knowledge of cloud platforms (GCP, AWS, or Azure) and their ML/AI service offerings Familiarity with containerization and orchestration frameworks (e.g., Kubernetes, Docker) Background working on distributed task management systems and scalable model serving & deployment architectures Understanding of monitoring, logging, observability, and version control best practices for ML systems You don’t have to meet every single requirement above. #J-18808-Ljbffr
$250k - $350k
...actually work. We’re hiring ML Infrastructure Engineers to tackle a hard, real-world... ...sites using wearable devices, large-scale video, and AI. This isn’t clean... ...: High-throughput video pipelines handling millions of hours of data Training and inference systems for multimodal...PipelineTraining- ...Francisco seeks a Machine Learning Engineer to work with the full ML stack, implementing advanced model... ...and building extensive data pipelines for large datasets. The ideal candidate will... ...principles, extensive experience in training models, and familiarity with distributed...PipelineSeniorTraining
- ...role you will help scale and optimize our training systems and core... ...for large-scale training,... ...efficient JAX training pipelines. You’ll work closely... ...and model engineers to translate ideas... ...intersection of ML, software engineering... ...needs into infra capabilities and...PipelineTraining
- ...seeking talent to build and optimize GPU infrastructure for large-scale model inference and training workloads. The ideal candidate will have hands-on... ...synthetic data generation and reinforcement learning pipelines. This role offers top-tier compensation and comprehensive...PipelineTraining
- ...pioneering tech startup in neurotechnology is seeking a Senior Machine Learning Infrastructure Engineer to design and scale critical infrastructure powering ML applications. This role involves creating robust data pipelines and optimizing modeling processes, essential for...PipelineSenior
$250k
Hamilton Barnes Associates Limited in San Francisco is seeking an experienced engineer to design and maintain large-scale GPU clusters for training and inference. The candidate should have over 7 years in SRE or DevOps, with strong skills in Kubernetes and Linux systems...SeniorTraining- ...Clera is seeking a Senior AI/ML Engineer to build production-grade ML infrastructure. You will design and ship end-to-end ML systems including data pipelines, training, and deployment. The role requires 4+ years of applied ML engineering experience in production settings...PipelineSeniorTrainingFull time
- ...Slope is seeking an experienced Software Engineer in San Francisco to build machine learning infrastructure for monetization systems. You will design large-scale data pipelines, work on model training platforms, and enhance system performance. The ideal candidate will...PipelineTraining
- ...Francisco is seeking an experienced Software Engineer to develop machine learning... ...systems. The role involves building data pipelines, creating training platforms, and collaborating with various... ...in distributed systems and ML workflows. Join us in shaping the future...PipelineSeniorTraining
$204k - $259k
...autolabels at a massive scale, serving as the foundation for training and validating the... ...are an advanced ML and engineering team that... ...framework that integrates large foundation models... ...with the ML Infra, Perception, Behavior... ...scale data processing pipelines for ML training....PipelineSeniorTrainingFull timeRemote work$200k - $300k
...Robotics, Chef is rapidly scaling with multiple multi-year... ...Foundation Model. As a Senior ML Engineer, Foundation Models, you... ...work at the frontier of large-scale robot learning: training and fine-tuning the Food... ...fine-tuning, and alignment pipelines that improve the model's...PipelineSeniorTrainingFlexible hours- ...Senior ML/RL Engineer, Behavior Planning At Bot Auto, we are revolutionizing the... ...Modeling: Develop and train diverse, conditioned policies... ...comfort. Scalable Training Pipelines: Contribute to the optimization of our large-scale, high-throughput training environments...PipelineSeniorTrainingShift work
- A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate...SeniorTraining
- ...A leading AI technology company in San Francisco is seeking an engineering professional to develop and manage intelligent job scheduling systems for large-scale AI applications. This role focuses on ensuring efficient resource allocation across GPU and TPU clusters while...Training
- ML Systems Engineer - Robotics & AI We are building the full-stack foundation for... ...and handling scenarios unseen in training. We work at the intersection of large-scale learning, robotics, and systems... ...sharded training, tensor/pipeline parallelism, gradient accumulation...PipelineTraining
$320k - $405k
...researchers, engineers, policy experts... ..., Node Infra About the role... ...quickly we can train new models, how... ...we can scale Claude to millions... ...alignment across senior stakeholders... ...managing large scale compute... ...provisioning pipelines Low-level systems... ...distributed ML workloads....PipelineSeniorTrainingVisa sponsorship- ...believe culture can be engineered - but when it... ...re looking for an ML infrastructure engineer... ..., build, and scale the foundational systems... ...stage of the ML training flywheel and be an... ...curation to large-scale model training... ...Develop batch compute pipelines for cataloging,...PipelineTrainingLocal area
$250k - $400k
...experienced professionals to build and scale systems for AI-driven scientific discovery. The role involves developing training pipelines, supporting model deployment, and... ...plus equity, with opportunities for ML Engineers, ML Infra, Research Engineers, and Research Scientists...PipelineTrainingRemote job- ...technology company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation models. You will... ...candidates have over 5 years of experience in ML infrastructure and a strong background in...SeniorTraining
$200k - $240k
...for all. The AI Engineering Team is chartered... ...special focus on Large Language Models (... ...to build robust pipelines, high-performance... ...speed, safety, and scale. We manage petabyte... ...the market. As a Senior or Staff ML Systems Engineer... ...for model training, evaluation, and...PipelineSeniorTrainingRemote workWorldwide- A leading AI company in San Francisco is seeking a skilled ML Infrastructure Engineer to manage and optimize large-scale training systems. In this role, you will design and maintain infrastructure for model training, ensuring efficient GPU/TPU utilization while working...Training
- ...Define the ML strategy, raise the... ...the systems that scale. You are a... ...between research and engineering reality. You... ...feature store, training infrastructure,... ...registry, deployment pipelines, and... ...stack. Mentor senior and mid-level engineers... ...with multiple large‑scale...PipelineSeniorTraining
- ...deliver reliable outcomes at scale. About the Role We... ...are looking for a visionary Senior ML Engineer who will bridge the gap... ...Automate SDLC processes and CI/CD pipelines, elevate QA standards, and... ...+ years of ML, specifically training or fine-tuning LLM models, embeddings...PipelineSeniorTrainingShift work
- ...firm in San Francisco is seeking an experienced ML Research Engineer to develop innovative training systems for atomistic simulation models. The... ...distributed cloud setups. Responsibilities include scaling training pipelines, defining strategies for deep models, and...PipelineSeniorTraining
- ...Highlight AI We're a small, senior team building the... ...Role We're hiring a Senior ML Engineer to help build the AI systems... ...across the full ML stack: data pipelines, model training, retrieval, ranking, evals,... ...Head of AI Engineering to scale our ML capabilities. You will...PipelineSeniorTrainingWork at officeRelocationRelocation packageFlexible hours
$230k - $332k
...Senior/Staff Machine Learning Engineer - Perception Offline Driving Intelligence... ...multimodal large language models... ...sophisticated training techniques,... ...the overall large scale data we have at... ...Drive end‑to‑end ML solutions from research... ...extensive data pipelines and...PipelineSeniorTrainingFull time$200k - $400k
...generation data platform to train AI video models.... ...end-to-end data pipeline connects creators,... ...strategic engineer to help us scale. Role Overview The Senior Machine Learning Engineer... ...building, and optimizing large‑scale machine... ...work across the full ML lifecycle, from...PipelineSeniorTrainingWork experience placement$200k - $300k
...Amazon Robotics, Chef is rapidly scaling with multiple multi-year... ...configurations. As a Senior ML Engineer, Manipulation, you will own... ...data collection strategies and training policies, to deploying and debugging... ...to build data collection pipelines using teleoperation,...PipelineSeniorTrainingFlexible hours$196k - $278k
Senior Machine Learning Engineer - Mapping Develop algorithms for high-definition... ...algorithms and ML models for 3D... ...environments. Leverage our large‑scale ML infrastructure to... .... Experience with training/deploying Deep... ...production Machine Learning pipelines: dataset creation,...PipelineSeniorTrainingTemporary workRelocation package$225k - $325k
...calls that once required large teams of human agents... ...investors, we have scaled to $60M ARR with a... ...high-ownership role for ML engineers who want to build... ...constraints. As a Founding Senior Machine Learning... ...performance end-to-end-from training pipelines to post-deployment...PipelineSeniorTrainingH1bWork at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior ML Infra Engineer - Large-Scale Training & Pipelines. Be the first to apply!
- computer vision machine learning engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- senior ml engineer San Francisco, CA
- machine learning software engineer San Francisco, CA
- data scientist machine learning engineer San Francisco, CA
- machine learning engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- junior machine learning research engineer San Francisco, CA
- graduate machine learning engineer San Francisco, CA
- senior office manager San Francisco, CA

