Senior ML Infrastructure Engineer: Scale GPU Training & HPC
Dyna Robotics
A cutting-edge robotics company based in California is looking for an experienced Machine Learning Infrastructure Engineer. This role involves designing scalable ML training platforms, optimizing high-performance computing systems, and ensuring robust job scheduling and reliability. Ideal candidates will have 7+ years in software with hands-on experience in ML model tuning and managing cloud environments. Join us to shape the future of AI-driven robotics. #J-18808-Ljbffr
$153.2k - $234.1k
...transportation on a global scale. Role Are you passionate... ...-world scenarios. As a Senior ML engineer, you will build critical infrastructure that powers every... ...supporting machine learning training and evaluation workflows... ...ML training across large GPU/CPU clusters or specialized...SeniorTrainingRemote workRelocation packageFlexible hours- ..., Inc. is looking for a skilled data engineer in San Carlos, California, to design, build, and maintain large-scale data pipelines for training and evaluation of robotics foundation... ...Responsibilities include managing core data infrastructure and collaborating with a dedicated...SeniorTraining
- ...Mountain View, California is looking for an experienced Data Engineer to design large-scale data pipelines and advanced data systems for autonomous... ...in Python, and experience working with large-scale data infrastructure. This role offers a competitive salary range and a...Senior
$181k - $297k
...We are seeking an HPC Network Engineer to design, deploy, and... ...Ethernet fabrics for large-scale GPU clusters. The role... ...supporting AI/ML training, inference, and HPC... ...RDMA traffic. As a Senior Staff Software Engineer... ...Experience with infrastructure automation or configuration...SeniorTrainingFor contractorsWork at officeFlexible hours- The Mission: As a Senior Machine Learning Engineer, you will be responsible for building... ...processes for model training, fine-tuning, testing, and... ...Generative AI models at significant scale. Investigate, prototype and... ...with building and evolving ML Training and Inferencing...SeniorTrainingLocal area
$128.7k - $261.3k
...learning models from training frameworks (e.g. PyTorch... ...two‑fold: build the ML deployment platform that... ...performed manually by engineers. Build the developer... ...production platform or infrastructure systems where reliability... ...with the NVIDIA GPU stack at the integration...SeniorTrainingLocal areaRemote workFlexible hoursShift work- ...team at GM builds core infrastructure that supports the end-... ...-end AI lifecycle of ML pipelines—from local experimentation... ...and large-scale training to evaluation, lineage... ..., enabling ML engineers and researchers to develop... ...environments. The Role: As a Senior AI/ML Engineer, you...SeniorTrainingLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours
$148k - $247k
...diverse perspectives and teamwork. ¹ As a Senior AI/ML Platform Engineer, you will architect and scale the ML platform for data scientists and ML... ...model monitoring. Design and implement infrastructure for model training, hyperparameter tuning, experiment tracking...SeniorTrainingFull timePart timeImmediate startFlexible hours$242.1k - $293.8k
...technical challenges at scale, and helping to create... ...ads machine learning infrastructure to deliver effective... ...advertisers. As a Senior Machine Learning Infrastructure Engineer in our Ads ML Infra team, you'll build... ...infrastructure including model training, data pipelines,...SeniorTrainingFull timeWork experience placementH1bWork at officeLocal areaVisa sponsorshipMonday to Friday- ...proud to serve as the infrastructure platform for teams developing... ...high-impact, ML-centric use cases. About... ...Role We are seeking a Senior ML Infrastructure engineer to help build and scale robust Compute platforms... ...performance computing (HPC). Experience working with...Senior
$170k - $240k
...delivering-driven expert in ML Training Infrastructure with a strong ability to... ...development initiatives. As a Senior ML Engineer, you will collaborate... ...support model training at scale. Model training performance... ...distributed computing, GPU computing, and cloud environments...SeniorTrainingLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours- ...developing end-to-end ML models for robot... ...expertise: data pipelines, training infrastructure or inference. You'll... ...multimodal data, scaling distributed training,... ...distributed training across GPU clusters with minimal... ...For Strong software engineering and systems...Training
$96.8k - $251.6k
...Description The Senior Principal AI Agent / ML Software Engineer is a Senior Staff-... ...on Oracle Cloud Infrastructure (OCI). This person... ...used in large-scale, business-critical... ...high throughput, GPU efficiency, reliability... ...GPU inference or training workloads for latency...SeniorTrainingTemporary workFlexible hours- ...Network Engineer - AI/HPC Memphis, TN; Palo Alto, CA... ...world to build a 100k GPU cluster on an ethernet... ...can develop at hyper scale while optimizing performance... ...optimize it to our training models and how we... ...seamlessly build-out new GPU infrastructure with little to no...Training
- ...We are seeking an experienced Machine Learning Infrastructure Engineer to join our team and help scale our ML training platform. In this role, you will be responsible... ...improve training performance across an expanding GPU ecosystem. You will work on cutting‑edge high-performance...TrainingLocal area
- ...A leading robotics company in Palo Alto seeks a Staff/Principal ML Systems Engineer to enhance training performance for their innovative humanoid robots. You will optimize distributed training systems and engage closely with researchers to transform model changes into...SeniorTraining
- ...black.ai is looking for a skilled platform engineer in Palo Alto to enhance our AWS infrastructure and support quantum simulations. This role requires strong experience... ...in platform engineering, DevOps practices, and GPU workloads. As a platform engineer, you will improve...Senior
$153k - $222k
...Decisive Point is looking for infrastructure engineers and ML engineers to join the Data & ML infra group in Mountain View, California. The role focuses on working across the ML lifecycle and solving broad data problems. Ideal candidates will have software engineering...SeniorTraining- ...Zoox is seeking a Senior Software Engineer, ML Core to optimize ML tooling for autonomous driving. Join a mission-focused team to develop tools that accelerate model training and deployment. Your 6+ years of experience in Python or C++ will help drive innovations in machine...SeniorTraining
$150k - $170k
...advantages for scale: photonsdon'tfeel... ...fiber-optic infrastructure. In 2024,PsiQuantumannounced... ...Software Engineering Team builds... ...with GPU-accelerated computing... .... GPU/HPC Bridge Work (30... ...infrastructure, ML platforms, or early... ...education and training, competencies,...SeniorTrainingFull timeShift work$197.3k - $313.7k
...Responsibilities Build, scale and maintain critical features and... ...with architects, product owners, engineers, user experience designers and... .... Experience with modern AI/ML frameworks, systems, libraries... ...compensation, promotion, benefits, training, assessment of job performance...SeniorTrainingFlexible hours$162.78k - $221.47k
..., and fastest scales. From particle... ...At SLAC, our infrastructure powers the discovery... ...seeking a Senior Kubernetes Engineer to help... ...guidance and training to help users... ...workloads Optimize GPU and... ...Familiarity with AI/ML infrastructure... ...computing or HPC environment...SeniorTrainingWorldwideFlexible hoursNight shift- ...Senior Solution Architect – AI / GPU Cloud Mountain View, California... ...intersection of AI infrastructure, GPU cloud... ..., enabling large-scale AI/ML and HPC workloads. Key... ...Center Ops, and Engineering teams Identify... ...with distributed training/inference, Kubernetes...SeniorTraining
- ...technologies. We are looking for a Senior Machine Learning Engineer to build the AI foundation... ..., from model research and training to deployment on embedded... ...evaluation frameworks that scale across imaging datasets.... ..., with at least one major ML framework (PyTorch,...SeniorTraining
$155.42k - $395.9k
...Description About the Team: The ML Inference Platform is part of the AV ML Infrastructure organization. Our team... ...committed to maximizing GPU utilization across... ...the Role: We are seeking a Senior ML Infrastructure engineer to help build and scale robust platforms for ML Inference...SeniorLocal areaRemote workRelocationRelocation packageFlexible hours$188k - $250k
...build, and productionize large-scale NLP and LLM systems that power... ...that analyze AI Answering engine outputs and public web content... ...customer problems into measurable ML deliverables and ship... ...to production: data pipelines, training and inference, CI/CD for ML, observability...SeniorTrainingLocal area$195k - $298k
...Technical Center – Cole Engineering Center Podium or... ...the Team The ML Compute Platform is... ...organization within Infrastructure Platforms. Our... ...platform supports training and deployment of... ...commit to maximizing GPU utilization across... ...Engineer to build and scale robust compute...TrainingLocal areaRelocation packageFlexible hours$248.71k - $292.6k
...developers the speed and scale they need.... ...Staff Software Engineer - High Performance GPU Inference... ...software-defined infrastructure. Low‑Level GPU Optimization... ...teams across ML compilers,... ...and optimizing ML/HPC workloads on GPU... ...with multi‑GPU training/inference frameworks...SeniorTraining- ...About the role We're looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.... ...generally support our research Maximize GPU allocation and utilization for both serving...Training
$317k - $370k
...Senior Engineering Manager, ML Platform Zoox is on a mission to reimagine... ...growing Software Infrastructure engineering... ...work on cutting-edge training and inference optimization... ...experimentation and scale our multi-modal Foundation... ...~ Experience with GPU-accelerated...SeniorTraining
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior ML Infrastructure Engineer: Scale GPU Training & HPC. Be the first to apply!
- computer vision machine learning engineer Redwood City, CA
- machine learning engineer Redwood City, CA
- senior ml engineer Redwood City, CA
- remote infrastructure engineer Redwood City, CA
- infrastructure developer Redwood City, CA
- senior infrastructure engineer Redwood City, CA
- infrastructure engineer Redwood City, CA
- data infrastructure engineer Redwood City, CA
- senior manager quality engineering Redwood City, CA
- senior director of development Redwood City, CA

