Tech Lead for Distributed ML Systems & Training Platform
Scale AI
A leading AI technology firm in New York is seeking a talented individual to build and optimize their training and inference frameworks for large language models. The ideal candidate will collaborate with ML teams to accelerate research and development, bringing expertise in multi-node training and large-scale ML systems. This position offers a competitive salary and benefits, making it an exciting opportunity for those passionate about AI innovation. #J-18808-Ljbffr
$264.8k - $331k
...Scale's LLM post-training platform team builds our internal distributed framework for large language model... ...-end solutions for the ML lifecycle. You will work... ...to optimize our ML system. Ideally you'd have:... ...that power the world's leading models, and help enterprises...PlatformTrainingFull time$117.2k - $313.7k
...meets action. Tech meets trust. And... ...at the company leading workforce transformation... .../frameworks in distributed filesystems in... ...of our cloud platform. Build... ...innovations that improve system scalability,... ...with Big-Data/ML and S3 Hands-on... ..., benefits, training, assessment of...PlatformTrainingImmediate startRemote work- ...real-time. Our vision is AI systems that are flexible, personalized... ...about both. Researchers and ML engineers will hand you workloads... ...Scale: Design and operate distributed inference systems for LLMs,... ...and curate the datasets behind training and evaluation. The...PlatformTrainingFlexible hours
$90k
...Distributed Systems Software Engineer, Python / Go Join to apply... ...clouds and developing AI/ML pipelines for... ...well as imagining and leading new initiatives within... ...datasets Operating data platforms: key-value stores, relational... ...Engineer, HTML - AI Training (Freelance, Remote)...PlatformTrainingFull timeFreelanceInternshipLocal areaRemote workWorldwide- Staff Software Engineer, ML Infra & Distributed Systems About the Role: As a Staff... ...machine learning inference platforms. These platforms power... ...to explore new frameworks, lead critical cross-functional... ...Understanding of ML model training pipelines and model internals...PlatformTraining
$230k - $385k
...the constraints of physical systems to improve peoples' lives.... ...As a Software Engineer, Distributed Data Systems, you will design... ...powers large-scale multimodal training and evaluation at OpenAI. You... ...security. Ensure our data platform can scale by orders of magnitude...PlatformTrainingWork at officeRelocation package$245k - $385k
...About the Team The Platform Runtime team builds the low level framework components to power our ML training systems. We work on building robust, scalable, high performance components to support our distributed training workloads. Our priorities are to maximize the...PlatformTrainingWork at officeLocal areaRelocation package- ...models—from multimodal training data pipelines to... ..., and scalable platform that enables our... ...ingestion/processing, distributed model training,... ...of our distributed systems. We are looking... ...with core ML frameworks such as... .... Demonstrated Tech Lead experience, driving...PlatformTrainingFull time
- ...payments infrastructure platform that helps... ...We are backed by leading investors and processing... ...intelligent systems that optimize... ...level AI Platform Tech Lead to own the full... ...products - from ML model training through... ...continuously - drift, distribution shifts, retraining...PlatformTrainingLocal areaShift work
- ...Experience Team (MLX Tech)** is committed to... ...**implementing AI/ML across Capital One*... ...achieve this by building platforms that enable the... ...learning and AI.* Lead a portfolio of... ...deep experience in distributed microservices, and full stack systems to create solutions...PlatformFull timePart timeInternship
$229.9k - $262.4k
Senior Lead Software Engineer, Distributed Systems (Golang + Python on Kubernetes) Do you love building... ...Experience Team (MLX Tech) is committed to... ...responsibly implementing AI/ML across Capital One. We achieve this by building platforms that enable the rapid and...PlatformFull timePart timeInternshipLocal area$229.9k - $262.4k
Senior Lead Software Engineer, Distributed Systems (Golang + Python on Kubernetes) Do you love building... ...Experience Team (MLX Tech) is committed to... ...responsibly implementing AI/ML across Capital One . We achieve this by building platforms that enable the rapid and...PlatformFull timePart timeInternshipLocal area$166k - $225k
...s best data and AI infrastructure platform so our customers can use deep data... ...will be building the next generation distributed data storage and processing systems that can outperform specialized... ...experience, relevant certifications and training, and specific work location. Based...PlatformTrainingLocal areaWorldwide$255k - $405k
...About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the... ...infrastructure that powers large‑scale multimodal training and evaluation at OpenAI. You’ll... ..., and security. Ensure our data platform can scale by orders of magnitude while...PlatformTrainingFull timeWork at officeLocal areaRelocation packageFlexible hours$190k - $250k
...Software Engineer / Tech Lead, ML Infrastructure Heartflow... ...gives you the platform to lead technically.... ...across the stack: data systems focused on curation and... ...environment for both training and inference. We... ...maintain large-scale distributed computing platforms for...PlatformTrainingFull timeWork at officeLocal areaWorldwideRelocation- ...such as developing, training, deploying, and optimizing... ...machine learning systems Experienced using ML accelerator... ...business goals and platform hardware characteristics... ...critical role, you will lead the development and... ...inspector in highly distributed training/inference setups...PlatformTraining
$240k - $330k
...visionary Principle Level Tech Lead Manager to build... ...Machine Learning (ML) Acceleration team... ...ML model training. The ultimate goal... ...technical expertise in ML systems and performance... ..., large scale distributed training, data loader... ..., ML Training platform, and product teams...PlatformTraining$148.5k - $260.1k
...ambition meets action. Tech meets trust. And... ...career at the company leading workforce transformation... ...CAC/PIV. Distributed Systems Software Engineer - GovCloud... ...systems engineering platform that ships hundreds of... ...promotion, benefits, training, assessment of job performance...PlatformTrainingLocal area$293.6k - $335.1k
COMFORT SYSTEMS is seeking a Distinguished Software Engineer to join our innovative team in San Francisco, CA. You will lead technical contributions and mentor colleagues in a collaborative... ...engineering, particularly in distributed systems and cloud technologies. This...Platform- ...Join us and help build the platform engineers turn to to ship AI... ...building the global operating system for distributed, heterogeneous AI hardware.... ...for foundational engineers to lead our GPU Networking efforts,... ...) Exposure to a variety of ML startups, offering...PlatformFlexible hours
$146.5k
...the team: The ML Data Engineering team... ...worldwide. Our systems operate at massive... ...data engineering, and distributed systems,... ...truly global scale. Tech Stack: Our backend... ...best practices. Lead the design, implementation... ...education or training; and other business...TrainingFor contractorsLocal areaWorldwideHome officeFlexible hours$160k - $180k
...nearly everyone does on our platform: play video games. Over 90% of... ...most critical services. Those systems are at the core of our text... ...scale, reliable and performant distributed systems. Collaborate with... ...experience, and relevant education or training. Please note that the...PlatformTrainingFull timeRelocationRelocation package$300k - $405k
...interpretable, and steerable AI systems. We want AI to be safe... ...and reliably for training and serving frontier... ...Work with our ML engineers to understand... ...influence hardware and platform features for AI workloads... ...schemes for large-scale distributed training Developing...PlatformTrainingWork at officeVisa sponsorshipFlexible hours$225k - $275k
...Infrastructure Staff Tech Lead Manager, ML Data Services Boston,... ...in machine learning systems, large-scale data processing... ...the ML Data Service platform, ensuring it meets... ...provision of diverse training data sources,... ...-scale data systems, distributed systems, or ML infrastructure...PlatformTrainingWork at officeRemote work2 days per week$146.5k - $228k
...About the team: The ML Data Engineering... ...users worldwide. Our systems operate at massive... ...data engineering, and distributed systems,... ...truly global scale. Tech Stack: Our backend... ...coding best practices. Lead the design, implementation... ...education or training; and other business...TrainingTemporary workLocal areaWorldwideHome officeFlexible hours- ...AI Systems Engineer - Codex Core Agents... ...of how models are trained and evaluated, making... ...level systems and ML workflows, able to... ...production systems in distributed systems,... ...virtualization, cloud platforms, or ML systems.... ...ownership, and can lead scoped or multi-team...PlatformTraining
- ...Learning Architect to define the ML strategy and build scalable systems. The role involves architecting end-to-end ML systems, leading technical roadmaps, and mentoring... ...leadership is essential. Experience in ML platforms and distributed training is highly valued. Join a forward-...PlatformTraining
$248.4k - $310.5k
...Robotics & Autonomous Systems Scale's Robotics... ...collection, model training pipelines, and... ...parts of our robotics platform, work directly... ...vehicle datasets Build ML training and fine-... ...Understanding of distributed systems, workflow... ...power the world's leading models, and help...PlatformTrainingFull time$147k - $211k
Software Engineer, Agentic AI Systems, Cloud Security Google San... ...Agentic development etc) or ML platform/infrastructure (e.g., model... ...systems. Experience in available distributed systems, cloud services or... ..., and relevant education or training. Your recruiter can share...PlatformTrainingFull timeWorldwide$44k - $185k
...of Cisco's AI-driven platforms and data infrastructure... ...data and intelligent systems. Explore the opportunities... .... Familiarity with distributed data processing... ...on experience with AI/ML. Familiarity with major... ...certifications, and/or training. The full salary range...PlatformTrainingFull timeTemporary workApprenticeshipInternshipLocal areaFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Tech Lead for Distributed ML Systems & Training Platform. Be the first to apply!
- technical lead manager San Francisco, CA
- technical leader San Francisco, CA
- technical lead San Francisco, CA
- salesforce technical lead San Francisco, CA
- digital platform specialist San Francisco, CA
- director of digital platform San Francisco, CA
- platform product manager San Francisco, CA
- platform manager San Francisco, CA
- road techs San Francisco, CA
- retail sales technology associate San Francisco, CA


