Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Tech Lead for Distributed ML Systems & Training Platform

Scale AI

A leading AI technology firm in New York is seeking a talented individual to build and optimize their training and inference frameworks for large language models. The ideal candidate will collaborate with ML teams to accelerate research and development, bringing expertise in multi-node training and large-scale ML systems. This position offers a competitive salary and benefits, making it an exciting opportunity for those passionate about AI innovation. #J-18808-Ljbffr

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Tech Lead for Distributed ML Systems & Training Platform in San Francisco, CA vacancy
  • $264.8k - $331k

     ...Scale's LLM post-training platform team builds our internal distributed framework for large language model...  ...-end solutions for the ML lifecycle. You will work...  ...to optimize our ML system. Ideally you'd have:...  ...that power the world's leading models, and help enterprises... 
    Platform
    Training
    Full time

    Scale AI

    San Francisco, CA
    9 days ago
  • $117.2k - $313.7k

     ...meets action. Tech meets trust. And...  ...at the company leading workforce transformation...  .../frameworks in distributed filesystems in...  ...of our cloud platform. Build...  ...innovations that improve system scalability,...  ...with Big-Data/ML and S3 Hands-on...  ..., benefits, training, assessment of... 
    Platform
    Training
    Immediate start
    Remote work

    Salesforce.Com Inc

    San Francisco, CA
    3 days ago
  •  ...real-time. Our vision is AI systems that are flexible, personalized...  ...about both. Researchers and ML engineers will hand you workloads...  ...Scale: Design and operate distributed inference systems for LLMs,...  ...and curate the datasets behind training and evaluation. The... 
    Platform
    Training
    Flexible hours

    Adaption

    San Francisco, CA
    15 days ago
  • $90k

     ...Distributed Systems Software Engineer, Python / Go Join to apply...  ...clouds and developing AI/ML pipelines for...  ...well as imagining and leading new initiatives within...  ...datasets Operating data platforms: key-value stores, relational...  ...Engineer, HTML - AI Training (Freelance, Remote)... 
    Platform
    Training
    Full time
    Freelance
    Internship
    Local area
    Remote work
    Worldwide

    Canonical

    San Francisco, CA
    15 days ago
  • Staff Software Engineer, ML Infra & Distributed Systems About the Role: As a Staff...  ...machine learning inference platforms. These platforms power...  ...to explore new frameworks, lead critical cross-functional...  ...Understanding of ML model training pipelines and model internals... 
    Platform
    Training

    Tubi Tv

    San Francisco, CA
    1 day ago
  • $230k - $385k

     ...the constraints of physical systems to improve peoples' lives....  ...As a Software Engineer, Distributed Data Systems, you will design...  ...powers large-scale multimodal training and evaluation at OpenAI. You...  ...security. Ensure our data platform can scale by orders of magnitude... 
    Platform
    Training
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    2 days ago
  • $245k - $385k

     ...About the Team The Platform Runtime team builds the low level framework components to power our ML training systems.  We work on building robust, scalable, high performance components to support our distributed training workloads.  Our priorities are to maximize the... 
    Platform
    Training
    Work at office
    Local area
    Relocation package

    OpenAI

    San Francisco, CA
    more than 2 months ago
  •  ...models—from multimodal training data pipelines to...  ..., and scalable platform that enables our...  ...ingestion/processing, distributed model training,...  ...of our distributed systems. We are looking...  ...with core ML frameworks such as...  .... Demonstrated Tech Lead experience, driving... 
    Platform
    Training
    Full time

    HeyGen

    San Francisco, CA
    4 days ago
  •  ...payments infrastructure platform that helps...  ...We are backed by leading investors and processing...  ...intelligent systems that optimize...  ...level AI Platform Tech Lead to own the full...  ...products - from ML model training through...  ...continuously - drift, distribution shifts, retraining... 
    Platform
    Training
    Local area
    Shift work

    DEUNA

    San Francisco, CA
    8 days ago
  •  ...Experience Team (MLX Tech)** is committed to...  ...**implementing AI/ML across Capital One*...  ...achieve this by building platforms that enable the...  ...learning and AI.* Lead a portfolio of...  ...deep experience in distributed microservices, and full stack systems to create solutions... 
    Platform
    Full time
    Part time
    Internship

    Capital One

    San Francisco, CA
    5 days ago
  • $229.9k - $262.4k

    Senior Lead Software Engineer, Distributed Systems (Golang + Python on Kubernetes) Do you love building...  ...Experience Team (MLX Tech) is committed to...  ...responsibly implementing AI/ML across Capital One. We achieve this by building platforms that enable the rapid and... 
    Platform
    Full time
    Part time
    Internship
    Local area

    Capital One National Association

    San Francisco, CA
    4 days ago
  • $229.9k - $262.4k

    Senior Lead Software Engineer, Distributed Systems (Golang + Python on Kubernetes) Do you love building...  ...Experience Team (MLX Tech) is committed to...  ...responsibly implementing AI/ML across Capital One . We achieve this by building platforms that enable the rapid and... 
    Platform
    Full time
    Part time
    Internship
    Local area

    Information Technology Senior Management Forum

    San Francisco, CA
    2 days ago
  • $166k - $225k

     ...s best data and AI infrastructure platform so our customers can use deep data...  ...will be building the next generation distributed data storage and processing systems that can outperform specialized...  ...experience, relevant certifications and training, and specific work location. Based... 
    Platform
    Training
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    4 days ago
  • $255k - $405k

     ...About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the...  ...infrastructure that powers large‑scale multimodal training and evaluation at OpenAI. You’ll...  ..., and security. Ensure our data platform can scale by orders of magnitude while... 
    Platform
    Training
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    5 days ago
  • $190k - $250k

     ...Software Engineer / Tech Lead, ML Infrastructure Heartflow...  ...gives you the platform to lead technically....  ...across the stack: data systems focused on curation and...  ...environment for both training and inference. We...  ...maintain large-scale distributed computing platforms for... 
    Platform
    Training
    Full time
    Work at office
    Local area
    Worldwide
    Relocation

    HeartFlow

    San Francisco, CA
    9 hours ago
  •  ...such as developing, training, deploying, and optimizing...  ...machine learning systems Experienced using ML accelerator...  ...business goals and platform hardware characteristics...  ...critical role, you will lead the development and...  ...inspector in highly distributed training/inference setups... 
    Platform
    Training

    Waymo

    San Francisco, CA
    5 days ago
  • $240k - $330k

     ...visionary Principle Level Tech Lead Manager to build...  ...Machine Learning (ML) Acceleration team...  ...ML model training. The ultimate goal...  ...technical expertise in ML systems and performance...  ..., large scale distributed training, data loader...  ..., ML Training platform, and product teams... 
    Platform
    Training

    Motional

    San Francisco, CA
    22 days ago
  • $148.5k - $260.1k

     ...ambition meets action. Tech meets trust. And...  ...career at the company leading workforce transformation...  ...CAC/PIV. Distributed Systems Software Engineer - GovCloud...  ...systems engineering platform that ships hundreds of...  ...promotion, benefits, training, assessment of job performance... 
    Platform
    Training
    Local area

    Salesforce.Com Inc

    San Francisco, CA
    1 day ago
  • $293.6k - $335.1k

    COMFORT SYSTEMS is seeking a Distinguished Software Engineer to join our innovative team in San Francisco, CA. You will lead technical contributions and mentor colleagues in a collaborative...  ...engineering, particularly in distributed systems and cloud technologies. This... 
    Platform

    COMFORT SYSTEMS

    San Francisco, CA
    5 days ago
  •  ...Join us and help build the platform engineers turn to to ship AI...  ...building the global operating system for distributed, heterogeneous AI hardware....  ...for foundational engineers to lead our GPU Networking efforts,...  ...) Exposure to a variety of ML startups, offering... 
    Platform
    Flexible hours

    Baseten

    San Francisco, CA
    5 days ago
  • $146.5k

     ...the team: The ML Data Engineering team...  ...worldwide. Our systems operate at massive...  ...data engineering, and distributed systems,...  ...truly global scale. Tech Stack: Our backend...  ...best practices. Lead the design, implementation...  ...education or training; and other business... 
    Training
    For contractors
    Local area
    Worldwide
    Home office
    Flexible hours

    Scribd

    San Francisco, CA
    5 days ago
  • $160k - $180k

     ...nearly everyone does on our platform: play video games. Over 90% of...  ...most critical services. Those systems are at the core of our text...  ...scale, reliable and performant distributed systems. Collaborate with...  ...experience, and relevant education or training. Please note that the... 
    Platform
    Training
    Full time
    Relocation
    Relocation package

    Discord

    San Francisco, CA
    4 days ago
  • $300k - $405k

     ...interpretable, and steerable AI systems. We want AI to be safe...  ...and reliably for training and serving frontier...  ...Work with our ML engineers to understand...  ...influence hardware and platform features for AI workloads...  ...schemes for large-scale distributed training Developing... 
    Platform
    Training
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    5 days ago
  • $225k - $275k

     ...Infrastructure Staff Tech Lead Manager, ML Data Services Boston,...  ...in machine learning systems, large-scale data processing...  ...the ML Data Service platform, ensuring it meets...  ...provision of diverse training data sources,...  ...-scale data systems, distributed systems, or ML infrastructure... 
    Platform
    Training
    Work at office
    Remote work
    2 days per week

    Motional AD Inc.

    San Francisco, CA
    5 days ago
  • $146.5k - $228k

     ...About the team: The ML Data Engineering...  ...users worldwide. Our systems operate at massive...  ...data engineering, and distributed systems,...  ...truly global scale. Tech Stack: Our backend...  ...coding best practices. Lead the design, implementation...  ...education or training; and other business... 
    Training
    Temporary work
    Local area
    Worldwide
    Home office
    Flexible hours

    Scribd

    San Francisco, CA
    1 day ago
  •  ...AI Systems Engineer - Codex Core Agents...  ...of how models are trained and evaluated, making...  ...level systems and ML workflows, able to...  ...production systems in distributed systems,...  ...virtualization, cloud platforms, or ML systems....  ...ownership, and can lead scoped or multi-team... 
    Platform
    Training

    OpenAI

    San Francisco, CA
    4 days ago
  •  ...Learning Architect to define the ML strategy and build scalable systems. The role involves architecting end-to-end ML systems, leading technical roadmaps, and mentoring...  ...leadership is essential. Experience in ML platforms and distributed training is highly valued. Join a forward-... 
    Platform
    Training

    Sierracorp

    San Francisco, CA
    1 day ago
  • $248.4k - $310.5k

     ...Robotics & Autonomous Systems Scale's Robotics...  ...collection, model training pipelines, and...  ...parts of our robotics platform, work directly...  ...vehicle datasets Build ML training and fine-...  ...Understanding of distributed systems, workflow...  ...power the world's leading models, and help... 
    Platform
    Training
    Full time

    Scale AI

    San Francisco, CA
    4 days ago
  • $147k - $211k

    Software Engineer, Agentic AI Systems, Cloud Security Google San...  ...Agentic development etc) or ML platform/infrastructure (e.g., model...  ...systems. Experience in available distributed systems, cloud services or...  ..., and relevant education or training. Your recruiter can share... 
    Platform
    Training
    Full time
    Worldwide

    Google Inc.

    San Francisco, CA
    5 days ago
  • $44k - $185k

     ...of Cisco's AI-driven platforms and data infrastructure...  ...data and intelligent systems. Explore the opportunities...  .... Familiarity with distributed data processing...  ...on experience with AI/ML. Familiarity with major...  ...certifications, and/or training. The full salary range... 
    Platform
    Training
    Full time
    Temporary work
    Apprenticeship
    Internship
    Local area
    Flexible hours

    Webex Events (formerly Socio)

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Tech Lead for Distributed ML Systems & Training Platform. Be the first to apply!