Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

ML Systems Engineer: Distributed LLM Training & Inference

$200.8k - $251k

Scale AI

A leading AI technology company in San Francisco seeks a team member to build and optimize a machine learning framework for large language models. Candidates should have system optimization experience and solid software engineering skills, particularly in tools like CUDA and Pytorch. This full-time position offers a competitive salary range of $200,800 - $251,000, along with comprehensive benefits. #J-18808-Ljbffr Scale AI

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the ML Systems Engineer: Distributed LLM Training & Inference in New York, NY vacancy
  • $110 per hour

     ...Responsibilities Guide research and engineering teams to close knowledge...  ...in MLOps , training infrastructure, and ML framework-level topics ....  ...to MLOps and ML systems problems . Evaluate...  ...training pipeline design, distributed systems reasoning, and kernel... 
    Training
    Remote job
    Contract work
    Summer work
    Weekday work

    Mercor

    New York, NY
    2 days ago
  •  ...to build and optimize their training and inference framework as part of their ML platform team. In this role...  ...a strong background in system optimization and experience with distributed ML systems, along with excellent software engineering skills. Scale offers competitive... 
    Training

    Scale AI, Inc.

    New York, NY
    2 days ago
  • $350k

     ...and steerable AI systems. We want AI to...  ...committed researchers, engineers, policy experts,...  ...Anthropic's inference fleet serves...  ...model servers, distributed routing, autoscaling...  ...Experience with ML systems: especially training or inference...  ...infrastructure or general LLM serving stacks.... 
    Training
    Visa sponsorship

    Menlo Ventures

    New York, NY
    1 day ago
  •  ...Learning / Software Engineer Dyania Health...  ...AI, an end-to-end system that combines a medically post-trained LLM with a physician-driven...  .... As a senior ML engineer at Dyania...  ..., deployment, and inference at scale....  ...GPU and multi-node distributed training and inference... 
    Training
    Internship
    Local area
    Remote work
    Flexible hours
    Shift work

    HealthX Ventures

    Jersey City, NJ
    2 days ago
  •  ...Generating high-quality training data for national LLMs Upskilling...  ...supporting end-to-end system reliability, real-time inference observability, sovereign...  ...: Partner with our Engineering and ML teams to ensure the...  ...agentic development, and LLM observability tools. Ownership... 
    Training

    AI Chopping Block, Inc.

    New York, NY
    3 days ago
  • $264.8k - $331k

     ...state of the art post-training algorithms to reach...  ...The Enterprise ML Research Lab works...  ...an ML Sys Research Engineer, you'll work on building...  ...to optimize our ML system. Your customer will...  ...our training and inference framework. Post-...  ...least 1-3 years of LLM training in a... 
    Training
    Full time

    Scale AI

    New York, NY
    6 days ago
  •  ...first and founding ML Operations Engineer at Tennr, you’ll play...  ...Machine Learning and AI systems. You’ll own building machine learning training and inference pipelines that can...  ...evaluation of ML & LLM systems. Candidate...  ...alerting Experience with distributed systems, reliability... 
    Training
    Work at office

    Tennr

    New York, NY
    4 days ago
  • $250k - $350k

     ...state of the art post‑training algorithms to reach...  .... The Enterprise ML Research Lab works...  ...an ML Sys Research Engineer, you’ll work on...  ...to optimize our ML system. Your customer will...  ...optimize our training and inference framework. Post‑...  ...least 1‑3 years of LLM training in a... 
    Training
    Full time

    Scale AI, Inc.

    New York, NY
    2 days ago
  •  ...the Role Mirage is seeking an ML Engineer to push the boundaries of...  ...building and extending agentic systems that understand and operate over...  ...Develop novel approaches for training and adapting the large...  ...understanding of transformers and modern LLM techniques Experience with... 
    Training
    Full time
    Local area
    Night shift

    AI Chopping Block

    New York, NY
    3 days ago
  • $200.2k - $357.5k

     ...Infrastructure Engineer to lead the design...  ...our end-to-end ML platform...  ..., and scale ML systems that improve real...  ...platform spanning training,...  ...batch and online inference, and edge deployment...  ...EcoDriving insights, LLM-based reporting...  ...experience with distributed computing frameworks... 
    Training
    Full time
    Work at office
    Remote work
    Flexible hours

    Samsara

    New York, NY
    2 days ago
  • A nonprofit AI research organization in New York City seeks a full-time ML Systems Engineer. This role involves managing distributed training infrastructure, debugging complex issues, and optimizing cloud resources to enhance operational efficiency. Ideal candidates will... 
    Training
    Full time

    Basis Research Institute

    New York, NY
    2 days ago
  • $227.2k - $417k

    Software Engineer, ML Infra & Distributed Systems (Staff & Principal) About the Role: As a...  ...world‑class machine learning inference platforms. These...  ...that support Deep Learning, LLM, and Search models. This...  ...Feast), ElastiCache, model training orchestration, etc. Understanding... 
    Training
    Full time
    Temporary work
    Local area
    Flexible hours

    Tubi Tv

    New York, NY
    3 days ago
  •  ...organization that puts human values first. About the Role ML Systems Engineers at Basis ensure training and evaluation infrastructure is fast, reliable, and scalable. You will own the full stack from distributed training frameworks through cloud administration, making... 
    Training
    Full time
    Work at office

    Basis Research Institute

    New York, NY
    2 days ago
  •  ...best AI detection systems. We publish...  ...Machine Learning Engineers at all levels...  ...generation, to training models, to deployment...  .... At Pangram, ML engineers are...  ...models Manage distributed infrastructure for multi-GPU LLM training...  ...optimizing training and inference code Deploy... 
    Training
    Work at office

    Pangram Labs

    New York, NY
    5 hours ago
  •  ...world of intelligent systems. Location : New York,...  ...deploy production‑grade ML systems with end‑to‑...  ...drive innovation in LLM and audio ML...  ...preprocessing, model training, deployment, inference, and monitoring in production...  ...professional experience in ML engineering. Strong programming... 
    Training
    Full time

    Catalyst Labs, LLC

    New York, NY
    3 days ago
  • $85 per hour

     ...Larry Summers , and Jack Dorsey . Position: ML Engineer (Coding Agent Experience) Type: Contract Compensation...  ...model-generated implementations involving model training , inference systems , MLOps , and LLM applications . Identify bugs, edge cases,... 
    Training
    Remote job
    Contract work
    Summer work

    Mercor

    New York, NY
    9 days ago
  • $200k

    AI/ML & Algorithm Engineer Full Time Dedicated United States,...  ...of multi-agent AI systems, novel scoring and...  ...alignment. Build LLM‑powered workflows:...  ...context assembly, and inference across disparate operational...  ...that enable model training across distributed, domain‑owned data... 
    Training
    Full time
    Temporary work
    Worldwide

    Salute Mission Critical LLC.

    New York, NY
    5 days ago
  • $112.7k - $169.1k

     ...the machine learning systems that decide which...  ...world's leading game engine. Recommendation and...  ...using causal inference, A/B testing, and offline...  ...full pipeline from training data to deployed...  ...reinforcement learning, LLM post-training or...  ...-scale data and ML systems, whether through... 
    Training
    Internship
    Work at office
    Worldwide
    Relocation package
    Shift work

    Jobr

    New York, NY
    3 days ago
  •  ...Senior Machine Learning Engineer , you will take a...  ...optimising end‑to‑end ML systems that operate at massive...  ...and own large-scale, distributed machine learning systems for training, deployment, inference, and monitoring....  ...Augmented Generation), and LLM application... 
    Training

    Affine Analytics

    New York, NY
    4 days ago
  • $152k - $228k

     ...Job Description Senior ML Engineer About Invoca Invoca...  ...lifecycle at Invoca, from model training and fine-tuning through inference optimization and...  ...Design and Optimize SLM/LLM Deployment: Own the full...  ...foundations to keep the systems powering our models reliable... 
    Training
    Currently hiring
    Remote work
    Flexible hours

    Invoca

    New York, NY
    11 days ago
  • $148.5k - $223.9k

     ...duplicating efforts. Job Category Software Engineering Job Details About Salesforce...  ...the design and delivery of large-scale, distributed systems within Salesforce’s Security ecosystem...  ...assignment, compensation, promotion, benefits, training, assessment of job performance,... 
    Training
    Full time

    Salesforce

    New York, NY
    1 day ago
  • $135k - $170k

     ...and ongoing skills training opportunities Employee...  ...The Senior MLOps Engineer treats ML systems as software systems...  ...scoring and real‑time inference. The Senior Engineer...  ...engineering‑first mindset — distributed systems,...  ...Required) Integrate LLM APIs (Bedrock, Anthropic... 
    Training
    For contractors
    H1b
    Visa sponsorship
    Flexible hours

    SwiftCruit

    New York, NY
    1 day ago
  • $250k - $350k

     ...Applied ML Systems Engineer  - Finance - NEW YORK - UNITED STATES...  ...be deep in GPU kernels trying to shave training time. Other weeks you'll be whiteboarding...  ...Systems," "Training Infrastructure," "Distributed Training," "GPU Optimization," "Model... 
    Training
    Permanent employment
    Full time
    Work experience placement
    Internship
    Immediate start
    Remote work
    Relocation
    Relocation package
    New York, NY
    28 days ago
  •  ...emergency response systems and remote patient...  ...a principal-level engineer: shaping unclear...  ...engineering, model training, model selection,...  ...Develop practical ML models that balance...  ...or near-real-time inference, model versioning,...  ...solutions, including LLM-powered workflows,... 
    Training
    Temporary work
    Remote work

    Medical Guardian

    New York, NY
    1 day ago
  •  ...Senior Systems Engineer Sales New Jersey Full-time ID: VDT27560 Description VAST Data is...  ...for real-time data analysis and AI training and inference. Designed from the ground up to make...  ...matter expertise on storage products, distributed storage architectures, file systems,... 
    Training
    Full time
    Traineeship

    Drive Capital

    New York, NY
    4 days ago
  • $180k - $250k

     ...looking for exceptional AI/ML Engineers to help shape and build it....  ...some of the world’s biggest distributed systems. A core part of this effort...  ...parts of the system – from LLM‑powered classification and detection...  .... Qualifications Experience training, deploying, and monitoring... 
    Training

    Artemis Llc

    New York, NY
    3 days ago
  • $160k - $200k

     ...everywhere in healthcare. Our LLM‑powered platform is solving...  ...use cases. For health systems, our first product dramatically...  ...We’re hiring an exceptional ML Engineer to join our team (Boston or...  ...‑to‑end ML systems (design, training, inference, deployment, and monitoring;... 
    Training
    Work at office

    Verana Health

    New York, NY
    4 days ago
  •  ...AI/ML Engineer 1 year + 4 days onsite...  ...scalable data systems and good communication...  ...downstream NLP models. LLM Integration • Incorporate...  ...tools and scalable inference strategies. • Prior...  ...collaborating on model training and evaluation, aligning... 
    Training
    Local area

    RIT Solutions, Inc.

    New York, NY
    12 hours ago
  • $150k - $198k

     ...Machine Learning Engineer on the Trust &...  ...detection systems that have a direct...  ...end lifecycle of ML projects,...  ...collection to training, deployment, monitoring...  ...to scalable inference and data pipelines...  ...of distributed computing for inference...  ...for an agent or LLM vs. a classical... 
    Training
    Work experience placement

    Match Group

    New York, NY
    2 days ago
  • $135k - $170k

     ...and ongoing skills training opportunities Employee...  ...The Senior MLOps Engineer treats ML systems as software systems...  ...scoring and real‑time inference. They build the platform...  ...) - Integrate LLM APIs into production...  ...Kubernetes or ECS, CI/CD, distributed systems debugging,... 
    Training
    For contractors
    H1b
    Flexible hours

    BetMGM LLC

    New York, NY
    12 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Systems Engineer: Distributed LLM Training & Inference. Be the first to apply!