ML Systems Engineer: Distributed LLM Training & Inference
$200.8k - $251kScale AI
A leading AI technology company in San Francisco seeks a team member to build and optimize a machine learning framework for large language models. Candidates should have system optimization experience and solid software engineering skills, particularly in tools like CUDA and Pytorch. This full-time position offers a competitive salary range of $200,800 - $251,000, along with comprehensive benefits. #J-18808-Ljbffr
- ...reliable, field-ready AI systems that solve the... ...rigorous engineering with learning systems... ...are seeking a Staff ML Systems Engineer to... ...architect and build the distributed infrastructure... ...processing, model training, evaluation, and... ...learning training and inference systems....TrainingLocal area
- ...technology company in Seattle is seeking a Machine Learning Engineer for Model Serving Infrastructure. The ideal candidate will... ...programming skills in C/C++/CUDA. You will design and implement distributed inference infrastructure and collaborate with product teams. This...Suggested
$204k - $259k
...Machine Learning Engineer – VLM/LLM Evaluation Waymo... ..., Bayesian inference, hierarchical learning... ...Waymo's systems, both onboard autonomous... ...life-cycle from pre-training and supervised... ...Experience in ML engineering and applied... ...with large scale distributed system...TrainingFull timeTemporary workRemote work$189.6k - $237k
...Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been... ...and evaluation of LLM's, as well as evaluation... ...optimize our ML system Ideally you... ...Strong software engineering skills,...TrainingFull time- ...seeking a Senior or Staff Software Engineer for the ML Infrastructure team. The role... ...designing and operating systems for large-scale model training and inference, focusing on reliability and performance... ...extensive experience with distributed systems, Kubernetes, and...Training
- ...Performance Engineer, Inference Systems San Francisco, CA | New York City,... ...kernels, model servers, distributed routing, autoscaling, capacity... ...Experience with ML systems, especially training or inference infrastructure or general LLM serving stacks. Direct large...TrainingWork at officeVisa sponsorshipFlexible hours
- ...A leading software company in Seattle is seeking a Senior Machine Learning Systems & Efficiency Engineer to enhance inference performance and cost efficiency in image editing applications. The role requires expertise in AI, machine learning systems, and performance optimization...
$164k - $313.3k
...Senior Machine Learning (ML) Systems & Efficiency Engineer to join our R&D team focused... ...-ready improvements in inference performance, latency, and... ...with experience in distributed inference, multimodal model... ...communication efficiency. Explore training or fine-tuning approaches...TrainingTemporary workLocal areaWorldwide- ...interpretable, and steerable AI systems. We want AI to be... ...researchers, engineers, policy experts, and... ...-edge systems that train AI models like... ...steerable AI. As an ML Systems Engineer on... ...performance, large scale distributed systems Large scale LLM training Python Implementing...TrainingWork at officeVisa sponsorshipFlexible hours
$320k - $405k
...Machine Learning Systems Engineer, Research Tools About Anthropic... ...more efficient and effective training of our AI systems while ensuring... ...systems, data pipelines, or ML infrastructure Are proficient... ...scientific progress Distributed systems and parallel computing...TrainingFull timeWork at officeVisa sponsorshipFlexible hours- ...Annapurna Labs (U.S.) Inc. in Seattle is seeking a Senior Software Engineer to join the Machine Learning Inference Applications team. This role involves adapting cutting-edge research in LLM optimization to enhance performance on Neuron devices. The position requires extensive...
$233.4k - $339.65k
...highly skilled and experienced Principal ML Systems Engineer to join our Autonomous Vehicles team.... ...Design & develop the next generation distributed ML data platform (Ingestion,... ...model lifecycle (feature engineering, training, validation, deployment, monitoring, etc...TrainingH1bLocal areaWork from homeRelocation packageFlexible hours$204.8k - $296.6k
...a Senior Machine Learning Systems & Efficiency Engineer in Seattle. This critical... ...will focus on optimizing ML systems for efficiency and... ...relevant field and experience in distributed systems and performance... ...designing high-throughput inference systems, conducting...- ...Principal AI Agent / ML Software Engineer The Principal... ...-generation AI systems on Oracle Cloud... ...workflows, scalable inference infrastructure,... ...candidate combines deep distributed systems... ...understanding of LLM application patterns... ...GPU inference or training workloads for latency...Training
$170k - $240k
...the Product and Engineering team at... ...MLE) on the AI & ML (Insights) team... ...architecture, training, deployment, and... ...scalable data systems. You will be expected... ...models that can infer meaning and... ...grade GenAI or LLM‑based systems with... ...pipelines and distributed systems using technologies...TrainingWork at officeRemote workVisa sponsorship- ...Machine Learning Engineer (Senior) About AZX... ...specialize in physics-informed ML and enterprise AI... ...climate risk, energy systems, and global economics.... ...~ Generative AI and LLM-related capabilities (... ...Large-scale data and distributed training paradigms (e.g., Spark...TrainingFull timeRemote workWork visaFlexible hoursShift work
$148.2k - $300.96k
...Advanced machine learning systems to detect and prevent... ...- Design prompt engineering and reasoning workflows... ...indicators, and real-time LLM-based decisions. - Knowledge... ...part of a cutting-edge ML + LLM team shaping the... ...with LLM post-training applications , especially...TrainingTemporary workLocal areaWorldwide$200k - $250k
...Machine Learning Engineering within the Advanced... ...our foundational systems that power our... ...annotation pipelines, ML Infrastructure... ...workflows (LLM-in-the-loop) to reduce... ...lifecycle, including distributed training infrastructure,... ...and low-latency inference services. Ensure...TrainingTemporary workWork at officeLocal area$171.6k - $302.2k
...Machine Learning Engineer, AI, SIML Seattle... ...The Intelligence System Experience (ISE)... ...Efficient and Scalable ML Infrastructure,... ..., scalable training and inference for Apple's AI-driven... ...understanding of LLM architectures and... ...such as PyTorch Distributed (torch.distributed...TrainingRelocationFlexible hours- ...Capital is seeking Software Engineers to join the ML Infrastructure team. In... ...you will design and operate systems to support large scale machine learning model training and inference. Candidates need... ...experience in backend systems and distributed technologies like Kubernetes...Training
$171.6k - $302.2k
Senior ML Infrastructure Engineer - Training Algorithms, SIML Seattle, Washington, United... ...? We are the Intelligence System Experience (ISE) team within... ...in training / adapting LLM and Diffusion models Advanced... ...projects Experience with distributed training of large models...TrainingRelocation- ...seeking a Senior or Staff Software Engineer to join their ML Infrastructure team. You will... ...and operating core systems for large scale model training and inference in a fast-paced environment.... ...This role requires expertise in distributed systems and Kubernetes, with...Training
- ...Systems Engineer About Us We are building next-generation... ...-built for large-scale AI training and inference. As a startup, we operate... ...system performance across distributed environments Troubleshoot... ...Experience in AI/ML infrastructure or HPC environments...TrainingWork at office
$171.6k - $302.2k
...Description As a Senior/Staff Engineer on the Foundation... ...and orchestration systems for large-scale TPU... ...clusters. You will work on distributed systems that manage... ...of large-scale training and inference jobs. This role spans... ...systems for distributed ML workloads running on...TrainingRelocation- ...VAST Data is looking for a Senior Systems Engineer to join our growing team! This is... ...for real-time data analysis and AI training and inference. Designed from the ground up to make... ...matter expertise on storage products, distributed storage architectures, file systems,...TrainingTraineeship
$176.76k - $232k
...company for yoga, running, training, and other athletic... ...As a Senior AI/ML Engineer, you will lead the delivery... ...tuning to architectures and system design for serving AI/ML inference solutions in production.... ...architecture and engineering of LLM and GenAI systems including...TrainingPermanent employmentContract workPart timeWork visa- ...Anomaly Detection, and LLM fine-tuning —... ...As one of our AI ML Engineer’s, you'll be a... ...performance multi-agent systems that perceive,... ...Build real-time inference pipelines for... ...architecting large-scale distributed systems on cloud... ..., Paternity) Training & Development...TrainingShift work
- ...Machine Learning Engineer As a Machine Learning... ...of intelligent systems. You will bridge... ...high-performance distributed systems to support... ...large-scale model inference and data processing... ...Implement robust ML pipelines, focusing... ...data ingestion and training to production...Training
$184.5k
...Senior Machine Learning Engineer to join our high-performing... ...batch and real-time ML systems that power pricing, inventory... ...of machine learning, distributed systems, and MLOps,... ...feature pipelines, model training and validation, scalable inference, monitoring, drift detection...TrainingLocal areaFlexible hours- ...requires production-grade AI/ML systems that meet federal data... ...preprocessing, feature engineering, model selection, training, evaluation, and validation... ...Large Language Model (LLM) capabilities, including... ...backend services for model inference, batch processing, and real...TrainingFor contractors1 day per week
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to ML Systems Engineer: Distributed LLM Training & Inference. Be the first to apply!
- machine learning software engineer Seattle, WA
- ai ml engineer Seattle, WA
- graduate machine learning engineer Seattle, WA
- computer vision machine learning engineer Seattle, WA
- machine learning engineer Seattle, WA
- senior ml engineer Seattle, WA
- machine learning ai engineer Seattle, WA
- data scientist machine learning engineer Seattle, WA
- healthcare systems engineer Seattle, WA
- application system engineer Seattle, WA


