Staff Engineer, Distributed ML Training Systems
Radical Numerics
An innovative AI research lab in the San Francisco Bay Area is seeking a Member of Technical Staff specializing in Infrastructure and Training Systems. This role involves designing and optimizing distributed training infrastructure for large-scale biological models while collaborating closely with researchers and engineers. Ideal candidates will have a strong background in distributed systems, proficiency in Python and PyTorch, and excellent communication skills. The organization offers competitive compensation and a collaborative culture focused on impactful research. #J-18808-Ljbffr Radical Numerics
- Genesis AI in San Francisco is looking for an experienced professional to optimize and build distributed training systems using PyTorch. The ideal candidate has over 8 years of experience in distributed systems, high-performance computing, and extensive expertise in Python...Training
- ...services that power our research, training, and production environments. These systems form the foundational platform... ...environments, multi-tenant isolation. Distributed Systems Architecture: Sharding,... ..., service reliability engineering. Ideal candidates have:...TrainingRelocation package
- ...leading streaming service is seeking a Staff Software Engineer to enhance ML infrastructure. The role involves designing scalable systems, mentoring engineers, and collaborating... ...over 8 years of experience in building distributed systems, strong skills in AWS, and knowledge...Suggested
$251k - $310k
...environments safely and efficiently. The system architecture team handles the onboard... ...challenging real-world problems with ML and engineering solutions. Use state of the art... ...exact work location, experience, relevant training and education, and skill level. Your recruiter...TrainingFull timeContract workInternshipRemote work- A leading technology firm in San Francisco is seeking a candidate to build and scale distributed training systems for large model pre-training. You will collaborate with research teams to design and operate training runs and enhance performance across distributed training...Training
- ...Difference You Will Make: As a staff software engineer, you will lead two areas... ...collaborate with different AI & ML engineering teams, cross-... ...science teams to develop backend systems and enhance AI prompt... ...upon many factors, such as: training, transferable skills, work experience...TrainingWork experience placementFlexible hours
$140k - $225k
Member of Technical Staff — SketchPro.ai Location: San Francisco... .... What You'll Own Agent engineering across context design,... ...and drafting generation systems for documentation and... ...Role Is NOT For Traditional ML researchers focused on model training only Pure computer vision...TrainingFull timeH1bWork at officeVisa sponsorship$141k - $228.08k
...seeking a Machine Learning Engineer to join our pioneering... ...intelligent defense systems. You will be... ...experience building, training, and deploying machine... ...track record of taking ML projects from initial... ...model performance and distributed software systems. ~...TrainingFull timeWork at office$141k - $249k
...learn more visit: You will... - Collaborate closely with autonomy and algorithm engineers to scale safe self-driving systems using an AI-first approach. - Build distributed training frameworks for research and production, drive our training towards new levels of...TrainingWork at officeWork from homeFlexible hours$188.24k - $235.3k
...L4, Machine Learning Engineer, Trust Intelligence Platform... ...cloud-native data and ML infrastructure that... ...Build reproducible ML training, evaluation, and... ...operating data or ML systems in production. ~ Proficient... ...of data modeling, distributed computing concepts, and...TrainingLocal areaRemote workWorldwide- ...collaborate with intelligent systems in one of the largest... ...valuable kind of AI engineering there is. What You'll... ...a strong backend or distributed systems foundation.... ...AI systems. Classical ML experience — supervised... ...engineering, model training and evaluation outside...TrainingLive inRemote work
$141.7k - $250.8k
...Francisco As a Sr. Staff Technical Solutions Engineer and tech subject matter... ...advanced Spark/ML/AI runtime capabilities... ..., and AI workflows. Train customer engineering... ...and troubleshooting distributed computing applications... ..., and alerting systems. Customer-Facing Experience...TrainingWork at officeLocal areaWorldwideNight shift- A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal...TrainingRemote work
$293k - $385k
...science, research, and engineering within OpenAI's B2... ...on building the systems that help OpenAI... ..., and applied ML/DS-adjacent problems... ...model evaluations, training datasets, and product... ..., including distributed systems, API services... ...Member of Technical Staff . We use Staff /...TrainingShift work- ...and motivated Senior Staff Data Engineer to be the technical leader... ...analytics, AI/ML and real-time data needs... ...velocity and system reliability across SDP... ...Contribute to hiring and training efforts to build a skilled... ..., or GCP) and distributed processing frameworks...TrainingRemote work
$209.7k - $283.8k
...San Francisco, CA, USA Staff Machine Learning Engineer, ML Infrastructure... ...across the company. Our systems operate at scale across batch... ...supports large-scale model training, feature generation, and... ...and enabling efficient, distributed model training at scale....TrainingWork at officeWorldwideRelocation package$208k - $282k
...Staff Data Engineer At Komodo Health, our mission is to reduce... ...the U.S. healthcare system — by combining de-... ...technical depth across SQL, distributed data processing,... ...needs. AI & ML Enablement: Experience... ...that supports AI/ML training, inference, experimentation...TrainingWork experience placementLocal areaFlexible hours$232k
...converting ideas to scalable systems. What the Candidate Will... ..., and fault-tolerant distributed machine learning libraries... ...Uber. Work closely with engineers in the broader Uber ML/AI Platform Team (Michelangelo... ...the industry Large-scale training using data structures and...TrainingFull time- An AI and Robotics firm in San Francisco seeks a Staff/Principal ML Systems Engineer to enhance training performance for multimodal robotic data. You will lead... ...candidates will have significant experience in distributed training, a strong background in PyTorch, and the...Training
$215k - $323k
...accessibility to the legal system. Tackling the most... ...why we’re seeking a Staff Machine Learning Engineer eager to join EvenUp... ...in physics, ML, neuroscience, and more... ...ensure high‑quality training and evaluation datasets... ...and aggregation of distributed facts. Develop...TrainingFull timeTemporary workWork at officeLocal areaHome officeFlexible hours3 days per week- ...looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation models. You will design distributed training systems and optimize GPU utilization... ...over 5 years of experience in ML infrastructure and a strong...Training
- ...Proficiency in Python and standard ML frameworks (e.g., JAX,... ...Experience with large-scale distributed training and data processing , Proven... ...for evaluating complex AI systems , (Desirable) Track record of... ...for researchers and software engineers who are passionate about...Training
$150k - $350k
Gimlet Labs, Inc. is seeking a Member of Technical Staff to focus on distributed systems in San Francisco, California. This role involves designing... ...APIs. The ideal candidate should have strong software engineering fundamentals and experience with distributed systems. The...- ...company in San Francisco is seeking a Member of Technical Staff to design and build distributed systems for AI workloads. The role involves developing... ...grade APIs. Ideal candidates should have strong software engineering skills and experience with distributed systems. This...
$170.4 per hour
...business. Founded by engineers, we leap at every opportunity... ...failures and complex system anomalies. Root Cause... ...Deep understanding of distributed system internals.... ...Strong grasp of model training, evaluation, and deployment... ...managing the ML lifecycle, including governance...TrainingLocal areaWorldwide- A tech-first company is seeking a Member of Technical Staff to focus on cutting-edge AI research and development. The role involves building and scaling training and inference infrastructure, designing ML kernels, and optimizing performance. Ideal candidates should have...Training
$200.2k - $357.5k
...operations. We’re hiring a Staff / Senior Staff... ...Infrastructure Engineer to lead the design... ...of our end-to-end ML platform powering... ..., and scale ML systems that improve real‑... ...-end ML platform (training, experimentation,... ...Strong experience with distributed computing...TrainingWork at officeRemote workFlexible hours$224k - $336k
...possible in identity engineering. What You’ll Do As a Full Stack Staff Engineer on the... ...reliable, scalable system. Responsibilities... ...discriminators, adversarial training). Drive the... .... Partner with ML, product, legal, and... ..., React, and distributed data stores. Experience...TrainingWork at officeLocal areaImmediate startRemote workWork from homeRelocationShift work- Modal Labs is seeking strong engineers to train production machine learning models and contribute to open-source projects. Candidates should have experience with high-performance code and ML training optimization, working in our NYC or San Francisco offices. Ideal applicants...Training
$250k - $334.53k
...Perception team builds the system which learns the... ...of sensors, enabling engineers like you to (1) develop... ...develop models and model training at scale, to (3)... ..., build and implement ML data infra and validate... ...Experience in designing distributed systems processing data...TrainingFull timeRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Engineer, Distributed ML Training Systems. Be the first to apply!
- assistant engineering manager San Francisco, CA
- assistant mechanical engineer San Francisco, CA
- staff data engineer San Francisco, CA
- staff design engineer San Francisco, CA
- engineering aide San Francisco, CA
- software engineer staff San Francisco, CA
- assistant chief engineer San Francisco, CA
- staff automation engineer San Francisco, CA
- project engineer assistant project manager San Francisco, CA
- technology administrator San Francisco, CA

