Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Engineer, Distributed ML Training Systems

Radical Numerics

An innovative AI research lab in the San Francisco Bay Area is seeking a Member of Technical Staff specializing in Infrastructure and Training Systems. This role involves designing and optimizing distributed training infrastructure for large-scale biological models while collaborating closely with researchers and engineers. Ideal candidates will have a strong background in distributed systems, proficiency in Python and PyTorch, and excellent communication skills. The organization offers competitive compensation and a collaborative culture focused on impactful research. #J-18808-Ljbffr Radical Numerics

Vacancy posted 22 hours ago
Similar jobs that could be interesting for youBased on the Staff Engineer, Distributed ML Training Systems in San Francisco, CA vacancy
  • Genesis AI in San Francisco is looking for an experienced professional to optimize and build distributed training systems using PyTorch. The ideal candidate has over 8 years of experience in distributed systems, high-performance computing, and extensive expertise in Python... 
    Training

    Genesis AI

    San Francisco, CA
    2 days ago
  •  ...services that power our research, training, and production environments. These systems form the foundational platform...  ...environments, multi-tenant isolation. Distributed Systems Architecture: Sharding,...  ..., service reliability engineering. Ideal candidates have:... 
    Training
    Relocation package

    Reflection AI

    San Francisco, CA
    4 days ago
  •  ...leading streaming service is seeking a Staff Software Engineer to enhance ML infrastructure. The role involves designing scalable systems, mentoring engineers, and collaborating...  ...over 8 years of experience in building distributed systems, strong skills in AWS, and knowledge... 
    Suggested

    Tubi Tv

    San Francisco, CA
    22 hours ago
  • $251k - $310k

     ...environments safely and efficiently. The system architecture team handles the onboard...  ...challenging real-world problems with ML and engineering solutions. Use state of the art...  ...exact work location, experience, relevant training and education, and skill level. Your recruiter... 
    Training
    Full time
    Contract work
    Internship
    Remote work

    Waymo

    San Francisco, CA
    15 hours ago
  • A leading technology firm in San Francisco is seeking a candidate to build and scale distributed training systems for large model pre-training. You will collaborate with research teams to design and operate training runs and enhance performance across distributed training... 
    Training

    Reflection

    San Francisco, CA
    22 hours ago
  •  ...Difference You Will Make: As a staff software engineer, you will lead two areas...  ...collaborate with different AI & ML engineering teams, cross-...  ...science teams to develop backend systems and enhance AI prompt...  ...upon many factors, such as: training, transferable skills, work experience... 
    Training
    Work experience placement
    Flexible hours

    airbnb, Inc.

    San Francisco, CA
    3 days ago
  • $140k - $225k

    Member of Technical Staff — SketchPro.ai Location: San Francisco...  .... What You'll Own Agent engineering across context design,...  ...and drafting generation systems for documentation and...  ...Role Is NOT For Traditional ML researchers focused on model training only Pure computer vision... 
    Training
    Full time
    H1b
    Work at office
    Visa sponsorship

    David Joseph & Company

    San Francisco, CA
    22 hours ago
  • $141k - $228.08k

     ...seeking a Machine Learning Engineer to join our pioneering...  ...intelligent defense systems. You will be...  ...experience building, training, and deploying machine...  ...track record of taking ML projects from initial...  ...model performance and distributed software systems. ~... 
    Training
    Full time
    Work at office

    Palo Alto Networks

    San Francisco, CA
    4 days ago
  • $141k - $249k

     ...learn more visit: You will... - Collaborate closely with autonomy and algorithm engineers to scale safe self-driving systems using an AI-first approach. - Build distributed training frameworks for research and production, drive our training towards new levels of... 
    Training
    Work at office
    Work from home
    Flexible hours

    Waabi

    San Francisco, CA
    2 days ago
  • $188.24k - $235.3k

     ...L4, Machine Learning Engineer, Trust Intelligence Platform...  ...cloud-native data and ML infrastructure that...  ...Build reproducible ML training, evaluation, and...  ...operating data or ML systems in production. ~ Proficient...  ...of data modeling, distributed computing concepts, and... 
    Training
    Local area
    Remote work
    Worldwide

    Twilio

    San Francisco, CA
    4 days ago
  •  ...collaborate with intelligent systems in one of the largest...  ...valuable kind of AI engineering there is. What You'll...  ...a strong backend or distributed systems foundation....  ...AI systems. Classical ML experience — supervised...  ...engineering, model training and evaluation outside... 
    Training
    Live in
    Remote work

    Zuma

    San Francisco, CA
    3 days ago
  • $141.7k - $250.8k

     ...Francisco As a Sr. Staff Technical Solutions Engineer and tech subject matter...  ...advanced Spark/ML/AI runtime capabilities...  ..., and AI workflows. Train customer engineering...  ...and troubleshooting distributed computing applications...  ..., and alerting systems. Customer-Facing Experience... 
    Training
    Work at office
    Local area
    Worldwide
    Night shift

    Databricks Inc.

    San Francisco, CA
    4 days ago
  • A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal... 
    Training
    Remote work

    Pluralis Research

    San Francisco, CA
    2 days ago
  • $293k - $385k

     ...science, research, and engineering within OpenAI's B2...  ...on building the systems that help OpenAI...  ..., and applied ML/DS-adjacent problems...  ...model evaluations, training datasets, and product...  ..., including distributed systems, API services...  ...Member of Technical Staff . We use Staff /... 
    Training
    Shift work

    OpenAI

    San Francisco, CA
    4 days ago
  •  ...and motivated Senior Staff Data Engineer to be the technical leader...  ...analytics, AI/ML and real-time data needs...  ...velocity and system reliability across SDP...  ...Contribute to hiring and training efforts to build a skilled...  ..., or GCP) and distributed processing frameworks... 
    Training
    Remote work

    SoFi

    San Francisco, CA
    1 day ago
  • $209.7k - $283.8k

     ...San Francisco, CA, USA Staff Machine Learning Engineer, ML Infrastructure...  ...across the company. Our systems operate at scale across batch...  ...supports large-scale model training, feature generation, and...  ...and enabling efficient, distributed model training at scale.... 
    Training
    Work at office
    Worldwide
    Relocation package

    Unity Technologies

    San Francisco, CA
    7 days ago
  • $208k - $282k

     ...Staff Data Engineer At Komodo Health, our mission is to reduce...  ...the U.S. healthcare system — by combining de-...  ...technical depth across SQL, distributed data processing,...  ...needs. AI & ML Enablement: Experience...  ...that supports AI/ML training, inference, experimentation... 
    Training
    Work experience placement
    Local area
    Flexible hours

    Komodo Health

    San Francisco, CA
    1 day ago
  • $232k

     ...converting ideas to scalable systems. What the Candidate Will...  ..., and fault-tolerant distributed machine learning libraries...  ...Uber. Work closely with engineers in the broader Uber ML/AI Platform Team (Michelangelo...  ...the industry Large-scale training using data structures and... 
    Training
    Full time

    Uber

    San Francisco, CA
    22 hours ago
  • An AI and Robotics firm in San Francisco seeks a Staff/Principal ML Systems Engineer to enhance training performance for multimodal robotic data. You will lead...  ...candidates will have significant experience in distributed training, a strong background in PyTorch, and the... 
    Training

    Maxwell Bond

    San Francisco, CA
    4 days ago
  • $215k - $323k

     ...accessibility to the legal system. Tackling the most...  ...why we’re seeking a Staff Machine Learning Engineer eager to join EvenUp...  ...in physics, ML, neuroscience, and more...  ...ensure high‑quality training and evaluation datasets...  ...and aggregation of distributed facts. Develop... 
    Training
    Full time
    Temporary work
    Work at office
    Local area
    Home office
    Flexible hours
    3 days per week

    B Capital

    San Francisco, CA
    3 days ago
  •  ...looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation models. You will design distributed training systems and optimize GPU utilization...  ...over 5 years of experience in ML infrastructure and a strong... 
    Training

    Baseten

    San Francisco, CA
    2 days ago
  •  ...Proficiency in Python and standard ML frameworks (e.g., JAX,...  ...Experience with large-scale distributed training and data processing , Proven...  ...for evaluating complex AI systems , (Desirable) Track record of...  ...for researchers and software engineers who are passionate about... 
    Training

    Waymo

    San Francisco, CA
    22 hours ago
  • $150k - $350k

    Gimlet Labs, Inc. is seeking a Member of Technical Staff to focus on distributed systems in San Francisco, California. This role involves designing...  ...APIs. The ideal candidate should have strong software engineering fundamentals and experience with distributed systems. The... 

    Gimlet Labs, Inc.

    San Francisco, CA
    22 hours ago
  •  ...company in San Francisco is seeking a Member of Technical Staff to design and build distributed systems for AI workloads. The role involves developing...  ...grade APIs. Ideal candidates should have strong software engineering skills and experience with distributed systems. This... 

    Gimlet Labs

    San Francisco, CA
    22 hours ago
  • $170.4 per hour

     ...business. Founded by engineers, we leap at every opportunity...  ...failures and complex system anomalies. Root Cause...  ...Deep understanding of distributed system internals....  ...Strong grasp of model training, evaluation, and deployment...  ...managing the ML lifecycle, including governance... 
    Training
    Local area
    Worldwide

    Databricks Inc.

    San Francisco, CA
    4 days ago
  • A tech-first company is seeking a Member of Technical Staff to focus on cutting-edge AI research and development. The role involves building and scaling training and inference infrastructure, designing ML kernels, and optimizing performance. Ideal candidates should have... 
    Training

    Mirendil

    San Francisco, CA
    2 days ago
  • $200.2k - $357.5k

     ...operations. We’re hiring a Staff / Senior Staff...  ...Infrastructure Engineer to lead the design...  ...of our end-to-end ML platform powering...  ..., and scale ML systems that improve real‑...  ...-end ML platform (training, experimentation,...  ...Strong experience with distributed computing... 
    Training
    Work at office
    Remote work
    Flexible hours

    Samsara

    San Francisco, CA
    13 hours ago
  • $224k - $336k

     ...possible in identity engineering. What You’ll Do As a Full Stack Staff Engineer on the...  ...reliable, scalable system. Responsibilities...  ...discriminators, adversarial training). Drive the...  .... Partner with ML, product, legal, and...  ..., React, and distributed data stores. Experience... 
    Training
    Work at office
    Local area
    Immediate start
    Remote work
    Work from home
    Relocation
    Shift work

    Stripe

    San Francisco, CA
    22 hours ago
  • Modal Labs is seeking strong engineers to train production machine learning models and contribute to open-source projects. Candidates should have experience with high-performance code and ML training optimization, working in our NYC or San Francisco offices. Ideal applicants... 
    Training

    Modal Labs

    San Francisco, CA
    22 hours ago
  • $250k - $334.53k

     ...Perception team builds the system which learns the...  ...of sensors, enabling engineers like you to (1) develop...  ...develop models and model training at scale, to (3)...  ..., build and implement ML data infra and validate...  ...Experience in designing distributed systems processing data... 
    Training
    Full time
    Remote work

    Waymo

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Engineer, Distributed ML Training Systems. Be the first to apply!