Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Research Member of Technical Staff- Training Systems

Rhoda AI

At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality. We're looking for a Staff / Principal ML Training Systems Engineer to own training systems performance end-to-end. You will define how our models train at scale — driving efficiency, scalability, and correctness across large-scale multimodal training. This is a core systems role, not infrastructure support. Your work directly determines how efficiently we use compute, how well models scale across thousands of GPUs, and how quickly research can iterate. What You'll Do Own training performance end-to-end Diagnose and improve performance of large-scale multimodal training (vision, video, proprioception, actions, language) Build systematic performance attribution: step-time decomposition (compute vs communication vs input pipeline), scaling curves across cluster sizes, and bottleneck identification and prioritization Drive measurable gains in: Distributed efficiency (comm/compute overlap, bucketization, topology-aware mapping, parallelism strategies) Compute efficiency (kernel hotspots, operator fusion, attention optimization, framework/runtime overhead) Memory efficiency (activation checkpointing, sequence packing/bucketing, fragmentation reduction) Design training systems (not just tune them) Define and evolve parallelism strategies: data / tensor / pipeline / sharding / hybrid approaches Improve execution efficiency through communication scheduling and overlap, graph capture and execution optimization, and runtime-level improvements Contribute to and extend training frameworks where needed Make performance observable and measurable Establish source-of-truth performance metrics: step-time breakdowns, MFU / throughput / scaling efficiency Build tools to identify bottlenecks quickly, track performance across model families, and compare scaling behavior across configurations Develop regression detection: microbenchmarks, performance baselines, and automated detection of efficiency regressions Partner deeply with researchers Work side-by-side with research scientists and research engineers — no silos Translate model innovations into scalable, efficient implementations Advise on training tradeoffs for robotics world models: long-horizon sequences, rollout/evaluation cadence, multimodal and variable-length data Collaborate on cluster-level efficiency Work with infrastructure/SRE teams to improve utilization across large distributed jobs, impact of network and collective performance on training, and topology-aware job placement and scaling behavior What We're Looking For Proven track record improving large-scale distributed training performance Deep hands-on experience with modern ML stacks (PyTorch required; JAX a plus) Strong understanding of data / tensor / pipeline parallelism, sharded training (FSDP / ZeRO-style), communication patterns and overlap strategies, and scaling behavior across large GPU clusters Strong systems intuition — ability to reason across compute, communication, and memory bottlenecks Exceptional debugging and measurement ability: turn “training is slow” into clear bottlenecks, experiments, and validated improvements High ownership mindset and comfort in a fast-moving environment Nice to Have (But Not Required) GPU kernel or compiler-level experience (CUDA, Triton, graph capture, operator fusion) Experience with multimodal or video training (variable-length sequences, packing/bucketing) Experience working on large-scale training frameworks or distributed runtimes Familiarity with cluster topology, networking, and large-scale scheduling effects Why This Role Direct leverage on research velocity — every efficiency gain you make accelerates model iteration across the entire research team Own the scalability and performance of large-scale multimodal training for real-world embodied intelligence, not static benchmarks Improvements you make compound across every training run the company executes — high ownership, high impact, small elite team #J-18808-Ljbffr Rhoda AI

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Research Member of Technical Staff- Training Systems in Mountain View, CA vacancy
  •  ...from high-performance hardware and robot systems to the infrastructure and state-of-the-...  ...cases, made possible by our cutting edge research and end-to-end system design. We've...  ...Research Engineer to build and maintain the training platform that powers our model development... 
    Technical training

    Rhoda AI

    Mountain View, CA
    3 days ago
  •  ...Radiology Partners, Cognita’s models are trained and validated on one of the world’s...  ...radiology datasets. About the Role As a Member of Technical Staff focused on model training, you will be...  ...engineers to ensure training systems are reliable, performant, and scalable... 
    Technical training

    Cognita Imaging Inc.

    Palo Alto, CA
    2 days ago
  • RadixArk is seeking a Member of Technical Staff — Training to build and scale the systems that train frontier AI models. You will work on large-scale distributed...  ...infrastructure tooling Collaborate with model researchers to support frontier experiments Debug and resolve... 
    Technical training
    Flexible hours

    RadixArk

    Palo Alto, CA
    4 days ago
  • $180k

    xAI’s mission is to create AI systems that can accurately...  ...teammates. About the Role The mid‑training team at xAI aims to provide an...  ...phone interview") during which a member of our team will ask some...  ...process, which consists of four technical interviews: Coding... 
    Technical training
    Temporary work
    Relocation

    xAI

    Palo Alto, CA
    4 days ago
  • $180k - $250k

    Member of Technical Staff -- TPU Systems (JAX / XLA / PALLAS) About the Role RadixArk is looking for a TPU Systems...  ...high-performance inference and training systems using JAX, XLA, and Pallas....  ...that powers leading AI companies and research labs. Join us in building... 
    Suggested
    Full time
    Flexible hours

    RadixArk

    Palo Alto, CA
    2 days ago
  •  ...paradigms. Born out of Stanford Research, our team blends AI with...  ...What You'll Do As a Founding Member of the Technical Staff at Architect, you'll be at the forefront of training AI models for chip design,...  ...and structured coding tasks. Systems Engineering: Strong software... 

    Architect Labs

    Palo Alto, CA
    2 days ago
  • $180k

    Member of Technical Staff, Recommendation Systems About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity...  ...the user experience Write data pipelines and training jobs that continuously learn from product data. Iterate... 
    Temporary work
    Relocation

    xAI

    Palo Alto, CA
    3 days ago
  • About the Role As a Member of Technical Staff [Research] at NeoCognition , you’ll be part of the core team advancing...  ...the frontier of LLM agents — systems that can reason, plan, and act...  ...in the areas of LLM reasoning, post-training, and agentic system design. Develop... 

    NeoCognition Inc.

    Palo Alto, CA
    2 days ago
  • $148.5k - $223.9k

    Senior Member of Technical Staff - AI ResearchSkip to main content#Senior Member of Technical Staff - AI Research page is loaded## Senior Member of Technical...  ...and iterate agentic AI systems with customers. With...  ...implementing and debugging model training, evaluation, and... 
    Work at office

    Salesforce, Inc.

    Palo Alto, CA
    3 days ago
  • $280k - $350k

     ...client is a Palo Alto AI research lab building the frontier of LLM agents: systems that reason, plan, and...  ..., shaping both the technical roadmap and the research...  ...across LLM reasoning, post‑training, and agentic system...  ...roadmap as one of the first members of the team. What we’... 
    Internship
    Relocation
    Visa sponsorship
    Relocation package

    Raydar

    Palo Alto, CA
    10 hours ago
  • $180k - $230k

     ...Job Description Job Description Member of Technical Staff, Finance Research 1 opening $6 - $8/hr Required Skills Financial research...  ...at the intersection of large language models, agentic systems, and enterprise finance, with a focus on building rigorous... 
    Full time
    Local area
    Remote work

    ESRhealthcare and EXEC STAFF RECRUITERS

    Palo Alto, CA
    11 days ago
  • $175k - $350k

     ...all. About the Role As a Model Training engineer, you will design,...  ...iterate on the fun parts. Balance research curiosity with product...  ...Communicate crisply with both technical and non-technical teammates....  ...improvements in customer-facing systems. Salary Range : $175,000 – $3... 
    Technical training

    Inflection AI

    Palo Alto, CA
    5 days ago
  • $180k

     ...Description Job Description About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in...  ...share knowledge with their teammates. Responsibilities Training trillion parameter neural networks at scale, as well as a... 
    Technical training
    Temporary work

    xAI

    Palo Alto, CA
    more than 2 months ago
  • $180k

     ...Description Job Description About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in...  ...ABOUT THE ROLE: You will work on the most critical post-training and reinforcement learning challenges at any given time —... 
    Technical training
    Temporary work

    xAI

    Palo Alto, CA
    a month ago
  • $180k

     ...Description Job Description About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in...  ...models through synthetic data generation. Optimize mid-training data mixtures to boost the ceiling for RL. Engineer long-... 
    Technical training
    Temporary work

    xAI

    Palo Alto, CA
    more than 2 months ago
  • $180k

     ...Description Job Description About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in...  .... You are a power user of AI models. If you previously trained models used by millions of people it's a big plus, but modeling... 
    Technical training
    Temporary work

    xAI

    Palo Alto, CA
    27 days ago
  • Salesforce AI Research is looking for a Machine Learning Engineer to incubate...  ...implement and iterate agentic AI systems with customers. With your strong technical competence, strategic thinking...  ...track records, such as LLM, pre/post-training, RL, agentic system. Prioritize... 

    Salesforce, Inc..

    Palo Alto, CA
    4 days ago
  • $148.5k - $223.9k

     ...engineering, product, and AI-focused activities across agentic AI systems. Specific responsibilities are not enumerated as a separate...  ...frameworks and strong ML fundamentals; experience debugging model training, evaluation, and inference pipelines. Infrastructure &... 

    Salesforce

    Palo Alto, CA
    10 hours ago
  • $148.5k - $223.9k

     ...future of Salesforce. Salesforce AI Research is looking for a Machine Learning...  ...implement and iterate agentic AI systems with customers. With your strong technical competence, strategic thinking...  ...track records, such as LLM, pre/post‑training, RL, agentic system. Prioritizes... 

    salesforce.com, inc.

    Palo Alto, CA
    2 days ago
  •  ...from high-performance hardware and robot systems to the infrastructure and state-of-the-...  ...cases, made possible by our cutting edge research and end-to-end system design. We've...  ...inference on edge and robot hardware Develop training strategies that produce better accuracy-... 

    Rhoda AI

    Mountain View, CA
    3 days ago
  •  ...performance hardware and robot systems to the infrastructure...  ...by our cutting edge research and end-to-end system...  ...that power our model training data pipeline, from...  ...— from senior to staff. What You'll Do Architect...  ...are expected to define technical direction and own architectural... 
    Immediate start

    Rhoda AI

    Palo Alto, CA
    1 day ago
  •  ...performance hardware and robot systems to the infrastructure...  ...by our cutting edge research and end-to-end system...  ...— from senior/MTS to staff. This is a customer-...  ...Communicate technical findings clearly to both...  ...modern ML pipelines: pre-training, fine‑tuning, evaluation... 

    Rhoda AI

    Palo Alto, CA
    2 days ago
  • Member of Technical Staff - Backend Engineer - Data Systems and APIs About Vinci We’re building a copilot for hardware. Software...  ...large datasets used for training and evaluating models Manage simulation...  ...with ML engineers, physics researchers, and product engineers. What... 

    Vinci4d

    Palo Alto, CA
    1 day ago
  • $220k - $300k

     ...We are looking for a Principal Member of Technical Staff to join our engineering organization as a...  ...needs into scalable, high-performance systems. This role is ideal for someone who has...  ...to open-source security tooling or research. Endor Labs is an Equal Opportunity Employer... 

    Endor Labs

    Palo Alto, CA
    1 day ago
  •  ...About xAI xAI’s mission is to create AI systems that can accurately understand the...  ...Role We are seeking a highly skilled Member of Technical Staff to join our team in managing and enhancing...  ...downtime\'s ripple effects on AI training pipelines. This role encourages broad... 

    Pantera Capital

    Palo Alto, CA
    1 day ago
  • $180k

     ...About xAI xAI’s mission is to create AI systems that can accurately understand the universe...  ...(“phone interview”) during which a member of our team will ask some basic questions...  ...enter the main process, which consists of 2 technical interviews and 1 project deep‑dive... 
    Temporary work
    Work at office
    Work from home

    Pantera Capital

    Palo Alto, CA
    1 day ago
  • $180k

     ...xAI’s mission is to create AI systems that can accurately understand the universe and aid...  ...(“phone interview”) during which a member of our team will ask some basic questions...  ...enter the main process, which consists of 2 technical interviews and 1 project deep-dive... 
    Temporary work
    H1b
    Work at office
    Work from home
    Work visa

    Xai

    Palo Alto, CA
    1 day ago
  •  ...company founded by pioneers of AI silicon, systems, software, and infrastructure to...  .... We are looking for an exceptional Member of Technical Staff to help design, build, and scale core...  ...demonstrated through prior roles, projects, or research. What We Offer Key leadership role in... 

    DensityAI

    Mountain View, CA
    5 days ago
  • $180k

     ...xAI’s mission is to create AI systems that can accurately...  ...collaborating closely with model and research teams to ship breakthroughs at...  ...phone interview”) during which a member of our team will ask some...  ...process, which consists of 2 technical interviews and 1 project deep... 
    Temporary work

    Pantera Capital

    Palo Alto, CA
    2 days ago
  • $180k

    Member of Technical Staff - Web Engineering About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization... 
    Temporary work
    Worldwide

    xAI

    Palo Alto, CA
    10 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Member of Technical Staff- Training Systems. Be the first to apply!