Research Scientist / Engineer - Training Infrastructure
Intellipro Group Inc
Job Description
Job Description
Job Title: Research Scientist / Engineer – Training Infrastructure
Position Type: Full time
Location: Palo Alto, CA • Remote - US • Remote - International
Salary Range: $220,000 - $300, 000 (USD)
Job ID#: 154559
We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change. We are looking for engineers with significant experience solving hard problems in PyTorch, CUDA and distributed systems. You will work alongside the rest of the research team to build & train cutting edge foundation models on thousands of GPUs that are built to scale from the ground up.
Responsibilities- Design, implement, and optimize efficient distributed training systems for models with thousands of GPUs
- Research and implement advanced parallelization techniques (FSDP, Tensor Parallel, Pipeline Parallel, Expert Parallel)
- Build monitoring, visualization, and debugging tools for large-scale training runs
- Optimize training stability, convergence, and resource utilization across massive clusters
- Extensive experience with distributed PyTorch training and parallelisms in foundation model training
- Deep understanding of GPU clusters, networking, and storage systems
- Familiarity with communication libraries (NCCL, MPI) and distributed system optimization
- (Preferred) Strong Linux systems administration and scripting capabilities
- (Preferred) Experience managing training runs across >100 GPUs
- (Preferred) Experience with containerization, orchestration, and cloud infrastructure
Founded in 2009, IntelliPro is a global leader in talent acquisition and HR solutions. Our commitment to delivering unparalleled service to clients, fostering employee growth, and building enduring partnerships sets us apart. We continue leading global talent solutions with a dynamic presence in over 160 countries, including the USA, China, Canada, Singapore, Japan, Philippines, UK, India, Netherlands, and the EU.
IntelliPro, a global leader connecting individuals with rewarding employment opportunities, is dedicated to understanding your career aspirations. As an Equal Opportunity Employer, IntelliPro values diversity and does not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, genetic information, disability, or any other legally protected group status. Moreover, our Inclusivity Commitment emphasizes embracing candidates of all abilities and ensures that our hiring and interview processes accommodate the needs of all applicants. Learn more about our commitment to diversity and inclusivity at Compensation: The pay offered to a successful candidate will be determined by various factors, including education, work experience, location, job responsibilities, certifications, and more. Additionally, IntelliPro provides a comprehensive benefits package, all subject to eligibility.
Powered by JazzHR
UYmlewWb2Y
- ...Job Description Job Description Job Title: Research Scientist / Engineer – Training Infrastructure Position Type: Full time Location: Palo Alto, CA • Remote - US • Remote - International Salary Range: $220,000 - $300, 000 (USD) Job ID#: 154559 Job Description...TrainingFull timeWork experience placementRemote work
- ...possible, we are building across the entire robotics stack. We're training state-of-the-art AI models that leverage our large-scale,... ...on the things they value most. As a Machine Learning Research Engineer, you will work on the software and algorithms that enable our...Training
$150.29k - $171.67k
...Cloud Engineer We are seeking a highly skilled Cloud Engineer... ...next generation of Stanford's research computing environment. This... ...Responsibilities Research, HPC & AI Infrastructure: Architect cloud-native and... ...workloads, such as AI training and genomics. Cloud...TrainingHourly payWeekend workAfternoon shift$90 - $121.86 per hour
...Job Description Job Description LLM Research Engineer Key Responsibilities: Design, train, and fine-tune large language models (e.g., GPT, LLaMA, PaLM) for various applications. Conduct research on cutting-edge techniques in natural language processing (NLP...TrainingHourly pay- ...of previous Stanford professors, SAIL researchers, Olympiad medalists (IPhO, IOI, etc.),... ...Your work will enable large-scale model training, inference, and reinforcement learning... .... Working closely with researchers and engineers, you’ll help make Voltai the world’s leading...TrainingFull time
$197k - $291k
...Staff Research Engineer, Applied AI Mountain View, California, US Snapshot We are seeking... .... At Google DeepMind, we're a team of scientists, engineers, machine learning experts... ...of hands-on experience building, training, and deploying machine learning models...TrainingFull time- ...the next generation of data infrastructure at Mistral AI. You will be a... ...governed data access for MLOps and research. You will take full... ...call rotations for critical training jobs. What will you... ...exabyte growth. • Platform Engineering: Contribute to the development...TrainingWork at officeVisa sponsorship
$190.58k - $200k
...GPU Cluster Lead Engineer Stanford Research Computing seeks an exceptional GPU Cluster Lead Engineer... ...as the technical authority on GPU infrastructure, driving system performance and reliability... ..., best practices guides, and training materials; deliver workshops on GPU...TrainingHourly payFlexible hoursWeekend workAfternoon shift$180k - $258.75k
...Job Description Job Description At Toyota Research Institute (TRI), we’re on a mission to improve the quality of... ...through to simulation and assembly — and developing the engineering infrastructure needed to train, evaluate, and iterate on these systems at scale....TrainingFull timeLocal areaShift work- ...powering the future of physical AI. digital infrastructure needed to bring intelligence to every... ...We are looking for a passionate Research Engineer (AI/RL Infrastructure) to join the Research... ...to our business. Design and build training and evaluation infrastructure to...TrainingFull timeFor contractorsFor subcontractorCasual workWork at officeImmediate startRemote workDay shift
$204k - $259k
...into the Waymo Driver. We conduct our own research to address real-world problems and... ...from a diverse set of sensors, enabling engineers like you to (1) develop methods for efficiently... ...data, to (2) develop models and model training at scale, to (3) analyze real-world...TrainingFull timeTemporary workRemote work$175k - $215k
...thoroughly tested code to bring cutting-edge research into production Partner with world-class researchers, engineers, and product managers to deliver safe and smooth... ...exact work location, experience, relevant training and education, and skill level. Your recruiter...TrainingFull timeInternshipRemote work$170k - $216k
...into the Waymo Driver. We conduct our own research to address real-world problems and... ...from a diverse set of sensors, enabling engineers like you to (1) develop methods for efficiently... ...data, to (2) develop models and model training at scale, to (3) analyze real-world...TrainingFull timeRemote work$204k - $259k
...collaborations with other research teams in Alphabet. AI... ...to a Principal Scientist. You will:... ...Foundation World Model post-training and evaluation Research... ...Waymo's internal RL infrastructure, conducting rigorous... ...Partner with engineering and research teams across...TrainingFull timeTemporary workRemote work- ...Machine Learning Research Scientist At Autoscience Institute, we create AI systems that autonomously... ...models. Collaborate with the engineering team to build and deploy production-ready research systems. RL post-train and fine-tune reasoning models to automate...TrainingFull timeFlexible hours
$213k - $263k
...initiate and foster collaborations with other research teams in Alphabet. AI Foundations areas... ...and reports to a Staff Research Scientist / Tech Lead Manager . You will:... ...Experience in large-scale distributed training and different forms of parallelism. Experience...TrainingFull timeTemporary workRemote work- ...At Toyota Research Institute (TRI), we're on a mission to improve the quality of human life... .... Collaborate with researchers and engineers across TRI and Toyota's broader ecosystem... ...project, from data processing to model training to evaluation. Genuine interest in how...TrainingWork experience placementInternshipLocal areaShift work
$176k - $253.5k
...At Toyota Research Institute (TRI), we're on a mission to improve the quality of human life... ...We are looking for an AI Research Scientist, or Senior Machine Learning Research Scientist... ...in large-scale foundational model training, fine-tuning, evaluation and benchmarking...TrainingTemporary workLocal areaShift work- ...this role, you will collaborate with a small team of talented researchers on ambitious, greenfield projects in generative AI and reinforcement... ...Code-specific architectures LLM fine-tuning, post-training, RLHF Requirements Ph.D. in Computer Science or a closely...TrainingRelocation packageFlexible hours
$204k - $259k
...across 15+ U.S. states. The mission of the Waymo Applied Research team is to develop machine learning solutions addressing open problems... ...research and development Design compelling experiments by training and evaluating large deep learning models Present results...TrainingFull timeRemote work$193.93k - $291.15k
...ML Research Scientist, Prediction & Smart Agents Mountain View, California (HQ) Nuro is a self-driving technology company on a mission... ...smart, controllable agents to enable effective closed-loop training in simulation. If you are passionate about solving challenging...Training$180k
...Network Engineer - ML Infrastructure (High-Speed Interconnects) Palo Alto, CA About xAI xAI's mission is to create AI systems that can... ...and optimize the network fabric that powers large-scale AI training and inference clusters. This strategic role will drive innovation...TrainingTemporary work$197.8k - $296.6k
Lab Summary: The Robot Intelligence Lab at Samsung Research America is a new facility dedicated to advancing the field of robotics through... ...in EECS/Robotics or equivalent combination of education, training, and experience 7+ years’ industry experience in robotics foundation...TrainingFull timeWork at officeLocal area- ...Performance Computing (HPC) and AI Networking Performance Research and Analysis Engineer Intelligent machines powered by Artificial... ...and CPUs scale clusters for distributed Deep Learning LLM training focused on collectives communication and networking. You...Training
- ...Responsibilities Conduct research and development focused on robot perception, control, task planning, and model training to transition intelligent agents from the digital world... ...such as computer science, mechanical engineering, electrical engineering, robotics, or a...Training
- ...Models We are a dedicated research lab for building,... ...cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental... ...architecture, training, and infrastructure to turn research ideas into...TrainingFull time
$150k - $230k
...Clockwork Systems was founded by Stanford researchers and veteran systems engineers who share a vision for redefining... ...increasingly complex, traditional infrastructure struggles to meet the demands of... ...high-performance distributed GPU training. You'll work at the intersection...Training- ...its fast-growing teams. As a Research Engineer, you will deliver mission-critical... ...alongside engineers, research scientists, and domain experts to build optimal... ...Develop tools and infrastructure for dataset generation, training, and evaluation to drive advancements...TrainingFull time
- ...Research Engineer, Foundation Models About the Opportunity We are... ...on the development, training, evaluation, and deployment... ...scale datasets and training infrastructure to experimenting with new model... ...Experimentation, Research Scientists, Research Engineers, Software...TrainingVisa sponsorshipRelocation packageFlexible hours
$204k - $259k
...The Simulation ML Infrastructure team builds scalable... ...for the testing and training of the Waymo Driver.... ...This role reports to an Engineering Manager. You will:... ...class, high-performing research engineering team to advance... ...for engineers and scientists. ~ Excellent...TrainingFull timeRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Research Scientist / Engineer - Training Infrastructure. Be the first to apply!
- machine learning research scientist Palo Alto, CA
- drug safety scientist Palo Alto, CA
- remote scientist Palo Alto, CA
- operations research scientist Palo Alto, CA
- senior scientist Palo Alto, CA
- applied scientist Palo Alto, CA
- water quality scientist Palo Alto, CA
- cell culture scientist Palo Alto, CA
- analytical scientist Palo Alto, CA
- qc scientist Palo Alto, CA




