Senior Research Engineer, Training Data Infrastructure in Foundation Models
Apple Inc.
Senior Research Engineer, Training Data Infrastructure in Foundation Models Cupertino, California, United States - Software and Services Our team is dedicated to solving the high-quality training data problem at the scale required to train advanced Foundation Models. We believe that the advanced model performance (including reasoning, codingارس کاربران, و agentic planning) fundamentally depends on a data-centric approach to Machine Learning. Our objective is to engineer a large-scale system that acquires,ликಿತ್ರ, processes, and curates the data required to advance the state of the art in Artificial Intelligence. We are seeking a Senior Research Engineer who possessesензи deep understanding of distributed systems and a strong intuition for Machine Learning. You will join a culture that values engineering craftsmanship, privacy, and rigorous scientific inquiry, utilizing advanced cloud technologies to build the data systems that power our most capable models. Description This position operates at the convergence of Software Engineering and Machine Learning Research. Unlike traditional backend roles, this position requires you to design systems where the outcome is the statistical distribution and quality of data itself. Youураль will work alongside Research Scientists to transform theoretical observations into concrete, scalable engineering solutions. Your core focus will be the architecture of our Data Acquisition, Processing, and Repository Management systems for Large Model training. You will lead technical efforts to enable active, quality-driven data curation, including filtering, deduping, synthetic data generation and data mixing, ensuring our models are trained on the highest-quality information available. Responsibilities Architect Scalable Ingestion Systems: Design and implement high-throughput distributed systems to ingest petabytes of text and multimodal data from diverse sources, including web crawls and third-party partnerships. Repository Optimization: Manage the lifecycle of large-scale datasets across data storage and high-performance file systems. Optimize data formats for efficient random access and sequential scanning during model training. Data Governance & Privacy: Engineer robust data governance and privacy solutions for the training data, in collaboration with compliance and legal teams, to ensure adherence to stringent regulatory standards. High-Performance Processing Pipelines: Build and maintain distributed data processing workflows using-connect frameworks on cloud infrastructure (e.g., GCP, AWS). Algorithmic Data Curation: Implement sophisticated data filtering and selection logic to remove low-quality content and develop semantic deduplication at scale to prevent model memorization and improve training efficiency. Decontamination Removal: Design automated systems to detect and remove benchmark leakage, ensuring that evaluation datasets remain strictly isolated from training corpora. Infrastructure for Scaling Laws: Collaborate with researchers to enable data ablations and scaling experiments. Build tools to support systematic data mixture optimization and empirically data studies. … #J-18808-Ljbffr Apple Inc.
$224k - $356.5k
NVIDIA is searching for a senior or principal engineer who specializes in building cutting‑edge infrastructure for large‑scale foundation model training in the Generalist Embodied Agent Research (GEAR) group. Our team... ...datasets. Implement scalable data loaders and...SeniorTrainingFoundationFull time- A leading technology company located in Cupertino, California, is seeking a Senior Research Engineer focused on training data infrastructure for advanced AI models. The ideal candidate will possess strong skills in distributed systems and a deep understanding of Machine...SeniorTrainingFoundation
$180k - $258.75k
...Description At Toyota Research Institute (TRI)... ...developing the engineering infrastructure needed to train, evaluate, and... ...looking for a Senior Research... ...software engineering foundation, deep... ...geometry or physical modeling, and a genuine... ...including efficient data structures,...SeniorTrainingFoundationLocal areaShift work- ...TITLE: ML Data Infrastructure Engineer LOCATION: Sunnyvale CA or Remote Duration: 12+ Months... ...composer), Vertext AI , Datapipeline, ML Training Role Overview: We're seeking... ...learning. This role focuses on the data foundation that powers our ML capabilities....SeniorTrainingFoundationRemote work
$150k
About the Institute of Foundation Models We are a dedicated research lab for building,... ...edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental... ...: Data Infrastructure & Pipelines Design, implement...TrainingFoundationVisa sponsorship$124.09k - $210k
...core member of our AI Infrastructure team, you will work... ...Autonomous Driving and Foundation Models. We don't just... ...EB-scale perception data from tens of thousands... ...high-performance Data Engine that powers our next-... ...data versioning. * Training Throughput Optimization...SeniorTrainingFoundationFull timeWork experience placement$210k - $267k
...trading energy are the foundation of what we do. We ingest large-scale data-weather, prices, load,... ...volatility. Our deep learning models have proven very... ...We're looking for an engineer to help lead the scaling... ...reliability of our data infrastructure, which is core to the ML...SeniorFoundationWork at officeRemote workWork from homeHome officeFlexible hours3 days per week$224k - $356.5k
NVIDIA Gruppe is seeking a Senior or Principal Engineer for their GEAR group, focusing on large-scale foundation model training for humanoid robots. You'll design distributed training... ...systems and collaborate with a top-tier research team to impact their projects...SeniorTrainingFoundation$203.45k - $344.3k
...Senior Staff AI Data Infrastructure/Pipeline Engineer Santa Clara, CA XPENG is a leading smart technology company... ...→ dataset production → model training / simulation input. In autonomous... ...Java. Solid software engineering foundation, good coding standards, and a strong...SeniorTrainingFoundationFull timeOverseas$181.1k - $318.4k
...Sr. Machine Learning Research Engineer, Siri Speech We are a group... ...to build cutting-edge infrastructure, datasets, and models that empower Siri with... ...to push the frontiers of foundation models and conversational... ...scale machine learning training/evaluation On-device...SeniorTrainingFoundationRelocation$184k - $287.5k
...recruiting top research engineers in the Autonomous... ...and generative modeling. You must have strong... ...track record of training deep learning... ...mathematical foundation to analyze new AI... ...Implement scalable data loaders and... ...optimize simulation infrastructure (based on GPU-...SeniorTrainingFoundationFull time$150k - $200k
...growing teams. As a Research Engineer, you will deliver... ...build optimal and data‑driven controls to... ...‑learning vehicle models and learning‑based... ...Develop tools and infrastructure for dataset generation, training, and evaluation to... ...Strong foundation in motion control...SeniorTrainingFoundation$150k
A leading research lab in AI located in Sunnyvale, California, is seeking an individual to join their AllWorld Team. The role focuses on developing scalable data pipelines and optimizing foundation model training. Candidates should hold at least a Master's or PhD in Machine...TrainingFoundation- ...robot systems to the infrastructure and state-of-the-art foundation world models that control our... ...our cutting edge research and end-to-end system... ...or Research Engineer to own the strategy... ...quality robot learning data. This role sits at... ...our models train on. What You'll Do...TrainingFoundation
- Ipro Networks Pte. Ltd. is seeking a Research Scientist / Engineer in Palo Alto, CA to develop and optimize distributed training infrastructure for multimodal foundation models. This role involves significant experience with PyTorch and managing large-scale GPU clusters...TrainingFoundationRemote job
$153.2k - $234.1k
...Embodied AI Infra Foundation team at General... ...build the critical infrastructure that powers every... ...machine learning engineer working on our... ...Autonomous Driving models. From foundational... .... As a Senior ML Infra Engineer... ...machine learning model training and evaluation workflows...SeniorTrainingFoundationWork at officeLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$181.1k - $318.4k
...something! Description As a Senior/Staff Engineer on the Foundation Model Compute Infrastructure team, you will lead the design... ...efficient execution of large‑scale training and inference jobs. This role... ...skills across engineering and research teams Bachelor’s degree in...SeniorTrainingFoundationRelocation- ...robot systems to the infrastructure and state-of-the-art foundation world models that control our robots... ...possibly by our cutting edge research and end-to-end system... ...hardware Develop training strategies that... ...or equivalent research/engineering experience Publication...TrainingFoundation
- ...The Catalog Data Science team... ...Learning, and Engineering. We tackle... ...visualization, and model serving. We... .... As a Senior Data... ...reproducible training, robust evaluation... ...) Strong foundation in classical... ...and GPU infrastructure Working knowledge... ...AI/ML research and translating...SeniorTrainingFoundation
$281k - $356k
...The Perception Data team at Waymo... ...data used to train and evaluate the... ...-vocabulary modeling. By unifying... ...development of foundation models and... ...Machine Learning, Infrastructure, and... ...a Director of Engineering You will... ...to Staff and Senior engineers across...SeniorTrainingFoundationFull timeRemote work- ...and cutting-edge models, products and... ...next generation of data infrastructure at Mistral AI.... ...access for MLOps and research. You will take... ...for critical training jobs. What will... ...growth. Platform Engineering: Contribute to... ...interest in supporting foundational compute and...TrainingWork at officeVisa sponsorship
$147.4k - $272.1k
...Machine Learning Engineer, Data and ML... ...the revolution in Foundation Models? Contribute to model... ...to improve model training and evaluation efficiency... .... As a Senior Machine Learning... ...of modeling, infrastructure, and product, helping... ...closely with research, infrastructure,...SeniorTrainingFoundationRelocation$220k - $300k
Job Title: Research Scientist / Engineer - Training Infrastructure Position Type: Full time Location: Palo Alto, CA •... ...intelligence. To go beyond language models and build more aware, capable... ...training and scaling up multimodal foundation models for systems that can see...TrainingFoundationFull timeWork experience placementRemote work$237.6k - $318.24k
...Senior Staff Software Engineer For Ai Model Lifecycle Team Crusoe is on a mission... ...integrated AI infrastructure company built from... ...energy, manufacturing, data center... ...systems for large foundation models (SFT, PEFT,... ...maintain end-to-end training pipelines for large...SeniorTrainingFoundationTemporary work$196k - $230k
...the rewards. The Data Engineering team builds and maintains the foundational datasets that... ...ensure accurate, well-modeled data is... ...products. As a Senior Data Engineer, you... ...data stack (Data Infrastructure, Analytics and Visualization... ...education, training, experience, location...SeniorTrainingFoundationWork at officeFlexible hoursShift work3 days per week$193.93k - $352.29k
...looking for a Senior/Staff Software Engineer to serve as... ...Nuro’s ML Data engine. You... ...Learning, and Infrastructure, acting as an... ...autonomy AI models. In this... ...high-value training signals for... ...for autonomy researchers, develop queries... ...for foundation model training...SeniorTrainingFoundationShift work$213k - $263k
...team, builds tools and infrastructure to realize the ML... ...partners closely with the modeling team to realize solutions... ...contribute to Waymo's data infrastructure... ...the field of software engineering ~ Experience programming... ...experience, relevant training and education, and skill...SeniorTrainingFull timeRemote work$256k - $356k
Principal Engineer, Infrastructure and Data Center Operations Google Sunnyvale, CA, USA Director+ Master'... ...the world. Our data centers are the foundation of all Google services and infrastructure... ..., and relevant education or training. Your recruiter can share more about...TrainingFoundationPermanent employmentFull timeFlexible hours$150k - $230k
...founded by Stanford researchers and veteran systems engineers who share a... ...the foundations of distributed... ..., traditional infrastructure struggles to meet... ...distributed GPU training. You'll work at... ...concurrency, memory models, and failure... ...stack. Senior Expectations...SeniorTrainingFoundation$197k - $291k
Staff AI Research Engineer, Large User Models Google Mountain View, CA, USA... ...direction related to Foundation Models, Large... ...of experience with data structures/algorithms... ...Recommender Model pre‑training. You will own the... ...collective roadmaps, ML infrastructure leads to define...TrainingFoundationFull timeWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Research Engineer, Training Data Infrastructure in Foundation Models. Be the first to apply!
- research engineer Cupertino, CA
- deep learning research engineer Cupertino, CA
- research programmer Cupertino, CA
- data visualization developer Cupertino, CA
- data science developer Cupertino, CA
- senior data center engineer Cupertino, CA
- sr information security engineer Cupertino, CA
- junior big data engineer Cupertino, CA
- entry level big data engineer Cupertino, CA
- aws data engineer Cupertino, CA


