Pre-Training Data Engineer for Large-Scale Multimodal AI
$350kThinking Machines Lab
Thinking Machines Lab in San Francisco is seeking a pre-training researcher, responsible for curating and analyzing large-scale datasets that support AI model development. The ideal candidate will demonstrate proficiency in Python and a strong academic background in relevant fields. This role blends research and engineering, requiring both theoretical knowledge and practical skills. Compensation ranges from $350,000 to $475,000 based on experience, and the company offers generous benefits including unlimited PTO and health insurance. #J-18808-Ljbffr Thinking Machines Lab
$250k - $380k
...time Department Scaling Compensation... ...Accounts Pre-tax accounts... ...OpenAI’s LLM training and inference... ...looking for an engineer to design and... ...closely with the multimodal researchers,... ...(MM) data that cannot fit... ...bottlenecks across large fleets of... ...OpenAI is an AI research and...TrainingFull timeWork at officeLocal areaRelocation packageFlexible hours- ...-site Department Engineering Our Mission Reflection... .... Our team of AI researchers and... .... About the Role Data is playing an increasingly... ...the data used to train our models meets a... ...on our pre‑training teams, you... ...measurable standards that scale across large data campaigns. We...TrainingFull timeRelocation package
- ...California. The Role: As a Data Engineer - Multimodal Systems , you will be a core... ...will be involved in collecting large-scale datasets and implementing... ...Experience contributing to large pre-existing codebases and... ...what we do and love discussing AI Benefits and Perks:...SuggestedWork at officeRelocation package
- ...Solutions is hiring a Senior Data Engineer (Apache Spark) in San Francisco, USA . Lead the design of large-scale distributed data processing systems... ...deliver feature stores and training data sets at scale Drive... ...technology company building AI-powered enterprise products....TrainingFlexible hours
- A pioneering AI firm based in San Francisco is seeking a Research Engineer, Distributed Data Systems. In this role, you will design and maintain infrastructure for large-scale multimodal training, ensuring scalability and reliability of data systems. Candidates should...TrainingWork at officeRelocation package
- ...breakthrough Physical AI system —... ...generation models — is trained on petabytes of... ..., and sensor data. But today's... ...analytics, not the multimodal corpora that power... ...Our open‑source engine, Daft, is the... ...that make corpus‑scale annotation tractable... ...and large‑scale data processing...TrainingHourly payWork at officeFlexible hoursNight shift1 day per week
- ...We are seeking a Data Infrastructure Engineer to build and operate... ...product usage scale. What You'll Do... ...perception model training and evaluation... ...search, indexing, and large-scale querying... ...Exposure to perception, multimodal, or geospatial... ..., data, and AI systems with real...TrainingPermanent employmentFull time
- ...Bio is developing a data-driven healthcare... ...and advanced AI to transform the management... .... We integrate multimodal data (clinical... ...generation discovery engine. This role is... ...central to building and training novel algorithms... ...models to analyze large-scale, multimodal datasets...Training
$179k - $218k
...Senior Staff Data Center Operations Engineer, GPU Hardware Architecture... ...integrated AI infrastructure company... ...believe in the scale of our ambition... ...maintenance, identifying "pre-failure" patterns... ...impact customer training runs.... ...Experience using large datasets or basic...TrainingTemporary work- ...turning physical AI into reality, helping... ...partners and scaling fast. The founding... ...embodied AI, and large-scale machine... ...looking for a Robotics Data Infrastructure Engineer to own and build... ...robot data to training and evaluation workflows... ...Wrangle massive multimodal datasets:...TrainingFull timeWork experience placementImmediate start
- ...recommendation models and methodologies closely related to LLM pre-training and post-training at a large scale. Candidates should possess a fresh PhD or MS/PhD with... ...and a strong passion for RecSys and applied AI. Apply now for this exciting opportunity! #J-18808-Ljbffr...Training
- ...-site Department Engineering Our Mission Reflection... .... Our team of AI researchers and... .... About the Role Data is playing an increasingly... ...web and other large-scale data sources into... ...corpora for training frontier models. You... ...delivers data to our pre-training pipelines...TrainingFull timeRelocation package
- ...is to architect AI that learns from... ...new primitive for training efficient, large-scale foundation... ...innovation and systems engineering paired with a... ...About the Role Data is the lifeblood... ...infrastructure that feed our pre‑training and... ...audio and other multimodal data. Your work...TrainingWork at officeVisa sponsorshipFlexible hours
$370k
...and steerable AI systems. We want... ...committed researchers, engineers, policy experts... ...an Analytics Data Engineering... ...that can scale with our company... ...of education, training, and/or experience... ...on just a few large-scale research... ...Interpretability, Multimodal Neurons, Scaling...TrainingWork at officeVisa sponsorshipFlexible hours- ...Machine Learning Engineer, Data Quality Rime builds voice AI for enterprises running... ...experiences at scale. Our text-to-speech... ...data. So before we trained a single model, we... ...quality assurance, pre-processing, cataloging... ...QA workflows : A large share of incoming data...TrainingRemote workVisa sponsorship
$146.4k - $235.38k
...business-critical data that is trapped inside... ...The Data and AI Platform Engineer will design, build... ...ML capabilities at scale. Reporting to the... ...data preparation, training, evaluation, deployment... ...) supporting large-scale data and AI... ...their achievement of pre-established sales...TrainingContract workWork at officeLocal areaRemote work2 days per week- ...Synthetic Data Engineer (AI Data/Training) San Francisco Bay Area, USA We are seeking a talented and innovative Synthetic Data Engineer.... ...Qualifications: Proven experience building large-scale data pipelines (Airflow, Spark, Ray). Deep knowledge...Training
$167.2k - $209k
...Senior Forward Deployed Data Scientist/Engineer San Francisco, CA; New York, NY At Scale AI, we help leading enterprises turn AI... ...enabled products Experience with large-scale data processing and... ..., and relevant education or training. Scale employees in eligible roles...TrainingFull time$275k - $370k
...and steerable AI systems. We want... ...committed researchers, engineers, policy experts... ...of our growing Data Science and... ...we think about scaling engineering... ...experience analyzing large-scale system... ...of education, training, and/or... ...Interpretability, Multimodal Neurons, Scaling...TrainingWork at officeVisa sponsorshipFlexible hours$138.9k - $186.2k
...Sr Data Engineer Disney Entertainment and ESPN Product &... ...the power of data and AI. We design and build innovative... ...to design, build, and scale the data foundations... ..., Experience/Skills/Training: ~5+ years of data... ...streaming data pipelines in large-scale, distributed...Training$255k - $405k
...Savings Accounts Pre‑tax accounts... ...is pioneering multimodal capabilities for... ...functionalities into our AI products,... ...As a Software Engineer, Distributed Data Systems, you will design and scale the... ...infrastructure that powers large‑scale multimodal training and evaluation...TrainingFull timeWork at officeLocal areaRelocation packageFlexible hours$25 - $30 per hour
...affordable energy at scale to support America'... ...energy and data center infrastructure... ...projects, including our AI-based digital... ...undergraduate Data Engineering & AI Enablement Intern... ...Transform and structure large datasets for... ...programs and workforce training. When you join SB Energy...TrainingHourly payInternshipSummer internshipWork at officeLocal area$139.44k - $174.31k
...Senior Scientific Data Engineer Berkeley Lab's Joint Genome... ...for an emerging era of AI-enabled scientific... ...will rely on to meet the scale, complexity, and urgency... ...orchestration, data access, and large-scale scientific... ...equivalent knowledge/training) in Computer Science or...TrainingFull timeWork at officeRemote workRelocation package- A leading AI research organization in San Francisco seeks an Infrastructure Engineer to design and maintain large distributed ML training and inference clusters. The ideal candidate will have a strong grasp of optimizing training workloads and experience with distributed...Training
$140k - $180k
...Data Infrastructure Engineer Alljoined is creating a future where humans are... ...deep learning research to large scale EEG datasets to decode multimedia... ...that process massive multimodal datasets (video, audio, text... ...clusters we use to train on it. You will be powering...TrainingLocal areaVisa sponsorship- ...experienced and motivated Senior Staff Data Engineer to be the technical leader of... ...architecture of our next gen AI powered SoFi Data Platform(... ...architecture and delivery of large-scale, high-performance data... ...: Contribute to hiring and training efforts to build a skilled and...TrainingRemote work
- Requirements AI & ML Enablement: Experience designing data workflows, feature pipelines, or infrastructure... ...supports AI/ML training, inference, experimentation... ...monitoring , Data Product Engineering: Proven experience building large-scale, production-grade data products...Training
$208k - $282k
...Staff Data Engineer At Komodo Health, our mission is to... ...data usable at enterprise scale. Komodo Health... ...serving patterns across large-scale healthcare datasets... ...Rust, C++, and emerging AI-enabled engineering workflows... ...that supports AI/ML training, inference,...TrainingWork experience placementLocal areaFlexible hours$197.3k - $313.7k
...Job Category Data Job Details About... ...Salesforce is the #1 AI CRM, where humans with... ...workloads, including feature engineering for ML models and real... ...designing and implementing large-scale Enterprise Data... ...feature freshness, model training pipelines, and real-time...TrainingWork at office$207k - $276k
...practices into the engineering organization. Essential... ...of expertise on Data Engineering, Data... ...(Cloudera / S3), AI/ML Model development... ...The base pay scale for this position in... ...education, experience, training, and specialized skills... ...Account (HSA) or pre-tax savings through...TrainingHourly payWork at officeImmediate startVisa sponsorshipWork visaFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Pre-Training Data Engineer for Large-Scale Multimodal AI. Be the first to apply!
- staff data engineer San Francisco, CA
- data visualization developer San Francisco, CA
- data science developer San Francisco, CA
- senior data center engineer San Francisco, CA
- sr information security engineer San Francisco, CA
- junior big data engineer San Francisco, CA
- entry level big data engineer San Francisco, CA
- data engineer contract San Francisco, CA
- aws data engineer San Francisco, CA
- data engineer manager San Francisco, CA

