Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Pre-Training Data Engineer for Large-Scale Multimodal AI

$350k

Thinking Machines Lab

Thinking Machines Lab in San Francisco is seeking a pre-training researcher, responsible for curating and analyzing large-scale datasets that support AI model development. The ideal candidate will demonstrate proficiency in Python and a strong academic background in relevant fields. This role blends research and engineering, requiring both theoretical knowledge and practical skills. Compensation ranges from $350,000 to $475,000 based on experience, and the company offers generous benefits including unlimited PTO and health insurance. #J-18808-Ljbffr Thinking Machines Lab

Vacancy posted 14 hours ago
Similar jobs that could be interesting for youBased on the Pre-Training Data Engineer for Large-Scale Multimodal AI in San Francisco, CA vacancy
  • $250k - $380k

     ...time Department Scaling Compensation...  ...Accounts Pre-tax accounts...  ...OpenAI’s LLM training and inference...  ...looking for an engineer to design and...  ...closely with the multimodal researchers,...  ...(MM) data that cannot fit...  ...bottlenecks across large fleets of...  ...OpenAI is an AI research and... 
    Training
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    1 day ago
  •  ...-site Department Engineering Our Mission Reflection...  .... Our team of AI researchers and...  .... About the Role Data is playing an increasingly...  ...the data used to train our models meets a...  ...on our pre‑training teams, you...  ...measurable standards that scale across large data campaigns. We... 
    Training
    Full time
    Relocation package

    B Capital

    San Francisco, CA
    3 days ago
  •  ...California. The Role: As a Data Engineer - Multimodal Systems , you will be a core...  ...will be involved in collecting large-scale datasets and implementing...  ...Experience contributing to large pre-existing codebases and...  ...what we do and love discussing AI Benefits and Perks:... 
    Suggested
    Work at office
    Relocation package

    Zyphra

    San Francisco, CA
    1 day ago
  •  ...Solutions is hiring a Senior Data Engineer (Apache Spark) in San Francisco, USA . Lead the design of large-scale distributed data processing systems...  ...deliver feature stores and training data sets at scale Drive...  ...technology company building AI-powered enterprise products.... 
    Training
    Flexible hours

    Appit LLC

    San Francisco, CA
    1 day ago
  • A pioneering AI firm based in San Francisco is seeking a Research Engineer, Distributed Data Systems. In this role, you will design and maintain infrastructure for large-scale multimodal training, ensuring scalability and reliability of data systems. Candidates should... 
    Training
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    4 days ago
  •  ...breakthrough Physical AI system —...  ...generation models — is trained on petabytes of...  ..., and sensor data. But today's...  ...analytics, not the multimodal corpora that power...  ...Our open‑source engine, Daft, is the...  ...that make corpus‑scale annotation tractable...  ...and large‑scale data processing... 
    Training
    Hourly pay
    Work at office
    Flexible hours
    Night shift
    1 day per week

    Eventual

    San Francisco, CA
    4 days ago
  •  ...We are seeking a Data Infrastructure Engineer to build and operate...  ...product usage scale. What You'll Do...  ...perception model training and evaluation...  ...search, indexing, and large-scale querying...  ...Exposure to perception, multimodal, or geospatial...  ..., data, and AI systems with real... 
    Training
    Permanent employment
    Full time

    Matter Intelligence

    San Francisco, CA
    4 days ago
  •  ...Bio is developing a data-driven healthcare...  ...and advanced AI to transform the management...  .... We integrate multimodal data (clinical...  ...generation discovery engine. This role is...  ...central to building and training novel algorithms...  ...models to analyze large-scale, multimodal datasets... 
    Training

    ChronicleBio

    San Francisco, CA
    1 day ago
  • $179k - $218k

     ...Senior Staff Data Center Operations Engineer, GPU Hardware Architecture...  ...integrated AI infrastructure company...  ...believe in the scale of our ambition...  ...maintenance, identifying "pre-failure" patterns...  ...impact customer training runs....  ...Experience using large datasets or basic... 
    Training
    Temporary work

    Crusoe

    San Francisco, CA
    1 day ago
  •  ...turning physical AI into reality, helping...  ...partners and scaling fast. The founding...  ...embodied AI, and large-scale machine...  ...looking for a Robotics Data Infrastructure Engineer to own and build...  ...robot data to training and evaluation workflows...  ...Wrangle massive multimodal datasets:... 
    Training
    Full time
    Work experience placement
    Immediate start

    Verne Robotics

    San Francisco, CA
    3 days ago
  •  ...recommendation models and methodologies closely related to LLM pre-training and post-training at a large scale. Candidates should possess a fresh PhD or MS/PhD with...  ...and a strong passion for RecSys and applied AI. Apply now for this exciting opportunity! #J-18808-Ljbffr... 
    Training

    Qishicpc

    San Francisco, CA
    14 hours ago
  •  ...-site Department Engineering Our Mission Reflection...  .... Our team of AI researchers and...  .... About the Role Data is playing an increasingly...  ...web and other large-scale data sources into...  ...corpora for training frontier models. You...  ...delivers data to our pre-training pipelines... 
    Training
    Full time
    Relocation package

    B Capital

    San Francisco, CA
    4 days ago
  •  ...is to architect AI that learns from...  ...new primitive for training efficient, large-scale foundation...  ...innovation and systems engineering paired with a...  ...About the Role Data is the lifeblood...  ...infrastructure that feed our pre‑training and...  ...audio and other multimodal data. Your work... 
    Training
    Work at office
    Visa sponsorship
    Flexible hours

    Cartesia

    San Francisco, CA
    2 days ago
  • $370k

     ...and steerable AI systems. We want...  ...committed researchers, engineers, policy experts...  ...an Analytics Data Engineering...  ...that can scale with our company...  ...of education, training, and/or experience...  ...on just a few large-scale research...  ...Interpretability, Multimodal Neurons, Scaling... 
    Training
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    18 days ago
  •  ...Machine Learning Engineer, Data Quality Rime builds voice AI for enterprises running...  ...experiences at scale. Our text-to-speech...  ...data. So before we trained a single model, we...  ...quality assurance, pre-processing, cataloging...  ...QA workflows : A large share of incoming data... 
    Training
    Remote work
    Visa sponsorship

    Rime Labs

    San Francisco, CA
    4 days ago
  • $146.4k - $235.38k

     ...business-critical data that is trapped inside...  ...The Data and AI Platform Engineer will design, build...  ...ML capabilities at scale. Reporting to the...  ...data preparation, training, evaluation, deployment...  ...) supporting large-scale data and AI...  ...their achievement of pre-established sales... 
    Training
    Contract work
    Work at office
    Local area
    Remote work
    2 days per week

    DocuSign

    San Francisco, CA
    4 days ago
  •  ...Synthetic Data Engineer (AI Data/Training) San Francisco Bay Area, USA We are seeking a talented and innovative Synthetic Data Engineer....  ...Qualifications: Proven experience building large-scale data pipelines (Airflow, Spark, Ray). Deep knowledge... 
    Training

    Hyphen Connect

    San Francisco, CA
    5 days ago
  • $167.2k - $209k

     ...Senior Forward Deployed Data Scientist/Engineer San Francisco, CA; New York, NY At Scale AI, we help leading enterprises turn AI...  ...enabled products Experience with large-scale data processing and...  ..., and relevant education or training. Scale employees in eligible roles... 
    Training
    Full time

    Scale AI

    San Francisco, CA
    2 days ago
  • $275k - $370k

     ...and steerable AI systems. We want...  ...committed researchers, engineers, policy experts...  ...of our growing Data Science and...  ...we think about scaling engineering...  ...experience analyzing large-scale system...  ...of education, training, and/or...  ...Interpretability, Multimodal Neurons, Scaling... 
    Training
    Work at office
    Visa sponsorship
    Flexible hours

    anthropic

    San Francisco, CA
    4 days ago
  • $138.9k - $186.2k

     ...Sr Data Engineer Disney Entertainment and ESPN Product &...  ...the power of data and AI. We design and build innovative...  ...to design, build, and scale the data foundations...  ..., Experience/Skills/Training: ~5+ years of data...  ...streaming data pipelines in large-scale, distributed... 
    Training

    Disney France

    San Francisco, CA
    5 days ago
  • $255k - $405k

     ...Savings Accounts Pre‑tax accounts...  ...is pioneering multimodal capabilities for...  ...functionalities into our AI products,...  ...As a Software Engineer, Distributed Data Systems, you will design and scale the...  ...infrastructure that powers large‑scale multimodal training and evaluation... 
    Training
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    3 days ago
  • $25 - $30 per hour

     ...affordable energy at scale to support America'...  ...energy and data center infrastructure...  ...projects, including our AI-based digital...  ...undergraduate Data Engineering & AI Enablement Intern...  ...Transform and structure large datasets for...  ...programs and workforce training. When you join SB Energy... 
    Training
    Hourly pay
    Internship
    Summer internship
    Work at office
    Local area

    SB Energy

    San Francisco, CA
    1 day ago
  • $139.44k - $174.31k

     ...Senior Scientific Data Engineer Berkeley Lab's Joint Genome...  ...for an emerging era of AI-enabled scientific...  ...will rely on to meet the scale, complexity, and urgency...  ...orchestration, data access, and large-scale scientific...  ...equivalent knowledge/training) in Computer Science or... 
    Training
    Full time
    Work at office
    Remote work
    Relocation package

    Berkely Lab

    San Francisco, CA
    2 days ago
  • A leading AI research organization in San Francisco seeks an Infrastructure Engineer to design and maintain large distributed ML training and inference clusters. The ideal candidate will have a strong grasp of optimizing training workloads and experience with distributed... 
    Training

    Causal Labs

    San Francisco, CA
    2 days ago
  • $140k - $180k

     ...Data Infrastructure Engineer Alljoined is creating a future where humans are...  ...deep learning research to large scale EEG datasets to decode multimedia...  ...that process massive multimodal datasets (video, audio, text...  ...clusters we use to train on it. You will be powering... 
    Training
    Local area
    Visa sponsorship

    Alljoined

    San Francisco, CA
    2 days ago
  •  ...experienced and motivated Senior Staff Data Engineer to be the technical leader of...  ...architecture of our next gen AI powered SoFi Data Platform(...  ...architecture and delivery of large-scale, high-performance data...  ...: Contribute to hiring and training efforts to build a skilled and... 
    Training
    Remote work

    SoFi

    San Francisco, CA
    2 days ago
  • Requirements AI & ML Enablement: Experience designing data workflows, feature pipelines, or infrastructure...  ...supports AI/ML training, inference, experimentation...  ...monitoring , Data Product Engineering: Proven experience building large-scale, production-grade data products... 
    Training

    Komodo Health

    San Francisco, CA
    1 day ago
  • $208k - $282k

     ...Staff Data Engineer At Komodo Health, our mission is to...  ...data usable at enterprise scale. Komodo Health...  ...serving patterns across large-scale healthcare datasets...  ...Rust, C++, and emerging AI-enabled engineering workflows...  ...that supports AI/ML training, inference,... 
    Training
    Work experience placement
    Local area
    Flexible hours

    Komodo Health

    San Francisco, CA
    2 days ago
  • $197.3k - $313.7k

     ...Job Category Data Job Details About...  ...Salesforce is the #1 AI CRM, where humans with...  ...workloads, including feature engineering for ML models and real...  ...designing and implementing large-scale Enterprise Data...  ...feature freshness, model training pipelines, and real-time... 
    Training
    Work at office

    Salesforce.Com Inc

    San Francisco, CA
    3 days ago
  • $207k - $276k

     ...practices into the engineering organization. Essential...  ...of expertise on Data Engineering, Data...  ...(Cloudera / S3), AI/ML Model development...  ...The base pay scale for this position in...  ...education, experience, training, and specialized skills...  ...Account (HSA) or pre-tax savings through... 
    Training
    Hourly pay
    Work at office
    Immediate start
    Visa sponsorship
    Work visa
    Flexible hours

    Early Warning Services, LLC

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Pre-Training Data Engineer for Large-Scale Multimodal AI. Be the first to apply!