Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff, Pre-Training Data

Cohere

Machine Learning Engineer Specializing in Pretraining Data

Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.

We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what's best for our customers.

Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.

Join us on our mission and shape the future!

As a Machine Learning Engineer specializing in pretraining data, you will play a pivotal role in developing the data pipeline that underpins Cohere's advanced language models. In this role, you will conduct data ablations to evaluate data quality and construct pre-training data mixtures to enhance model performance. By combining research and engineering, you will bridge the gap between raw data and cutting-edge AI models, directly contributing to improvements in critical training metrics like throughput and accelerator utilization.

Your work will be essential to Cohere's mission of delivering efficient and reliable language understanding and generation capabilities, driving innovation in natural language processing. If you are passionate about transforming data into the foundation of AI systems, this role offers a unique opportunity to make a meaningful impact.

There are no restrictions on where you can be located for this role between EST and EU.

As a Member of Technical Staff, Pre-Training Data, you will:

  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance.
  • Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency.
  • Research and implement innovative data curation methods, leveraging Cohere's infrastructure to drive advancements in natural language processing.
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models.

You may be a good fit if you have:

  • Strong software engineering skills, with proficiency in Python and experience building data pipelines.
  • Familiarity with curriculum learning, data mixing and data attribution.
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools.
  • Experience working with large-scale datasets, including web data, code data, and multilingual corpora.
  • Knowledge of data quality assessment techniques and experimentation with data mixtures.
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training.

Bonus: paper at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).

If some of the above doesn't line up perfectly with your experience, we still encourage you to apply!

We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.

Full-Time Employees at Cohere enjoy these Perks:

  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff, Pre-Training Data in United States vacancy
  • $180k

     ...Member Of Technical Staff - Pre-Training Palo Alto, CA About XAI XAI's mission is to create AI systems that can accurately understand the universe...  .... xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.... 
    Data
    Training
    Temporary work

    Xai

    Palo Alto, CA
    2 days ago
  •  ...Data Team Engineer Data is playing an increasingly...  ...better data. As a member of the Data Team, your...  ...that the data used to train our models meets a high...  ...class researchers on our pre-training teams, you'll...  ...clearly articulate complex technical concepts across teams... 
    Data
    Training
    Relocation package

    Reflection AI

    New York, NY
    2 days ago
  • $225k

     ...reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time...  ...On Scale distributed training across large GPU clusters (data, tensor, pipeline parallelism) Optimize communication... 
    Data
    Training
    Relocation
    Visa sponsorship

    Magic Inc

    San Francisco, CA
    2 days ago
  •  ...About the Role Build and scale distributed training systems that power frontier model pre-training. Work closely with research teams to design...  ...Familiarity with large-scale model parallelism strategies (data, tensor, pipeline, or expert parallelism).... 
    Data
    Training
    Relocation package

    Reflection AI, Inc

    New York, NY
    5 days ago
  • $119.8k - $234.7k

     ...Microsoft AI, we are on a mission to train the world's most capable AI...  ...and product deployment. The Pre-Training team at Microsoft AI...  ...track record and significant technical leadership in high-impact projects...  ...detail, and a commitment to data-driven decision-making Have... 
    Data
    Training
    Ongoing contract
    Work at office
    Local area

    Microsoft Corporation

    Mountain View, CA
    4 days ago
  • $180k

     ...About the Role We are looking for a Member of Technical Staff - Mid-Training to lead the development of training strategies that bridge pre-training and post-training, shaping how...  ...model capability development-defining how data, algorithms, and systems interact to... 
    Data
    Training
    Full time

    Hark

    San Jose, CA
    5 days ago
  • $200k

     ...Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context...  ...pre-training, post-training, data, inference, and product, and sits on...  ...'s most important decisions. As a Member of Technical Staff on Evals, you will build both the platform... 
    Data
    Training
    Visa sponsorship
    Relocation package

    Magic AI Corp.

    San Francisco, CA
    18 hours ago
  • $180k

     ...Member Of Technical Staff - RL Infrastructure Palo Alto, CA xAI's mission is to...  ...software engineers to create robust data pipelines, comprehensive...  ...collected that require complex pre-processing to prepare it for large-scale RL training. How do we standardize our preprocessing... 
    Data
    Training
    Temporary work

    Xai

    San Francisco, CA
    2 days ago
  •  ...is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises...  ...our mission and shape the future! As a Senior Member of Technical Staff specializing in web data for pre-training, you will play a pivotal role in... 
    Data
    Training
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    New York, NY
    18 hours ago
  •  ...Research and build solutions across algorithms, scaling laws, data processing, optimizers, and model architecture. Design and run...  ...independently while collaborating on larger initiatives Optimize the training infrastructure for efficient scaling. Contribute across the... 
    Data
    Training
    Relocation package

    Reflection AI, Inc

    San Francisco, CA
    7 days ago
  • $180k

     ...Member of Technical Staff - Multimodal Understanding Palo Alto, CA About xAI xAI's mission is to create AI...  ...video, audio, and text—spanning the full stack: data curation/acquisition, tokenizer training, large-scale pre-training, post-training/alignment,... 
    Data
    Training
    Temporary work

    Xai

    Palo Alto, CA
    5 days ago
  •  ...Member Of Technical Staff Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's fastest...  ...experienced scientists and engineers with deep expertise in pre- and mid-training large language models. You will advance our diffusion-... 
    Training
    Immediate start
    Flexible hours

    Inception LLC

    Palo Alto, CA
    2 days ago
  •  ...intelligence to serve humanity. We're training and deploying frontier...  .... This includes developing data-generation techniques for...  ...remote or hybrid works! As a Member of Technical Staff for Agents Modeling you will...  ...(Reasoning, Post-training, Pre-training, etc.) to improve... 
    Data
    Training
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    New York, NY
    4 days ago
  • $119.8k - $234.7k

     ...Overview We’re looking for data scientists to help build the next generation of post-training methods for frontier models at Microsoft...  ...team that works closely with pre-training, product, and...  ...advance model capabilities. Each team member owns meaningful parts of the post... 
    Data
    Training
    Ongoing contract
    Local area
    Worldwide

    Microsoft Corporation

    Mountain View, CA
    2 days ago
  • $134.64k - $176k

     ...S., Inc. Position Title: Member of Technical Staff Quality Engineer Salary:...  ...Compile and revie audit output data to identify improvement...  ...the successful completion of pre-employment conditions, as applicable...  ...recruitment, selection, training, utilization, promotion,... 
    Data
    Training
    Local area
    Monday to Friday

    GLOBALFOUNDRIES

    Ballston Spa, NY
    2 days ago
  • $180k - $300k

     ...Member Of Technical Staff - Large Scale Data Infrastructure Freiburg (Germany), San Francisco (USA) About Black Forest Labs We're the team behind...  .... You'll build the data systems behind the largest training runs on thousands of GPUs, where fixing one bottleneck... 
    Data
    Training
    Remote work
    Worldwide
    2 days per week

    Black Forest Labs

    Wildorado, TX
    2 days ago
  •  ...teams to understand exactly what data is needed, then turn ambiguous...  ...squeezing performance through pre/post-processing, parallelism,...  ...ambiguous needs into concrete technical systems Strong Python...  ...end-to-end outcomes, not just training models or writing research code... 
    Data
    Training

    Sieve, Inc.

    San Francisco, CA
    5 days ago
  •  ...Member Of Technical Staff - Post Training Freiburg (Germany) About Black Forest Labs We're the team behind Latent Diffusion, Stable Diffusion,...  ...training pipeline for our multimodal models end to end — from data strategy and reward modeling to preference optimization... 
    Data
    Training
    Remote work
    Worldwide
    2 days per week

    Black Forest Labs

    United States
    2 days ago
  • $225k

     ...reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time...  ...kernel optimization challenges around memory utilization, data movement, and sustained throughput. What you'll work on... 
    Data
    Training
    Remote work
    Relocation
    Visa sponsorship

    Magic AI Corp.

    United States
    1 day ago
  • $180k

     ...Member Of Technical Staff - Mid-training Palo Alto, CA xAI's mission is to create AI systems that can accurately understand the universe and aid humanity...  .... Responsibilities: Scale synthetic coding data to trillions of tokens with large-scale docker... 
    Data
    Training
    Temporary work

    Xai

    Palo Alto, CA
    4 days ago
  •  ...systems and processes that create tight feedback loops between data, evals, and model behavior Develop generalizable evaluation...  ...reasoning, alignment, and usefulness. Collaborate closely with pre-training, post-training, and applied teams to translate insights into... 
    Data
    Training
    Relocation package

    Reflection AI

    San Francisco, CA
    2 days ago
  • $200k

     ...reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time...  ...systems. Long-running distributed jobs, high-throughput data movement, and strict availability requirements demand infrastructure... 
    Data
    Training
    Relocation
    Visa sponsorship

    Magic Inc

    San Francisco, CA
    2 days ago
  •  ...Member Of Technical Staff - Pretraining Freiburg (Germany) About Black Forest Labs We're the team behind Latent Diffusion...  ...the center of our research effort. You'll shape training objectives, architectures, data strategies, and systems behind our joint image,... 
    Data
    Training
    Remote work
    Worldwide
    2 days per week

    Black Forest Labs

    United States
    2 days ago
  •  ...Member Of Technical Staff, Search We are looking for talented individuals to help us develop state...  ...working on a range of tasks including training our embedding and reranker models. You...  ...quality retrieval datasets and optimize data pipelines for model training and... 
    Data
    Training
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    United States
    2 days ago
  •  ...Member Of Technical Staff Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI...  ...organizational needs. We have all the compute, data, and talent available for you to do your... 
    Data
    Training
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    United States
    2 days ago
  •  ...About the Role RadixArk is seeking a Member of Technical Staff - Training to build and scale the systems that train frontier AI models. You...  ...Strong experience with large-scale distributed training (data, tensor, and pipeline parallelism) ~ Deep understanding... 
    Data
    Training
    Flexible hours

    RadixArk

    Palo Alto, CA
    4 days ago
  •  ...What You'll Do As a Founding Member of the Technical Staff (ML infra) at Architect, you'll be responsible...  ...that our researchers depend on to train models. Your work will directly enable...  ...building end-to-end ML pipelines, including data curation, preparation, and large-scale... 
    Data
    Training

    Architect Labs

    Palo Alto, CA
    2 days ago
  • $175k - $220k

     ...Member of Technical Staff, Performance Optimization San Mateo, CA About Us At Fireworks, we'...  ...for high-throughput AI workloads across training and inference Analyze and improve...  ...skills, interview performance, market data, and work location. The listed salary... 
    Data
    Training

    Fireworks AI

    San Mateo, CA
    5 days ago
  • $256k - $276k

     ...our vision at Postman. The Opportunity As a Member of Technical Staff on AI Infrastructure, you will build and maintain the...  ...distributed infrastructure that power AI model post training, inference, and data pipelines. You will collaborate with engineering and... 
    Data
    Training
    Work at office
    Flexible hours
    3 days per week

    Postman

    San Francisco, CA
    4 days ago
  • $180k

     ...Member of Technical Staff - Imagine Safety Palo Alto, CA About xAI xAI's mission is to create AI systems that can accurately understand...  ...loops between user interactions, model outputs, and training data to continuously improve safety while maintaining high performance... 
    Data
    Training
    Temporary work
    Worldwide

    Xai

    Palo Alto, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff, Pre-Training Data. Be the first to apply!