Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Research Scientist - Data

$150k

Institute of Foundation Models

About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk‑managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge‑driven economy. As part of our team, you’ll have the opportunity to work on the core of cutting‑edge foundation model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers. The Role As a Research Scientist in the Data team, your primary responsibility is to curate high quality data at the web‑scale to fuel the development of next generation foundation models. You will work on exploring and consolidating data sources and collaborate with cross‑functional teams to conduct in‑depth data research, contributing to MBZUAI’s mission of driving impactful AI discoveries and positioning the institution as a leader in the global AI research community. Your expertise will be key in enhancing the performance of large‑scale machine learning models, while supporting the development of transformative AI tools that can influence industries worldwide. Key Responsibilities Pioneer web‑scale data collection and curation methodologies for LLMs and multi‑modal foundation models. Design and implement novel data synthesis pipelines for code, mathematics, and agentic reasoning datasets. Trace the impact of data from pre‑training to final model capabilities and create automated quality assessment frameworks for massive datasets. Design data recipes that maximize model capabilities across diverse domains. Optimize data‑model co‑design for improved training dynamics. Contribute to research papers and represent MBZUAI at industry conferences and events, showcasing the institution’s AI research and innovation. Academic Qualifications Minimum: Master’s in Computer Science, Data Science, or a related technical field, or equivalent practical experience required. Preferred: PhD or equivalent research experience in Machine Learning, NLP, or Data Science with a focus on LLMs and data is preferred. Professional Experience Experience working with large language models, including evaluation, fine‑tuning, and prompt engineering. Strong Python development skills with a focus on research‑grade code and scalable data pipelines. Familiarity with collecting and processing large‑scale datasets from open‑source and web resources. Demonstrated ability to work with ML infrastructure (e.g., model evaluation, optimization, debugging). Proactive mindset with the ability to identify impactful research questions and execute on them with minimal supervision. Effective communication and collaboration skills for working in cross‑functional teams. Preferred Prior research experience in areas such as web data curation and mixing, synthetizing complex datasets for training, LLM evaluation, post‑training data, efficient inference, LLM‑as‑a‑judge, tokenization. Strong publication record in leading AI conferences (e.g., NeurIPS, ICLR, ICML, EMNLP) and/or prior contributions to open‑source AI research or data tools. Hands‑on experience training language/mulitli-modal models from scratch. $150,000 - $450,000 a year Visa Sponsorship This position is eligible for visa sponsorship. Benefits Include Comprehensive medical, dental, and vision benefits Bonus 401K Plan Generous paid time off, sick leave and holidays Paid Parental Leave Employee Assistance Program Life insurance and disability #J-18808-Ljbffr

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Research Scientist - Data in Sunnyvale, CA vacancy
  • $158k - $304k

     ...Decisive Point in Mountain View, CA is seeking a passionate Research Scientist to join its Research Team. You will engage in world-class research for autonomous systems, leveraging large-scale data and industry-leading tools to develop advanced models. The role requires... 
    Data

    Decisive Point

    Mountain View, CA
    1 day ago
  • $158k - $304k

     ...Decisive Point in Mountain View, CA is seeking a passionate Research Scientist to drive research for autonomous systems. You will analyze large-scale data and utilize industry-leading tools to innovate in autonomous driving and robotics. The ideal candidate will possess... 
    Data
    Full time

    Decisive Point

    Mountain View, CA
    1 day ago
  •  ...pretraining to deployment on real robotic hardware. This is foundational research with direct physical impact. No hand-offs, no bureaucracy, just...  ..., not shallow associations Run end-to-end training loops: data curation, experiment design, failure diagnosis, and iteration... 
    Data

    Prime Recruitment Partners

    Santa Clara, CA
    23 hours ago
  • $150k

     ...A leading AI research institution is seeking a Research Scientist specializing in large language models in Sunnyvale, California. The role involves advancing LLM capabilities, designing data pipelines, and collaborating on innovative AI solutions. Candidates should have... 
    Data

    Institute of Foundation Models

    Sunnyvale, CA
    1 day ago
  • $150k

     ...A leading AI research institution in Sunnyvale is seeking a Research Scientist to curate web-scale data crucial for developing foundation models. This role involves pioneering data collection methods and collaborating with cross-functional teams to enhance AI capabilities... 
    Data

    Institute of Foundation Models

    Sunnyvale, CA
    1 day ago
  • $168k - $264.5k

     ...NVIDIA is searching for a world‑class generative AI researcher to join the fundamental generative AI research team at NVIDIA Research. We...  ...molecules, molecular dynamics, proteins, RNA, or other scientific data. Excellent programming skills in some prototyping environments... 
    Data

    NVIDIA Gruppe

    Santa Clara, CA
    23 hours ago
  •  ...We are looking for a Senior Research Scientist passionate about Large Language Model (LLM) and Diffusion Language Model (DLM) post‑training...  ...Solid background in computer science fundamentals: algorithms, data structures, parallel/distributed computing, and systems programming... 
    Data

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $184k - $299k

     ...Senior Research Scientist, Efficient Deep Learning NVIDIA is searching for an outstanding Senior Researcher working on efficient deep learning...  ...‑on experience with large‑scale model training including data preparation and model parallelization (tensor and pipeline) is... 
    Data

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $184k - $299k

     ...Senior Research Scientist, Security and Privacy page is loaded## Senior Research Scientist, Security and Privacylocations: US, MA, Westford:...  ...safety concerns are increasingly limiting the access and use of data in AI as well as the use of AI in critical use cases. In... 
    Data
    Work experience placement

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...step of our exciting journey. The mission of the Waymo Research team is to develop machine learning solutions addressing open problems...  ...learning, etc) to these problems; scale them to Google-sized data pipelines; and streamline them to run in real-time on the cars.... 
    Data
    Internship
    Summer internship
    Local area

    Waymo

    Mountain View, CA
    1 day ago
  • $168k - $264.5k

     ...We are now looking for a Research Scientist New Graduate with a focus on Machine Learning Systems (MLSys). NVIDIA Research seeks exceptional...  ...such as AI/ML systems, operating systems, distributed systems, data management, cloud computing, or computer architecture. What... 
    Data

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $192k - $304.75k

     ...We’re now looking for a Senior Research Scientist, Multi-Modal Language Models! NVIDIA is seeking a Senior Research Scientist passionate about...  ...in that we strive for open models, open weights, open data. We want to deliver models that work amazingly well in the real... 
    Data

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $192k - $304.75k

     ...We are now looking for a Research Scientist with a focus in System Software and I/O! NVIDIA is seeking Research Scientists with a focus in System...  ...workloads such as recommender systems, graph analytics, and data frames. Your base salary will be determined based on your... 
    Data
    Work experience placement

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  •  ...optimization and integration into the Waymo Driver. We conduct our own research to address real-world problems and collaborate with research teams at Alphabet. We have access to millions of miles of driving data from a diverse set of sensors, enabling engineers like you to (1... 
    Data
    Full time
    Temporary work
    Remote work

    Somi AI

    Mountain View, CA
    23 hours ago
  • $55 per hour

     ...Prototype MR/AR demos using Python, MATLAB, Blender, Unity, and C#. Research and develop computer vision and machine learning algorithms for...  ...Python, C++, and C#. Solid background in signal processing and data analysis. Strong understanding of image processing, color... 
    Data
    Weekly pay
    Temporary work
    Flexible hours

    Hong Kong Study Skills Research Institute

    Sunnyvale, CA
    1 day ago
  • $300k

     ...About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation...  ...model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges... 
    Data
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    1 day ago
  • $150k

     ...About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation...  ...model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges... 
    Data
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    23 hours ago
  • $183.83k - $275.98k

     ...Behavior team, leveraging the cutting edge of machine learning research to solve challenging real-world robotics problems. This role is...  ...quickly and efficiently, collaborating with other teams to determine data and infrastructure support needs, and working to improve model... 
    Data

    Icehouseventures

    Mountain View, CA
    23 hours ago
  • $204k - $259k

     ...initiate and foster collaborations with other research teams in Alphabet. AI Foundations areas...  ...role, you will report to a Principal Scientist. You will: Participate in Waymo’s Foundation...  ...and performant manner such as Data parallel, FSDP and other sharding approaches... 
    Data
    Temporary work
    Remote work

    Neura Market

    Mountain View, CA
    23 hours ago
  • $150k

     ...About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk‑managing foundation...  ...model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges... 
    Data
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    23 hours ago
  • $158k - $304k

     ...About the role We are looking for a passionate Research Scientist to join the Research Team at Applied Intuition. This team conducts world‑class...  .... You will have access to millions of miles of large‑scale data and industry‑leading simulators and tools to develop cutting‑... 
    Data
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Immediate start
    Remote work
    Flexible hours

    Decisive Point

    Mountain View, CA
    23 hours ago
  • $150k

     ...About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation...  ...model training, alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges... 
    Data
    Visa sponsorship
    Shift work

    Institute of Foundation Models

    Sunnyvale, CA
    1 day ago
  • $158k - $304k

     ...by exception.) About the role We are looking for a passionate Research Scientist to join the Research Team at Applied Intuition. This team...  ...robotics. You will have access to millions of miles of large‑scale data and industry‑leading simulators and tools to develop cutting‑... 
    Data
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Immediate start
    Remote work
    Day shift

    Decisive Point

    Mountain View, CA
    1 day ago
  •  ...career! The Role We’re looking for Applied Scientists to join Wayve Labs and help build the...  ...within Wayve, we are a high‑conviction research team with the strategic patience and backing...  ...over long contexts, and scale with data and compute. Cross‑Embodiment and Multimodal... 
    Data
    Full time
    Work at office
    Work from home
    Visa sponsorship
    Relocation package
    Flexible hours

    Icehouseventures

    Sunnyvale, CA
    1 day ago
  • $165k - $185k

     ...Full-time Company Description The Bosch Research and Technology Center North America with...  ...Valley focuses on Foundation Models, Big Data Visual Analytics, Explainable AI (XAI), Natural...  ...and CoRL. Job Description As a Research Scientist- Vision- Language- Action (VLA) Models,... 
    Data
    Full time
    Work experience placement
    Local area
    Worldwide

    Robert Bosch Group

    Sunnyvale, CA
    1 day ago
  • $201.3k - $367.4k

     ...Machine Learning Research Scientist - Health AIML The Health AIML team is at the forefront of machine learning and health science at Apple...  ...technology that enables models to understand health and fitness data. Responsibilities Lead research into health and... 
    Data
    Work experience placement
    Worldwide
    Relocation

    Apple

    Cupertino, CA
    3 days ago
  • $126k - $423k

     ...About the role and team We are looking for multiple passionate Research Scientists to join the Research Group at Applied Intuition. The mission...  ...tools and infra, researchers can access millions of miles of data from large fleets, and deploy methods they develop into... 
    Data
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Immediate start
    Remote work
    Day shift

    Decisive Point

    Sunnyvale, CA
    1 day ago
  •  ...Applied Machine Learning Research Scientist Sunnyvale CA or Toronto Canada Cerebras Systems builds the world's largest AI chip, 56 times...  ...system behaviors, improving model quality, and iterating on data and evaluation strategies. Your contributions will help translate... 
    Data
    Internship

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    3 days ago
  •  ...role involves applying novel machine learning techniques, building ML data pipelines, and deploying autonomous software. Qualifications include an M.Sc. or Ph.D. with 6+ years of experience, expertise in C++ or Python, and a strong research background. #J-18808-Ljbffr... 
    Data

    I did my part and supported the Regular Toilet

    Mountain View, CA
    1 day ago
  •  ...energy are the foundation of what we do. We ingest large-scale data—weather, prices, load, and grid conditions—to build...  ...we work. The Role We are looking for a Power Systems Research Scientist to develop physics-based models of large-scale transmission systems... 
    Data

    Gridmatic

    Cupertino, CA
    26 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Scientist - Data. Be the first to apply!