Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Research Scientist - Vision Language Model

$150k

Institute of Foundation Models

About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy. As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers. Position Summary As a Research Scientist in the Vision Language Model (VLM) team, your role will be central to advancing state-of-the-art multimodal foundation models that integrate visual understanding, reasoning, and agentic capabilities. You will work on the research and development of large-scale VLM systems, spanning model architectures, data recipes for pre-training and post-training, and evaluation benchmarks. The role combines cutting-edge research with practical engineering, emphasizing large-scale data processing, filtering, and weighting pipelines, distributed training systems, and reinforcement learning algorithms and systems for multimodal reasoning and agent development. Key Responsibilities Research and development of next-generation Vision Language Models across pre-training, instruction tuning, reasoning, and agents. Develop novel architectures and training methodologies for integrating visual understanding, language reasoning, and tool-use capabilities. Research efficient multimodal learning techniques, including data-efficient training, long-context modeling, model modularity, and inference optimization. Build and improve large-scale multimodal datasets, synthetic data generation pipelines, and evaluation benchmarks for VLM capabilities. Investigate multimodal reasoning, agentic behavior, OCR, grounding, document understanding, chart understanding, and visual question answering capabilities. Contribute to technical reports, research publications, and open-source software. Represent MBZUAI at research conferences and industry events, showcasing advancements in multimodal foundation models and large-scale AI systems. Mentor junior researchers and collaborate across teams to drive impactful research initiatives. Academic Qualifications PhD or equivalent research experience in Machine Learning, Computer Vision, Natural Language Processing, or Multimodal AI. $150,000 - $450,000 a year Professional Experience Minimum Experience working with large language models and/or vision-language models, including pre-training, fine-tuning, evaluation, or inference. Strong Python and PyTorch development skills for large-scale machine learning research. Experience with distributed training systems and large-scale model optimization. Familiarity with multimodal datasets and data processing pipelines involving images, text, and video. Understanding of modern deep learning architectures, including Transformers, attention mechanisms, and multimodal fusion techniques. Experience with ML infrastructure, including model evaluation, debugging, optimization, and large-scale experimentation. Problem-solving and research skills with the ability to independently drive research/engineering projects. Effective communication and collaboration skills for working across research and engineering teams. Preferred Skills Hands-on experience training or fine-tuning large Vision Language Models or multimodal foundation models at scale. Experience with distributed learning frameworks and infrastructure such as PyTorch Distributed, Megatron, Triton, or CUDA. Research experience in multimodal reasoning, agentic systems, tool use, OCR, grounding, document understanding, or multimodal retrieval. Experience with synthetic data generation, multimodal data curation, or automated evaluation frameworks for VLMs. Familiarity with efficient training and inference techniques such as FlashAttention, quantization, tensor parallelism, pipeline parallelism, or memory optimization. Experience contributing to open-source ML software and large-scale research codebases. Strong publication record in leading AI conferences such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP, or related venues. Experience collaborating across research, infrastructure, and product-oriented teams to deliver state-of-the-art multimodal systems. #J-18808-Ljbffr

Vacancy posted 13 hours ago
Similar jobs that could be interesting for youBased on the Research Scientist - Vision Language Model in Sunnyvale, CA vacancy
  • $165k - $185k

     ...Senior AI Research Scientist- World Model Full-time The Bosch Research and Technology Center North America with offices in Sunnyvale...  ...Data Visual Analytics, Explainable AI (XAI), Natural Language Processing, Computer Vision & Mixed Reality, Cloud Robotics, Data Science, AI... 
    Language
    Full time
    Work experience placement
    Worldwide

    Robert Bosch Group

    Sunnyvale, CA
    12 hours ago
  • $165k - $185k

     ...Robert Bosch Group seeks a motivated Research Scientist specializing in Vision-Language-Action Models in Sunnyvale, California. The role emphasizes cutting-edge research in AI, focusing on autonomous systems and collaboration across global teams. The successful candidate... 
    Language

    Robert Bosch Group

    Sunnyvale, CA
    12 hours ago
  •  ...The Institute of Foundation Models in Sunnyvale, California is seeking a Research Scientist for their Vision Language Model team. This role involves advancing state-of-the-art multimodal foundation models and developing large-scale systems combining visual understanding... 
    Language

    Institute of Foundation Models

    Sunnyvale, CA
    12 hours ago
  • $150k

     ...the Institute of Foundation Models We are a dedicated research lab for building,...  ...world-class researchers, data scientists, and engineers, tackling the...  ...specializing in Computer Vision your role will be crucial...  ...AI‑related concepts (e.g., language modeling, computer vision)... 
    Language
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    13 hours ago
  • $165k - $185k

     ...Company Description The Bosch Research and Technology Center...  ...focuses on Foundation Models, Big Data Visual Analytics...  ...Explainable AI (XAI), Natural Language Processing, Computer Vision & Mixed Reality, Cloud...  ...As a Research Scientist- Vision- Language- Action... 
    Language
    Full time
    Work experience placement
    Local area
    Worldwide

    Robert Bosch Group

    Sunnyvale, CA
    12 hours ago
  • $184k - $287.5k

     ...new AI-powered application is built. We are seeking a senior vision language model engineer to design and build agentic data and training...  ...Physical AI. What you'll be doing: Partner with our researchers to develop and evaluate prototypes of our latest models, such... 
    Language

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...General Motors is seeking a Staff Research Scientist specializing in Vision-Language Models to redefine mobility. You will lead advancements in AI for autonomous driving at the Mountain View Technical Center. This remote position requires a Ph.D. and 5+ years of experience... 
    Language
    Remote work

    General Motors

    Mountain View, CA
    13 hours ago
  • $126k - $423k

     ...looking for multiple passionate Research Scientists to join the Research Group at...  ...pretraining world-action foundation model with various world modalities including vision and physics associated with ego...  ..., human data incorporation, language modality, and spatial reasoning... 
    Language
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Immediate start
    Remote work
    Day shift

    Applied Intuition

    Sunnyvale, CA
    14 days ago
  •  ...as our ability to measure it. At Sanas, model quality spans dimensions that automated...  ...-world disfluency. We’re looking for a Research Scientist who can define what "better" actually...  ...Noise Cancellation, Speech Enhancement, Language Translation, and more — ensuring each captures... 
    Language

    Sanas

    Palo Alto, CA
    13 hours ago
  •  ...Return to jobs list Overview Research Scientist, Vision-Language-Action Models Job type: Full Time · Department: Manufacturing Engineering · Work type: On-Site Menlo Park, California, United States About Matter Matter is building the AI-native autonomy stack for physical... 
    Language
    Full time
    Contract work
    Immediate start

    Neara

    Menlo Park, CA
    13 hours ago
  • $190k - $250k

     ...developing large-scale generative world models that learn to predict realistic,...  ...autonomous trucks. We are looking for a research scientist to lead the design and development of world...  ...bonuses Excellent Medical, Dental, and Vision plans through Kaiser Permanente, Cigna,... 
    Temporary work
    Work at office
    Visa sponsorship
    Flexible hours

    Kodiak

    Mountain View, CA
    18 days ago
  •  ...Job Title: CW Research on Large Vehicle Data Model - Summer Intern (99W210) About Kyyba: Founded...  ...and post-training, leveraging language supervision, and enhancing multimodal...  ...technical field. Prior experience with Vision Language Models (VLMs), Large... 
    Language
    Summer internship
    Visa sponsorship
    Work visa

    Kyyba

    Mountain View, CA
    2 days ago
  • $184k - $287.5k

     ...deployment of cutting-edge deep learning models on every NVIDIA GPU. With demand for...  ...particularly in the realm of large language models (LLMs) and vision language models (VLMs, VLAs), we are...  ..., interfacing directly with NVIDIA Researchers, GPU Architects, and other teams... 
    Language

    NVIDIA

    Santa Clara, CA
    12 hours ago
  • $50 per hour

     ...to lead day-to-day execution of Chinese (zh-CN) multimedia and language data labeling and review work (e.g., video, images, and related...  ...experience in data annotation, multimodal data labeling, computer vision labeling, content QA, or a closely related field, including... 
    Language
    Full time

    Welocalize

    Cupertino, CA
    1 day ago
  •  ...stack of a unified multimodal foundation model, from pretraining to deployment on real...  ...robotic hardware. This is foundational research with direct physical impact. No hand-...  ...large-scale multimodal architectures where vision, language, and kinematics share a unified... 
    Language

    Prime Recruitment Partners

    Santa Clara, CA
    11 hours ago
  • $224k - $356.5k

     ...never been done before takes vision, innovation, and the world’s...  ...Principal Deep Learning Engineer — Model Evaluation & AI Systems, you...  ...‑on experience with large language models and NLP, including...  ...communicate effectively across research, engineering, and product teams... 
    Language

    NVIDIA Gruppe

    Santa Clara, CA
    12 hours ago
  • $192k - $304.75k

     ...Responsibilities Conduct original research in the space of generative AI Implement and train large-scale generative AI models for various content creation applications...  ...and practice of deep learning, computer vision, natural language processing, or computer graphics Track... 
    Language

    University of Georgia- FACS

    Santa Clara, CA
    1 day ago
  • $184k - $299k

     ...Senior Research Scientist, Efficient Deep Learning NVIDIA is searching for...  ...about methods for post-training model optimization (pruning,...  ...the top venues in computer vision and machine learning. Our existing...  .... Experience with large language models and large vision‑language... 
    Language

    NVIDIA

    Santa Clara, CA
    13 hours ago
  • $192k - $304.75k

     ...We’re now looking for a Senior Research Scientist, Multi-Modal Language Models! NVIDIA is seeking a Senior Research Scientist passionate about multi modal...  ...or related areas 4+ years of experience in computer vision, especially multi‑modal LLMs Proficiency in Python with... 
    Language

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $192k - $304.75k

     ...We are now looking for a Senior Research Scientist focused on Multimodal Foundation Models and Robotics! NVIDIA is searching for an outstanding research scientist...  ...least one of the following topics: LLMs; Large vision‑language models; Video generative models and diffusion... 
    Language

    University of Georgia- FACS

    Santa Clara, CA
    12 hours ago
  • $300k

     ...the Institute of Foundation Models We are a dedicated research lab for building,...  ...world-class researchers, data scientists, and engineers, tackling the...  ...shape the future of large language models. Why You’ll Love This...  ...medical, dental, and vision benefits Bonus 401K Plan Generous... 
    Language
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    13 hours ago
  • $150k - $300k

     ...scale. The Silicon Valley Research Lab focuses on developing...  ...RL) , etc. As a Research Scientist in the team, you will...  ...and evaluate algorithms, models and prototypes of AI systems...  ...machine learning, natural language processing, computer vision, reinforcement learning,... 
    Language
    Full time
    H1b
    Work at office
    Visa sponsorship
    3 days per week

    Horizon Robotics Inc

    Cupertino, CA
    3 days ago
  • $160.36k - $240.54k

     ...Machine Learning Research Scientist: Generative Modeling for Planning Mountain View, California (HQ) Nuro...  ...foundation models. Leverage large language models and world foundation models...  ...autonomous driving. Experiences in vision-language-action models, reinforcement... 
    Language

    Nuro

    Mountain View, CA
    2 days ago
  •  ...What You’ll Do Lead hands‑on research at the intersection of classical...  ...image processing, computer vision, graphics, and content...  ...signal processing, spectral/3D modeling, geometry, and calibration. Deep...  ...segmentation, synthesis, captioning, language models). Experience with GPU... 
    Language
    Local area
    Worldwide
    Flexible hours

    Via Licensing Corporation

    Sunnyvale, CA
    4 days ago
  • $150k

     ...the Institute of Foundation Models We are a dedicated research lab for building,...  ...world‑class researchers, data scientists, and engineers, tackling the...  ...Experience working with large language models, including...  ...Comprehensive medical, dental, and vision benefits Bonus 401K Plan... 
    Language
    Worldwide
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    12 hours ago
  • $181.1k - $318.4k

     ...bring smile to people’s face”. Foundation Model Services team, within Machine Learning...  ...on optimizing billions of parameter language and vision and speech models using state of the art...  ...time. Work alongside Foundation Model Research team to prototype and develop inference... 
    Language
    Relocation

    Apple

    Santa Clara, CA
    13 hours ago
  • $150k

     ...the Institute of Foundation Models We are a dedicated research lab for building,...  ...world‑class researchers, data scientists, and engineers, tackling the...  ...focus on data‑centric large language model (LLM) development,...  ...Comprehensive medical, dental, and vision benefits Bonus 401K Plan... 
    Language
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    4 days ago
  • $158k - $304k

     ...role We are looking for a passionate Research Scientist to join the Research Team at Applied Intuition...  ...and tools to develop cutting‑edge models at scale. In addition to your research...  ...robotic foundation model and vision‑language‑action model, reinforcement learning and... 
    Language
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Immediate start
    Remote work
    Flexible hours

    Decisive Point

    Mountain View, CA
    12 hours ago
  • $156k - $387.6k

     ...digital image processing, computer vision, and system‑level...  ...psychophysics experiments, early vision modeling, and developing perceptual...  ...inclusive of graduate school research experience) in related fields...  ...analysis, or prototyping languages such as MATLAB, Python, C++.... 
    Language
    Temporary work
    Local area

    ByteDance

    San Jose, CA
    13 hours ago
  •  ...to lead day-to-day execution of Chinese (zh-CN) multimedia and language data labeling and review work (e.g., video, images, and related...  ...experience in data annotation, multimodal data labeling, computer vision labeling, content QA, or a closely related field, including... 
    Language
    Full time

    Welo Global

    Sunnyvale, CA
    27 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Scientist - Vision Language Model. Be the first to apply!