Research Scientist - Vision Language Model
Institute of Foundation Models
Job Description
Job Description
About the Institute of Foundation Models
We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.
As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.
Position SummaryAs a Research Scientist in the Vision Language Model (VLM) team, your role will be central to advancing state-of-the-art multimodal foundation models that integrate visual understanding, reasoning, and agentic capabilities. You will work on the research and development of large-scale VLM systems, spanning model architectures, data recipes for pre-training and post-training, and evaluation benchmarks. The role combines cutting-edge research with practical engineering, emphasizing large-scale data processing, filtering, and weighting pipelines, distributed training systems, and reinforcement learning algorithms and systems for multimodal reasoning and agent development.
Key ResponsibilitiesResearch and development of next-generation Vision Language Models across pre-training, instruction tuning, reasoning, and agents.
Develop novel architectures and training methodologies for integrating visual understanding, language reasoning, and tool-use capabilities.
Research efficient multimodal learning techniques, including data-efficient training, long-context modeling, model modularity, and inference optimization.
Build and improve large-scale multimodal datasets, synthetic data generation pipelines, and evaluation benchmarks for VLM capabilities.
Investigate multimodal reasoning, agentic behavior, OCR, grounding, document understanding, chart understanding, and visual question answering capabilities.
Contribute to technical reports, research publications, and open-source software.
Represent MBZUAI at research conferences and industry events, showcasing advancements in multimodal foundation models and large-scale AI systems.
Mentor junior researchers and collaborate across teams to drive impactful research initiatives.
PhD or equivalent research experience in Machine Learning, Computer Vision, Natural Language Processing, or Multimodal AI.
Salary Range
The posted salary range represents the company’s good faith estimate of the compensation for this position upon hire. The actual compensation offered may vary within this range depending on individual qualifications, including but not limited to relevant skills, experience, education, certifications, geographic location, and specific business needs.
Professional Experience Minimum
Experience working with large language models and/or vision-language models, including pre-training, fine-tuning, evaluation, or inference.
Strong Python and PyTorch development skills for large-scale machine learning research.
Experience with distributed training systems and large-scale model optimization.
Familiarity with multimodal datasets and data processing pipelines involving images, text, and video.
Understanding of modern deep learning architectures, including Transformers, attention mechanisms, and multimodal fusion techniques.
Experience with ML infrastructure, including model evaluation, debugging, optimization, and large-scale experimentation.
Problem-solving and research skills with the ability to independently drive research/engineering projects.
Effective communication and collaboration skills for working across research and engineering teams.
Hands-on experience training or fine-tuning large Vision Language Models or multimodal foundation models at scale.
Experience with distributed learning frameworks and infrastructure such as PyTorch Distributed, Megatron, Triton, or CUDA.
Research experience in multimodal reasoning, agentic systems, tool use, OCR, grounding, document understanding, or multimodal retrieval.
Experience with synthetic data generation, multimodal data curation, or automated evaluation frameworks for VLMs.
Familiarity with efficient training and inference techniques such as FlashAttention, quantization, tensor parallelism, pipeline parallelism, or memory optimization.
Experience contributing to open-source ML software and large-scale research codebases.
Strong publication record in leading AI conferences such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP, or related venues.
Experience collaborating across research, infrastructure, and product-oriented teams to deliver state-of-the-art multimodal systems.
$165k - $185k
...Description Company Description The Bosch Research and Technology Center North America... ...Silicon Valley focuses on Foundation Models, Big Data Visual Analytics, Explainable AI (XAI), Natural Language Processing, Computer Vision & Mixed Reality, Cloud Robotics, Data...LanguageWork experience placementWorldwide$185k - $215k
...Description The Bosch Research and Technology Center... ...Valley focuses on Foundation Models, Big Data Visual... ...Explainable AI (XAI), Natural Language Processing, Computer Vision & Mixed Reality, Cloud Robotics... ...As a Senior Research Scientist- Vision-Language-Action (...LanguageWork experience placementLocal areaWorldwide- The Institute of Foundation Models in Sunnyvale, California is seeking a Research Scientist for their Vision Language Model team. This role involves advancing state-of-the-art multimodal foundation models and developing large-scale systems combining visual understanding...Language
$190k - $250k
...developing large-scale generative world models that learn to predict realistic,... ...autonomous trucks. We are looking for a research scientist to lead the design and development of world... ...bonuses Excellent Medical, Dental, and Vision plans through Kaiser Permanente, Cigna,...SuggestedTemporary workWork at officeVisa sponsorshipFlexible hours- ...Job Title: CW Research on Large Vehicle Data Model - Summer Intern (99W210) About Kyyba: Founded... ...and post-training, leveraging language supervision, and enhancing multimodal... ...technical field. Prior experience with Vision Language Models (VLMs), Large...LanguageSummer internshipVisa sponsorshipWork visa
- ...Research Intern Applied Intuition, Inc. is powering... ...world-action foundation model with various world modalities including vision and physics associated... ...human data incorporation, language modality, and spatial... ...closely with the Research Scientists and Engineers on high...LanguageFor contractorsFor subcontractorCasual workInternshipWork at officeImmediate startRemote workDay shift
- ...to lead day-to-day execution of Chinese (zh-CN) multimedia and language data labeling and review work (e.g., video, images, and related... ...experience in data annotation, multimodal data labeling, computer vision labeling, content QA, or a closely related field, including...LanguageFull time
$184k - $287.5k
...deployment of cutting-edge deep learning models on every NVIDIA GPU. With demand for... ...particularly in the realm of large language models (LLMs) and vision language models (VLMs, VLAs), we are... ..., interfacing directly with NVIDIA Researchers, GPU Architects, and other teams...Language$160.36k - $240.54k
...flexible, partner-led business model, Nuro is working toward a... ...foundation models. Leverage large language models and world foundation... ..., in industry, or both. Research experiences in generative models... ...driving. Experiences in vision-language-action models, reinforcement...LanguageImmediate startFlexible hours$300k
...Institute of Foundation Models We are a dedicated research lab for building, understanding... ...-class researchers, data scientists, and engineers, tackling... ...the future of large language models. Why You’ll Love... ...Comprehensive medical, dental, and vision benefits *Bonus *401K...LanguageVisa sponsorship- ...Institute of Foundation Models We are a dedicated research lab for building, understanding... ...-class researchers, data scientists, and engineers, tackling... ...working with large language models, including evaluation... ...medical, dental, and vision benefits *Bonus *401K...LanguageWorldwideVisa sponsorship
- ...Institute of Foundation Models We are a dedicated research lab for building, understanding... ...-class researchers, data scientists, and engineers, tackling... ...on data-centric large language model (LLM) development,... ...Comprehensive medical, dental, and vision benefits *Bonus *401K...LanguageVisa sponsorship
- ...Institute of Foundation Models We are a dedicated research lab for building, understanding... ...-class researchers, data scientists, and engineers, tackling... ...specializing in Natural Language Processing (NLP) with a... ...models such as computer vision models ~ Identify defined...LanguageVisa sponsorship
- ...What You’ll Do Lead hands‑on research at the intersection of classical... ...image processing, computer vision, graphics, and content... ...signal processing, spectral/3D modeling, geometry, and calibration. Deep... ...segmentation, synthesis, captioning, language models). Experience with GPU...LanguageLocal areaWorldwideFlexible hours
$272k - $431.25k
...019461 Job Category: Research. Time Type: Full time.... ...generating it! Our world model team is pushing the... ...Lead a team of Research Scientists focused on world‑model... ...learning, computer vision, multimodal AI, robotics... ...video models, vision‑language‑action models, diffusion...LanguageFull time$224k - $356.5k
...never been done before takes vision, innovation, and the world’s... ...Principal Deep Learning Engineer — Model Evaluation & AI Systems, you... ...‑on experience with large language models and NLP, including... ...communicate effectively across research, engineering, and product teams...Language$192k - $304.75k
Responsibilities Conduct original research in the space of generative AI Implement and train large-scale generative AI models for various content creation applications... ...and practice of deep learning, computer vision, natural language processing, or computer graphics...Language$215.28k - $364.32k
...Machine Learning Engineer - Foundation Model Santa Clara, CA XPENG is a leading... ...full-time Machine Learning Engineer / Research Scientist to drive the modeling and... ...development of XPENG's next-generation Vision-Language-Action (VLA) Foundation Model — the...LanguageFull time$184k - $299k
Senior Research Scientist, Efficient Deep Learning NVIDIA is searching for... ...about methods for post-training model optimization (pruning,... ...the top venues in computer vision and machine learning. Our existing... .... Experience with large language models and large vision‑language...Language$147k - $211k
...experience. Experience training embodied reasoning VLMs (Vision Language Models). Experience working with simulators and real-world robots... ...unlock new robot capabilities. Write software to implement research ideas and iterate. Participate in research, including learning...LanguageFull time$192k - $304.75k
We’re now looking for a Senior Research Scientist, Multi-Modal Language Models! NVIDIA is seeking a Senior Research Scientist passionate about multi modal... ...or related areas 4+ years of experience in computer vision, especially multi‑modal LLMs Proficiency in Python with...Language$168k - $264.5k
NVIDIA Research’s \"AI-Mediated Reality and Interaction... ...for a research scientist. Our mission is to create... ...interaction and 4D world modeling using new ideas in... ...generative modeling, large language models, human behavior... ...venues in computer vision, artificial intelligence...Language$172.43k - $230.95k
...Senior Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate... ...Machine Learning models, including Large Language Models (LLMs). What You'll Be Working... ...~ Comprehensive health, dental & vision insurance ~ Employer contributions to HSA...LanguageTemporary work- ...Institute of Foundation Models We are a dedicated research lab for building, understanding... ...-class researchers, data scientists, and engineers, tackling... ...train advanced agentic language models that are adept at... ...medical, dental, and vision benefits *Bonus *401K...LanguageVisa sponsorship
$150k - $300k
...scale. The Silicon Valley Research Lab focuses on developing... ...) , etc. As a Research Scientist in the team, you will... ...and evaluate algorithms, models and prototypes of AI systems... ...machine learning, natural language processing, computer vision, reinforcement learning,...LanguageFull timeH1bWork at officeVisa sponsorship3 days per week$165k - $195k
...Job Description Company Description The Bosch Research and Technology Center North America with offices in... ...AI research in Silicon Valley focuses on Foundation Models, Natural Language Processing, Computer Vision & Mixed Reality, Cloud Robotics, Big Data Visual Analytics...LanguageFull timeWork experience placementLocal areaWorldwide- ...software and foundation models enable vehicles to... ...driving systems. Our vision is to create autonomy... ...re looking for Applied Scientists to join Wayve Labs and... ...are a high‑conviction research team with the strategic... ...inputs, using vision, language, and active sensors....LanguageFull timeWork at officeWork from homeVisa sponsorshipRelocation packageFlexible hours
$168k - $264.5k
NVIDIA is searching for an outstanding research scientist to build humanoid robot foundation models and systems in the Generalist Embodied Agent Research (GEAR... ...least one of the following topics: LLMs; Large vision‑language models; Video generative models and diffusion...Language- Google Inc. is hiring a Research Scientist in Mountain View, CA, with a strong background in Machine Learning and a PhD in Computer Science or related field. The role involves proposing independent research directions, managing a strong research agenda, and translating...
- ...Sunnyvale, California. In this full-time role, you will evaluate AI model outputs, performing audit-based reviews to enhance data quality... ...degree, strong critical thinking skills, and native-level language proficiency. Benefits include paid time off, medical coverage,...LanguageFull timeWork at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Research Scientist - Vision Language Model. Be the first to apply!
- deep learning scientist Sunnyvale, CA
- support scientist Sunnyvale, CA
- quality control scientist Sunnyvale, CA
- scientist biology Sunnyvale, CA
- lab scientist Sunnyvale, CA
- scientist immunology Sunnyvale, CA
- health scientist Sunnyvale, CA
- validation scientist Sunnyvale, CA
- analytical scientist Sunnyvale, CA
- molecular biology scientist Sunnyvale, CA


