Research Scientist - Vision Language Model

$150k

Institute of Foundation Models

About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy. As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers. Position Summary As a Research Scientist in the Vision Language Model (VLM) team, your role will be central to advancing state-of-the-art multimodal foundation models that integrate visual understanding, reasoning, and agentic capabilities. You will work on the research and development of large-scale VLM systems, spanning model architectures, data recipes for pre-training and post-training, and evaluation benchmarks. The role combines cutting-edge research with practical engineering, emphasizing large-scale data processing, filtering, and weighting pipelines, distributed training systems, and reinforcement learning algorithms and systems for multimodal reasoning and agent development. Key Responsibilities Research and development of next-generation Vision Language Models across pre-training, instruction tuning, reasoning, and agents. Develop novel architectures and training methodologies for integrating visual understanding, language reasoning, and tool-use capabilities. Research efficient multimodal learning techniques, including data-efficient training, long-context modeling, model modularity, and inference optimization. Build and improve large-scale multimodal datasets, synthetic data generation pipelines, and evaluation benchmarks for VLM capabilities. Investigate multimodal reasoning, agentic behavior, OCR, grounding, document understanding, chart understanding, and visual question answering capabilities. Contribute to technical reports, research publications, and open-source software. Represent MBZUAI at research conferences and industry events, showcasing advancements in multimodal foundation models and large-scale AI systems. Mentor junior researchers and collaborate across teams to drive impactful research initiatives. Academic Qualifications PhD or equivalent research experience in Machine Learning, Computer Vision, Natural Language Processing, or Multimodal AI. $150,000 - $450,000 a year Professional Experience Minimum Experience working with large language models and/or vision-language models, including pre-training, fine-tuning, evaluation, or inference. Strong Python and PyTorch development skills for large-scale machine learning research. Experience with distributed training systems and large-scale model optimization. Familiarity with multimodal datasets and data processing pipelines involving images, text, and video. Understanding of modern deep learning architectures, including Transformers, attention mechanisms, and multimodal fusion techniques. Experience with ML infrastructure, including model evaluation, debugging, optimization, and large-scale experimentation. Problem-solving and research skills with the ability to independently drive research/engineering projects. Effective communication and collaboration skills for working across research and engineering teams. Preferred Skills Hands-on experience training or fine-tuning large Vision Language Models or multimodal foundation models at scale. Experience with distributed learning frameworks and infrastructure such as PyTorch Distributed, Megatron, Triton, or CUDA. Research experience in multimodal reasoning, agentic systems, tool use, OCR, grounding, document understanding, or multimodal retrieval. Experience with synthetic data generation, multimodal data curation, or automated evaluation frameworks for VLMs. Familiarity with efficient training and inference techniques such as FlashAttention, quantization, tensor parallelism, pipeline parallelism, or memory optimization. Experience contributing to open-source ML software and large-scale research codebases. Strong publication record in leading AI conferences such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP, or related venues. Experience collaborating across research, infrastructure, and product-oriented teams to deliver state-of-the-art multimodal systems. #J-18808-Ljbffr

Apply

Vacancy posted 13 hours ago

Similar jobs that could be interesting for youBased on the Research Scientist - Vision Language Model in Sunnyvale, CA vacancy

Senior AI Research Scientist- World Model
$165k - $185k
...Senior AI Research Scientist- World Model Full-time The Bosch Research and Technology Center North America with offices in Sunnyvale... ...Data Visual Analytics, Explainable AI (XAI), Natural Language Processing, Computer Vision & Mixed Reality, Cloud Robotics, Data Science, AI...
Language
Full time
Work experience placement
Worldwide
Robert Bosch Group
Sunnyvale, CA
12 hours ago
Research Scientist: Vision-Language-Action for Embodied AI
$165k - $185k
...Robert Bosch Group seeks a motivated Research Scientist specializing in Vision-Language-Action Models in Sunnyvale, California. The role emphasizes cutting-edge research in AI, focusing on autonomous systems and collaboration across global teams. The successful candidate...
Language
Robert Bosch Group
Sunnyvale, CA
12 hours ago
Research Scientist, Vision-Language Multimodal AI
...The Institute of Foundation Models in Sunnyvale, California is seeking a Research Scientist for their Vision Language Model team. This role involves advancing state-of-the-art multimodal foundation models and developing large-scale systems combining visual understanding...
Language
Institute of Foundation Models
Sunnyvale, CA
12 hours ago
Research Scientist - Computer Vision
$150k
...the Institute of Foundation Models We are a dedicated research lab for building,... ...world-class researchers, data scientists, and engineers, tackling the... ...specializing in Computer Vision your role will be crucial... ...AI‑related concepts (e.g., language modeling, computer vision)...
Language
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
13 hours ago
Research Scientist Vision Language Action VLA Models
$165k - $185k
...Company Description The Bosch Research and Technology Center... ...focuses on Foundation Models, Big Data Visual Analytics... ...Explainable AI (XAI), Natural Language Processing, Computer Vision & Mixed Reality, Cloud... ...As a Research Scientist- Vision- Language- Action...
Language
Full time
Work experience placement
Local area
Worldwide
Robert Bosch Group
Sunnyvale, CA
12 hours ago
Senior Vision Language Model Engineer
$184k - $287.5k
...new AI-powered application is built. We are seeking a senior vision language model engineer to design and build agentic data and training... ...Physical AI. What you'll be doing: Partner with our researchers to develop and evaluate prototypes of our latest models, such...
Language
NVIDIA
Santa Clara, CA
1 day ago
Staff Research Scientist: Vision-Language AI for Autonomy
...General Motors is seeking a Staff Research Scientist specializing in Vision-Language Models to redefine mobility. You will lead advancements in AI for autonomous driving at the Mountain View Technical Center. This remote position requires a Ph.D. and 5+ years of experience...
Language
Remote work
General Motors
Mountain View, CA
13 hours ago
Research Scientist - World Action Foundation Model, Robotics
$126k - $423k
...looking for multiple passionate Research Scientists to join the Research Group at... ...pretraining world-action foundation model with various world modalities including vision and physics associated with ego... ..., human data incorporation, language modality, and spatial reasoning...
Language
Full time
For contractors
For subcontractor
Casual work
Work at office
Immediate start
Remote work
Day shift
Applied Intuition
Sunnyvale, CA
14 days ago
Research Scientist (Model Evaluation)
...as our ability to measure it. At Sanas, model quality spans dimensions that automated... ...-world disfluency. We’re looking for a Research Scientist who can define what "better" actually... ...Noise Cancellation, Speech Enhancement, Language Translation, and more — ensuring each captures...
Language
Sanas
Palo Alto, CA
13 hours ago
Research Scientist, Vision-Language-Action Models
...Return to jobs list Overview Research Scientist, Vision-Language-Action Models Job type: Full Time · Department: Manufacturing Engineering · Work type: On-Site Menlo Park, California, United States About Matter Matter is building the AI-native autonomy stack for physical...
Language
Full time
Contract work
Immediate start
Neara
Menlo Park, CA
13 hours ago
World Model Research Scientist- Physical AI
$190k - $250k
...developing large-scale generative world models that learn to predict realistic,... ...autonomous trucks. We are looking for a research scientist to lead the design and development of world... ...bonuses Excellent Medical, Dental, and Vision plans through Kaiser Permanente, Cigna,...
Temporary work
Work at office
Visa sponsorship
Flexible hours
Kodiak
Mountain View, CA
18 days ago
CW Research on Large Vehicle Data Model - Summer Intern
...Job Title: CW Research on Large Vehicle Data Model - Summer Intern (99W210) About Kyyba: Founded... ...and post-training, leveraging language supervision, and enhancing multimodal... ...technical field. Prior experience with Vision Language Models (VLMs), Large...
Language
Summer internship
Visa sponsorship
Work visa
Kyyba
Mountain View, CA
2 days ago
Manager, Large Language Model Inference
$184k - $287.5k
...deployment of cutting-edge deep learning models on every NVIDIA GPU. With demand for... ...particularly in the realm of large language models (LLMs) and vision language models (VLMs, VLAs), we are... ..., interfacing directly with NVIDIA Researchers, GPU Architects, and other teams...
Language
NVIDIA
Santa Clara, CA
12 hours ago
Lead Generative AI Analyst (Chinese zh-CN) - Onsite | San Jose, CA
$50 per hour
...to lead day-to-day execution of Chinese (zh-CN) multimedia and language data labeling and review work (e.g., video, images, and related... ...experience in data annotation, multimodal data labeling, computer vision labeling, content QA, or a closely related field, including...
Language
Full time
Welocalize
Cupertino, CA
1 day ago
Research Scientist
...stack of a unified multimodal foundation model, from pretraining to deployment on real... ...robotic hardware. This is foundational research with direct physical impact. No hand-... ...large-scale multimodal architectures where vision, language, and kinematics share a unified...
Language
Prime Recruitment Partners
Santa Clara, CA
11 hours ago
Senior Deep Learning Engineer - Model Evaluation & AI Systems
$224k - $356.5k
...never been done before takes vision, innovation, and the world’s... ...Principal Deep Learning Engineer — Model Evaluation & AI Systems, you... ...‑on experience with large language models and NLP, including... ...communicate effectively across research, engineering, and product teams...
Language
NVIDIA Gruppe
Santa Clara, CA
12 hours ago
Senior Research Scientist for Generative AI
$192k - $304.75k
...Responsibilities Conduct original research in the space of generative AI Implement and train large-scale generative AI models for various content creation applications... ...and practice of deep learning, computer vision, natural language processing, or computer graphics Track...
Language
University of Georgia- FACS
Santa Clara, CA
1 day ago
Senior Research Scientist, Efficient Deep Learning
$184k - $299k
...Senior Research Scientist, Efficient Deep Learning NVIDIA is searching for... ...about methods for post-training model optimization (pruning,... ...the top venues in computer vision and machine learning. Our existing... .... Experience with large language models and large vision‑language...
Language
NVIDIA
Santa Clara, CA
13 hours ago
Senior Research Scientist, Multi-Modal Language Models
$192k - $304.75k
...We’re now looking for a Senior Research Scientist, Multi-Modal Language Models! NVIDIA is seeking a Senior Research Scientist passionate about multi modal... ...or related areas 4+ years of experience in computer vision, especially multi‑modal LLMs Proficiency in Python with...
Language
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Research Scientist, Multimodal Foundation Models and Robotics
$192k - $304.75k
...We are now looking for a Senior Research Scientist focused on Multimodal Foundation Models and Robotics! NVIDIA is searching for an outstanding research scientist... ...least one of the following topics: LLMs; Large vision‑language models; Video generative models and diffusion...
Language
University of Georgia- FACS
Santa Clara, CA
12 hours ago
Research Scientist - Distributed Machine Learning
$300k
...the Institute of Foundation Models We are a dedicated research lab for building,... ...world-class researchers, data scientists, and engineers, tackling the... ...shape the future of large language models. Why You’ll Love This... ...medical, dental, and vision benefits Bonus 401K Plan Generous...
Language
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
13 hours ago
Research Scientist
$150k - $300k
...scale. The Silicon Valley Research Lab focuses on developing... ...RL) , etc. As a Research Scientist in the team, you will... ...and evaluate algorithms, models and prototypes of AI systems... ...machine learning, natural language processing, computer vision, reinforcement learning,...
Language
Full time
H1b
Work at office
Visa sponsorship
3 days per week
Horizon Robotics Inc
Cupertino, CA
3 days ago
Machine Learning Research Scientist: Generative Modeling for Planning
$160.36k - $240.54k
...Machine Learning Research Scientist: Generative Modeling for Planning Mountain View, California (HQ) Nuro... ...foundation models. Leverage large language models and world foundation models... ...autonomous driving. Experiences in vision-language-action models, reinforcement...
Language
Nuro
Mountain View, CA
2 days ago
Principal Computer Vision Researcher - Advanced Technology Group
...What You’ll Do Lead hands‑on research at the intersection of classical... ...image processing, computer vision, graphics, and content... ...signal processing, spectral/3D modeling, geometry, and calibration. Deep... ...segmentation, synthesis, captioning, language models). Experience with GPU...
Language
Local area
Worldwide
Flexible hours
Via Licensing Corporation
Sunnyvale, CA
4 days ago
Research Scientist - Data
$150k
...the Institute of Foundation Models We are a dedicated research lab for building,... ...world‑class researchers, data scientists, and engineers, tackling the... ...Experience working with large language models, including... ...Comprehensive medical, dental, and vision benefits Bonus 401K Plan...
Language
Worldwide
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
12 hours ago
Machine Learning Engineer, Foundation Model Services
$181.1k - $318.4k
...bring smile to people’s face”. Foundation Model Services team, within Machine Learning... ...on optimizing billions of parameter language and vision and speech models using state of the art... ...time. Work alongside Foundation Model Research team to prototype and develop inference...
Language
Relocation
Apple
Santa Clara, CA
13 hours ago
Research Scientist
$150k
...the Institute of Foundation Models We are a dedicated research lab for building,... ...world‑class researchers, data scientists, and engineers, tackling the... ...focus on data‑centric large language model (LLM) development,... ...Comprehensive medical, dental, and vision benefits Bonus 401K Plan...
Language
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
4 days ago
Research Scientist - Robotics
$158k - $304k
...role We are looking for a passionate Research Scientist to join the Research Team at Applied Intuition... ...and tools to develop cutting‑edge models at scale. In addition to your research... ...robotic foundation model and vision‑language‑action model, reinforcement learning and...
Language
Full time
For contractors
For subcontractor
Casual work
Work at office
Immediate start
Remote work
Flexible hours
Decisive Point
Mountain View, CA
12 hours ago
Vision Scientist - PICO Lab - San Jose
$156k - $387.6k
...digital image processing, computer vision, and system‑level... ...psychophysics experiments, early vision modeling, and developing perceptual... ...inclusive of graduate school research experience) in related fields... ...analysis, or prototyping languages such as MATLAB, Python, C++....
Language
Temporary work
Local area
ByteDance
San Jose, CA
13 hours ago
Lead Generative AI Analyst (Chinese zh-CN) - Onsite \u007C San Jose, CA
...to lead day-to-day execution of Chinese (zh-CN) multimedia and language data labeling and review work (e.g., video, images, and related... ...experience in data annotation, multimodal data labeling, computer vision labeling, content QA, or a closely related field, including...
Language
Full time
Welo Global
Sunnyvale, CA
27 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Scientist - Vision Language Model. Be the first to apply!