Research Scientist - Vision Language Model

Institute of Foundation Models

Job Description

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

Position Summary

As a Research Scientist in the Vision Language Model (VLM) team, your role will be central to advancing state-of-the-art multimodal foundation models that integrate visual understanding, reasoning, and agentic capabilities. You will work on the research and development of large-scale VLM systems, spanning model architectures, data recipes for pre-training and post-training, and evaluation benchmarks. The role combines cutting-edge research with practical engineering, emphasizing large-scale data processing, filtering, and weighting pipelines, distributed training systems, and reinforcement learning algorithms and systems for multimodal reasoning and agent development.

Key Responsibilities

Research and development of next-generation Vision Language Models across pre-training, instruction tuning, reasoning, and agents.
Develop novel architectures and training methodologies for integrating visual understanding, language reasoning, and tool-use capabilities.
Research efficient multimodal learning techniques, including data-efficient training, long-context modeling, model modularity, and inference optimization.
Build and improve large-scale multimodal datasets, synthetic data generation pipelines, and evaluation benchmarks for VLM capabilities.
Investigate multimodal reasoning, agentic behavior, OCR, grounding, document understanding, chart understanding, and visual question answering capabilities.
Contribute to technical reports, research publications, and open-source software.
Represent MBZUAI at research conferences and industry events, showcasing advancements in multimodal foundation models and large-scale AI systems.
Mentor junior researchers and collaborate across teams to drive impactful research initiatives.

Academic Qualifications

PhD or equivalent research experience in Machine Learning, Computer Vision, Natural Language Processing, or Multimodal AI.

Salary Range

The posted salary range represents the company’s good faith estimate of the compensation for this position upon hire. The actual compensation offered may vary within this range depending on individual qualifications, including but not limited to relevant skills, experience, education, certifications, geographic location, and specific business needs.

Professional Experience Minimum

Experience working with large language models and/or vision-language models, including pre-training, fine-tuning, evaluation, or inference.
Strong Python and PyTorch development skills for large-scale machine learning research.
Experience with distributed training systems and large-scale model optimization.
Familiarity with multimodal datasets and data processing pipelines involving images, text, and video.
Understanding of modern deep learning architectures, including Transformers, attention mechanisms, and multimodal fusion techniques.
Experience with ML infrastructure, including model evaluation, debugging, optimization, and large-scale experimentation.
Problem-solving and research skills with the ability to independently drive research/engineering projects.
Effective communication and collaboration skills for working across research and engineering teams.

Preferred Skills

Hands-on experience training or fine-tuning large Vision Language Models or multimodal foundation models at scale.
Experience with distributed learning frameworks and infrastructure such as PyTorch Distributed, Megatron, Triton, or CUDA.
Research experience in multimodal reasoning, agentic systems, tool use, OCR, grounding, document understanding, or multimodal retrieval.
Experience with synthetic data generation, multimodal data curation, or automated evaluation frameworks for VLMs.
Familiarity with efficient training and inference techniques such as FlashAttention, quantization, tensor parallelism, pipeline parallelism, or memory optimization.
Experience contributing to open-source ML software and large-scale research codebases.
Strong publication record in leading AI conferences such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP, or related venues.
Experience collaborating across research, infrastructure, and product-oriented teams to deliver state-of-the-art multimodal systems.

Apply

Vacancy posted 17 days ago

Similar jobs that could be interesting for youBased on the Research Scientist - Vision Language Model in Sunnyvale, CA vacancy

Research Scientist - Vision Language Model
$150k
About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using... ...world-class researchers, data scientists, and engineers, tackling the most... ...Summary As a Research Scientist in the Vision Language Model (VLM) team, your role will...
Language
Institute of Foundation Models
Sunnyvale, CA
2 days ago
AI Research Scientist- World Model
$165k - $185k
...Company Description The Bosch Research and Technology Center North America with offices... ...Silicon Valley focuses on Foundation Models, Big Data Visual Analytics, Explainable AI (XAI), Natural Language Processing, Computer Vision & Mixed Reality, Cloud Robotics, Data Science...
Language
Work experience placement
Worldwide
Bosch Group Inc
Sunnyvale, CA
2 days ago
Senior Applied Scientist, Delivery Foundation Model at Amazon.com Services LLC Santa Clara, CA
Senior Applied Scientist, Delivery Foundation Model job at Amazon.com Services LLC. Santa... ...direction for specific research initiatives, ensuring robust... ...ambitious research vision with real-world impact. Our... ...using Python, C++ or other languages. Strong publication...
Language
Worldwide
Itlearn360
Santa Clara, CA
2 days ago
Research Scientist (Model Evaluation)
...as our ability to measure it. At Sanas, model quality spans dimensions that automated... ...-world disfluency. We’re looking for a Research Scientist who can define what "better" actually... ...Noise Cancellation, Speech Enhancement, Language Translation, and more — ensuring each captures...
Language
Sanas
Palo Alto, CA
2 days ago
Research Intern - World-Action Foundation Model, Robotics
...for multiple passionate Research Interns to join the... ...world‑action foundation model with various world modalities including vision and physics associated... ...human data incorporation, language modality, and spatial reasoning... ...with the Research Scientists and Engineers on high...
Language
For contractors
For subcontractor
Casual work
Internship
Work at office
Immediate start
Remote work
Day shift
Decisive Point
Sunnyvale, CA
3 days ago
Research Scientist- Vision-Language-Action (VLA) Models
$165k - $185k
...Description The Bosch Research and Technology Center... ...Valley focuses on Foundation Models, Big Data Visual... ...Explainable AI (XAI), Natural Language Processing, Computer Vision & Mixed Reality, Cloud Robotics... ...As a Research Scientist- Vision-Language-Action (...
Language
Work experience placement
Local area
Worldwide
Bosch Group
Sunnyvale, CA
7 days ago
AI Experience Researcher, Product Evaluation, Vision Products Group
$141.8k - $258.6k
AI Experience Researcher, Product Evaluation, Vision Products Group Sunnyvale, California... ...collaborating with ML and data scientists, software engineers,... ...to recognize patterns in model behaviors and outputs,... ...models, preferably Large Language Models Familiarity with...
Language
Relocation
Apple Inc.
Sunnyvale, CA
1 day ago
Research Scientist, Wayve Labs
...software and foundation models enable vehicles to... ...driving systems. Our vision is to create autonomy... ...re looking for Applied Scientists to join Wayve Labs and... ...are a high‑conviction research team with the strategic... ...inputs, using vision, language, and active sensors. Key...
Language
Full time
Work at office
Work from home
Visa sponsorship
Relocation package
Flexible hours
Icehouseventures
Sunnyvale, CA
2 days ago
Senior Research Manager, World Model Evaluation
$272k - $431.25k
We are seeking a Senior Research Manager to lead world‑model evaluation and benchmarking... ...Lead a team of Research Scientists focused on world‑model evaluation... ...learning, computer vision, multimodal AI, robotics,... ...video models, vision‑language‑action models, diffusion...
Language
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Research Scientist - Agents
$150k
...the Institute of Foundation Models We are a dedicated research lab for building,... ...world‑class researchers, data scientists, and engineers, tackling the... ...to train advanced agentic language models that are adept at using... ...medical, dental, and vision benefits Bonus 401K Plan Generous...
Language
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
20 hours ago
Senior Research Scientist, Multi-Modal Language Models
$192k - $304.75k
We’re now looking for a Senior Research Scientist, Multi-Modal Language Models! NVIDIA is seeking a Senior Research Scientist passionate about multi modal... ...or related areas 4+ years of experience in computer vision, especially multi‑modal LLMs Proficiency in Python with...
Language
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior Research Scientist, Multimodal Foundation Models and Robotics
$192k - $304.75k
We are now looking for a Senior Research Scientist focused on Multimodal Foundation Models and Robotics! NVIDIA is searching for an outstanding research scientist... ...least one of the following topics: LLMs; Large vision‑language models; Video generative models and diffusion...
Language
Califesciences
Santa Clara, CA
2 days ago
Applied Scientist III, AWS Identity
$192.2k - $260k
Are you passionate about programming languages, applying formal verification, program analysis... ...maker. - Mentors and trains the research scientist community on complex technical issues.... ...including health insurance (medical, dental, vision, prescription, Basic Life & AD&D...
Language
Local area
Flexible hours
Amazon Development Center U.S., Inc.
Santa Clara, CA
1 hour ago
Medical Assistant
$23 - $28 per hour
...and reliable attendance. Excellent customer service skills Language Skills: verbal and written fluency in English required.... ...k) Plan ~ Skills Training ~ Excellent medical, dental, and vision benefits Benefit offerings include medical, dental, vision...
Language
Hourly pay
Weekly pay
Full time
Temporary work
Work at office
Local area
Monday to Friday
Adecco
Mountain View, CA
5 days ago
Turkish Data Quality Associate | AI Model Evaluator
...Sunnyvale, California. In this full-time role, you will evaluate AI model outputs, performing audit-based reviews to enhance data quality... ...degree, strong critical thinking skills, and native-level language proficiency. Benefits include paid time off, medical coverage,...
Language
Full time
Work at office
Welo Global
Sunnyvale, CA
5 days ago
Localization Specialist
$88k - $162k
...new features across our vehicle displays (Model 3/Y, Cybertruck, Cybercab, and beyond).... ...regression testing on bug fixes Analyze language data from the customer fleet and take... ...Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck...
Language
Hourly pay
Full time
Temporary work
Flexible hours
Tesla
Palo Alto, CA
3 days ago
Research Scientist, Wayve Labs
$230k - $380k
The Role We’re looking for Research Scientists to join Wayve Labs and help build the next generation... ...the following areas: World & Reward Modeling: Building realistic, diverse... ...platforms and multimodal inputs, using vision, language, and active sensors. Key Responsibilities...
Language
Full time
Work at office
Visa sponsorship
Relocation package
Flexible hours
Wayve
Sunnyvale, CA
1 day ago
Family Practice/Primary Care Nurse Practitioner
$158k - $185k
...experience Bilingual in one of the following languages preferred: Mandarin, Cantonese, Spanish,... ...full covered! Medical, dental, and vision insurance Paid time off and holidays Professional... ...to work within an integrated care model Meaningful work serving diverse community...
Language
Full time
Part time
3 days per week
Cross Country Search
San Jose, CA
4 days ago
Client Advisor (Part-Time) - Westfield Valley Fair
$22 - $24 per hour
...communicate in English required (written and verbal), additional languages preferred, but not required (Spanish, Mandarin, Cantonese, or... ...and a comprehensive benefits package including medical, dental, vision, short and long-term disability, paid parental leave, paid...
Language
Hourly pay
Temporary work
Part time
Work experience placement
Flexible hours
Moncler S.P.A.
Santa Clara, CA
2 days ago
Senior Staff Research Scientist, Google Cloud AI Research
$248k - $349k
Senior Staff Research Scientist, Google Cloud AI Research corporate_fare Google... ..., data mining, natural language processing, hardware and software... ...enables our advanced AI models, delivers computing power across... ...research conferences and visioning activities. Deliver full...
Language
Full time
Worldwide
Google Inc.
Sunnyvale, CA
2 days ago
World Model Research Scientist- Physical AI
$190k - $250k
...developing large-scale generative world models that learn to predict realistic,... ...autonomous trucks. We are looking for a research scientist to lead the design and development of world... ...bonuses Excellent Medical, Dental, and Vision plans through Kaiser Permanente, Cigna,...
Temporary work
Work at office
Visa sponsorship
Flexible hours
Kodiak
Mountain View, CA
24 days ago
Senior Staff Research Scientist, Embodied Intelligence & Robotics
$197.8k - $296.6k
...The Robot Intelligence Lab at Samsung Research America is a new facility dedicated... ...looking for a Senior Staff Research Scientist with solid technical skills and rich... ...on topics such as robotics foundation models, vision‑language‑action (VLA) models, vision language...
Language
Work at office
Local area
Dormont Manufacturing Co
Mountain View, CA
2 days ago
Head of Project Management
$90k - $170k
...drive alignment. Fluent in spoken and written English. Additional language proficiency is a plus. Comfortable working in a fast-paced,... ...eligible for equity, as well as a comprehensive benefits package (health, dental, vision, PTO, flexible work schedule). #J-18808-Ljbffr...
Language
Flexible hours
Abakaai
Mountain View, CA
2 days ago
Industrial AI Research Scientist: ML & Computer Vision
$164.4k - $193.55k
...Automotive Systems Americas, Inc. is seeking a Machine Learning/Computer Vision Research Scientist in Santa Clara, California. This role involves researching and developing advanced deep learning models for unique computer vision applications and solving industrial issues...
Hitachi Automotive Systems Americas, Inc.
Santa Clara, CA
5 days ago
Sr. Applied Scientist, Prime Video - Personalization and Discovery Science
$192.2k - $260k
...and resourceful Applied Scientist to bring diverse... ...learning practitioner and a research leader. You will play... ...machine learning models from the ground up. At... ...C++, Python or related language - Experience with neural... ...insurance (medical, dental, vision, prescription, Basic...
Language
Local area
Worldwide
Flexible hours
Amazon
Sunnyvale, CA
20 hours ago
Model UX Writer, Devices and Services
$171k - $248k
...text-based interfaces. About the job As a Model UX Content Designer, your role is to... ...a deep technical understanding of Large Language Models (LLM). Beyond setting the design... ...position requires close collaboration with researchers, product managers, and engineers to...
Language
Full time
Google Inc.
Mountain View, CA
4 days ago
Senior Research Scientist
$126k - $248k
...Overview We are seeking a Senior Research Scientist to join our team and... ...development of next‑generation AI models. This role offers a unique... ..., from frontier large language models (LLMs) to embedding models... ...Opportunity to utilize research vision to innovate the entire...
Language
Local area
Flexible hours
The Consulting Solutions
Palo Alto, CA
20 hours ago
Research Scientist
$150k - $350k
...chips. Position Overview We are seeking Research Scientists with deep expertise in AI and machine... .... You'll train and deploy large language models at scale, architect intelligent agent... ...Unlimited PTO and full benefits (medical, vision, dental, 401k). Two engineering‑...
Language
ChipAgents
San Jose, CA
5 days ago
Full Time Cleaning / Janitorial
...group and independently. Previous working experience preferred. Language Must be able to fluently speak, read, and write English. Fluent... ...: Competitive benefits, including free medical, dental and vision insurance for employee, spouse and/or children; and company contribution...
Language
Full time
Work experience placement
Work at office
North East Medical Services
San Jose, CA
3 days ago
Research Scientist - Salesforce AI Research
$117.2k - $313.7k
...About the Role Salesforce AI Research is seeking outstanding AI Research Scientists / Research Engineers to build and deploy high‑impact AI solutions... ..., autonomous workflows Multimodal & Computer Vision – Vision‑language models, video understanding, visual grounding for...
Language
Full time
100 Salesforce, Inc.
Palo Alto, CA
19 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Scientist - Vision Language Model. Be the first to apply!