WSN23901 Multimodal Learning

Hiroshima University Syllabus

Japanese

Academic Year 2026Year School/Graduate School Graduate School of Advanced Science and Engineering (Master's Course) Division of Advanced Science and Engineering Informatics and Data Science Program

Lecture Code WSN23901 Subject Classification Specialized Education

Subject Name Multimodal Learning

Subject Name
（Katakana）マルチモーダルラーニング

Subject Name in
English Multimodal Learning

Instructor YU YI

Instructor
(Katakana) ユ　イ

Campus Higashi-Hiroshima Semester/Term 1st-Year, Second Semester, 4Term

Days, Periods, and Classrooms (4T) Mon5-8

Lesson Style Lecture/Seminar Lesson Style
(More Details) Online (on-demand)

Lecture by using on demand video streams.　Moodle is used for mini-test and reports.

Credits 2.0 Class Hours/Week 4 Language of Instruction E : English

Course Level 5 : Graduate Basic

Course Area（Area） 25 : Science and Technology

Course Area（Discipline） 02 : Information Science

Eligible Students

Keywords

Special Subject for Teacher Education 　 Special Subject 　

Class Status
within Educational
Program
(Applicable only to targeted subjects for undergraduate students)
Criterion referenced
Evaluation
(Applicable only to targeted subjects for undergraduate students)

Class Objectives
/Class Outline This course systematically covers multimodal learning from its theoretical foundations to state-of-the-art large-scale models. The central theme is how to bridge the semantic gaps and distributional differences that exist across heterogeneous modalities such as images, audio, and text, and how to build unified representation learning frameworks and generative models.
In the first half of the course, we address fundamental theories including multimodal representation learning, cross-modal alignment, attention mechanisms, and the relationship between correlation and causality, thereby deepening structural understanding across heterogeneous data sources. In the second half, we discuss large-scale models based on Transformers and foundation models, autoregressive generative models, diffusion models, alignment and reliability evaluation of generative models, and multimodal large language models (MLLMs). The course is conducted through a combination of theoretical lectures, exercises, paper discussions, and implementation assignments, aiming to cultivate students’ research capabilities in the field of multimodal AI.

Class Schedule lesson1 Introduction to Multimodal Learning
lesson2 Modalities & Data Representations
lesson3 Multimodal Representation Learning
lesson4 Cross-Modal Alignment & Contrastive Learning
lesson5 Attention Mechanisms in Multimodal Learning
lesson6 Correlation in Multimodal Learning
lesson7 Causality in Multimodal Learning
lesson8 Objective Functions in Multimodal Learning
lesson9 Multimodal Transformers
lesson10 Multimodal Foundation Models
lesson11 Autoregressive Multimodal Generation
lesson12 Diffusion & Flow-based Multimodal Models
lesson13 Multimodal Generative Alignment
lesson14 Evaluation, Benchmarks, Ethics
lesson15 Multimodal Large Language Models

Each lecture includes a short quiz or a project report submission.

Text/Reference
Books,etc. No specific textbook.

PC or AV used in
Class,etc. Text, Audio Materials, Visual Materials, Microsoft Teams, moodle

(More Details)

Learning techniques to be incorporated Discussions, Quizzes/ Quiz format, PBL (Problem-based Learning)/ TBL (Team-based Learning), Post-class Report

Suggestions on
Preparation and
Review Self-investigation of unknown words and/or interesting contents.

Requirements

Grading Method Mini-tests: 40%, Final exam or report 60%
Final examination and some reports

Practical Experience

Summary of Practical Experience and Class Contents based on it

Message

Other

Please fill in the class improvement questionnaire which is carried out on all classes.
Instructors will reflect on your feedback and utilize the information for improving their teaching.

Academic Year	2026Year	School/Graduate School	Graduate School of Advanced Science and Engineering (Master's Course) Division of Advanced Science and Engineering Informatics and Data Science Program
Lecture Code	WSN23901	Subject Classification	Specialized Education
Subject Name	Multimodal Learning
Subject Name （Katakana）	マルチモーダルラーニング
Subject Name in English	Multimodal Learning
Instructor	YU YI
Instructor (Katakana)	ユ　イ
Campus	Higashi-Hiroshima	Semester/Term	1st-Year, Second Semester, 4Term
Days, Periods, and Classrooms	(4T) Mon5-8
Lesson Style	Lecture/Seminar	Lesson Style (More Details)	Online (on-demand)
Lecture by using on demand video streams.　Moodle is used for mini-test and reports.
Credits	2.0	Class Hours/Week	4	Language of Instruction	E : English
Course Level	5 : Graduate Basic
Course Area（Area）	25 : Science and Technology
Course Area（Discipline）	02 : Information Science
Eligible Students
Keywords
Special Subject for Teacher Education		Special Subject
Class Status within Educational Program (Applicable only to targeted subjects for undergraduate students)
Criterion referenced Evaluation (Applicable only to targeted subjects for undergraduate students)
Class Objectives /Class Outline	This course systematically covers multimodal learning from its theoretical foundations to state-of-the-art large-scale models. The central theme is how to bridge the semantic gaps and distributional differences that exist across heterogeneous modalities such as images, audio, and text, and how to build unified representation learning frameworks and generative models. In the first half of the course, we address fundamental theories including multimodal representation learning, cross-modal alignment, attention mechanisms, and the relationship between correlation and causality, thereby deepening structural understanding across heterogeneous data sources. In the second half, we discuss large-scale models based on Transformers and foundation models, autoregressive generative models, diffusion models, alignment and reliability evaluation of generative models, and multimodal large language models (MLLMs). The course is conducted through a combination of theoretical lectures, exercises, paper discussions, and implementation assignments, aiming to cultivate students’ research capabilities in the field of multimodal AI.
Class Schedule	lesson1 Introduction to Multimodal Learning lesson2 Modalities & Data Representations lesson3 Multimodal Representation Learning lesson4 Cross-Modal Alignment & Contrastive Learning lesson5 Attention Mechanisms in Multimodal Learning lesson6 Correlation in Multimodal Learning lesson7 Causality in Multimodal Learning lesson8 Objective Functions in Multimodal Learning lesson9 Multimodal Transformers lesson10 Multimodal Foundation Models lesson11 Autoregressive Multimodal Generation lesson12 Diffusion & Flow-based Multimodal Models lesson13 Multimodal Generative Alignment lesson14 Evaluation, Benchmarks, Ethics lesson15 Multimodal Large Language Models Each lecture includes a short quiz or a project report submission.
Text/Reference Books,etc.	No specific textbook.
PC or AV used in Class,etc.	Text, Audio Materials, Visual Materials, Microsoft Teams, moodle
(More Details)
Learning techniques to be incorporated	Discussions, Quizzes/ Quiz format, PBL (Problem-based Learning)/ TBL (Team-based Learning), Post-class Report
Suggestions on Preparation and Review	Self-investigation of unknown words and/or interesting contents.
Requirements
Grading Method	Mini-tests: 40%, Final exam or report 60% Final examination and some reports
Practical Experience
Summary of Practical Experience and Class Contents based on it
Message
Other
Please fill in the class improvement questionnaire which is carried out on all classes. Instructors will reflect on your feedback and utilize the information for improving their teaching.

Back to syllabus main page