| Academic Year |
2026Year |
School/Graduate School |
Graduate School of Advanced Science and Engineering (Master's Course) Division of Advanced Science and Engineering Informatics and Data Science Program |
| Lecture Code |
WSN23901 |
Subject Classification |
Specialized Education |
| Subject Name |
Multimodal Learning |
Subject Name (Katakana) |
マルチモーダルラーニング |
Subject Name in English |
Multimodal Learning |
| Instructor |
YU YI |
Instructor (Katakana) |
ユ イ |
| Campus |
Higashi-Hiroshima |
Semester/Term |
1st-Year, Second Semester, 4Term |
| Days, Periods, and Classrooms |
(4T) Mon5-8 |
| Lesson Style |
Lecture/Seminar |
Lesson Style (More Details) |
Online (on-demand) |
| Lecture by using on demand video streams. Moodle is used for mini-test and reports. |
| Credits |
2.0 |
Class Hours/Week |
4 |
Language of Instruction |
E
:
English |
| Course Level |
5
:
Graduate Basic
|
| Course Area(Area) |
25
:
Science and Technology |
| Course Area(Discipline) |
02
:
Information Science |
| Eligible Students |
|
| Keywords |
|
| Special Subject for Teacher Education |
|
Special Subject |
|
Class Status within Educational Program (Applicable only to targeted subjects for undergraduate students) | |
|---|
Criterion referenced Evaluation (Applicable only to targeted subjects for undergraduate students) | |
Class Objectives /Class Outline |
This course systematically covers multimodal learning from its theoretical foundations to state-of-the-art large-scale models. The central theme is how to bridge the semantic gaps and distributional differences that exist across heterogeneous modalities such as images, audio, and text, and how to build unified representation learning frameworks and generative models. In the first half of the course, we address fundamental theories including multimodal representation learning, cross-modal alignment, attention mechanisms, and the relationship between correlation and causality, thereby deepening structural understanding across heterogeneous data sources. In the second half, we discuss large-scale models based on Transformers and foundation models, autoregressive generative models, diffusion models, alignment and reliability evaluation of generative models, and multimodal large language models (MLLMs). The course is conducted through a combination of theoretical lectures, exercises, paper discussions, and implementation assignments, aiming to cultivate students’ research capabilities in the field of multimodal AI. |
| Class Schedule |
lesson1 Introduction to Multimodal Learning lesson2 Modalities & Data Representations lesson3 Multimodal Representation Learning lesson4 Cross-Modal Alignment & Contrastive Learning lesson5 Attention Mechanisms in Multimodal Learning lesson6 Correlation in Multimodal Learning lesson7 Causality in Multimodal Learning lesson8 Objective Functions in Multimodal Learning lesson9 Multimodal Transformers lesson10 Multimodal Foundation Models lesson11 Autoregressive Multimodal Generation lesson12 Diffusion & Flow-based Multimodal Models lesson13 Multimodal Generative Alignment lesson14 Evaluation, Benchmarks, Ethics lesson15 Multimodal Large Language Models
Each lecture includes a short quiz or a project report submission. |
Text/Reference Books,etc. |
No specific textbook. |
PC or AV used in Class,etc. |
Text, Audio Materials, Visual Materials, Microsoft Teams, moodle |
| (More Details) |
|
| Learning techniques to be incorporated |
Discussions, Quizzes/ Quiz format, PBL (Problem-based Learning)/ TBL (Team-based Learning), Post-class Report |
Suggestions on Preparation and Review |
Self-investigation of unknown words and/or interesting contents. |
| Requirements |
|
| Grading Method |
Mini-tests: 40%, Final exam or report 60% Final examination and some reports |
| Practical Experience |
|
| Summary of Practical Experience and Class Contents based on it |
|
| Message |
|
| Other |
|
Please fill in the class improvement questionnaire which is carried out on all classes. Instructors will reflect on your feedback and utilize the information for improving their teaching. |