Hiroshima University Syllabus

Back to syllabus main page
Japanese
Academic Year 2026Year School/Graduate School Graduate School of Advanced Science and Engineering (Master's Course) Division of Advanced Science and Engineering Informatics and Data Science Program
Lecture Code WSN23901 Subject Classification Specialized Education
Subject Name Multimodal Learning
Subject Name
(Katakana)
マルチモーダルラーニング
Subject Name in
English
Multimodal Learning
Instructor YU YI
Instructor
(Katakana)
ユ イ
Campus Higashi-Hiroshima Semester/Term 1st-Year,  Second Semester,  4Term
Days, Periods, and Classrooms (4T) Mon5-8
Lesson Style Lecture/Seminar Lesson Style
(More Details)
Online (on-demand)
Lecture by using on demand video streams. Moodle is used for mini-test and reports. 
Credits 2.0 Class Hours/Week 4 Language of Instruction E : English
Course Level 5 : Graduate Basic
Course Area(Area) 25 : Science and Technology
Course Area(Discipline) 02 : Information Science
Eligible Students
Keywords  
Special Subject for Teacher Education   Special Subject  
Class Status
within Educational
Program
(Applicable only to targeted subjects for undergraduate students)
 
Criterion referenced
Evaluation
(Applicable only to targeted subjects for undergraduate students)
 
Class Objectives
/Class Outline
This course systematically covers multimodal learning from its theoretical foundations to state-of-the-art large-scale models. The central theme is how to bridge the semantic gaps and distributional differences that exist across heterogeneous modalities such as images, audio, and text, and how to build unified representation learning frameworks and generative models.
In the first half of the course, we address fundamental theories including multimodal representation learning, cross-modal alignment, attention mechanisms, and the relationship between correlation and causality, thereby deepening structural understanding across heterogeneous data sources. In the second half, we discuss large-scale models based on Transformers and foundation models, autoregressive generative models, diffusion models, alignment and reliability evaluation of generative models, and multimodal large language models (MLLMs). The course is conducted through a combination of theoretical lectures, exercises, paper discussions, and implementation assignments, aiming to cultivate students’ research capabilities in the field of multimodal AI. 
Class Schedule lesson1 Introduction to Multimodal Learning
lesson2 Modalities & Data Representations
lesson3 Multimodal Representation Learning
lesson4 Cross-Modal Alignment & Contrastive Learning
lesson5 Attention Mechanisms in Multimodal Learning
lesson6 Correlation in Multimodal Learning
lesson7 Causality in Multimodal Learning
lesson8 Objective Functions in Multimodal Learning
lesson9 Multimodal Transformers
lesson10 Multimodal Foundation Models
lesson11 Autoregressive Multimodal Generation
lesson12 Diffusion & Flow-based Multimodal Models
lesson13 Multimodal Generative Alignment
lesson14 Evaluation, Benchmarks, Ethics
lesson15 Multimodal Large Language Models

Each lecture includes a short quiz or a project report submission. 
Text/Reference
Books,etc.
No specific textbook. 
PC or AV used in
Class,etc.
Text, Audio Materials, Visual Materials, Microsoft Teams, moodle
(More Details)  
Learning techniques to be incorporated Discussions, Quizzes/ Quiz format, PBL (Problem-based Learning)/ TBL (Team-based Learning), Post-class Report
Suggestions on
Preparation and
Review
Self-investigation of unknown words and/or interesting contents. 
Requirements  
Grading Method Mini-tests: 40%, Final exam or report 60%
Final examination and some reports
 
Practical Experience  
Summary of Practical Experience and Class Contents based on it  
Message  
Other   
Please fill in the class improvement questionnaire which is carried out on all classes.
Instructors will reflect on your feedback and utilize the information for improving their teaching. 
Back to syllabus main page