WMK00400 Basics of Machine Learning

Hiroshima University Syllabus

Japanese

Academic Year 2026Year School/Graduate School Graduate School of Humanities and Social Sciences (Master's Course) Division of Humanities and Social Sciences Social Data Science Program

Lecture Code WMK00400 Subject Classification Specialized Education

Subject Name 機械学習の基礎

Subject Name
（Katakana）キカイガクシュウノキソ

Subject Name in
English Basics of Machine Learning

Instructor WAKUDA YUKI

Instructor
(Katakana) ワクダ　ユウキ

Campus Higashi-Senda Semester/Term 1st-Year, First Semester, 2Term

Days, Periods, and Classrooms (2T) Tues13-14,Weds13-14：Online

Lesson Style Lecture Lesson Style
(More Details) Online (simultaneous interactive)

Credits 2.0 Class Hours/Week 4 Language of Instruction J : Japanese

Course Level 5 : Graduate Basic

Course Area（Area） 24 : Social Sciences

Course Area（Discipline） 07 : Education

Eligible Students Social Data Science Program

Keywords Machine Learning, Data science, AI, Social Sciences, Education

Special Subject for Teacher Education 　 Special Subject 　

Class Status
within Educational
Program
(Applicable only to targeted subjects for undergraduate students)
Criterion referenced
Evaluation
(Applicable only to targeted subjects for undergraduate students)

Class Objectives
/Class Outline Course Overview:
This course approaches machine learning as "a technology that supports decision-making by enabling prediction, classification, and structure discovery from data." Students will learn the key concepts and appropriate applications of representative methods. Through Python-based exercises covering regression, classification, and clustering, students will develop practical skills in data preprocessing, model training, validation (hold-out / cross-validation), selection of evaluation metrics, and handling overfitting. Generative AI-assisted coding will be actively utilized throughout the exercises, allowing students to engage in practical data processing and analysis in collaboration with AI. The course also addresses practically important perspectives such as model performance evaluation and interpretability, cultivating the foundational ability to select appropriate methods and clearly explain results for real-world data.

Learning Objectives:
- Understand the representative methods of supervised learning (classification and regression) and unsupervised learning, and be able to select and apply appropriate analytical methods for given data and tasks.
- Execute the complete workflow of machine learning — from data preprocessing through model building to evaluation — using Python with the assistance of generative AI.
- Evaluate model performance using appropriate metrics, interpret analytical results, and explain findings clearly in research and policy contexts.

Class Schedule Session 1: Introduction to Machine Learning
- Overview of Machine Learning (ML) and AI
- The ML lifecycle: CRISP-DM process
- Understanding ML applications through comparison with human learning
- ELSI (Ethical, Legal, and Social Issues) in ML/AI utilization
- [Exercise] Pre-course survey on machine learning
Session 2: Python Fundamentals for ML (I)
- Software execution environment: Google Colaboratory (Colab)
- Markdown notation for documentation
- Utilization of Open Data
- Introduction to generative AI-assisted coding (Gemini on Colab, prompt design)
- [Exercise] Creating and editing a Notebook in Google Colab
Session 3: Python Fundamentals for ML (II)
- Basics of data processing (data types, Pandas)
- Execution fundamentals (control structures, functions)
- Process design using flowcharts (input / process / output thinking)
- Data visualization basics (Matplotlib, japanize-matplotlib)
- [Exercise] Loading an Excel file and visualizing data
Session 4: Python Fundamentals for ML (III)
- Basics of data manipulation (groupby, join, fillna)
- Best practices for AI-assisted coding (modularization, configuration at the top, code review)
- [Exercise] Data aggregation, integration, and operation verification
Session 5: Data Pre-processing (I)
- Data quality
- Data cleansing and variable selection
- [Exercise] Data quality assessment, cleansing, record removal, and variable selection
Session 6: Data Pre-processing (II)
- Data pre-processing for ML (Train-test split)
- Categorical variable encoding (get_dummies)
- [Exercise] Completing ML-ready input data (organizing target and explanatory variables / completing quantification)
Session 7: Supervised Learning (I): Classification
- Automating classification with ML (introduction to classification and decision trees)
- Building classification models (decision trees)
- Selecting and switching analytical methods
- Introduction to model performance evaluation (Precision, Recall, F-score)
- [Exercise] Building a model to identify customers likely to churn
Session 8: Supervised Learning (II): Classification
- Performance evaluation of classification models (F-score, confusion matrix, P-R curve, AUC)
- Separating training and evaluation data (Hold-out method)
- Generalization performance and overfitting
- [Exercise] Evaluating classification model performance (confusion matrix / P-R curve)
Session 9: Supervised Learning (III): Classification
- Model interpretation using Feature Importance (FI)
- Model interpretation using SHAP
- Using ML as a hypothesis extraction tool
- [Exercise] Identifying contributing variables and interpreting the model (FI/SHAP)
Session 10: Supervised Learning (IV): Classification
- Cross-validation (k-fold CV)
- Handling imbalanced data (under/oversampling, SMOTE)
- [Exercise] Addressing imbalanced data and confirming performance changes
Session 11: Supervised Learning (I): Regression
- Estimation using machine learning (general theory of regression)
- Pre-processing for regression tasks (constructing panel data)
- Building regression models (panel data analysis, multiple linear regression)
- [Exercise] Panel data analysis (analysis and prediction of land price trends)
Session 12: Supervised Learning (II): Regression
- Performance evaluation of regression models (R2, RMSE, MAE)
- Algorithm selection and performance comparison (LightGBM)
- Applying regression to time-series data
- [Exercise] Time-series analysis and regression performance evaluation (analysis and prediction of economic indicator trends)
Session 13: Unsupervised Learning: PCA and Cluster Analysis
- Analyzing data components: Principal Component Analysis (PCA)
- Grouping data into clusters: cluster analysis (k-means / hierarchical clustering)
- Principal component cluster analysis
- Factor analysis (concepts and positioning)
- [Exercise] Customer segmentation through data analysis (PCA cluster analysis)
Session 14: Wrap-up and Discussion (I): ML Overview, Prediction, and Causality
- Reviewing the overall ML landscape and guidelines for algorithm selection
- Strengths and limitations of ML / common pitfalls and failure cases
- Data science in social and economic fields
- Causal relationships vs. correlations
- Natural experiments and Difference-in-Differences (DID)
- Introduction to causal analysis approaches using ML
- [Exercise] Post-course survey on machine learning
Session 15: Wrap-up and Discussion (II): Designing ML Applications for Your Own Research
- Reflection on the entire course
- Exploring the application of ML to your own research
- [Exercise] Final assignment: Design an ML application scenario for a topic of personal or research interest

Students are required to submit reports.

Text/Reference
Books,etc. The textbooks will be assigned in the class.

PC or AV used in
Class,etc. Visual Materials, Microsoft Teams, Zoom, moodle

(More Details)

Learning techniques to be incorporated Discussions, PBL (Problem-based Learning)/ TBL (Team-based Learning), Post-class Report

Suggestions on
Preparation and
Review Preparation (Before Class):
- Review the topics for the next session in advance, and come to class with your own thoughts and questions organized.
- If there are any remaining tasks from the previous session or assigned homework, make progress on them before the next class.
Review (After Class):
- Continue working on any tasks or materials that you were unable to complete during class time.
- If there are points you did not understand, do not leave them unclear-take notes and bring them to the next class.
- Use the post-class report to organize your understanding of the day’s content in your own words.

Requirements

Grading Method Your final grade will be evaluated based on your class participation and your assignment reports.
Class Participation (Attendance and Active Involvement in Lectures/Workshops): 40%
Assignment Reports: 60%

Practical Experience Experienced

Summary of Practical Experience and Class Contents based on it Leveraging professional experience in the commercialization of machine learning and collaborative research with private companies, this course focuses on machine learning methodologies that are highly effective and applicable in real-world business environments.

Message Machine learning has become remarkably accessible today, thanks to the availability of open-source software libraries, modern development environments, and the assistance of Generative AI.
In this course, we will focus strictly on practical programming as much as possible. We will cover foundational techniques with a clear eye toward the application of machine learning in the Humanities and Social Sciences.
I encourage you to build a solid foundation that enables you to apply machine learning and practice data science in various contexts-whether in your own research or your future professional careers. My hope is that you will use these skills to tackle the specific problems you encounter within your own areas of expertise.

Other

Please fill in the class improvement questionnaire which is carried out on all classes.
Instructors will reflect on your feedback and utilize the information for improving their teaching.

Academic Year	2026Year	School/Graduate School	Graduate School of Humanities and Social Sciences (Master's Course) Division of Humanities and Social Sciences Social Data Science Program
Lecture Code	WMK00400	Subject Classification	Specialized Education
Subject Name	機械学習の基礎
Subject Name （Katakana）	キカイガクシュウノキソ
Subject Name in English	Basics of Machine Learning
Instructor	WAKUDA YUKI
Instructor (Katakana)	ワクダ　ユウキ
Campus	Higashi-Senda	Semester/Term	1st-Year, First Semester, 2Term
Days, Periods, and Classrooms	(2T) Tues13-14,Weds13-14：Online
Lesson Style	Lecture	Lesson Style (More Details)	Online (simultaneous interactive)

Credits	2.0	Class Hours/Week	4	Language of Instruction	J : Japanese
Course Level	5 : Graduate Basic
Course Area（Area）	24 : Social Sciences
Course Area（Discipline）	07 : Education
Eligible Students	Social Data Science Program
Keywords	Machine Learning, Data science, AI, Social Sciences, Education
Special Subject for Teacher Education		Special Subject
Class Status within Educational Program (Applicable only to targeted subjects for undergraduate students)
Criterion referenced Evaluation (Applicable only to targeted subjects for undergraduate students)
Class Objectives /Class Outline	Course Overview: This course approaches machine learning as "a technology that supports decision-making by enabling prediction, classification, and structure discovery from data." Students will learn the key concepts and appropriate applications of representative methods. Through Python-based exercises covering regression, classification, and clustering, students will develop practical skills in data preprocessing, model training, validation (hold-out / cross-validation), selection of evaluation metrics, and handling overfitting. Generative AI-assisted coding will be actively utilized throughout the exercises, allowing students to engage in practical data processing and analysis in collaboration with AI. The course also addresses practically important perspectives such as model performance evaluation and interpretability, cultivating the foundational ability to select appropriate methods and clearly explain results for real-world data. Learning Objectives: - Understand the representative methods of supervised learning (classification and regression) and unsupervised learning, and be able to select and apply appropriate analytical methods for given data and tasks. - Execute the complete workflow of machine learning — from data preprocessing through model building to evaluation — using Python with the assistance of generative AI. - Evaluate model performance using appropriate metrics, interpret analytical results, and explain findings clearly in research and policy contexts.
Class Schedule	Session 1: Introduction to Machine Learning - Overview of Machine Learning (ML) and AI - The ML lifecycle: CRISP-DM process - Understanding ML applications through comparison with human learning - ELSI (Ethical, Legal, and Social Issues) in ML/AI utilization - [Exercise] Pre-course survey on machine learning Session 2: Python Fundamentals for ML (I) - Software execution environment: Google Colaboratory (Colab) - Markdown notation for documentation - Utilization of Open Data - Introduction to generative AI-assisted coding (Gemini on Colab, prompt design) - [Exercise] Creating and editing a Notebook in Google Colab Session 3: Python Fundamentals for ML (II) - Basics of data processing (data types, Pandas) - Execution fundamentals (control structures, functions) - Process design using flowcharts (input / process / output thinking) - Data visualization basics (Matplotlib, japanize-matplotlib) - [Exercise] Loading an Excel file and visualizing data Session 4: Python Fundamentals for ML (III) - Basics of data manipulation (groupby, join, fillna) - Best practices for AI-assisted coding (modularization, configuration at the top, code review) - [Exercise] Data aggregation, integration, and operation verification Session 5: Data Pre-processing (I) - Data quality - Data cleansing and variable selection - [Exercise] Data quality assessment, cleansing, record removal, and variable selection Session 6: Data Pre-processing (II) - Data pre-processing for ML (Train-test split) - Categorical variable encoding (get_dummies) - [Exercise] Completing ML-ready input data (organizing target and explanatory variables / completing quantification) Session 7: Supervised Learning (I): Classification - Automating classification with ML (introduction to classification and decision trees) - Building classification models (decision trees) - Selecting and switching analytical methods - Introduction to model performance evaluation (Precision, Recall, F-score) - [Exercise] Building a model to identify customers likely to churn Session 8: Supervised Learning (II): Classification - Performance evaluation of classification models (F-score, confusion matrix, P-R curve, AUC) - Separating training and evaluation data (Hold-out method) - Generalization performance and overfitting - [Exercise] Evaluating classification model performance (confusion matrix / P-R curve) Session 9: Supervised Learning (III): Classification - Model interpretation using Feature Importance (FI) - Model interpretation using SHAP - Using ML as a hypothesis extraction tool - [Exercise] Identifying contributing variables and interpreting the model (FI/SHAP) Session 10: Supervised Learning (IV): Classification - Cross-validation (k-fold CV) - Handling imbalanced data (under/oversampling, SMOTE) - [Exercise] Addressing imbalanced data and confirming performance changes Session 11: Supervised Learning (I): Regression - Estimation using machine learning (general theory of regression) - Pre-processing for regression tasks (constructing panel data) - Building regression models (panel data analysis, multiple linear regression) - [Exercise] Panel data analysis (analysis and prediction of land price trends) Session 12: Supervised Learning (II): Regression - Performance evaluation of regression models (R2, RMSE, MAE) - Algorithm selection and performance comparison (LightGBM) - Applying regression to time-series data - [Exercise] Time-series analysis and regression performance evaluation (analysis and prediction of economic indicator trends) Session 13: Unsupervised Learning: PCA and Cluster Analysis - Analyzing data components: Principal Component Analysis (PCA) - Grouping data into clusters: cluster analysis (k-means / hierarchical clustering) - Principal component cluster analysis - Factor analysis (concepts and positioning) - [Exercise] Customer segmentation through data analysis (PCA cluster analysis) Session 14: Wrap-up and Discussion (I): ML Overview, Prediction, and Causality - Reviewing the overall ML landscape and guidelines for algorithm selection - Strengths and limitations of ML / common pitfalls and failure cases - Data science in social and economic fields - Causal relationships vs. correlations - Natural experiments and Difference-in-Differences (DID) - Introduction to causal analysis approaches using ML - [Exercise] Post-course survey on machine learning Session 15: Wrap-up and Discussion (II): Designing ML Applications for Your Own Research - Reflection on the entire course - Exploring the application of ML to your own research - [Exercise] Final assignment: Design an ML application scenario for a topic of personal or research interest Students are required to submit reports.
Text/Reference Books,etc.	The textbooks will be assigned in the class.
PC or AV used in Class,etc.	Visual Materials, Microsoft Teams, Zoom, moodle
(More Details)
Learning techniques to be incorporated	Discussions, PBL (Problem-based Learning)/ TBL (Team-based Learning), Post-class Report
Suggestions on Preparation and Review	Preparation (Before Class): - Review the topics for the next session in advance, and come to class with your own thoughts and questions organized. - If there are any remaining tasks from the previous session or assigned homework, make progress on them before the next class. Review (After Class): - Continue working on any tasks or materials that you were unable to complete during class time. - If there are points you did not understand, do not leave them unclear-take notes and bring them to the next class. - Use the post-class report to organize your understanding of the day’s content in your own words.
Requirements
Grading Method	Your final grade will be evaluated based on your class participation and your assignment reports. Class Participation (Attendance and Active Involvement in Lectures/Workshops): 40% Assignment Reports: 60%
Practical Experience	Experienced
Summary of Practical Experience and Class Contents based on it	Leveraging professional experience in the commercialization of machine learning and collaborative research with private companies, this course focuses on machine learning methodologies that are highly effective and applicable in real-world business environments.
Message	Machine learning has become remarkably accessible today, thanks to the availability of open-source software libraries, modern development environments, and the assistance of Generative AI. In this course, we will focus strictly on practical programming as much as possible. We will cover foundational techniques with a clear eye toward the application of machine learning in the Humanities and Social Sciences. I encourage you to build a solid foundation that enables you to apply machine learning and practice data science in various contexts-whether in your own research or your future professional careers. My hope is that you will use these skills to tackle the specific problems you encounter within your own areas of expertise.
Other
Please fill in the class improvement questionnaire which is carried out on all classes. Instructors will reflect on your feedback and utilize the information for improving their teaching.

Back to syllabus main page