WNF03000 Basics of Machine Learning

Hiroshima University Syllabus

Japanese

Academic Year 2026Year School/Graduate School Graduate School of Humanities and Social Sciences (Master's Course) Division of Educational Sciences Education Data Science Program

Lecture Code WNF03000 Subject Classification Specialized Education

Subject Name 機械学習の基礎

Subject Name
（Katakana）

Subject Name in
English Basics of Machine Learning

Instructor TANAKA HIDEYUKI,HARADA YUSUKE

Instructor
(Katakana) タナカ　ヒデユキ,ハラダ　ユウスケ

Campus Higashi-Hiroshima Semester/Term 1st-Year, First Semester, 2Term

Days, Periods, and Classrooms (2T) Weds5-8：EDU K115

Lesson Style Seminar Lesson Style
(More Details) Face-to-face

Credits 2.0 Class Hours/Week 4 Language of Instruction J : Japanese

Course Level 5 : Graduate Basic

Course Area（Area） 24 : Social Sciences

Course Area（Discipline） 07 : Education

Eligible Students Social Data Science Program

Keywords Machine Learning, Data science, AI, Social Sciences, Education

Special Subject for Teacher Education 　 Special Subject 　

Class Status
within Educational
Program
(Applicable only to targeted subjects for undergraduate students)
Criterion referenced
Evaluation
(Applicable only to targeted subjects for undergraduate students)

Class Objectives
/Class Outline Course Overview
In this course, we define Machine Learning (ML) as a "technology for making predictions, classifications, and discovering structures from data to support decision-making," and students will learn the concepts and practical applications of representative methodologies. Using regression, classification, and clustering as subject matter, students will acquire skills in data pre-processing, training, validation (hold-out and cross-validation), selecting appropriate evaluation metrics, and addressing overfitting through Python-based hands-on exercises. Furthermore, the course covers critical professional perspectives such as model interpretability, reproducibility, ethics, and privacy (ELSI). The goal is to develop the foundational ability to select appropriate methods for real-world data and provide clear, logical explanations of the results.

Learning Goals
・Understand representative methods of supervised learning (classification/regression) and unsupervised learning, and select and apply appropriate analytical techniques for given data and challenges.

・Execute the entire machine learning workflow—from data pre-processing to model construction and evaluation—using Python.

・Evaluate the performance of constructed models using appropriate metrics, interpret the analytical results, and explain them clearly to a non-expert audience.

Class Schedule Session 1: Introduction to Machine Learning
Overview of Machine Learning (ML) and AI
The ML Lifecycle: CRISP-DM process
Understanding ML applications through comparison with human learning
ELSI (Ethical, Legal, and Social Issues) in ML/AI utilization
Session 2: Python Fundamentals for ML (I)
Software execution environment: Google Colaboratory (Colab)
Markdown notation for documentation
Utilizing Open Data
Session 3: Python Fundamentals for ML (II)
Basics of data processing (Data types, Pandas)
Execution fundamentals (Control structures, Functions)
Data visualization basics (Matplotlib, japanize-matplotlib)
Session 4: Python Fundamentals for ML (III)
Introduction to Generative AI-Assisted Coding (Code generation, execution, and review)
Data ingestion (Reading CSV and Excel files)
Basics of data manipulation (groupby, merge/join, fillna)
Session 5: Data Pre-processing (I)
Data Cleansing (Handling missing values, feature selection, and standardization)
Session 6: Data Pre-processing (II)
Data pre-processing for ML (Introduction to scikit-learn, data integration, and aggregation)
Train-test split methodology
Session 7: Supervised Learning (I): Classification
Automating classification with ML (Introduction to Classification and Decision Trees)
Pre-processing for classification tasks
Building classification models (Decision Trees)
Session 8: Supervised Learning (I): Classification (Continued)
Performance evaluation of classification models (Accuracy, Recall, F-measure, Confusion Matrix, and P-R Curve)
Generalization performance and Overfitting
Algorithm selection, performance comparison, and Hyperparameter Tuning (Random Forest, XGBoost)
Session 9: Supervised Learning (I): Classification (Continued)
Cross-validation (k-fold CV)
Handling Imbalanced Data
Model Interpretation (Feature Importance (FI), SHAP)
Session 10: Supervised Learning (II): Regression
Forecasting with ML (General theory of Regression)
Pre-processing for regression tasks
Building regression models (Panel Data Analysis, Multiple Linear Regression)
Session 11: Supervised Learning (II): Regression (Continued)
Performance evaluation of regression models ( RMSE, MAE)
Algorithm selection and performance comparison (Multiple Linear Regression vs. LightGBM)
Session 12: Unsupervised Learning (I)
Analyzing data components: Principal Component Analysis (PCA)
Grouping data into clusters: Cluster Analysis (k-means / Hierarchical Clustering)
Session 13: Unsupervised Learning (II)
General theory of Anomaly Detection
Anomaly detection using unsupervised ML (Isolation Forest)
Session 14: Wrap-up and Discussion (I): Overview of ML
Reviewing the overall ML landscape and guidelines for algorithm selection
Strengths and limitations of Machine Learning
Precautions and common pitfalls/failure cases in ML application
Session 15: Wrap-up and Discussion (II): Reflections on ML Application
Retrospective: Reflecting on the entire course
Exploring the application of ML to your own research
Discussing ML utilization in Social Implementation and business development

Students are required to submit reports.

Text/Reference
Books,etc. The textbooks will be assigned in the class.

PC or AV used in
Class,etc. Visual Materials, moodle

(More Details)

Learning techniques to be incorporated Discussions, Post-class Report

Suggestions on
Preparation and
Review Sessions 1–4: Introduction and Utilizing Python/Generative AI
Preparation: Log in to Google Colab, ask a Generative AI (e.g., ChatGPT) to "Show me the Python code to read a CSV file using Pandas," and observe the execution results.

Review: Referring to the Markdown materials distributed in class, try adding explanatory comments to your own code (in Japanese or your preferred language) to clarify "what the code is doing."

Sessions 5–6: Data Pre-processing
Preparation: Think about your own criteria for deciding whether to "drop" or "fill" missing values (blanks) when you encounter them in a dataset.

Review: Articulate why splitting data into training and test sets (Train-test split) using scikit-learn is absolutely essential for evaluating model accuracy.

Sessions 7–9: Supervised Learning (Classification)
Preparation: Keeping in mind that there are cases (e.g., medical diagnoses) where a 1% critical oversight is unacceptable even with 99% "Accuracy," research the different types of evaluation metrics available.

Review: Check which variables were most important (FI and SHAP) based on the execution results of Random Forest or XGBoost, and discuss whether this aligns with your human intuition.

Sessions 10–11: Supervised Learning (Regression/Prediction)
Preparation: Clarify the difference between "Classification (Yes/No)" and "Regression (Numerical Prediction)," and consider which category familiar data (e.g., stock prices, temperatures, sales) falls into.

Review: Compare the prediction accuracy of Multiple Linear Regression and Gradient Boosting (LightGBM), and identify the predictive "tendencies" or characteristics specific to each method.

Sessions 12–13: Unsupervised Learning (PCA, Clustering, Anomaly Detection)
Preparation: Try to conceptualize the mechanism (e.g., the concept of mathematical distance) used to find "similar items" in a dataset that has no "correct answer" labels.

Review: Based on the results of the Cluster Analysis, try assigning descriptive names to the divided groups (e.g., "The [Characteristic] Group") to practice interpreting the analytical outcomes.

Sessions 14–15: Wrap-up and Social Implementation
Preparation: Look for real-world "failure cases" (e.g., in the news) where machine learning was implemented, and consider the reasons behind the failure (e.g., data quality, ELSI, operational issues).

Review: Select the one method learned in class that seems most "applicable" to your own research or future career, and write a brief proposal (about half an A4 page) outlining how you would utilize it.

Requirements

Grading Method Your final grade will be evaluated based on your class participation and your assignment reports.

Class Participation (Attendance and Active Involvement in Lectures/Workshops): 40%

Assignment Reports: 60%

Practical Experience Experienced

Summary of Practical Experience and Class Contents based on it Leveraging professional experience in the commercialization of machine learning and collaborative research with private companies, this course focuses on machine learning methodologies that are highly effective and applicable in real-world business environments.

Message Machine learning has become remarkably accessible today, thanks to the availability of open-source software libraries, modern development environments, and the assistance of Generative AI.

In this course, we will focus strictly on practical programming as much as possible. We will cover foundational techniques with a clear eye toward the application of machine learning in the Humanities and Social Sciences.

I encourage you to build a solid foundation that enables you to apply machine learning and practice data science in various contexts—whether in your own research or your future professional careers. My hope is that you will use these skills to tackle the specific problems you encounter within your own areas of expertise.

Other

Please fill in the class improvement questionnaire which is carried out on all classes.
Instructors will reflect on your feedback and utilize the information for improving their teaching.

Academic Year	2026Year	School/Graduate School	Graduate School of Humanities and Social Sciences (Master's Course) Division of Educational Sciences Education Data Science Program
Lecture Code	WNF03000	Subject Classification	Specialized Education
Subject Name	機械学習の基礎
Subject Name （Katakana）
Subject Name in English	Basics of Machine Learning
Instructor	TANAKA HIDEYUKI,HARADA YUSUKE
Instructor (Katakana)	タナカ　ヒデユキ,ハラダ　ユウスケ
Campus	Higashi-Hiroshima	Semester/Term	1st-Year, First Semester, 2Term
Days, Periods, and Classrooms	(2T) Weds5-8：EDU K115
Lesson Style	Seminar	Lesson Style (More Details)	Face-to-face

Credits	2.0	Class Hours/Week	4	Language of Instruction	J : Japanese
Course Level	5 : Graduate Basic
Course Area（Area）	24 : Social Sciences
Course Area（Discipline）	07 : Education
Eligible Students	Social Data Science Program
Keywords	Machine Learning, Data science, AI, Social Sciences, Education
Special Subject for Teacher Education		Special Subject
Class Status within Educational Program (Applicable only to targeted subjects for undergraduate students)
Criterion referenced Evaluation (Applicable only to targeted subjects for undergraduate students)
Class Objectives /Class Outline	Course Overview In this course, we define Machine Learning (ML) as a "technology for making predictions, classifications, and discovering structures from data to support decision-making," and students will learn the concepts and practical applications of representative methodologies. Using regression, classification, and clustering as subject matter, students will acquire skills in data pre-processing, training, validation (hold-out and cross-validation), selecting appropriate evaluation metrics, and addressing overfitting through Python-based hands-on exercises. Furthermore, the course covers critical professional perspectives such as model interpretability, reproducibility, ethics, and privacy (ELSI). The goal is to develop the foundational ability to select appropriate methods for real-world data and provide clear, logical explanations of the results. Learning Goals ・Understand representative methods of supervised learning (classification/regression) and unsupervised learning, and select and apply appropriate analytical techniques for given data and challenges. ・Execute the entire machine learning workflow—from data pre-processing to model construction and evaluation—using Python. ・Evaluate the performance of constructed models using appropriate metrics, interpret the analytical results, and explain them clearly to a non-expert audience.
Class Schedule	Session 1: Introduction to Machine Learning Overview of Machine Learning (ML) and AI The ML Lifecycle: CRISP-DM process Understanding ML applications through comparison with human learning ELSI (Ethical, Legal, and Social Issues) in ML/AI utilization Session 2: Python Fundamentals for ML (I) Software execution environment: Google Colaboratory (Colab) Markdown notation for documentation Utilizing Open Data Session 3: Python Fundamentals for ML (II) Basics of data processing (Data types, Pandas) Execution fundamentals (Control structures, Functions) Data visualization basics (Matplotlib, japanize-matplotlib) Session 4: Python Fundamentals for ML (III) Introduction to Generative AI-Assisted Coding (Code generation, execution, and review) Data ingestion (Reading CSV and Excel files) Basics of data manipulation (groupby, merge/join, fillna) Session 5: Data Pre-processing (I) Data Cleansing (Handling missing values, feature selection, and standardization) Session 6: Data Pre-processing (II) Data pre-processing for ML (Introduction to scikit-learn, data integration, and aggregation) Train-test split methodology Session 7: Supervised Learning (I): Classification Automating classification with ML (Introduction to Classification and Decision Trees) Pre-processing for classification tasks Building classification models (Decision Trees) Session 8: Supervised Learning (I): Classification (Continued) Performance evaluation of classification models (Accuracy, Recall, F-measure, Confusion Matrix, and P-R Curve) Generalization performance and Overfitting Algorithm selection, performance comparison, and Hyperparameter Tuning (Random Forest, XGBoost) Session 9: Supervised Learning (I): Classification (Continued) Cross-validation (k-fold CV) Handling Imbalanced Data Model Interpretation (Feature Importance (FI), SHAP) Session 10: Supervised Learning (II): Regression Forecasting with ML (General theory of Regression) Pre-processing for regression tasks Building regression models (Panel Data Analysis, Multiple Linear Regression) Session 11: Supervised Learning (II): Regression (Continued) Performance evaluation of regression models ( RMSE, MAE) Algorithm selection and performance comparison (Multiple Linear Regression vs. LightGBM) Session 12: Unsupervised Learning (I) Analyzing data components: Principal Component Analysis (PCA) Grouping data into clusters: Cluster Analysis (k-means / Hierarchical Clustering) Session 13: Unsupervised Learning (II) General theory of Anomaly Detection Anomaly detection using unsupervised ML (Isolation Forest) Session 14: Wrap-up and Discussion (I): Overview of ML Reviewing the overall ML landscape and guidelines for algorithm selection Strengths and limitations of Machine Learning Precautions and common pitfalls/failure cases in ML application Session 15: Wrap-up and Discussion (II): Reflections on ML Application Retrospective: Reflecting on the entire course Exploring the application of ML to your own research Discussing ML utilization in Social Implementation and business development Students are required to submit reports.
Text/Reference Books,etc.	The textbooks will be assigned in the class.
PC or AV used in Class,etc.	Visual Materials, moodle
(More Details)
Learning techniques to be incorporated	Discussions, Post-class Report
Suggestions on Preparation and Review	Sessions 1–4: Introduction and Utilizing Python/Generative AI Preparation: Log in to Google Colab, ask a Generative AI (e.g., ChatGPT) to "Show me the Python code to read a CSV file using Pandas," and observe the execution results. Review: Referring to the Markdown materials distributed in class, try adding explanatory comments to your own code (in Japanese or your preferred language) to clarify "what the code is doing." Sessions 5–6: Data Pre-processing Preparation: Think about your own criteria for deciding whether to "drop" or "fill" missing values (blanks) when you encounter them in a dataset. Review: Articulate why splitting data into training and test sets (Train-test split) using scikit-learn is absolutely essential for evaluating model accuracy. Sessions 7–9: Supervised Learning (Classification) Preparation: Keeping in mind that there are cases (e.g., medical diagnoses) where a 1% critical oversight is unacceptable even with 99% "Accuracy," research the different types of evaluation metrics available. Review: Check which variables were most important (FI and SHAP) based on the execution results of Random Forest or XGBoost, and discuss whether this aligns with your human intuition. Sessions 10–11: Supervised Learning (Regression/Prediction) Preparation: Clarify the difference between "Classification (Yes/No)" and "Regression (Numerical Prediction)," and consider which category familiar data (e.g., stock prices, temperatures, sales) falls into. Review: Compare the prediction accuracy of Multiple Linear Regression and Gradient Boosting (LightGBM), and identify the predictive "tendencies" or characteristics specific to each method. Sessions 12–13: Unsupervised Learning (PCA, Clustering, Anomaly Detection) Preparation: Try to conceptualize the mechanism (e.g., the concept of mathematical distance) used to find "similar items" in a dataset that has no "correct answer" labels. Review: Based on the results of the Cluster Analysis, try assigning descriptive names to the divided groups (e.g., "The [Characteristic] Group") to practice interpreting the analytical outcomes. Sessions 14–15: Wrap-up and Social Implementation Preparation: Look for real-world "failure cases" (e.g., in the news) where machine learning was implemented, and consider the reasons behind the failure (e.g., data quality, ELSI, operational issues). Review: Select the one method learned in class that seems most "applicable" to your own research or future career, and write a brief proposal (about half an A4 page) outlining how you would utilize it.
Requirements
Grading Method	Your final grade will be evaluated based on your class participation and your assignment reports. Class Participation (Attendance and Active Involvement in Lectures/Workshops): 40% Assignment Reports: 60%
Practical Experience	Experienced
Summary of Practical Experience and Class Contents based on it	Leveraging professional experience in the commercialization of machine learning and collaborative research with private companies, this course focuses on machine learning methodologies that are highly effective and applicable in real-world business environments.
Message	Machine learning has become remarkably accessible today, thanks to the availability of open-source software libraries, modern development environments, and the assistance of Generative AI. In this course, we will focus strictly on practical programming as much as possible. We will cover foundational techniques with a clear eye toward the application of machine learning in the Humanities and Social Sciences. I encourage you to build a solid foundation that enables you to apply machine learning and practice data science in various contexts—whether in your own research or your future professional careers. My hope is that you will use these skills to tackle the specific problems you encounter within your own areas of expertise.
Other
Please fill in the class improvement questionnaire which is carried out on all classes. Instructors will reflect on your feedback and utilize the information for improving their teaching.

Back to syllabus main page