| Academic Year |
2026Year |
School/Graduate School |
Graduate School of Humanities and Social Sciences (Master's Course) Division of Educational Sciences Education Data Science Program |
| Lecture Code |
WNF03000 |
Subject Classification |
Specialized Education |
| Subject Name |
機械学習の基礎 |
Subject Name (Katakana) |
|
Subject Name in English |
Basics of Machine Learning |
| Instructor |
TANAKA HIDEYUKI,HARADA YUSUKE |
Instructor (Katakana) |
タナカ ヒデユキ,ハラダ ユウスケ |
| Campus |
Higashi-Hiroshima |
Semester/Term |
1st-Year, First Semester, 2Term |
| Days, Periods, and Classrooms |
(2T) Weds5-8:EDU K115 |
| Lesson Style |
Seminar |
Lesson Style (More Details) |
Face-to-face |
| |
| Credits |
2.0 |
Class Hours/Week |
4 |
Language of Instruction |
J
:
Japanese |
| Course Level |
5
:
Graduate Basic
|
| Course Area(Area) |
24
:
Social Sciences |
| Course Area(Discipline) |
07
:
Education |
| Eligible Students |
Social Data Science Program |
| Keywords |
Machine Learning, Data science, AI, Social Sciences, Education |
| Special Subject for Teacher Education |
|
Special Subject |
|
Class Status within Educational Program (Applicable only to targeted subjects for undergraduate students) | |
|---|
Criterion referenced Evaluation (Applicable only to targeted subjects for undergraduate students) | |
Class Objectives /Class Outline |
Course Overview In this course, we define Machine Learning (ML) as a "technology for making predictions, classifications, and discovering structures from data to support decision-making," and students will learn the concepts and practical applications of representative methodologies. Using regression, classification, and clustering as subject matter, students will acquire skills in data pre-processing, training, validation (hold-out and cross-validation), selecting appropriate evaluation metrics, and addressing overfitting through Python-based hands-on exercises. Furthermore, the course covers critical professional perspectives such as model interpretability, reproducibility, ethics, and privacy (ELSI). The goal is to develop the foundational ability to select appropriate methods for real-world data and provide clear, logical explanations of the results.
Learning Goals ・Understand representative methods of supervised learning (classification/regression) and unsupervised learning, and select and apply appropriate analytical techniques for given data and challenges.
・Execute the entire machine learning workflow—from data pre-processing to model construction and evaluation—using Python.
・Evaluate the performance of constructed models using appropriate metrics, interpret the analytical results, and explain them clearly to a non-expert audience. |
| Class Schedule |
Session 1: Introduction to Machine Learning Overview of Machine Learning (ML) and AI The ML Lifecycle: CRISP-DM process Understanding ML applications through comparison with human learning ELSI (Ethical, Legal, and Social Issues) in ML/AI utilization Session 2: Python Fundamentals for ML (I) Software execution environment: Google Colaboratory (Colab) Markdown notation for documentation Utilizing Open Data Session 3: Python Fundamentals for ML (II) Basics of data processing (Data types, Pandas) Execution fundamentals (Control structures, Functions) Data visualization basics (Matplotlib, japanize-matplotlib) Session 4: Python Fundamentals for ML (III) Introduction to Generative AI-Assisted Coding (Code generation, execution, and review) Data ingestion (Reading CSV and Excel files) Basics of data manipulation (groupby, merge/join, fillna) Session 5: Data Pre-processing (I) Data Cleansing (Handling missing values, feature selection, and standardization) Session 6: Data Pre-processing (II) Data pre-processing for ML (Introduction to scikit-learn, data integration, and aggregation) Train-test split methodology Session 7: Supervised Learning (I): Classification Automating classification with ML (Introduction to Classification and Decision Trees) Pre-processing for classification tasks Building classification models (Decision Trees) Session 8: Supervised Learning (I): Classification (Continued) Performance evaluation of classification models (Accuracy, Recall, F-measure, Confusion Matrix, and P-R Curve) Generalization performance and Overfitting Algorithm selection, performance comparison, and Hyperparameter Tuning (Random Forest, XGBoost) Session 9: Supervised Learning (I): Classification (Continued) Cross-validation (k-fold CV) Handling Imbalanced Data Model Interpretation (Feature Importance (FI), SHAP) Session 10: Supervised Learning (II): Regression Forecasting with ML (General theory of Regression) Pre-processing for regression tasks Building regression models (Panel Data Analysis, Multiple Linear Regression) Session 11: Supervised Learning (II): Regression (Continued) Performance evaluation of regression models ( RMSE, MAE) Algorithm selection and performance comparison (Multiple Linear Regression vs. LightGBM) Session 12: Unsupervised Learning (I) Analyzing data components: Principal Component Analysis (PCA) Grouping data into clusters: Cluster Analysis (k-means / Hierarchical Clustering) Session 13: Unsupervised Learning (II) General theory of Anomaly Detection Anomaly detection using unsupervised ML (Isolation Forest) Session 14: Wrap-up and Discussion (I): Overview of ML Reviewing the overall ML landscape and guidelines for algorithm selection Strengths and limitations of Machine Learning Precautions and common pitfalls/failure cases in ML application Session 15: Wrap-up and Discussion (II): Reflections on ML Application Retrospective: Reflecting on the entire course Exploring the application of ML to your own research Discussing ML utilization in Social Implementation and business development
Students are required to submit reports. |
Text/Reference Books,etc. |
The textbooks will be assigned in the class. |
PC or AV used in Class,etc. |
Visual Materials, moodle |
| (More Details) |
|
| Learning techniques to be incorporated |
Discussions, Post-class Report |
Suggestions on Preparation and Review |
Sessions 1–4: Introduction and Utilizing Python/Generative AI Preparation: Log in to Google Colab, ask a Generative AI (e.g., ChatGPT) to "Show me the Python code to read a CSV file using Pandas," and observe the execution results.
Review: Referring to the Markdown materials distributed in class, try adding explanatory comments to your own code (in Japanese or your preferred language) to clarify "what the code is doing."
Sessions 5–6: Data Pre-processing Preparation: Think about your own criteria for deciding whether to "drop" or "fill" missing values (blanks) when you encounter them in a dataset.
Review: Articulate why splitting data into training and test sets (Train-test split) using scikit-learn is absolutely essential for evaluating model accuracy.
Sessions 7–9: Supervised Learning (Classification) Preparation: Keeping in mind that there are cases (e.g., medical diagnoses) where a 1% critical oversight is unacceptable even with 99% "Accuracy," research the different types of evaluation metrics available.
Review: Check which variables were most important (FI and SHAP) based on the execution results of Random Forest or XGBoost, and discuss whether this aligns with your human intuition.
Sessions 10–11: Supervised Learning (Regression/Prediction) Preparation: Clarify the difference between "Classification (Yes/No)" and "Regression (Numerical Prediction)," and consider which category familiar data (e.g., stock prices, temperatures, sales) falls into.
Review: Compare the prediction accuracy of Multiple Linear Regression and Gradient Boosting (LightGBM), and identify the predictive "tendencies" or characteristics specific to each method.
Sessions 12–13: Unsupervised Learning (PCA, Clustering, Anomaly Detection) Preparation: Try to conceptualize the mechanism (e.g., the concept of mathematical distance) used to find "similar items" in a dataset that has no "correct answer" labels.
Review: Based on the results of the Cluster Analysis, try assigning descriptive names to the divided groups (e.g., "The [Characteristic] Group") to practice interpreting the analytical outcomes.
Sessions 14–15: Wrap-up and Social Implementation Preparation: Look for real-world "failure cases" (e.g., in the news) where machine learning was implemented, and consider the reasons behind the failure (e.g., data quality, ELSI, operational issues).
Review: Select the one method learned in class that seems most "applicable" to your own research or future career, and write a brief proposal (about half an A4 page) outlining how you would utilize it. |
| Requirements |
|
| Grading Method |
Your final grade will be evaluated based on your class participation and your assignment reports.
Class Participation (Attendance and Active Involvement in Lectures/Workshops): 40%
Assignment Reports: 60% |
| Practical Experience |
Experienced
|
| Summary of Practical Experience and Class Contents based on it |
Leveraging professional experience in the commercialization of machine learning and collaborative research with private companies, this course focuses on machine learning methodologies that are highly effective and applicable in real-world business environments. |
| Message |
Machine learning has become remarkably accessible today, thanks to the availability of open-source software libraries, modern development environments, and the assistance of Generative AI.
In this course, we will focus strictly on practical programming as much as possible. We will cover foundational techniques with a clear eye toward the application of machine learning in the Humanities and Social Sciences.
I encourage you to build a solid foundation that enables you to apply machine learning and practice data science in various contexts—whether in your own research or your future professional careers. My hope is that you will use these skills to tackle the specific problems you encounter within your own areas of expertise. |
| Other |
|
Please fill in the class improvement questionnaire which is carried out on all classes. Instructors will reflect on your feedback and utilize the information for improving their teaching. |