| Academic Year |
2026Year |
School/Graduate School |
Graduate School of Humanities and Social Sciences (Master's Course) Division of Humanities and Social Sciences Social Data Science Program |
| Lecture Code |
WMK00400 |
Subject Classification |
Specialized Education |
| Subject Name |
機械学習の基礎 |
Subject Name (Katakana) |
キカイガクシュウノキソ |
Subject Name in English |
Basics of Machine Learning |
| Instructor |
WAKUDA YUKI,KAJIKAWA HIROAKI,HARADA YUSUKE |
Instructor (Katakana) |
ワクダ ユウキ,カジカワ ヒロアキ,ハラダ ユウスケ |
| Campus |
Higashi-Senda |
Semester/Term |
1st-Year, First Semester, 2Term |
| Days, Periods, and Classrooms |
(2T) Tues13-14,Weds13-14:Online,Higashi-Senda Seminar Rm 3 |
| Lesson Style |
Lecture |
Lesson Style (More Details) |
Online (simultaneous interactive) |
| |
| Credits |
2.0 |
Class Hours/Week |
4 |
Language of Instruction |
J
:
Japanese |
| Course Level |
5
:
Graduate Basic
|
| Course Area(Area) |
24
:
Social Sciences |
| Course Area(Discipline) |
07
:
Education |
| Eligible Students |
Social Data Science Program |
| Keywords |
Machine Learning, Data science, AI, Social Sciences, Education |
| Special Subject for Teacher Education |
|
Special Subject |
|
Class Status within Educational Program (Applicable only to targeted subjects for undergraduate students) | |
|---|
Criterion referenced Evaluation (Applicable only to targeted subjects for undergraduate students) | |
Class Objectives /Class Outline |
Course Overview: This course approaches machine learning as "a technology that supports decision-making by enabling prediction, classification, and structure discovery from data." Students will learn the key concepts and appropriate applications of representative methods. Through Python-based exercises covering regression, classification, and clustering, students will develop practical skills in data preprocessing, model training, validation (hold-out / cross-validation), selection of evaluation metrics, and handling overfitting. Generative AI-assisted coding will be actively utilized throughout the exercises, allowing students to engage in practical data processing and analysis in collaboration with AI. The course also addresses practically important perspectives such as model performance evaluation and interpretability, cultivating the foundational ability to select appropriate methods and clearly explain results for real-world data.
Learning Objectives: - Understand the representative methods of supervised learning (classification and regression) and unsupervised learning, and be able to select and apply appropriate analytical methods for given data and tasks. - Execute the complete workflow of machine learning — from data preprocessing through model building to evaluation — using Python with the assistance of generative AI. - Evaluate model performance using appropriate metrics, interpret analytical results, and explain findings clearly in research and policy contexts. |
| Class Schedule |
Session 1: Introduction to Machine Learning - Overview of Machine Learning (ML) and AI - The ML lifecycle: CRISP-DM process - Understanding ML applications through comparison with human learning - ELSI (Ethical, Legal, and Social Issues) in ML/AI utilization - [Exercise] Pre-course survey on machine learning Session 2: Python Fundamentals for ML (I) - Software execution environment: Google Colaboratory (Colab) - Markdown notation for documentation - Utilization of Open Data - Introduction to generative AI-assisted coding (Gemini on Colab, prompt design) - [Exercise] Creating and editing a Notebook in Google Colab Session 3: Python Fundamentals for ML (II) - Basics of data processing (data types, Pandas) - Execution fundamentals (control structures, functions) - Process design using flowcharts (input / process / output thinking) - Data visualization basics (Matplotlib, japanize-matplotlib) - [Exercise] Loading an Excel file and visualizing data Session 4: Python Fundamentals for ML (III) - Basics of data manipulation (groupby, join, fillna) - Best practices for AI-assisted coding (modularization, configuration at the top, code review) - [Exercise] Data aggregation, integration, and operation verification Session 5: Data Pre-processing (I) - Data quality - Data cleansing and variable selection - [Exercise] Data quality assessment, cleansing, record removal, and variable selection Session 6: Data Pre-processing (II) - Data pre-processing for ML (Train-test split) - Categorical variable encoding (get_dummies) - [Exercise] Completing ML-ready input data (organizing target and explanatory variables / completing quantification) Session 7: Supervised Learning (I): Classification - Automating classification with ML (introduction to classification and decision trees) - Building classification models (decision trees) - Selecting and switching analytical methods - Introduction to model performance evaluation (Precision, Recall, F-score) - [Exercise] Building a model to identify customers likely to churn Session 8: Supervised Learning (II): Classification - Performance evaluation of classification models (F-score, confusion matrix, P-R curve, AUC) - Separating training and evaluation data (Hold-out method) - Generalization performance and overfitting - [Exercise] Evaluating classification model performance (confusion matrix / P-R curve) Session 9: Supervised Learning (III): Classification - Model interpretation using Feature Importance (FI) - Model interpretation using SHAP - Using ML as a hypothesis extraction tool - [Exercise] Identifying contributing variables and interpreting the model (FI/SHAP) Session 10: Supervised Learning (IV): Classification - Cross-validation (k-fold CV) - Handling imbalanced data (under/oversampling, SMOTE) - [Exercise] Addressing imbalanced data and confirming performance changes Session 11: Supervised Learning (I): Regression - Estimation using machine learning (general theory of regression) - Pre-processing for regression tasks (constructing panel data) - Building regression models (panel data analysis, multiple linear regression) - [Exercise] Panel data analysis (analysis and prediction of land price trends) Session 12: Supervised Learning (II): Regression - Performance evaluation of regression models (R2, RMSE, MAE) - Algorithm selection and performance comparison (LightGBM) - Applying regression to time-series data - [Exercise] Time-series analysis and regression performance evaluation (analysis and prediction of economic indicator trends) Session 13: Unsupervised Learning: PCA and Cluster Analysis - Analyzing data components: Principal Component Analysis (PCA) - Grouping data into clusters: cluster analysis (k-means / hierarchical clustering) - Principal component cluster analysis - Factor analysis (concepts and positioning) - [Exercise] Customer segmentation through data analysis (PCA cluster analysis) Session 14: Wrap-up and Discussion (I): ML Overview, Prediction, and Causality - Reviewing the overall ML landscape and guidelines for algorithm selection - Strengths and limitations of ML / common pitfalls and failure cases - Data science in social and economic fields - Causal relationships vs. correlations - Natural experiments and Difference-in-Differences (DID) - Introduction to causal analysis approaches using ML - [Exercise] Post-course survey on machine learning Session 15: Wrap-up and Discussion (II): Designing ML Applications for Your Own Research - Reflection on the entire course - Exploring the application of ML to your own research - [Exercise] Final assignment: Design an ML application scenario for a topic of personal or research interest
Students are required to submit reports. |
Text/Reference Books,etc. |
The textbooks will be assigned in the class. |
PC or AV used in Class,etc. |
Visual Materials, Microsoft Teams, Zoom, moodle |
| (More Details) |
|
| Learning techniques to be incorporated |
Discussions, PBL (Problem-based Learning)/ TBL (Team-based Learning), Post-class Report |
Suggestions on Preparation and Review |
Preparation (Before Class): - Review the topics for the next session in advance, and come to class with your own thoughts and questions organized. - If there are any remaining tasks from the previous session or assigned homework, make progress on them before the next class. Review (After Class): - Continue working on any tasks or materials that you were unable to complete during class time. - If there are points you did not understand, do not leave them unclear-take notes and bring them to the next class. - Use the post-class report to organize your understanding of the day’s content in your own words. |
| Requirements |
|
| Grading Method |
Your final grade will be evaluated based on your class participation and your assignment reports. Class Participation (Attendance and Active Involvement in Lectures/Workshops): 40% Assignment Reports: 60% |
| Practical Experience |
Experienced
|
| Summary of Practical Experience and Class Contents based on it |
Leveraging professional experience in the commercialization of machine learning and collaborative research with private companies, this course focuses on machine learning methodologies that are highly effective and applicable in real-world business environments. |
| Message |
Machine learning has become remarkably accessible today, thanks to the availability of open-source software libraries, modern development environments, and the assistance of Generative AI. In this course, we will focus strictly on practical programming as much as possible. We will cover foundational techniques with a clear eye toward the application of machine learning in the Humanities and Social Sciences. I encourage you to build a solid foundation that enables you to apply machine learning and practice data science in various contexts-whether in your own research or your future professional careers. My hope is that you will use these skills to tackle the specific problems you encounter within your own areas of expertise. |
| Other |
|
Please fill in the class improvement questionnaire which is carried out on all classes. Instructors will reflect on your feedback and utilize the information for improving their teaching. |