| Academic Year |
2026Year |
School/Graduate School |
Graduate School of Humanities and Social Sciences (Master's Course) Division of Humanities and Social Sciences Social Data Science Program |
| Lecture Code |
WMK00600 |
Subject Classification |
Specialized Education |
| Subject Name |
データ収集・活用・公開 |
Subject Name (Katakana) |
データシュウシュウ・カツヨウ・コウカイ |
Subject Name in English |
Data Collection, Utilization, and Publication |
| Instructor |
WAKUDA YUKI,HARADA YUSUKE |
Instructor (Katakana) |
ワクダ ユウキ,ハラダ ユウスケ |
| Campus |
Higashi-Senda |
Semester/Term |
1st-Year, Second Semester, Second Semester |
| Days, Periods, and Classrooms |
(2nd) Weds13-14 |
| Lesson Style |
Lecture |
Lesson Style (More Details) |
Online (simultaneous interactive), Online (on-demand) |
| |
| Credits |
2.0 |
Class Hours/Week |
2 |
Language of Instruction |
J
:
Japanese |
| Course Level |
5
:
Graduate Basic
|
| Course Area(Area) |
24
:
Social Sciences |
| Course Area(Discipline) |
05
:
Sociology |
| Eligible Students |
Social Scienses Data Science Program |
| Keywords |
Computational Social Science, Open Data, OSS, EBPM, Data Ethics, License |
| Special Subject for Teacher Education |
|
Special Subject |
|
Class Status within Educational Program (Applicable only to targeted subjects for undergraduate students) | |
|---|
Criterion referenced Evaluation (Applicable only to targeted subjects for undergraduate students) | |
Class Objectives /Class Outline |
Course Overview: This course provides a systematic, example-based introduction to the collection, utilization, and publication of research data in computational social science. Topics span the entire data lifecycle, from data acquisition and anonymization to ethical considerations, license selection, and publication methods.
Course Objectives: - To acquire the knowledge and judgment necessary at each stage of data collection, utilization, and publication - To understand the ethical and legal foundations required for the appropriate handling of research data - To develop practical skills for publishing and sharing data in line with the open science movement |
| Class Schedule |
Session 1: Guidance: What is Data Collection, Utilization, and Publication? This session introduces the significance of data collection, utilization, and publication in computational social science. - Definitions and relationships among data collection, utilization, and publication; types of research data - Evidence quality issues and the impact of data quality on research and policy Session 2: [Part 1: Collection] Design Philosophy of Data Collection and Sampling Students consider what to collect, from whom, and how much. The session emphasizes the risks of collecting data without a clear purpose. - Designing data volume and quality according to research objectives - Sampling methods and the concept of selection bias - The distinction between exploratory and confirmatory analysis Session 3: [Part 1: Collection] Experimental Design and Natural Experiments Students learn the conceptual framework for establishing causal relationships, focusing on methods used in economics and social science. - Principles of RCT design and practical constraints - Quasi-experimental approaches (difference-in-differences, regression discontinuity) - Data quality issues in experimentally collected data Session 4: [Part 1: Collection] Survey Design, Data Acquisition Methods, and SNS Data Collection Students understand the diversity of data sources and learn the characteristics and limitations of each. - Principles and common pitfalls of survey design - Data acquisition methods (purchase, joint research, disclosure requests, open data) - SNS and web data collection and legal boundaries - Case study: SNS and web data collection Session 5: [Part 1: Collection] Data Quality and Evidence Quality Students develop the ability to assess the quality of collected data, and understand that not all evidence is equal from an EBPM perspective. - Handling missing values, outliers, and sensitivity analysis - Evidence quality and levels in EBPM - The importance of metadata and judging the limits of data Session 6: [Part 2: Utilization] Personal Information and Data Anonymization Students understand what is and is not permitted when data is linked to individuals, and learn anonymization techniques and their limitations. - Tiered structure of Japan's Act on the Protection of Personal Information - Anonymization approaches and re-identification risks - Alternatives when original data cannot be used (synthetic data) Session 7: [Part 2: Utilization] SNS Data Utilization: Possibilities and Limitations Students examine both the analytical potential and the inherent limitations of SNS data, a core resource in computational social science. - Inherent limitations of SNS data (lack of representativeness, algorithmic bias) - Ethical gray zones in SNS data utilization - Case study: SNS data utilization Session 8: [Part 2: Utilization] Generative AI and Data Utilization Students develop a multifaceted understanding of the new possibilities and ethical/legal issues arising from the widespread use of LLMs. - Risks and judgment criteria for inputting data into LLMs - Utilization of synthetic and dummy data generated by AI - Copyright issues and reliability of AI-generated content - Case study: Issues surrounding AI-generated content Session 9: [Part 2: Utilization] Research Ethics, Institutional Review Boards (IRB), and Research Misconduct Students learn the ethical judgment required when utilizing data for research, from procedural understanding to reasoning in gray zones. - IRB structure and procedures - Informed consent design and the challenges of consent in computational social science - Case study: Research ethics violations and misconduct Session 10: [Part 3: Publication] Motivations, Objectives, and Strategies for Data Publication Students understand that data publication is a strategic decision and consider the reasons for and against publication from multiple perspectives. - Publication motivations from the perspectives of researchers, companies, and government - Types of publication (open, registered, approved, contractual) - Risks, costs, and the decision not to publish - Case study: Government open data publication Session 11: [Part 3: Publication] Understanding and Selecting Licenses Students distinguish between two types of publication and understand how to select appropriate licenses for each. - Differences in purpose and method between data publication and code publication (OSS) - Data licenses (CC family) vs. code licenses (MIT, Apache, etc.) - New licensing issues in the age of AI Session 12: [Part 3: Publication] Procedures for Safe Publication Students learn practical approaches to pre-publication checks and metadata preparation. - Pre-publication re-identification risk checks - Metadata and documentation preparation (FAIR principles) - Machine readability of data formats and persistent identifiers (DOI) Session 13: [Part 3: Publication] Methods and Platforms for Publication Students organize publication methods along two axes: those obtaining data vs. those publishing data, and researchers vs. government/local authorities. - Features and selection of researcher-oriented repositories (Zenodo, GitHub, discipline-specific archives) - Role of government and local authority platforms (CKAN, data.go.jp, etc.) - Differences between code publication and data publication methods Session 14: [Part 3: Publication] Publishing Data and Code Together: Ensuring Reproducibility Students understand the concept of reproducibility packages and what it means to make data, code, and documentation publication-ready as a set. - Components of a reproducibility package (data, code, README, environment information) - Background to top journal requirements for publication (the reproducibility crisis) - Case study: Research data publication Session 15: Summary: Creating a Data Management Plan Students review the entire course and consolidate the key decision points at each phase of the data lifecycle. - Review of the entire data lifecycle - Creating a Data Management Plan (collection design, utilization methods, publication plan, ethical considerations)
Students are required to submit reports. |
Text/Reference Books,etc. |
The textbooks will be assigned in the class. |
PC or AV used in Class,etc. |
Text, Handouts, Visual Materials, Zoom, moodle |
| (More Details) |
|
| Learning techniques to be incorporated |
Discussions, Post-class Report |
Suggestions on Preparation and Review |
Preparation (Before Class): - Review the topics for the next session in advance, and come to class with your own thoughts and questions organized. - If there are any remaining tasks from the previous session or assigned homework, make progress on them before the next class. Review (After Class): - Continue working on any tasks or materials that you were unable to complete during class time. - If there are points you did not understand, do not leave them unclear?take notes and bring them to the next class. - Use the post-class report to organize your understanding of the day’s content in your own words. |
| Requirements |
|
| Grading Method |
Comprehensive evaluation of attitude to class and report |
| Practical Experience |
|
| Summary of Practical Experience and Class Contents based on it |
|
| Message |
|
| Other |
|
Please fill in the class improvement questionnaire which is carried out on all classes. Instructors will reflect on your feedback and utilize the information for improving their teaching. |