WMK00600 Data Collection, Utilization, and Publication

Hiroshima University Syllabus

Japanese

Academic Year 2026Year School/Graduate School Graduate School of Humanities and Social Sciences (Master's Course) Division of Humanities and Social Sciences Social Data Science Program

Lecture Code WMK00600 Subject Classification Specialized Education

Subject Name データ収集・活用・公開

Subject Name
（Katakana）データシュウシュウ・カツヨウ・コウカイ

Subject Name in
English Data Collection, Utilization, and Publication

Instructor WAKUDA YUKI,HARADA YUSUKE

Instructor
(Katakana) ワクダ　ユウキ,ハラダ　ユウスケ

Campus Higashi-Senda Semester/Term 1st-Year,  Second Semester,  Second Semester

Days, Periods, and Classrooms (2nd) Weds13-14

Lesson Style Lecture Lesson Style
(More Details) Online (simultaneous interactive), Online (on-demand)

Credits 2.0 Class Hours/Week 2 Language of Instruction J : Japanese

Course Level 5 : Graduate Basic

Course Area（Area） 24 : Social Sciences

Course Area（Discipline） 05 : Sociology

Eligible Students Social Scienses Data Science Program

Keywords Computational Social Science, Open Data, OSS, EBPM, Data Ethics, License

Special Subject for Teacher Education 　 Special Subject 　

Class Status
within Educational
Program
(Applicable only to targeted subjects for undergraduate students)
Criterion referenced
Evaluation
(Applicable only to targeted subjects for undergraduate students)

Class Objectives
/Class Outline Course Overview:
This course provides a systematic, example-based introduction to the collection, utilization, and publication of research data in computational social science. Topics span the entire data lifecycle, from data acquisition and anonymization to ethical considerations, license selection, and publication methods.

Course Objectives:
- To acquire the knowledge and judgment necessary at each stage of data collection, utilization, and publication
- To understand the ethical and legal foundations required for the appropriate handling of research data
- To develop practical skills for publishing and sharing data in line with the open science movement

Class Schedule Session 1: Guidance: What is Data Collection, Utilization, and Publication?
This session introduces the significance of data collection, utilization, and publication in computational social science.
    - Definitions and relationships among data collection, utilization, and publication; types of research data
    - Evidence quality issues and the impact of data quality on research and policy

Session 2: [Part 1: Collection] Design Philosophy of Data Collection and Sampling
Students consider what to collect, from whom, and how much. The session emphasizes the risks of collecting data without a clear purpose.
    - Designing data volume and quality according to research objectives
    - Sampling methods and the concept of selection bias
    - The distinction between exploratory and confirmatory analysis

Session 3: [Part 1: Collection] Experimental Design and Natural Experiments
Students learn the conceptual framework for establishing causal relationships, focusing on methods used in economics and social science.
    - Principles of RCT design and practical constraints
    - Quasi-experimental approaches (difference-in-differences, regression discontinuity)
    - Data quality issues in experimentally collected data

Session 4: [Part 1: Collection] Survey Design, Data Acquisition Methods, and SNS Data Collection
Students understand the diversity of data sources and learn the characteristics and limitations of each.
    - Principles and common pitfalls of survey design
    - Data acquisition methods (purchase, joint research, disclosure requests, open data)
    - SNS and web data collection and legal boundaries
    - Case study: SNS and web data collection

Session 5: [Part 1: Collection] Data Quality and Evidence Quality
Students develop the ability to assess the quality of collected data, and understand that not all evidence is equal from an EBPM perspective.
    - Handling missing values, outliers, and sensitivity analysis
    - Evidence quality and levels in EBPM
    - The importance of metadata and judging the limits of data

Session 6: [Part 2: Utilization] Personal Information and Data Anonymization
Students understand what is and is not permitted when data is linked to individuals, and learn anonymization techniques and their limitations.
    - Tiered structure of Japan's Act on the Protection of Personal Information
    - Anonymization approaches and re-identification risks
    - Alternatives when original data cannot be used (synthetic data)

Session 7: [Part 2: Utilization] SNS Data Utilization: Possibilities and Limitations
Students examine both the analytical potential and the inherent limitations of SNS data, a core resource in computational social science.
    - Inherent limitations of SNS data (lack of representativeness, algorithmic bias)
    - Ethical gray zones in SNS data utilization
    - Case study: SNS data utilization

Session 8: [Part 2: Utilization] Generative AI and Data Utilization
Students develop a multifaceted understanding of the new possibilities and ethical/legal issues arising from the widespread use of LLMs.
    - Risks and judgment criteria for inputting data into LLMs
    - Utilization of synthetic and dummy data generated by AI
    - Copyright issues and reliability of AI-generated content
    - Case study: Issues surrounding AI-generated content

Session 9: [Part 2: Utilization] Research Ethics, Institutional Review Boards (IRB), and Research Misconduct
Students learn the ethical judgment required when utilizing data for research, from procedural understanding to reasoning in gray zones.
    - IRB structure and procedures
    - Informed consent design and the challenges of consent in computational social science
    - Case study: Research ethics violations and misconduct

Session 10: [Part 3: Publication] Motivations, Objectives, and Strategies for Data Publication
Students understand that data publication is a strategic decision and consider the reasons for and against publication from multiple perspectives.
    - Publication motivations from the perspectives of researchers, companies, and government
    - Types of publication (open, registered, approved, contractual)
    - Risks, costs, and the decision not to publish
    - Case study: Government open data publication

Session 11: [Part 3: Publication] Understanding and Selecting Licenses
Students distinguish between two types of publication and understand how to select appropriate licenses for each.
    - Differences in purpose and method between data publication and code publication (OSS)
    - Data licenses (CC family) vs. code licenses (MIT, Apache, etc.)
    - New licensing issues in the age of AI

Session 12: [Part 3: Publication] Procedures for Safe Publication
Students learn practical approaches to pre-publication checks and metadata preparation.
    - Pre-publication re-identification risk checks
    - Metadata and documentation preparation (FAIR principles)
    - Machine readability of data formats and persistent identifiers (DOI)

Session 13: [Part 3: Publication] Methods and Platforms for Publication
Students organize publication methods along two axes: those obtaining data vs. those publishing data, and researchers vs. government/local authorities.
    - Features and selection of researcher-oriented repositories (Zenodo, GitHub, discipline-specific archives)
    - Role of government and local authority platforms (CKAN, data.go.jp, etc.)
    - Differences between code publication and data publication methods

Session 14: [Part 3: Publication] Publishing Data and Code Together: Ensuring Reproducibility
Students understand the concept of reproducibility packages and what it means to make data, code, and documentation publication-ready as a set.
    - Components of a reproducibility package (data, code, README, environment information)
    - Background to top journal requirements for publication (the reproducibility crisis)
    - Case study: Research data publication

Session 15: Summary: Creating a Data Management Plan
Students review the entire course and consolidate the key decision points at each phase of the data lifecycle.
    - Review of the entire data lifecycle
    - Creating a Data Management Plan (collection design, utilization methods, publication plan, ethical considerations)

Students are required to submit reports.

Text/Reference
Books,etc. The textbooks will be assigned in the class.

PC or AV used in
Class,etc. Text, Handouts, Visual Materials, Zoom, moodle

(More Details)

Learning techniques to be incorporated Discussions, Post-class Report

Suggestions on
Preparation and
Review Preparation (Before Class):
- Review the topics for the next session in advance, and come to class with your own thoughts and questions organized.
- If there are any remaining tasks from the previous session or assigned homework, make progress on them before the next class.
Review (After Class):
- Continue working on any tasks or materials that you were unable to complete during class time.
- If there are points you did not understand, do not leave them unclear?take notes and bring them to the next class.
- Use the post-class report to organize your understanding of the day’s content in your own words.

Requirements

Grading Method Comprehensive evaluation of attitude to class and report

Practical Experience

Summary of Practical Experience and Class Contents based on it

Message

Other

Please fill in the class improvement questionnaire which is carried out on all classes.
Instructors will reflect on your feedback and utilize the information for improving their teaching.

Academic Year	2026Year	School/Graduate School	Graduate School of Humanities and Social Sciences (Master's Course) Division of Humanities and Social Sciences Social Data Science Program
Lecture Code	WMK00600	Subject Classification	Specialized Education
Subject Name	データ収集・活用・公開
Subject Name （Katakana）	データシュウシュウ・カツヨウ・コウカイ
Subject Name in English	Data Collection, Utilization, and Publication
Instructor	WAKUDA YUKI,HARADA YUSUKE
Instructor (Katakana)	ワクダ　ユウキ,ハラダ　ユウスケ
Campus	Higashi-Senda	Semester/Term	1st-Year, Second Semester, Second Semester
Days, Periods, and Classrooms	(2nd) Weds13-14
Lesson Style	Lecture	Lesson Style (More Details)	Online (simultaneous interactive), Online (on-demand)

Credits	2.0	Class Hours/Week	2	Language of Instruction	J : Japanese
Course Level	5 : Graduate Basic
Course Area（Area）	24 : Social Sciences
Course Area（Discipline）	05 : Sociology
Eligible Students	Social Scienses Data Science Program
Keywords	Computational Social Science, Open Data, OSS, EBPM, Data Ethics, License
Special Subject for Teacher Education		Special Subject
Class Status within Educational Program (Applicable only to targeted subjects for undergraduate students)
Criterion referenced Evaluation (Applicable only to targeted subjects for undergraduate students)
Class Objectives /Class Outline	Course Overview: This course provides a systematic, example-based introduction to the collection, utilization, and publication of research data in computational social science. Topics span the entire data lifecycle, from data acquisition and anonymization to ethical considerations, license selection, and publication methods. Course Objectives: - To acquire the knowledge and judgment necessary at each stage of data collection, utilization, and publication - To understand the ethical and legal foundations required for the appropriate handling of research data - To develop practical skills for publishing and sharing data in line with the open science movement
Class Schedule	Session 1: Guidance: What is Data Collection, Utilization, and Publication? This session introduces the significance of data collection, utilization, and publication in computational social science. - Definitions and relationships among data collection, utilization, and publication; types of research data - Evidence quality issues and the impact of data quality on research and policy Session 2: [Part 1: Collection] Design Philosophy of Data Collection and Sampling Students consider what to collect, from whom, and how much. The session emphasizes the risks of collecting data without a clear purpose. - Designing data volume and quality according to research objectives - Sampling methods and the concept of selection bias - The distinction between exploratory and confirmatory analysis Session 3: [Part 1: Collection] Experimental Design and Natural Experiments Students learn the conceptual framework for establishing causal relationships, focusing on methods used in economics and social science. - Principles of RCT design and practical constraints - Quasi-experimental approaches (difference-in-differences, regression discontinuity) - Data quality issues in experimentally collected data Session 4: [Part 1: Collection] Survey Design, Data Acquisition Methods, and SNS Data Collection Students understand the diversity of data sources and learn the characteristics and limitations of each. - Principles and common pitfalls of survey design - Data acquisition methods (purchase, joint research, disclosure requests, open data) - SNS and web data collection and legal boundaries - Case study: SNS and web data collection Session 5: [Part 1: Collection] Data Quality and Evidence Quality Students develop the ability to assess the quality of collected data, and understand that not all evidence is equal from an EBPM perspective. - Handling missing values, outliers, and sensitivity analysis - Evidence quality and levels in EBPM - The importance of metadata and judging the limits of data Session 6: [Part 2: Utilization] Personal Information and Data Anonymization Students understand what is and is not permitted when data is linked to individuals, and learn anonymization techniques and their limitations. - Tiered structure of Japan's Act on the Protection of Personal Information - Anonymization approaches and re-identification risks - Alternatives when original data cannot be used (synthetic data) Session 7: [Part 2: Utilization] SNS Data Utilization: Possibilities and Limitations Students examine both the analytical potential and the inherent limitations of SNS data, a core resource in computational social science. - Inherent limitations of SNS data (lack of representativeness, algorithmic bias) - Ethical gray zones in SNS data utilization - Case study: SNS data utilization Session 8: [Part 2: Utilization] Generative AI and Data Utilization Students develop a multifaceted understanding of the new possibilities and ethical/legal issues arising from the widespread use of LLMs. - Risks and judgment criteria for inputting data into LLMs - Utilization of synthetic and dummy data generated by AI - Copyright issues and reliability of AI-generated content - Case study: Issues surrounding AI-generated content Session 9: [Part 2: Utilization] Research Ethics, Institutional Review Boards (IRB), and Research Misconduct Students learn the ethical judgment required when utilizing data for research, from procedural understanding to reasoning in gray zones. - IRB structure and procedures - Informed consent design and the challenges of consent in computational social science - Case study: Research ethics violations and misconduct Session 10: [Part 3: Publication] Motivations, Objectives, and Strategies for Data Publication Students understand that data publication is a strategic decision and consider the reasons for and against publication from multiple perspectives. - Publication motivations from the perspectives of researchers, companies, and government - Types of publication (open, registered, approved, contractual) - Risks, costs, and the decision not to publish - Case study: Government open data publication Session 11: [Part 3: Publication] Understanding and Selecting Licenses Students distinguish between two types of publication and understand how to select appropriate licenses for each. - Differences in purpose and method between data publication and code publication (OSS) - Data licenses (CC family) vs. code licenses (MIT, Apache, etc.) - New licensing issues in the age of AI Session 12: [Part 3: Publication] Procedures for Safe Publication Students learn practical approaches to pre-publication checks and metadata preparation. - Pre-publication re-identification risk checks - Metadata and documentation preparation (FAIR principles) - Machine readability of data formats and persistent identifiers (DOI) Session 13: [Part 3: Publication] Methods and Platforms for Publication Students organize publication methods along two axes: those obtaining data vs. those publishing data, and researchers vs. government/local authorities. - Features and selection of researcher-oriented repositories (Zenodo, GitHub, discipline-specific archives) - Role of government and local authority platforms (CKAN, data.go.jp, etc.) - Differences between code publication and data publication methods Session 14: [Part 3: Publication] Publishing Data and Code Together: Ensuring Reproducibility Students understand the concept of reproducibility packages and what it means to make data, code, and documentation publication-ready as a set. - Components of a reproducibility package (data, code, README, environment information) - Background to top journal requirements for publication (the reproducibility crisis) - Case study: Research data publication Session 15: Summary: Creating a Data Management Plan Students review the entire course and consolidate the key decision points at each phase of the data lifecycle. - Review of the entire data lifecycle - Creating a Data Management Plan (collection design, utilization methods, publication plan, ethical considerations) Students are required to submit reports.
Text/Reference Books,etc.	The textbooks will be assigned in the class.
PC or AV used in Class,etc.	Text, Handouts, Visual Materials, Zoom, moodle
(More Details)
Learning techniques to be incorporated	Discussions, Post-class Report
Suggestions on Preparation and Review	Preparation (Before Class): - Review the topics for the next session in advance, and come to class with your own thoughts and questions organized. - If there are any remaining tasks from the previous session or assigned homework, make progress on them before the next class. Review (After Class): - Continue working on any tasks or materials that you were unable to complete during class time. - If there are points you did not understand, do not leave them unclear?take notes and bring them to the next class. - Use the post-class report to organize your understanding of the day’s content in your own words.
Requirements
Grading Method	Comprehensive evaluation of attitude to class and report
Practical Experience
Summary of Practical Experience and Class Contents based on it
Message
Other
Please fill in the class improvement questionnaire which is carried out on all classes. Instructors will reflect on your feedback and utilize the information for improving their teaching.

Back to syllabus main page