Data Access and Regulation

A.Y. 2021/2022
12
Max ECTS
80
Overall hours
SSD
INF/01 IUS/09
Language
English
Learning objectives
The goal of the course is to provide students with a multidisciplinary overview on data processing. According with this objective, each module is focused on a specific aspect.

First module
The first module is devoted to the key elements of data protection law, and will thus explore principles, rights and duties set by the EU General Data Protection Regulation (GDPR) also in light of the relevant case law of European courts.

Second module
Students are increasingly keen on developing empirical projects that involve some big data application, such as analyses dealing with social media data or a large database. However, even when they possess sufficient knowledge of a general programming language, such as R or Python, they usually lack the practical side of working with unstructured and big data. Furthermore, the frontier is moving fast, and big data mining and modeling tools have quickly become essential in the social scientist's toolkit.
This second module of Data Access and Regulation guides the students to move their first steps into data mining, offering real case studies and exercises to learn (by doing) how to handle big data in their future work. The seminar is designed to introduce students to the various skills that are needed to access the (big) data ocean confidently and, importantly, to self-learn new skills in the future. All activities rely on the R programming language and, thus, students are expected to have a minimal working knowledge in R (basic notions will be refreshed in class).

Third module
The third module introduces students to strategies and technologies that are used to handle, store and access large volumes of research data, with a focus on approaches that enable seamless collaboration and ensure your data are accessible to other researchers in the long term.
Expected learning outcomes
Students actively participating in the first module are expected to develop an expertise on the legal status and significance of the right to privacy and data protection, and to become familiar with the current legal framework in Europe and the challenges that the digital technologies have posed to this right over the last decades.

Students actively participating in the second module are expected:
1. [Competence in data analysis] to become proficient in data analysis, learning advanced topics such as data wrangling, code efficiency, and workflow reproducibility.
2. [Data literacy] To understand and critically assess data-related issues arising in applied research problems with big data.
3. [Data mining] To gather handle structured and unstructured data subsequently unfolding the tidying data process.
4. [Coding skills] Ability to develop and debug complex code throughout the data analysis cycle (mining, tidying, analyzing, reporting).
5. [Research and analytical skills] Ability to develop original ideas and elaborate feasible big data designs to test their validity.

The third module will give students an overview of database and other data storage technologies, showing them the strengths and weaknesses of each approach and equipping them to make informed decisions about which technologies and platforms to use in their own future research projects. Students will also be introduced to data management planning documents, which are increasingly commonly required as part of research funding proposals, and show them what considerations should be included in the creation of such a plan.
Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Course syllabus and organization

Single session

Lesson period
Second trimester
The lessons will be held synchronously through the Microsoft Teams platform.
Course syllabus
First module (Marco Bassini)
The first module of the course will consist of three clusters: the first one will be devoted to exploring the origins of the right to privacy, originally meant as "right to be let alone", and its protection in the case law of US and European courts; the second cluster will delve into the European legal framework (most notably the GDPR), capturing the evolution of the right to privacy in light of the large scale processing of data; the third cluster will cover recent challenges in data protection emerged in the digital age.

Second module (Andrea de Angelis)
The second module is structured in three blocks:
1) an introductory block covering the essential knowledge for working with big data (notions of R programming, developing reproducible code, reporting in automated notebooks, version control using Git and GitHub; review of existing data for social science research).
2) A data access block focusing on web scraping and functional topics (introduction to regular expressions, HTML language, XML and JSON data structures).
3) A third block introduces more advanced data access concepts, such as API interaction, and more advanced practice live coding sessions.

Third module (Robert Fahley)
The third module will give students an overview of database and other data storage technologies, showing them the strengths and weaknesses of each approach and equipping them to make informed decisions about which technologies and platforms to use in their own future research projects. Students will also be introduced to data management planning documents, which are increasingly commonly required as part of research funding proposals, and show them what considerations should be included in the creation of such a plan.
Prerequisites for admission
As far as the first module (Marco Bassini) of the course is concerned, students are expected to have a genuine interest in the relationship between the right to privacy and digital technologies. Intellectual curiosity will be a key value for students to explore the different phases through which the right to privacy has evolved in reaction to the technological changes and developments.

The main requirement for the second module (Andrea De Angelis) is an intrinsic motivation to learn data mining and statistical programming techniques. The course will assume a basic understanding of R and basic computer notions (e.g., find, copy and paste folders). Introductory notions of descriptive and inferential statistics would facilitate the understanding of some applications and examples. Students lacking these required notions are allowed to attend the seminar, but they are also expected to make some additional effort in order to keep up the class' learning pace. This course adopts a platform called Perusall, where students are expected to conduct the reading activity and to support each other in the learning process. Students are additionally required to have a functioning laptop with preliminarily installed R and RStudio. Cloud versions of R can be circulated upon request to support students with old hardware or lacking admin privileges on their machine. Students should also install Git, create a GitHub account.

An elementary knowledge of R is required for the third module (Robert Fahley).
Teaching methods
The first module (Marco Bassini) will include both frontal lectures and sessions devoted to case studies discussion.

The second module (Andrea De Angelis) adopts a participated and collaborative learning style. A typical session is divided into two parts. In the first part, the course is similar to a participated lecture: students are expected to have already familiarized themselves with the topic of the day having read the assigned texts on Perusall. The second part is usually hands-on and the format shifts towards the workshop, or a guided class exercise.

In principle, the third module (Robert Fahley) will be taught in-person as a combination of lectures/demonstrations and lab classes. Lab sessions are essential to this course, as hands-on experience of the various technologies is indispensible; students will be given a short list of free software to install on their laptops prior to the start of the module, and are expected to troubleshoot any technical problems with the course instructor ahead of time.
Teaching Resources
First module (Marco Bassini)
Mandatory readings will be provided by the instructor for each session. These reading may include essays, reports and/or courts' judgments. A list of mandatory readings will be circulated at the beginning of the course.

Second module (Andrea De Angelis)
A mandatory reading is provided for each session of the second module. Mandatory readings usually consist of an introductory text with basic notions regarding the topic of the day. Supplementary texts are not graded and usually consist of scientific contributions applying the topic covered in that session, or more advanced texts digging deeper into issues of particular interest.
We will rely on the following textbooks:
- Imai, K. (2017). Quantitative Social Science: An Introduction. Princeton: Princeton University Press.
- Wickham, H., and G. Grolemund (2014). R for Data Science. O'Reilly Media. The book is also freely available online: https://r4ds.had.co.nz/.
- Munzert et al. (2014). Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. London: Wiley & Sons.

Third module (Robert Fahley)
-Foster I, Ghani R, Jarmin RS, Kreuter F, Lane J (eds.) (2016), "Big Data and Social Science", CRC Press, http://www.bigdatasocialscience.com/
-Teorey T, Lightstone S, Nadeau T, Jagadish HV (2011), "Database Modeling & Design (5th Edition)", Elsevier, https://doi.org/10.1016/C2009-0-62960-3
-Van den Eynden V, Corti L, Woollard M, Bishop L, Horton L (2011), "Managing and Sharing Data: Best Practice for Researchers", UK Data Archive, https://ukdataservice.ac.uk/media/622417/managingsharing.pdf
-Jones S (2011), "How to Develop a Data Management and Sharing Plan", Digital Curation Centre, http://www.dcc.ac.uk/sites/default/files/documents/publications/reports/guides/How%20to%20Develop.pdf
Assessment methods and Criteria
To pass the first module (Marco Bassini) students must pass an oral exam, consisting of an interview focusing on the subject-matters touched upon throughout the course. Positive class participation is also taken into account in the final evaluation.

To pass the second module (Andrea De Angelis) students must participate in all class activities, including: attending all the sessions and reading the assigned texts (25% of the overall final grade), solving the assigned exercises (25%), and realize a capstone project in small groups (2/3 students) involving an application of choice (50%). Students are allowed to fail their project, and the grade will not be affected, as far as students' effort and progress are manifested and documented (on GitHub).

For enrolled students, course grades in the third module (Robert Fahley) will be based on a combination of lab and home assignments (60%), and two larger assignments given at the end of the first and second week of the course (15% and 25% respectively).
INF/01 - INFORMATICS - University credits: 6
IUS/09 - PUBLIC LAW - University credits: 6
Lessons: 80 hours
Professors: Bassini Marco, De Angelis Andrea, Fahey Robert Andrew
Professor(s)