Data access and regulation

A.Y. 2019/2020
9
Max ECTS
60
Overall hours
SSD
INF/01 IUS/09
Language
English
Learning objectives
The objective of the course is to give students a multidisciplinary appoach to data processing.
According with this objective, each module is focused on a specific aspect.
The aim of the first module is to provide students with the essenatial elements of data protection law, making them familiars with principles, rights and duties set by the General Data Protection Regulation (GDPR)
The aim of the second module is to provide students with the essential elements of data access, making them familiar with the tools and techniques that enable gathering social science data from the Internet. In this setting, students will develop their data literacy and analysis skills, and reinforce their ability to develop code to produce applications or to achieve analytical goals.
The aim of the third module is to give students a clear understanding of the data storage, access and sharing options which are available to them - ranging from simple flat file structures to commercial managed cloud databases - so that they will be able to effectively choose the correct data approach for their research projects. A well-chosen data methodology allows your data management tools to do a lot of laborious work for you, enables smooth collaboration with your colleagues and ensures that you will be able to share your research data when the time comes to publish your work.
Expected learning outcomes
At the end of the course students should have acquired: knowledge and understanding of the fundamental legal concepts of data protection; autonomous ability to read the new European regulatory standards; ability to understand the essential element of data access and data storage; capacity to choose a correct data approach. The final exam aims to ascertain the results achieved in terms of learning, acquisition of specialized language and techniques as well as in terms of putting the rules and the principles of data access acquired into practice in small matters.
Course syllabus and organization

Single session

Responsible
Lesson period
Second trimester
Course syllabus
Programme for attending students

I (first module): Introduction. The European concept of privacy between EU and ECHR; The relevant data subjects; Territorial and material scope; Principles and conditions relating to processing of personal data; Rights of the data subject; The Member States' Independent Supervisory Authorities and the European Data Protection Board; Competence, tasks and powers, Remedies and penalties; Transfers of personal data to third countries (non-EU countries); IA and Data protection.

II (second module): Introduction (How to get help when using R,- Data wrangling with dplyr, Automated and reproducible reporting using rmarkdown, Tidy data principles, Project-oriented workflow, Version control, Git, GitHub) Web scraping ( HTML tags, robots.txt; Web scraping, the rvest package, the RCrawler package; XPath, HTTP protocol and URLs (libcurl, RCurl); XML and JSON; Regular expressions). Dealing with APIs (Introduction to (web-service, RESTful) APIs; Basic API flow; Social media APIs; Cloud computing in AWS.


III (third module): The module will begin with a discussion of the challenges presented to social science researchers by Big Data and the increasingly rigorous demands for transparency and open sharing of research data, and outline the considerations which researchers need to keep in mind when designing a data management plan for their project. Next we will introduce a variety of different data management tools, including flat files, local databases, and large-scale cloud storage systems. For each of these we will discuss strengths and weaknesses, and the kinds of data or analysis for which they might be suitable, and students will have an opportunity to try working with them (adding, manipulating and exporting data) in lab sessions, where they will learn the basics of the data specification and query syntaxes (SQL and NoSQL) that are commonly used for these systems. Finally, we will discuss best practices for creating a data management plan that will address considerations such as collaborating with other researchers, ensuring that your valuable data is securely backed up, and allowing you to easily share it publicly when your research is published. The module will conclude with a project in which students will design a data management plan for a hypothetical research project and write a short report justifying the choices they made.

Programme for not attending students
I (first module): Introduction. The European concept of privacy between EU and ECHR; The relevant data subjects; Territorial and material scope; Principles and conditions relating to processing of personal data; Rights of the data subject; The Member States' Independent Supervisory Authorities and the European Data Protection Board; Competence, tasks and powers, Remedies and penalties; Transfers of personal data to third countries (non-EU countries); IA and Data protection.

II (second module): Introduction (How to get help when using R,- Data wrangling with dplyr, Automated and reproducible reporting using rmarkdown, Tidy data principles, Project-oriented workflow, Version control, Git, GitHub) Web scraping ( HTML tags, robots.txt; Web scraping, the rvest package, the RCrawler package; XPath, HTTP protocol and URLs (libcurl, RCurl); XML and JSON; Regular expressions). Dealing with APIs (Introduction to (web-service, RESTful) APIs; Basic API flow; Social media APIs; Cloud computing in AWS.

III (third module): The module will begin with a discussion of the challenges presented to social science researchers by Big Data and the increasingly rigorous demands for transparency and open sharing of research data, and outline the considerations which researchers need to keep in mind when designing a data management plan for their project. Next we will introduce a variety of different data management tools, including flat files, local databases, and large-scale cloud storage systems. For each of these we will discuss strengths and weaknesses, and the kinds of data or analysis for which they might be suitable, and students will have an opportunity to try working with them (adding, manipulating and exporting data) in lab sessions, where they will learn the basics of the data specification and query syntaxes (SQL and NoSQL) that are commonly used for these systems. Finally, we will discuss best practices for creating a data management plan that will address considerations such as collaborating with other researchers, ensuring that your valuable data is securely backed up, and allowing you to easily share it publicly when your research is published. The module will conclude with a project in which students will design a data management plan for a hypothetical research project and write a short report justifying the choices they made.
Prerequisites for admission
The course will assume a basic understanding of the R programming language. Importantly, students are required to have familiarity with basic computer notions (e.g., find, copy and paste folders, move files). Finally, some basic statistics notions would facilitate and support the students understanding/developing applications.
Some code examples used in the course - especially those related to using cloud services - will be in Python, so you may find it beneficial to take a short online tutorial to help you understand the basics of that language.
Assessment for this portion of the course will be a series of short assignments and one final slightly longer assignment (designing and justifying a data management scheme for a hypothetical research project).
Teaching methods
A mixture of lectures and hands-on lab sessions.
Lectures are accompanied by slides and cases.
The teaching method is consistent with the "particular -> general -> particular" principle, moving from applications to general concepts and back. The method also incentives students' independence in shaping the learning process. Applications and class exercises are inspired by the idea that once can only learn data science by doing, not by reading.
Teaching Resources
The texts and the exam readings, for first and third module, will be indicated on Ariel platform.

For the second module:

- Wickham, H., and G. Grolemund (2014). R for Data Science. O'Reilly Media.
- Munzert et al. 2014. Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. London: Wiley & Sons.
Additional material consists of various documents (such as packages' documentation and official manuals). All the reading material will be available to the students before the start of the course (through Perusall.com).
Assessment methods and Criteria
Attending students can take the examon the basis of intermediate verifications which will be held at the end of each module. Class participation and reading assignments are part of the evaluation
The first verification will be on multiple choice questions and an open-ended question. The objective is to verify the argumentative and reasoning skills.
The second verification will be a series of short assignments and one final slightly longer assignment (designing and justifying a data management scheme for a hypothetical research project).
The third verification is based on: 1) three short exercises in which the whole class will cooperate to solve a data science problem; 3) a capstone project to be handed in on Git Hub one month after the end of the course, and accompanied by a report of about 2000 words.

Not attending students can take the exam on the basis of three verifications. Please contact instructors for details.
INF/01 - INFORMATICS - University credits: 3
IUS/09 - PUBLIC LAW - University credits: 6
Lessons: 60 hours
Professor(s)
Reception:
Book over email
Online only