Data Analysis

A.Y. 2021/2022
12
Max ECTS
80
Overall hours
SSD
SPS/07
Language
English
Learning objectives
The objective of the course is to acquire a solid foundation in applied statistical methodology for the social sciences. By the end of the course students will master the basic toolkit of quantitative research both from a theoretical and a practical/applied standpoint.
Expected learning outcomes
Reach proficiency in various types of univariate and bivariate analyses. Understand what it means to make inference in the social sciences and how to do it in different circumstances. Become competent in hypothesis testing with different types of variables. Be able to produce basic statistical analyses of quantitative data independently using Stata. Provide students with a working knowledge of the software R for statistical data analysis and reporting.
Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Course syllabus and organization

Single session

Lesson period
First trimester
More specific information on the delivery modes of training activities for academic year 2021/2022 will be provided over the coming months, based on the evolution of the public health situation.
Course syllabus
The "Data Analysis" module aims at providing students with a solid foundation in applied statistical methodology. Students who attend and successfully complete the course will master the basic toolkit of quantitative research (i.e. cases, types of variables, datasets, hypotheses testing); will achieve an understanding of why sampling is used, different sampling methods and how to make predictions (inference) in the social sciences; they will be proficient with the main tools for univariate and bivariate analyses. Students will also receive basic training for the use of the statistical software Stata and, by the end of the course, they will be able to produce basic statistical analyses of quantitative data independently.
The topic covered are: Introduction, variables and samples; Descriptive statistics, Introduction to Stata, setting up the workspace, descriptive statistics, Probabilities and distributions, Generating and modifying variables in Stata, Inference and estimation, Significance tests; Point and interval estimation with Stata; Comparing two groups and associations between categorical variables, Cross-tabulation in Stata, Linear regression and correlation, ANOVA, Linear regression and ANOVA in Stata, Introduction to logistic regression and to multivariate relationships; Setting up and executing a quantitative research analysis in Stata.

The goal of the "Introduction to R" module is to provide students with a working knowledge of the software R for statistical data analysis and reporting. Students who complete the course will be shown how to use R for a variety of operations, including: data exploration and management, basic and advanced statistical operations (frequencies, hypothesis testing, linear regression), data visualization, and development of data products such as reports and slides.
Prerequisites for admission
No previous background in statistics is required to take this course
Teaching methods
The course includes both lectures and lab sessions. Students are given in-class and take-home assignments and are asked to work in groups and/or individually. Lab sessions include individual exercises with the software Stata and R.
Teaching Resources
For "Data Analysis":
Alan Agresti and Barbara Finlay (2014), Statistical Methods for the Social Sciences. Pearson, 4th Edition
Chapters: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15.

For Stata: syntax will be provided by the professor on ARIEL
Useful (not mandatory) textbooks for learning how to use Stata on your own:
Ulrich Kohler & Frauke Kreuter (2012). Data Analysis Using Stata. Stata Press, 3rd Edition
Alan Acock (2014). A Gentle Introduction to Stata. Stata Press. 4th Edition

For the "Introduction to R" module:
The course is not based on a textbook. Materials, mostly in the form of HTML tutorials, will be provided by the instructor. However, in preparing the class the instructor will draw from the following two books, which students may consult in case they want to delve more deeply into the topics discussed:
Fox, J., and Weisberg, S. An R Companion to Applied Regression (3rd Edition). Sage, 2019
Grolemund, G., and Wickham, H. R for Data Science. O'Reilly, 2017 (available online and for free at https://r4ds.had.co.nz/)
Assessment methods and Criteria
For the "Data Analysis" module (60 hours), attending students are asked to participate in at least 80% of the classes. They will be evaluated for their participation in class and for doing and handing in homework as instructed in class. The final exam for attendees includes multiple-choice questions and exercises (similar to those assigned for the homework). Attendees will also write a short assignment, in groups or individually, based on the analysis of micro-level data using the software Stata. Non-attendees will take a comprehensive final exam on all the material assigned in the textbook.

For the "Introduction to R" module (20 hours), students will be assessed on the basis of 3 home assignments (2 smaller midterm assignments and 1 larger final assignment). In order to pass the course, students will have to complete all 3 assignments, regardless of whether they attend the class or not.
Unita' didattica 1
SPS/07 - GENERAL SOCIOLOGY - University credits: 9
Lessons: 60 hours
Unita' didattica 2 - Introduction to R
SPS/07 - GENERAL SOCIOLOGY - University credits: 3
Lessons: 20 hours
Professor: Vegetti Federico
Professor(s)
Reception:
By appointment
On Microsoft Teams