Data Analysis

A.Y. 2020/2021
12
Max ECTS
80
Overall hours
SSD
SPS/07
Language
English
Learning objectives
The objective of the course is to acquire a solid foundation in applied statistical methodology for the social sciences. By the end of the course students will master the basic toolkit of quantitative research both from a theoretical and a practical/applied standpoint.
Expected learning outcomes
Reach proficiency in various types of univariate and bivariate analyses. Understand what it means to make inference in the social sciences and how to do it in different circumstances. Become competent in hypothesis testing with different types of variables. Be able to produce basic statistical analyses of quantitative data independently using Stata. Achieve basic competences in the understanding and production of time series analyses. Provide students with a working knowledge of the software R for statistical data analysis and reporting.
Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Course syllabus and organization

Single session

Lesson period
First trimester
During the health emergency, the course undergoes the following changes:
Teaching methods:
The course will take place mostly (about two thirds) in the classroom (booking is compulsory using the app). Students who don't book a seat will be able to follow the class remotely. The remaining third of the course will be help remotely with non-synchronous classes where students will develop the same abilities they would have achieved in presence.
The course calendar and all details of the activities will be published in the online course before the beginning of the classes. All updates will be published on the website of the course. Students are required to check their institutional e-mail account often (@studenti.unimi.it).
The means and criteria for sitting classes in presence, which need to be booked using the app, will be published in advance on the online course.
Students are considered as attending the course even if they do not take part in the classes in presence, but attend the class in streaming and submit the exercises and the take home lab sessions.
Course material:
students who attend the course must refer to the mandatory textbook and to the materials such as exercises and lab sessions that are published in the online course.
Course syllabus
The "Data Analysis" module aims at providing students with a solid foundation in applied statistical methodology. Students who attend and successfully complete the course will master the basic toolkit of quantitative research (i.e. cases, types of variables, datasets, hypotheses testing); will achieve an understanding of why sampling is used, different sampling methods and how to make predictions (inference) in the social sciences; they will be proficient with the main tools for univariate and bivariate analyses. Students will also receive basic training for the use of the statistical software Stata and, by the end of the course, they will be able to produce basic statistical analyses of quantitative data independently.
The topic covered are: Introduction, variables and samples; Descriptive statistics, Introduction to Stata, setting up the workspace, descriptive statistics, Probabilities and distributions, Generating and modifying variables in Stata, Inference and estimation, Significance tests; Point and interval estimation with Stata; Comparing two groups and associations between categorical variables, Cross-tabulation in Stata, Linear regression and correlation, Anova, Linear regression and Anova in Stata, Introduction to logistic regression and to multivariate relationships; Setting up and executing a quantitative research analysis in Stata.

The introductory module on "Time Series Analysis" will provide the methodological basis for time series analysis and technical tools for the descriptive analysis, decomposition and forecasting of time series using Excel and Stata.

The goal of the "Introduction to R" module is to provide students with a working knowledge of the software R for statistical data analysis and reporting. Students who complete the course will be shown how to use R for a variety of operations, including: data exploration and management, basic and advanced statistical operations (frequencies, hypothesis testing, linear regression), data visualization, and development of data products such as reports and slides.
Prerequisites for admission
No previous background in statistics is required to take this course
Teaching methods
All three modules include lectures and lab sessions. Students are given in class and take-home assignments and are asked to work individually and/or in groups. Lab sessions include individual exercises with Stata, Excel and R.
Teaching Resources
For "Data Analysis":
Alan Agresti and Barbara Finlay (2014), Statistical Methods for the Social Sciences. Pearson, 4th Edition
Chapters: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15.

For Stata: syntax will be provided by the professor in ARIEL
Useful (not mandatory) textbooks for learning how to use Stata on your own:
Ulrich Kohler & Frauke Kreuter (2012). Data Analysis Using Stata. Stata Press, 3rd Edition
Alan Acock (2014). A Gentle Introduction to Stata. Stata Press. 4th Edition

For the introductory module on "Time Series Analysis":
Barrow Michael (2017) Statistics for Economics, Accounting and Business Studies, Pearson, 7th Edition.
Chapters: 1, 10 and 11.
Further materials will be provided by professor in ARIEL.

For the "Introduction to R" module:
The course is not based on a textbook. Materials, mostly in the form of HTML tutorials, will be provided by the instructor. However, in preparing the class the instructor will draw from the following two books, which students may consult in case they want to delve more deeply into the topics discussed:
Fox, J., and Weisberg, S. An R Companion to Applied Regression (3rd Edition). Sage, 2019
Grolemund, G., and Wickham, H. R for Data Science. O'Reilly, 2017 (available online and for free at https://r4ds.had.co.nz/)
Assessment methods and Criteria
For the "Data Analysis" module (40 hours) attendance to the course is mandatory and will be checked using attendance sheets. Students are expected to participate in at least 80% of the classes. They will be evaluated for their participation in class and for doing and uploading homework on the course website (Ariel). The final exam for attendees includes multiple-choice questions and written exercises (similar to those assigned for the homework). Attendees will also have a test on the use of the software Stata. Non-attendees will take a comprehensive final exam on all the material assigned in the textbook. The exam will be held remotely or in presence, according to the indications received by the University.

About the introductory module on "Time Series Analysis" (20 hours), attendees will have to answer to two open questions. Attendees will also have a practical session with exercises to do with Excel and Stata. Non-attendees will take a comprehensive final exam on all the material assigned in the textbook and provided on Ariel. They will have to do an exercise using Excel and answer two open questions. The exam will be held remotely or in presence, according to the indications received by the University.

For the "Introduction to R" module (20 hours), students will be assessed on the basis of 3 home assignments (2 smaller midterm assignments and 1 larger final assignment). In order to pass the course, students will have to complete all 3 assignments, regardless of whether they attend the class or not.
Unita' didattica 1
SPS/07 - GENERAL SOCIOLOGY - University credits: 9
Lessons: 60 hours
Professor: Bozzon Rossella
Unita' didattica 2 - Introduction to R
SPS/07 - GENERAL SOCIOLOGY - University credits: 3
Lessons: 20 hours
Professor: Vegetti Federico