#
Statistics

A.Y. 2021/2022

Learning objectives

The goal of the course is to provide basic notions of descriptive statistics, probability theory and inferential statistics. More specifically, the aim is to develop the proper technical language and to enable students to attend more specialized courses such as "Statistica per le decisioni aziendali", "Statistica per i Big Data Economico/Aziendali" and "Econometria".

Expected learning outcomes

At the end of the course a student will have acquired the appropriate terminology and will have learned the main tools of descriptive statistics (construction of indices, tables and graphs and interpretation of the same) and of inferential statistics (point estimation, confidence intervals and hypothesis testing).

In particular, he/she will be able to apply the right statistical technique to analyse data and to solve common real-life problems. He/she will be able to construct and read frequency tables and to interpret the most common statistical indices; to calculate and to interpret point estimates and confidence intervals and to test the most common statistical hypotheses, such as equality of means and independence between variables. Finally, he/she will be able to perform a simple linear regression through a statistical software and to interpret the output.

In particular, he/she will be able to apply the right statistical technique to analyse data and to solve common real-life problems. He/she will be able to construct and read frequency tables and to interpret the most common statistical indices; to calculate and to interpret point estimates and confidence intervals and to test the most common statistical hypotheses, such as equality of means and independence between variables. Finally, he/she will be able to perform a simple linear regression through a statistical software and to interpret the output.

**Lesson period:**
First trimester

**Assessment methods:** Esame

**Assessment result:** voto verbalizzato in trentesimi

Course syllabus and organization

### Single session

Responsible

Lesson period

First trimester

More specific information on the delivery modes of training activities for academic year 2021/22 will be provided over the coming months, based on the evolution of the public health situation.

**Course syllabus**

Descriptive statistics

1) Classification of statistical phenomena (types of characters and scales of measurement) and frequency distributions (absolute, relative and cumulative frequencies).

2) Graphical representations: bar graph, stick graph, histogram.

3) Calculation of a mode, a median and a sample mean when the data are classified in a frequency table. Theorems and properties of the mean.

4) Some indices of variability and dispersion: range, interquartile difference, variance and standard deviation. The variation coefficient.

5) Contingency tables and bivariate analysis: definition of joint absolute and relative, marginal and conditioned frequency distributions; the Pearson index for independence; dependence in mean; covariance and the linear correlation coefficient.

Probability and random variables

1) Introduction to probability theory: classical, frequentist, subjective and axiomatic probability definitions; elementary, compound and disjoint events; stochastic independence; Bayes theorem; principle of total probabilities; types of sampling (extractions with and without replacement).

2) Definition of discrete and continuous random variables: probability distribution, probability density, distribution function; expected value (or mean), mode, median, variance of a random variable. Definition of independence between random variables.

3) Central limit theorem and law of large numbers.

4) Bernoulli random variable, Normal random variable and Binomial random variable; Normal approximation to Binomial distribution.

Inferential statistics

1) Point estimation: definition of unbiased estimator; the standard error as an accuracy measure of an estimator. The sample mean and variance; the sample proportion.

2) Confidence intervals for a mean (with Normal observations and known or unknown variance). Confidence intervals for a proportion.

3) General definition of statistical hypothesis testing: null and alternative hypotheses; type 1 and type 2 errors; rejection region; p-value. Hypothesis testing for a mean, with Normal observations and known or unknown variance; the t-test for the comparison between 2 means; the ANOVA test for comparison among multiple means.

4) Hypothesis testing for a proportion. Chi-square test for comparison among multiple proportions and to verify the independence between two variables.

Simple linear regression

1) Presentation of the statistical package R: how to install it; basic commands.

2) Definition of linear regression model; estimation of the parameters (slope and intercept coefficients) with the least square method; goodness of fit and determination coefficient; confidence interval for the coefficients of the linear regression model; hypothesis testing on the intercept and on the slope coefficients.

3) The use of R for the statistical analyzes described in point 2. Interpretation of the output.

1) Classification of statistical phenomena (types of characters and scales of measurement) and frequency distributions (absolute, relative and cumulative frequencies).

2) Graphical representations: bar graph, stick graph, histogram.

3) Calculation of a mode, a median and a sample mean when the data are classified in a frequency table. Theorems and properties of the mean.

4) Some indices of variability and dispersion: range, interquartile difference, variance and standard deviation. The variation coefficient.

5) Contingency tables and bivariate analysis: definition of joint absolute and relative, marginal and conditioned frequency distributions; the Pearson index for independence; dependence in mean; covariance and the linear correlation coefficient.

Probability and random variables

1) Introduction to probability theory: classical, frequentist, subjective and axiomatic probability definitions; elementary, compound and disjoint events; stochastic independence; Bayes theorem; principle of total probabilities; types of sampling (extractions with and without replacement).

2) Definition of discrete and continuous random variables: probability distribution, probability density, distribution function; expected value (or mean), mode, median, variance of a random variable. Definition of independence between random variables.

3) Central limit theorem and law of large numbers.

4) Bernoulli random variable, Normal random variable and Binomial random variable; Normal approximation to Binomial distribution.

Inferential statistics

1) Point estimation: definition of unbiased estimator; the standard error as an accuracy measure of an estimator. The sample mean and variance; the sample proportion.

2) Confidence intervals for a mean (with Normal observations and known or unknown variance). Confidence intervals for a proportion.

3) General definition of statistical hypothesis testing: null and alternative hypotheses; type 1 and type 2 errors; rejection region; p-value. Hypothesis testing for a mean, with Normal observations and known or unknown variance; the t-test for the comparison between 2 means; the ANOVA test for comparison among multiple means.

4) Hypothesis testing for a proportion. Chi-square test for comparison among multiple proportions and to verify the independence between two variables.

Simple linear regression

1) Presentation of the statistical package R: how to install it; basic commands.

2) Definition of linear regression model; estimation of the parameters (slope and intercept coefficients) with the least square method; goodness of fit and determination coefficient; confidence interval for the coefficients of the linear regression model; hypothesis testing on the intercept and on the slope coefficients.

3) The use of R for the statistical analyzes described in point 2. Interpretation of the output.

**Prerequisites for admission**

Mathematics

**Teaching methods**

About the theoretical part, the teacher explains on the blackboard preferably without the use of slides, the lesson in this way is more interactive and is adapted to the needs of the classroom. Students who cannot attend can find everything in the reference material (textbook and lecture notes on ARIEL).

After the introduction of any new concept, various numerical examples are presented to fully understand its meaning and practice the calculations.

In addition to the theoretical lessons, classroom exercises are also carried out. The exercises carried out during the classes are available on the course web page (ARIEL) to facilitate non-attending students.

About the use of the R software in linear regression, the teacher presents slides with the various instructions to be typed for each kind of analysis. During the lecture some examples of regression analysis are carried out and students are invited to bring a laptop (if they have one) in order to train with the teacher. In any case, all the instructions presented in the lecture are made available in ARIEL so that students can reproduce them at home with their PC.

Comments and requests for clarification during the lessons / exercises by the students are always welcome, because they make the lessons more lively and certainly more useful for everyone.

After the introduction of any new concept, various numerical examples are presented to fully understand its meaning and practice the calculations.

In addition to the theoretical lessons, classroom exercises are also carried out. The exercises carried out during the classes are available on the course web page (ARIEL) to facilitate non-attending students.

About the use of the R software in linear regression, the teacher presents slides with the various instructions to be typed for each kind of analysis. During the lecture some examples of regression analysis are carried out and students are invited to bring a laptop (if they have one) in order to train with the teacher. In any case, all the instructions presented in the lecture are made available in ARIEL so that students can reproduce them at home with their PC.

Comments and requests for clarification during the lessons / exercises by the students are always welcome, because they make the lessons more lively and certainly more useful for everyone.

**Teaching Resources**

I) Descriptive statistics: two lecture notes available on the ARIEL page of the course: http://ctommasis.ariel.ctu.unimi.it/v5 (under the headings: lezioni-statistica descrittiva)

II) Probability and random variables: Introduction to the statistical inference. Authors: Ferrari, Nicolini and Tommasi, Giappichelli Editore - Turin (2009) - CHAPTERS: 1-2.

III) Inferential statistics: Introduction to the statistical inference. Authors: Ferrari, Nicolini and Tommasi, Giappichelli Editore - Turin (2009) - CHAPTERS: 3-4-5

and the following supplementary notes:

1) "la stima puntuale"

2) "confronto tra due o più medie (ANOVA)"

3) "Il test del chi-quadrato per l'indipendenza e per il confronto tra più proporzioni. Il test Z per il confronto tra due proporzioni"

which are available on the ARIEL page of the course: http://ctommasis.ariel.ctu.unimi.it/v5 (under the headings: contenuti - lezioni).

IV) Simple linear regression: Introduction to the statistical inference. Authors: Ferrari, Nicolini and Tommasi, Giappichelli Editore - Turin (2009) - CHAPTER 6.

The material about the use of the R software is available on the course web-page on ARIEL: https://ctommasis.ariel.ctu.unimi.it/v5

II) Probability and random variables: Introduction to the statistical inference. Authors: Ferrari, Nicolini and Tommasi, Giappichelli Editore - Turin (2009) - CHAPTERS: 1-2.

III) Inferential statistics: Introduction to the statistical inference. Authors: Ferrari, Nicolini and Tommasi, Giappichelli Editore - Turin (2009) - CHAPTERS: 3-4-5

and the following supplementary notes:

1) "la stima puntuale"

2) "confronto tra due o più medie (ANOVA)"

3) "Il test del chi-quadrato per l'indipendenza e per il confronto tra più proporzioni. Il test Z per il confronto tra due proporzioni"

which are available on the ARIEL page of the course: http://ctommasis.ariel.ctu.unimi.it/v5 (under the headings: contenuti - lezioni).

IV) Simple linear regression: Introduction to the statistical inference. Authors: Ferrari, Nicolini and Tommasi, Giappichelli Editore - Turin (2009) - CHAPTER 6.

The material about the use of the R software is available on the course web-page on ARIEL: https://ctommasis.ariel.ctu.unimi.it/v5

**Assessment methods and Criteria**

The exam consists of a written test lasting approximately one hour and a half.

It consists of 3 exercises and 6 multiple choice questions (rated from 0 to 30 points), plus and additional, more theoretical exercise (rated from 0 to 2 points). Questions and exercises concern all the topics listed in the program.

The exam is considered sufficient if a score of at least 18 is obtained.

To carry out the written test you need to bring a calculator with you.

The structure of the exam (numerical exercises plus multiple choice questions) allows the teacher to check whether the student is able to carry out simple statistical analyses and interpret the results.

It consists of 3 exercises and 6 multiple choice questions (rated from 0 to 30 points), plus and additional, more theoretical exercise (rated from 0 to 2 points). Questions and exercises concern all the topics listed in the program.

The exam is considered sufficient if a score of at least 18 is obtained.

To carry out the written test you need to bring a calculator with you.

The structure of the exam (numerical exercises plus multiple choice questions) allows the teacher to check whether the student is able to carry out simple statistical analyses and interpret the results.

Professor(s)

Reception:

Monday 10-13. Office hours of January 17 are postponed to Wednesday 19, 10-11.30 and 2-3.30

Room 33, 3rd floor DEMM