Statistics and Data Analysis
A.Y. 2021/2022
Learning objectives
The course aim at introducing the fundamentals of descriptive statistics, probability and parametric inferential statistics.
Expected learning outcomes
Students will be able to carry out basic explorative analyses and inferences on datasets, they will know the main probability distributions and will be able to understand statistical analyses conducted by others; moreover, they will know simple methods for the problem of binary classification, and will be able to evaluate their performances. The students will also acquire the fundamental competences for studying more sophisticated techniques for data analysis and data modeling.
Lesson period: Second semester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
Single session
Responsible
Lesson period
Second semester
More specific information on the delivery modes of training activities for academic year 2021/22 will be provided over the coming months, based on the evolution of the public health situation.
Course syllabus
Introduction to python.
Descriptive statistics:
- Frequencies and cumulate frequencies. Joined and marginal frequencies.
- Indices of centrality, dispersion, correlation, heterogeneity, and concentration.
- Graphical methods: frequency and cumulative frequency plots, scatter plots, and QQ plots.
- Classificators and ROC curves.
Probability:
- Combinatorics. Basics of set theory.
- Probability axioms.
- Conditional probability and related theorems.
- Discrete and continuous random variables. Centrality and dispersion indices for random variables and their properties.
- Multivariate random variables. Covariance and correlation indices for random variables.
- Independent events and independent random variables.
- Markov and Tchebyshev inequalities.
- Bernoulli, binomial, geometric, Poisson, discrete uniform and hypergeometric models.
- Continuous uniform, exponential and gaussian models.
- Poisson process.
Parametric inferential statistics:
- Population, random sample and point estimates.
- Sample mean. Central limit theorem.
- Sample variance.
- Unbiasedness and Consistency in mean square.
- Large numbers law.
- Computation of the sample size.
Descriptive statistics:
- Frequencies and cumulate frequencies. Joined and marginal frequencies.
- Indices of centrality, dispersion, correlation, heterogeneity, and concentration.
- Graphical methods: frequency and cumulative frequency plots, scatter plots, and QQ plots.
- Classificators and ROC curves.
Probability:
- Combinatorics. Basics of set theory.
- Probability axioms.
- Conditional probability and related theorems.
- Discrete and continuous random variables. Centrality and dispersion indices for random variables and their properties.
- Multivariate random variables. Covariance and correlation indices for random variables.
- Independent events and independent random variables.
- Markov and Tchebyshev inequalities.
- Bernoulli, binomial, geometric, Poisson, discrete uniform and hypergeometric models.
- Continuous uniform, exponential and gaussian models.
- Poisson process.
Parametric inferential statistics:
- Population, random sample and point estimates.
- Sample mean. Central limit theorem.
- Sample variance.
- Unbiasedness and Consistency in mean square.
- Large numbers law.
- Computation of the sample size.
Prerequisites for admission
Students shall have passed the exam of "Matematica del continuo"; besides that, the course requires knowledge of the main topics of computer programming, and having passed the exam of "Matematica del discreto" is strongly suggested.
Teaching methods
Frontal classes and exercise sessions
Teaching Resources
Suggested textbooks:
- S. Ross, Introduzione alla statistica, Apogeo education, 2014, ISBN 9788838786020
- S. Ross, Probabilità e statistica per l'ingegneria e le scienze, terza edizione, Apogeo education, 2015, ISBN 8891609946
Lecture notes (for topics not covered in the suggested textbooks) and sample code available at the course Web pages:
https://labonline.ctu.unimi.it/course/view.php?id=228
- http://malchiodi.di.unimi.it/teaching/data-analytics/
- S. Ross, Introduzione alla statistica, Apogeo education, 2014, ISBN 9788838786020
- S. Ross, Probabilità e statistica per l'ingegneria e le scienze, terza edizione, Apogeo education, 2015, ISBN 8891609946
Lecture notes (for topics not covered in the suggested textbooks) and sample code available at the course Web pages:
https://labonline.ctu.unimi.it/course/view.php?id=228
- http://malchiodi.di.unimi.it/teaching/data-analytics/
Assessment methods and Criteria
The exam consists of a written and an oral test, both related to the topics covered in the course. The written test takes place in a computer-based room and it lasts two hours and a half. It is based on open-ended questions and on the analysis of a dataset through the adequate application of the statistical techniques described during the classes. The evaluation, with a mark of pass/fail, takes into account the level of mastery of the topics and the correct use of mathematical formalism.
The oral test, which is accessed after passing the written test, is based on the discussion of the written test answers and on questions concerning topics covered in the course. Its evaluation, expressed on a scale between 0 and 30, takes into account the level of mastery of the topics, the clarity, the language skills, and the correct use of technical jargon.
The oral test, which is accessed after passing the written test, is based on the discussion of the written test answers and on questions concerning topics covered in the course. Its evaluation, expressed on a scale between 0 and 30, takes into account the level of mastery of the topics, the clarity, the language skills, and the correct use of technical jargon.
INF/01 - INFORMATICS - University credits: 6
Practicals: 36 hours
Lessons: 24 hours
Lessons: 24 hours
Professor:
Malchiodi Dario
Professor(s)