Mathematical Statistics
A.Y. 2019/2020
Learning objectives
The main aim of the course is to introduce the modern concepts of multivariate and computational Mathematical Statistics, both from a theoretical and applied point of view, with particular reference to the techniques for Big Data analysis. During the lab activities, the students will be trained to perform a data analysis with advanced software instruments (R and R Spark).
Expected learning outcomes
Basic notions and theorems of Multivariate Mathematical and Computational Statistics.
The student will then be able to apply and broaden his/her knowledge of the subjects in different areas of interest, both in theoretical and applied contexts, and to perform statistical data analyses, both in the multivariate and big data case.
The student will then be able to apply and broaden his/her knowledge of the subjects in different areas of interest, both in theoretical and applied contexts, and to perform statistical data analyses, both in the multivariate and big data case.
Lesson period: Second semester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
Single session
Responsible
Lesson period
Second semester
Course syllabus
Here an indication of the chapters that should be developed is provided. The teachers could operate a selection due to lack of time.
1. Random vectors
2. The Multivariate Normal Distribution
2.1. Definition and properties of the multivariate normal distribution
2.2. Test for the normality of a random vector
2.3. Detection of outliers
3. Main multivariate distributions originating from the Normal
3.1. Wishart distribution
3.2. Hotelling T2 distribution
3.3. Wilks' Lambda distribution
4. Multivariate Hypothesis Tests
4.1. Test on one or two mean vectors
4.2. Multivariate Analysis of Variance (MANOVA)
4.3. Test on covariance matrices
Statistical Methods for the analysis of Big Data
5. Locality Sensitive Hashing (LSH)
6. Finding Similar Items
7. Frequent Itemsets
8. Cluster analysis
9. Techniques for dimensionality reduction
10. Analysis of data streams
11. Analysis of social networks
12. Computer Lab
Multivariate and big data analysis by statistical softwares ( R and R Spark)
1. Random vectors
2. The Multivariate Normal Distribution
2.1. Definition and properties of the multivariate normal distribution
2.2. Test for the normality of a random vector
2.3. Detection of outliers
3. Main multivariate distributions originating from the Normal
3.1. Wishart distribution
3.2. Hotelling T2 distribution
3.3. Wilks' Lambda distribution
4. Multivariate Hypothesis Tests
4.1. Test on one or two mean vectors
4.2. Multivariate Analysis of Variance (MANOVA)
4.3. Test on covariance matrices
Statistical Methods for the analysis of Big Data
5. Locality Sensitive Hashing (LSH)
6. Finding Similar Items
7. Frequent Itemsets
8. Cluster analysis
9. Techniques for dimensionality reduction
10. Analysis of data streams
11. Analysis of social networks
12. Computer Lab
Multivariate and big data analysis by statistical softwares ( R and R Spark)
Prerequisites for admission
The students should have followed an introductory course to Mathematical Statistics, with particular reference to statistical hypotheses tests and Linear Regression.
Teaching methods
Frontal lectures, exercises and computer lab
Teaching Resources
A.C. Rencher, Multivariate Statistical Inference and Applications, Wiley, 1998
K.V. Mardia, J.T. Kent, J.M., Bibby, Multivariate Analysis, Academic Press, 1979
Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of massive datasets, Cambridge University Press, 2014. Versione online: http://www.mmds.org/
Lecture notes of the teachers
K.V. Mardia, J.T. Kent, J.M., Bibby, Multivariate Analysis, Academic Press, 1979
Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of massive datasets, Cambridge University Press, 2014. Versione online: http://www.mmds.org/
Lecture notes of the teachers
Assessment methods and Criteria
The final examination consists of two parts: a written exam and a lab exam.
- During the written exam, the student must solve some exercises in the format of open-ended and/or short answer questions, with the aim of assessing the student's ability to solve problems of Multivariate Statistics. The duration of the written exam will be proportional to the number of exercises assigned, also taking into account the nature and complexity of the exercises themselves (however, the duration will not exceed three hours).The outcomes of these tests will be available in the SIFA service through the UNIMIA portal and on the ARIEL website of the course.
-The lab exam consists in short reports and programs developments related with problems or exercises, which will be assigned by the professors during the lectures. The reports will be evaluated during the course, thus it is required a constant attendance to the lectures. The lab portion of the examination serves to assess the capability of the student to put a problem of multivariate and/or big data into context, find a solution and to give a report on the results obtained.
The complete final examination is passed if both parts (written and lab) are successfully passed. Final marks are given using the numerical range 0-30, and are computed as a weighted mean (6 cfu for the written part, 3 cfu for the lab part) of the grades of the two parts and will be communicated immediately after the correction of the written examination.
- During the written exam, the student must solve some exercises in the format of open-ended and/or short answer questions, with the aim of assessing the student's ability to solve problems of Multivariate Statistics. The duration of the written exam will be proportional to the number of exercises assigned, also taking into account the nature and complexity of the exercises themselves (however, the duration will not exceed three hours).The outcomes of these tests will be available in the SIFA service through the UNIMIA portal and on the ARIEL website of the course.
-The lab exam consists in short reports and programs developments related with problems or exercises, which will be assigned by the professors during the lectures. The reports will be evaluated during the course, thus it is required a constant attendance to the lectures. The lab portion of the examination serves to assess the capability of the student to put a problem of multivariate and/or big data into context, find a solution and to give a report on the results obtained.
The complete final examination is passed if both parts (written and lab) are successfully passed. Final marks are given using the numerical range 0-30, and are computed as a weighted mean (6 cfu for the written part, 3 cfu for the lab part) of the grades of the two parts and will be communicated immediately after the correction of the written examination.
MAT/06 - PROBABILITY AND STATISTICS - University credits: 9
Laboratories: 36 hours
Lessons: 42 hours
Lessons: 42 hours
Professors:
Aletti Giacomo, Micheletti Alessandra
Shifts:
Professor(s)
Reception:
on appointment
office 2099