Data Mining and Computational Statistics

A.Y. 2019/2020
Lesson for
9
Max ECTS
60
Overall hours
SSD
SECS-S/01
Language
English
Learning objectives
This is an introductory course to basic techniques and applications in finance and economics of Data Mining and Computational Statistics, also in the more general framework of data science. We will allow students to develop programming skills using the R software in the Data Mining part, and the OpenBUGS software for Bayesian Markov Chain Monte Carlo random variable generation. Students will acquire independence in studying Data Mining & Computational Statistics subjects and will be able to solve practical problems in economic and financial data analysis.
At the end of the course students will be able to perform machine learning techniques and algorithms and use them in economic and financial applications. Specifically, students will be familiar with supervised and unsupervised models. In particular, in the supervised framework students will be able to perform advanced regression models like the ridge and lasso regression, classification techniques like the Bayes classifier, the K-NN classifier and the logistic model, whereas in the unsupervised framework students will become familiar with dimensional reduction techniques and cluster analysis. More sophisticated techniques like decision tree-based classification will be presented to the students. In Computational statistics, resampling techniques, random number and random variable generation and numerical integration will be part of the acquired knowledge the students will have at the end of the course.

Course structure and Syllabus

Active edition
Yes
Responsible
SECS-S/01 - STATISTICS - University credits: 9
Lessons: 60 hours
Professor: Manzi Giancarlo
ATTENDING STUDENTS
Syllabus
Part I
(i) Revision of likelihood inference; (ii) Introduction to data mining and statistical learning. (iii) Exploratory data analysis and visualization. (iv) Supervised vs. unsupervised methods: introduction. (v) Parametric vs. nonparametric methods: introduction. (vi) Multiple linear regression. (vii) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. The Bayes classifier. (viii) Resampling methods: cross validation and the bootstrap. (ix) Shrinkage methods: Ridge regression, the Lasso and other shrinkage methods. (x) Regression splines and local regression. (xi) Tree-based methods: random forest, bagging and boosting. (xii) Support vector machines. (xiii) Unsupervised learning: PCA, clustering and multidimensional scaling methods; correspondance analysis. Principal component regression. (xiv) Introduction to Bayesian methods in data mining. (xv) Basic text mining. (xvi) Data mining in finance.
Part II
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
NON-ATTENDING STUDENTS
Syllabus
Part I
(i) Revision of likelihood inference; (ii) Introduction to data mining and statistical learning. (iii) Exploratory data analysis and visualization. (iv) Supervised vs. unsupervised methods: introduction. (v) Parametric vs. nonparametric methods: introduction. (vi) Multiple linear regression. (vii) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. The Bayes classifier. (viii) Resampling methods: cross validation and the bootstrap. (ix) Shrinkage methods: Ridge regression, the Lasso and other shrinkage methods. (x) Regression splines and local regression. (xi) Tree-based methods: random forest, bagging and boosting. (xii) Support vector machines. (xiii) Unsupervised learning: PCA, clustering and multidimensional scaling methods; correspondance analysis. Principal component regression. (xiv) Introduction to Bayesian methods in data mining. (xv) Basic text mining. (xvi) Data mining in finance.
Part II
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
Lesson period
Third trimester
Lesson period
Third trimester
Assessment methods
Esame
Assessment result
voto verbalizzato in trentesimi
Professor(s)
Reception:
Wed 4.30PM-7.30PM.
Room 37, 3rd Floor.