Data Mining and Computational Statistics

A.Y. 2015/2016
Overall hours
Learning objectives
Course objectives are:
· To introduce students to the expanding world of big data analysis.
· To introduce students to basic concepts, techniques and applications of computational statistics & data mining to be used in finance and economics.
· To develop skills for using the R software in order to solve practical problems
· To achieve skills for doing independent study and research.
Expected learning outcomes
Course syllabus and organization

Single session

Lesson period
Second trimester
Course syllabus
Main topics:
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple Linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression and the Lasso. Principal component regression. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. (xi) Support vector machines. (xi) Unsupervised learning: PCA and clustering methods. (xii) Introduction to Bayesian methods in data mining.
Further topics:
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
SECS-S/01 - STATISTICS - University credits: 9
Practicals: 40 hours
Lessons: 40 hours
Professors: Andreis Federico, Manzi Giancarlo
Wednesday 2.30PM-5.30PM (appointment suggested, via Teams).
Room 37, 3rd Floor (due to sanitary emergency office hours in person are suspended - Office hours will be held via Teams)