Data Mining and Computational Statistics

A.Y. 2016/2017
Lesson for
9
Max ECTS
60
Overall hours
Language
English
Learning objectives
Course objectives are:
· To introduce students to the expanding world of big data analysis.
· To introduce students to basic concepts, techniques and applications of computational statistics & data mining to be used in finance and economics.
· To develop skills for using the R software in order to solve practical problems
· To achieve skills for doing independent study and research.

Course structure and Syllabus

Active edition
Yes
Responsible
Lessons: 60 hours
Professor: Manzi Giancarlo
ATTENDING STUDENTS
Syllabus
Main topics:
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple Linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression and the Lasso. Principal component regression. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. (xi) Support vector machines. (xii) Unsupervised learning: PCA, clustering and multidimensional scaling methods. (xiii) Introduction to Bayesian methods in data mining. (xiv) Elementary text mining. (xv) Data mining in finance.
Further topics:
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
A lab on correspondance analysis will be also presented.
NON-ATTENDING STUDENTS
Syllabus
Main topics:
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple Linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression and the Lasso. Principal component regression. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. (xi) Support vector machines. (xii) Unsupervised learning: PCA, clustering and multidimensional scaling methods. (xiii) Introduction to Bayesian methods in data mining. (xiv) Elementary text mining. (xv) Data mining in finance.
Further topics:
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
A lab on correspondance analysis will be also presented.
Lesson period
Second trimester
Lesson period
Second trimester
Assessment methods
Esame
Assessment result
voto verbalizzato in trentesimi
Professor(s)
Reception:
Wed 4.30PM-7.30PM.
Room 37, 3rd Floor.