Data Mining and Computational Statistics

A.Y. 2018/2019
Lesson for
9
Max ECTS
60
Overall hours
SSD
SECS-S/01
Language
English
Learning objectives
Course objectives are:
· To introduce students to the expanding world of big data analysis.
· To introduce students to basic concepts, techniques and applications of computational statistics & data mining to be used in finance and economics.
· To develop skills for using the R software in order to solve practical problems
· To achieve skills for doing independent study and research.

Course structure and Syllabus

Active edition
Yes
Responsible
SECS-S/01 - STATISTICS - University credits: 9
Lessons: 60 hours
Professor: Manzi Giancarlo
ATTENDING STUDENTS
Syllabus
Part I
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iii) Parametric vs. nonparametric methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. The Bayes classifier. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression, the Lasso and other shrinkage methods. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. Introduction to Bayesian networks. (xi) Support vector machines. (xii) Unsupervised learning: PCA, clustering and multidimensional scaling methods; correspondance analysis. Principal component regression. (xiii) Introduction to Bayesian methods in data mining. (xiv) Elementary text mining. (xv) Data mining in finance.
Part II
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
NON-ATTENDING STUDENTS
Syllabus
Part I
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iii) Parametric vs. nonparametric methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. The Bayes classifier. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression, the Lasso and other shrinkage methods. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. Introduction to Bayesian networks. (xi) Support vector machines. (xii) Unsupervised learning: PCA, clustering and multidimensional scaling methods; correspondance analysis. Principal component regression. (xiii) Introduction to Bayesian methods in data mining. (xiv) Elementary text mining. (xv) Data mining in finance.
Part II
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
Lesson period
Third trimester
Lesson period
Third trimester
Assessment methods
Esame
Assessment result
voto verbalizzato in trentesimi
Professor(s)
Reception:
Wed 4.30PM-7.30PM.
Room 37, 3rd Floor.