Data mining and computational statistics

A.A. 2018/2019
9
Crediti massimi
60
Ore totali
SSD
SECS-S/01
Lingua
Inglese
Obiettivi formativi
Course objectives are:
· To introduce students to the expanding world of big data analysis.
· To introduce students to basic concepts, techniques and applications of computational statistics & data mining to be used in finance and economics.
· To develop skills for using the R software in order to solve practical problems
· To achieve skills for doing independent study and research.
Risultati apprendimento attesi
Non definiti
Corso singolo

Questo insegnamento non può essere seguito come corso singolo. Puoi trovare gli insegnamenti disponibili consultando il catalogo corsi singoli.

Programma e organizzazione didattica

Edizione unica

Periodo
Terzo trimestre

STUDENTI FREQUENTANTI
Programma
Part I
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iii) Parametric vs. nonparametric methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. The Bayes classifier. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression, the Lasso and other shrinkage methods. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. Introduction to Bayesian networks. (xi) Support vector machines. (xii) Unsupervised learning: PCA, clustering and multidimensional scaling methods; correspondance analysis. Principal component regression. (xiii) Introduction to Bayesian methods in data mining. (xiv) Elementary text mining. (xv) Data mining in finance.
Part II
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
Propedeuticità
No mandatory prerequisites are required, but a good knowledge of basic statistical and mathematical topics is welcome.
Prerequisiti
A basic knowledge of statistics and probability fundamentals is required. Basics on regression methods are useful to speed up the first part of the course.
Matrix algebra and multivariate calculus will be beneficial but are not strictly required.
A basic R knowledge and some programming skills are also useful but not required.
Evaluation will be performed through an oral examination on boh theoretical topics and possible applications. Homeworks and assignments will be delivered during the course.
Metodi didattici
75% lecture-style lessons
25% classroom teaching activities focused on examples and applications in R
Materiale di riferimento
Main textbooks:
(i) An Introduction to Statistical Learning, with applications in R (2013) by G. James, D. Witten, T. Hastie, R. Tibshirani, Springer.
(ii) Introducing Monte Carlo Statistical Methods with R (2010) by C.P. Robert, G. Casella, Springer.
Suggested reading for insights into some topics in main textbooks:
(i) The Elements of Statistical Learning, 2nd edition (2009), T. Hastie, R. Tibshirani, J. Friedman, Springer.
(ii) Machine Learning: a Probabilistic Perspective (2012), K.P. Murphy, The MIT Press.
(iii) Monte Carlo Statistical Methods (2004) by C.P. Robert, G. Casella, Springer.

Further reading will be suggested during the course.
STUDENTI NON FREQUENTANTI
Programma
Part I
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iii) Parametric vs. nonparametric methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. The Bayes classifier. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression, the Lasso and other shrinkage methods. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. Introduction to Bayesian networks. (xi) Support vector machines. (xii) Unsupervised learning: PCA, clustering and multidimensional scaling methods; correspondance analysis. Principal component regression. (xiii) Introduction to Bayesian methods in data mining. (xiv) Elementary text mining. (xv) Data mining in finance.
Part II
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
Prerequisiti
A basic knowledge of statistics and probability fundamentals is required. Basics on regression methods are useful to speed up the first part of the course.
Matrix algebra and multivariate calculus will be beneficial but are not strictly required.
A basic R knowledge and some programming skills are also useful but not required.
Evaluation will be performed through an oral examination on boh theoretical topics and possible applications. Homeworks and assignments will be delivered during the course.
Materiale di riferimento
Main textbooks:
(i) An Introduction to Statistical Learning, with applications in R (2013) by G. James, D. Witten, T. Hastie, R. Tibshirani, Springer.
(ii) Introducing Monte Carlo Statistical Methods with R (2010) by C.P. Robert, G. Casella, Springer.
Suggested reading for insights into some topics in main textbooks:
(i) The Elements of Statistical Learning, 2nd edition (2009), T. Hastie, R. Tibshirani, J. Friedman, Springer.
(ii) Machine Learning: a Probabilistic Perspective (2012), K.P. Murphy, The MIT Press.
(iii) Monte Carlo Statistical Methods (2004) by C.P. Robert, G. Casella, Springer.

Further reading will be suggested during the course.
SECS-S/01 - STATISTICA - CFU: 9
Lezioni: 60 ore
Docente: Manzi Giancarlo