Data Mining and Computational Statistics
A.Y. 2018/2019
Learning objectives
Course objectives are:
· To introduce students to the expanding world of big data analysis.
· To introduce students to basic concepts, techniques and applications of computational statistics & data mining to be used in finance and economics.
· To develop skills for using the R software in order to solve practical problems
· To achieve skills for doing independent study and research.
· To introduce students to the expanding world of big data analysis.
· To introduce students to basic concepts, techniques and applications of computational statistics & data mining to be used in finance and economics.
· To develop skills for using the R software in order to solve practical problems
· To achieve skills for doing independent study and research.
Expected learning outcomes
Undefined
Lesson period: Third trimester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
Single session
Lesson period
Third trimester
ATTENDING STUDENTS
Course syllabus
Part I
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iii) Parametric vs. nonparametric methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. The Bayes classifier. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression, the Lasso and other shrinkage methods. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. Introduction to Bayesian networks. (xi) Support vector machines. (xii) Unsupervised learning: PCA, clustering and multidimensional scaling methods; correspondance analysis. Principal component regression. (xiii) Introduction to Bayesian methods in data mining. (xiv) Elementary text mining. (xv) Data mining in finance.
Part II
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iii) Parametric vs. nonparametric methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. The Bayes classifier. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression, the Lasso and other shrinkage methods. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. Introduction to Bayesian networks. (xi) Support vector machines. (xii) Unsupervised learning: PCA, clustering and multidimensional scaling methods; correspondance analysis. Principal component regression. (xiii) Introduction to Bayesian methods in data mining. (xiv) Elementary text mining. (xv) Data mining in finance.
Part II
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
Website
NON-ATTENDING STUDENTS
Course syllabus
Part I
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iii) Parametric vs. nonparametric methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. The Bayes classifier. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression, the Lasso and other shrinkage methods. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. Introduction to Bayesian networks. (xi) Support vector machines. (xii) Unsupervised learning: PCA, clustering and multidimensional scaling methods; correspondance analysis. Principal component regression. (xiii) Introduction to Bayesian methods in data mining. (xiv) Elementary text mining. (xv) Data mining in finance.
Part II
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
(i) Introduction to data mining and statistical learning. (ii) Exploratory data analysis and visualization. (iii) Supervised vs. unsupervised methods: introduction. (iii) Parametric vs. nonparametric methods: introduction. (iv) Quick review of Maximum Likelihood Methods (v) Multiple linear regression. (vi) Classification methods: logistic regression, linear discriminant analysis and the K-nearest neighbors method. The Bayes classifier. (vii) Resampling methods: cross validation and the bootstrap. (vii) Shrinkage methods: Ridge regression, the Lasso and other shrinkage methods. (ix) Regression splines and local regression. (x) Tree-based methods: random forest, bagging and boosting. Introduction to Bayesian networks. (xi) Support vector machines. (xii) Unsupervised learning: PCA, clustering and multidimensional scaling methods; correspondance analysis. Principal component regression. (xiii) Introduction to Bayesian methods in data mining. (xiv) Elementary text mining. (xv) Data mining in finance.
Part II
(i) Computer-intensive statistical methods: overview. (ii) Pseudo-random number and variable generation. (iii) Monte Carlo methods for numerical integration. (iv) Simulation-based inference. (v) MCMC methods: overview. (vi) MCMC methods: Metropolis-Hastings and Gibbs sampling.
SECS-S/01 - STATISTICS - University credits: 9
Lessons: 60 hours
Professor:
Manzi Giancarlo