Data Mining and Computational Statistics

A.Y. 2025/2026
9
Max ECTS
59.4
Overall hours
SSD
SECS-S/01
Language
English
Learning objectives
This is an introductory course to basic techniques and applications in finance and economics of Data Mining and Computational Statistics, also in the more general framework of data science. We will allow students to develop programming skills using the R software. By the end of the course, students will be equipped to apply these methods to solve practical problems in the analysis of economic and financial data.
Expected learning outcomes
At the end of the course students will be able to perform machine learning techniques and algorithms and use them in economic and financial applications. Specifically, students will be familiar with supervised and unsupervised models. In particular, in the supervised framework students will be able to perform advanced regression models like the ridge and lasso regression, classification techniques like the Bayes classifier, the K-NN classifier and the logistic model, whereas in the unsupervised framework students will become familiar with dimensional reduction techniques and cluster analysis. More sophisticated techniques like decision tree-based classification will be presented to the students. In Computational statistics, resampling techniques, random number and random variable generation and numerical integration will be part of the acquired knowledge the students will have at the end of the course.
Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Course syllabus and organization

Single session

Responsible
Lesson period
Third trimester
Course syllabus
Program


- Introduction to R software.
- Supervised vs. unsupervised methods: introductory overview.
- Parametric vs. non-parametric methods: analysis of the trade-off between bias and variance in statistical learning.

Supervised methods:
· Overview of linear regression; Ridge regression , Lasso , and other regularization techniques .
· Classification methods: logistic regression, discriminant analysis (both linear and quadratic), and the Bayes classifier.
· Resampling methods: techniques such as cross-validation and bootstrap.
· Decision trees: regression and classification models, with a focus on pruning techniques.
· Tree-based methods: bagging and random forest.

Unsupervised methods:
· Principal component analysis (PCA).
· Clustering.
Prerequisites for admission
A sound knowledge of the topics covered in the basic mathematics and statistics courses is required. Some programming skills are appreciated, but not essential.
Teaching methods
The course will be conducted through interactive lectures, during which theoretical issues will be discussed and practical cases presented. The aim is to work interactively with students, encouraging their participation and organizing moments of discussion and peer interaction. In addition to the lectures, 20 hours of lab exercises are scheduled, where the concepts presented in class will be applied using R software.
Teaching Resources
Textbook

An Introduction to Statistical Learning, with applications in R (2013) by G. James, D. Witten, T. Hastie, R. Tibshirani, Springer.

Notes written by the teacher available on myAriel
Assessment methods and Criteria
There are two options for the exam:

Option A: Team work and written exam

Team Work:

Students will form groups (maximum 5 /6 people per group) to collect an analyze data relating to relevant recent problems.
Preparation of a classroom presentation is required, to be delivered in front of peers.
A detailed report, outlining the individual contributions of group members, is mandatory.
Scheduled classroom sessions will monitor and assess the progress of each group's work.

Written exam (30 minutes): The test will feature a general question related to the course material.
Students are allowed to bring a one-sided A4 formula sheet and a non-programmable calculator.

Final scoring will comprise three components:
Assessment of team work (10 points)
Evaluation of the report (10 points)
Result from the written test (10 points)

Honors will be granted to students who not only achieve the highest scores but also display active and substantial engagement in the assigned activities.

Option B: Individual work and written exam


Individual work

Students are required to submit a report providing a detailed analysis of a case study of their choice. They may use data from the textbook, published articles, or public databases .
A short report, outlining the main results, is mandatory.
Preparation of a classroom presentation is required, if possible to be delivered in front of peers.

Written exam (30 minutes): The test will feature a general question related to the course material.
Students are allowed to bring a one-sided A4 formula sheet and a non-programmable calculator.
Final scoring will comprise three components:
Assessment of oral presentation (10 points)
Evaluation of the report (10 points)
Result from the written test (10 points)

Honors will be granted to students who not only achieve the highest scores but also display active and substantial engagement in the assigned activities.
SECS-S/01 - STATISTICS - University credits: 9
Lessons: 59.4 hours
Professor: Tarantola Claudia
Professor(s)
Reception:
Wednesday 9:30 a.m. to 12:30 p.m. (by appointment)
office n16 Via Conservatorio 7 (by appointment) or via teams (by appointment)