Data Mining and Computational Statistics

A.Y. 2026/2027
9
Max ECTS
60
Overall hours
SSD
STAT-01/A
Language
English
Learning objectives
This is an introductory course to basic techniques and applications in finance and economics of Data Mining and Computational Statistics, also in the more general framework of data science. We will allow students to develop programming skills using the R software. By the end of the course, students will be equipped to apply these methods to solve practical problems in the analysis of economic and financial data.
Expected learning outcomes
At the end of the course students will be able to perform machine learning techniques and algorithms and use them in economic and financial applications. Specifically, students will be familiar with supervised and unsupervised models. In particular, in the supervised framework students will be able to perform advanced regression models like the ridge and lasso regression, classification techniques like the Bayes classifier, the K-NN classifier and the logistic model, whereas in the unsupervised framework students will become familiar with dimensional reduction techniques and cluster analysis. More sophisticated techniques like decision tree-based classification will be presented to the students. In Computational statistics, resampling techniques, random number and random variable generation and numerical integration will be part of the acquired knowledge the students will have at the end of the course.
Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Course syllabus and organization

Single session

Responsible
Lesson period
Third trimester
Course syllabus
Program


- Introduction to R software.
- Supervised vs. unsupervised methods: introductory overview.
- Parametric vs. non-parametric methods: analysis of the trade-off between bias and variance in statistical learning.

Supervised methods:
· Overview of linear regression; Ridge regression , Lasso , and other regularization techniques .
· Classification methods: logistic regression, discriminant analysis (both linear and quadratic), and the Bayes classifier.
· Resampling methods: techniques such as cross-validation and bootstrap.
· Decision trees: regression and classification models, with a focus on pruning techniques.
· Tree-based methods: bagging and random forest.

Unsupervised methods:
· Principal component analysis (PCA).
· Clustering.
Prerequisites for admission
A sound knowledge of the topics covered in the basic mathematics and statistics courses is required. Some programming skills are appreciated, but not essential.
Teaching methods
The course will be held in person. Attendance is not mandatory, although it is strongly recommended in order to facilitate learning.

The course will be conducted through interactive lectures, during which theoretical issues will be discussed and practical cases presented. The aim is to work interactively with students, encouraging their participation and organizing moments of discussion and peer interaction. In addition to the lectures, 20 hours of lab exercises are scheduled, where the concepts presented in class will be applied using R software.
Teaching Resources
Textbook

An Introduction to Statistical Learning, with applications in R (2013) by G. James, D. Witten, T. Hastie, R. Tibshirani, Springer.

Notes written by the teacher available on myAriel
Assessment methods and Criteria
There are two options for the exam:

Option A: Team work and written exam

Team Work:

Students will form groups (maximum 5 /6 people per group) to collect and analyse data relating to relevant recent problems.
Preparation of a classroom presentation is required, to be delivered in front of peers.
A detailed report, outlining the individual contributions of group members, is mandatory.
Scheduled classroom sessions will monitor and assess the progress of each group's work.

Written exam

The test will consist of two parts:

Part 1 - Open Question:
The test will feature a general question related to the course material.
Students must write their answer on a single A4 sheet (double-sided).

Maximun score 5 points

Part 2 - Multiple Choice Questions:
A set of 5 multiple choice questions covering key concepts of the course.

Scoring for multiple choice questions:
+1 point for each correct answer
−0.25 points for each incorrect answer
0 points for unanswered questions
Students are allowed to bring a non-programmable calculator.
Duration: 60 minutes


Final scoring will comprise three components:
Assessment of teamwork (10 points)
Evaluation of the report (10 points)
Result from the written test (10 points)

Honors will be granted to students who not only achieve the highest scores but also display active and substantial engagement in the assigned activities.

Option B: Individual work and written exam


Individual work

Students are required to submit a report providing a detailed analysis of a case study of their choice. They may use data from the textbook, published articles, or public databases.
A short report, outlining the main results, is mandatory.
They will be required to prepare a presentation to be delivered, if possible, in class in front of their classmates or to the instructor.

Written exam

The test will consist of two parts:
Part 1 - Open Question:
The test will feature a general question related to the course material.
Students must write their answer on a single A4 sheet (double-sided).
Maximum score 5 points

Part 2 - Multiple Choice Questions:

A set of 5 multiple choice questions covering key concepts of the course.

Scoring for multiple choice questions:
+1 point for each correct answer
−0.25 points for each incorrect answer
0 points for unanswered questions
Students are allowed to bring a non-programmable calculator.
Duration: 60 minutes

Final scoring will comprise three components:
Assessment of oral presentation (10 points)
Evaluation of the report (10 points)
Result from the written test (10 points)

Honours will be awarded only if the maximum score is achieved and if the work developed and/or the presentation show elements of originality and demonstrate a high level of commitment.
STAT-01/A - Statistics - University credits: 9
Lessons: 60 hours
Professor: Tarantola Claudia
Professor(s)
Reception:
Wednesday 9:30 a.m. to 12:30 p.m. (by appointment) or via teams (by appointment)
Via Conservatorio 7, office 34, (by appointment)