Statistics for Big Data for Economics and Business
A.Y. 2025/2026
Learning objectives
This course aims at introducing and illustrating specific statistical, IT and machine learning methodologies for the analysis of Big Data in economic, business and financial applications. The course will focus mainly on the Python programming language, which is by far the most used in Big Data applications, but some parts will be devoted to the R language and other more classical languages such as Java. On the statistical side, supervised and unsupervised statistical learning themes will be proposed with some reference to Bayesian statistics.
Expected learning outcomes
At the end of the course, students will have acquired adequate statistical and programming skills allowing for mastering the tools necessary for the analysis of Big Data and the extrapolation of information of interest in the economic, business and financial fields.
Lesson period: Third trimester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
Single session
Responsible
Lesson period
Third trimester
Course syllabus
The topics are
- matrix, data matrix and their descriptive statistics
- Cluster analysis (hierarchical and non methods) with intepretation in R
- Principal Component Analysis with intepretation in R
- Decision Trees (CART) with intepretation in R
- Bootstrap with intepretation in R
- Random forest with intepretation in R
- matrix, data matrix and their descriptive statistics
- Cluster analysis (hierarchical and non methods) with intepretation in R
- Principal Component Analysis with intepretation in R
- Decision Trees (CART) with intepretation in R
- Bootstrap with intepretation in R
- Random forest with intepretation in R
Prerequisites for admission
In order to adequately understand the contents of the course, the students must have basic knowledges in Statistics and Mathematics.
Teaching methods
The course will be organized through theoretical lectures and practical lectures where will interpret output in R
Teaching Resources
- Slides
- Witten et al. (2020). Introduzione all'apprendimento statistico, Piccin (https://www.piccin.it/it/statistica
introduzione-all-apprendimento-statistico-con-applicazioni-in-r-9788829930944.html).
- Crawley (2012) The R book, Wiley.
- Parte informatica (Python): Bowles (2015). Machine Learning in Python, Wiley.
- Parte informatica (Cloud computing): Barrie Sosinsky (2010): Cloud Computing Bible,
Wiley.
- Parte informatica (Database relazionali): Atzeni et al. (2018).Basi di dati. McGraw-
Hill
- Witten et al. (2020). Introduzione all'apprendimento statistico, Piccin (https://www.piccin.it/it/statistica
introduzione-all-apprendimento-statistico-con-applicazioni-in-r-9788829930944.html).
- Crawley (2012) The R book, Wiley.
- Parte informatica (Python): Bowles (2015). Machine Learning in Python, Wiley.
- Parte informatica (Cloud computing): Barrie Sosinsky (2010): Cloud Computing Bible,
Wiley.
- Parte informatica (Database relazionali): Atzeni et al. (2018).Basi di dati. McGraw-
Hill
Assessment methods and Criteria
Tre homework non compulsory
Written exam with open questions and multiple choice questions on theory, R and Python
Written exam with open questions and multiple choice questions on theory, R and Python
Professor(s)
Reception:
Each Wednesday 10 -12
DEMM, room 31, 3° floor (By appointment, please send an email)