Data Analysis and Predictive Modeling
A.Y. 2025/2026
Learning objectives
The main goal of the Analysis and Predictive Modeling of Data course is to illustrate the fundamental concepts of basic statistics applied to biomedical sciences. The course integrates descriptive and inferential statistics to equip students with essential skills for data collection, analysis, interpretation, and presentation, with a particular focus on data collected during their Master's Thesis. Specifically, the course aims to:
1. Structure of a Scientific Paper and the Role of Data Analysis and Data Collection:
· Familiarize students with the structure of a scientific paper, section by section, emphasizing the specific role of data analysis.
· Teach students the differences between various types of data (qualitative vs. quantitative, discrete vs. continuous).
· Introduce methods for data collection and sampling from a population for conducting a sample study.
2. Descriptive Statistics:
· Highlight the importance of effective data summarization and presentation using tables and graphs (dot plots, histograms, bar charts, and boxplots).
· Teach students how to calculate and interpret key numerical indicators, including:
◦ Central tendency (mode, mean, and median)
◦ Position (minimum, maximum, percentiles)
◦ Dispersion (range, variance, standard deviation)
◦ Shape (skewness, kurtosis)
3. Inferential Statistics:
· Provide students with tools for making predictions and inferences about a population based on sample data.
· Explain and explore the implications of the Central Limit Theorem, which underpins inferential statistics, enabling confidence interval calculations and hypothesis testing.
· Introduce key statistical tests (e.g., Z-test, Student's t-test for paired and unpaired data, ANOVA, chi-square test). A complete list of statistical tests is detailed in the course syllabus.
4. Predictive Modeling:
· Introduce predictive modeling, which involves building models to make forecasts. This branch of statistics is increasingly important due to the rapid development of artificial intelligence, particularly machine learning.
· Explain how one or more variables (independent variables or predictors) are used to predict another variable (dependent or predicted variable).
· Cover covariance and correlation between variable pairs and predictive models based on:
◦ Simple and multiple linear regression
◦ Polynomial regression
◦ Nonlinear regression, focusing on four types of functions commonly used in biomedical sciences: sinusoidal/cosinusoidal function, exponential function, sigmoidal function, Michaelis-Menten saturation function.
1. Structure of a Scientific Paper and the Role of Data Analysis and Data Collection:
· Familiarize students with the structure of a scientific paper, section by section, emphasizing the specific role of data analysis.
· Teach students the differences between various types of data (qualitative vs. quantitative, discrete vs. continuous).
· Introduce methods for data collection and sampling from a population for conducting a sample study.
2. Descriptive Statistics:
· Highlight the importance of effective data summarization and presentation using tables and graphs (dot plots, histograms, bar charts, and boxplots).
· Teach students how to calculate and interpret key numerical indicators, including:
◦ Central tendency (mode, mean, and median)
◦ Position (minimum, maximum, percentiles)
◦ Dispersion (range, variance, standard deviation)
◦ Shape (skewness, kurtosis)
3. Inferential Statistics:
· Provide students with tools for making predictions and inferences about a population based on sample data.
· Explain and explore the implications of the Central Limit Theorem, which underpins inferential statistics, enabling confidence interval calculations and hypothesis testing.
· Introduce key statistical tests (e.g., Z-test, Student's t-test for paired and unpaired data, ANOVA, chi-square test). A complete list of statistical tests is detailed in the course syllabus.
4. Predictive Modeling:
· Introduce predictive modeling, which involves building models to make forecasts. This branch of statistics is increasingly important due to the rapid development of artificial intelligence, particularly machine learning.
· Explain how one or more variables (independent variables or predictors) are used to predict another variable (dependent or predicted variable).
· Cover covariance and correlation between variable pairs and predictive models based on:
◦ Simple and multiple linear regression
◦ Polynomial regression
◦ Nonlinear regression, focusing on four types of functions commonly used in biomedical sciences: sinusoidal/cosinusoidal function, exponential function, sigmoidal function, Michaelis-Menten saturation function.
Expected learning outcomes
By the end of the course, students will:
1. Enhance Analytical Thinking - Gain familiarity with statistical data analysis and learn to identify patterns, trends, and outliers.
2. Develop Competence in Statistical Techniques - Acquire practical skills in descriptive and inferential statistics, enabling them to analyze datasets, conduct statistical tests, and draw rational conclusions.
3. Make Data-Driven Decisions - Learn how to make informed decisions based on statistical analysis, a critical skill in today's data-driven world.
4. Improve Problem-Solving Skills - Apply statistical methods to tackle complex problems and derive meaningful conclusions from data.
5. Prepare for Academic Careers - Build a solid foundation for students interested in pursuing a Ph.D.
6. Develop Data Presentation Skills - Learn how to present data effectively using graphs, tables, and reports that convey complex information clearly and efficiently.
1. Enhance Analytical Thinking - Gain familiarity with statistical data analysis and learn to identify patterns, trends, and outliers.
2. Develop Competence in Statistical Techniques - Acquire practical skills in descriptive and inferential statistics, enabling them to analyze datasets, conduct statistical tests, and draw rational conclusions.
3. Make Data-Driven Decisions - Learn how to make informed decisions based on statistical analysis, a critical skill in today's data-driven world.
4. Improve Problem-Solving Skills - Apply statistical methods to tackle complex problems and derive meaningful conclusions from data.
5. Prepare for Academic Careers - Build a solid foundation for students interested in pursuing a Ph.D.
6. Develop Data Presentation Skills - Learn how to present data effectively using graphs, tables, and reports that convey complex information clearly and efficiently.
Lesson period: First semester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course can be attended as a single course.
Course syllabus and organization
Single session
Responsible
Lesson period
First semester
Course syllabus
Introduction to the course. Structure of a scientific work. Distinction between population and sample. Distinction between descriptive statistics and inferential statistics. Data representation through tables and graphs (dot plot, histogram, boxplot). Presentation of descriptive statistical indices. Measures of central tendency, position, variability, and shape. The Gaussian distribution and the standardized Gaussian distribution (Z). Use of the Z table. Inferential statistics and the Central Limit Theorem. Accuracy vs. precision. Calculation of the confidence interval based on Z. The Student's t distribution with applications to the calculation of the confidence interval. Hypothesis testing theory. Z test and Student's t test for a single sample. Type I and Type II statistical errors. Student's t test for two samples: paired data test and unpaired data test. ANOVA for independent groups and repeated measures ANOVA. Assessment of the normality of a dataset and logarithmic transformation. Non-parametric tests. Bivariate and predictive statistics: correlation and simple linear regression. Multiple linear regression. Polynomial regression. Nonlinear regression with reference to the exponential function and the Michaelis-Menten function. Categorical variables. Chi-square test for independence applied to contingency tables. Fisher's exact test. Case study: introduction to chronobiology and actigraphic study of chronotype through questionnaire (MEQ) and actigraphy interpreted with the Cosinor model. Case study: the glucose-insulin regulation system and the concept of insulin resistance. Determination of two crucial indices for the functioning of the glucose-insulin system: insulin sensitivity and beta-cell sensitivity.
Prerequisites for admission
Students should have passed an Introductory Mathematics exam during their Bachelor's degree. However, high school mathematics knowledge is sufficient for following the course.
Teaching methods
Lectures
Teaching Resources
REFERENCE TEXTBOOKS The course does not require any specific textbook, as myAriel materials are sufficient for exam preparation. However, recommended references include:
1. Marc Triola e Mario Triola - Statistica per le discipline biosanitarie, Pearson Editore (544 pagine).
2. S. A. Glantz- Statistica per le Discipline Biomediche, McGraw-Hill Editore (490 pagine).
3. David S. Moore - Statistica di base, Apogeo Editore (645 pagine).
MEETINGS WITH STUDENTS ARE ARRANGED VIA E_MAIL
1. Marc Triola e Mario Triola - Statistica per le discipline biosanitarie, Pearson Editore (544 pagine).
2. S. A. Glantz- Statistica per le Discipline Biomediche, McGraw-Hill Editore (490 pagine).
3. David S. Moore - Statistica di base, Apogeo Editore (645 pagine).
MEETINGS WITH STUDENTS ARE ARRANGED VIA E_MAIL
Assessment methods and Criteria
EXAM FORMAT
· Written exam lasting 1 hour, consisting of 31 multiple-choice questions.
· Scoring:
◦ 31 correct answers: 30L
◦ 30 correct answers: 30
◦ No penalty for incorrect answers.
· Required items: Z and t tables, a calculator (not a mobile phone calculator).
· Written exam lasting 1 hour, consisting of 31 multiple-choice questions.
· Scoring:
◦ 31 correct answers: 30L
◦ 30 correct answers: 30
◦ No penalty for incorrect answers.
· Required items: Z and t tables, a calculator (not a mobile phone calculator).
ING-INF/06 - ELECTRONIC AND INFORMATICS BIOENGINEERING - University credits: 2
MAT/06 - PROBABILITY AND STATISTICS - University credits: 4
MAT/06 - PROBABILITY AND STATISTICS - University credits: 4
Lessons: 48 hours
Professor:
Caumo Andrea
Educational website(s)
Professor(s)
Reception:
To be arranged via e-mail
Via Colombo 71, 20133 Milano