Databases and exposure scenarios

A.A. 2021/2022
Crediti massimi
Ore totali
INF/01 SECS-S/01
Obiettivi formativi
The course is organized into two parts, namely "Informatics and databases" and "Statistics applied to epidemiology".
Informatics and databases part of the course aims at providing the basic concepts of database and database management systems, with focus on relational data modeling and SQL query language. To develop a deeper understanding of the relational data organization in real contexts, examples relational data schemas of biological databases and SQL queries to extract data from them are presented and discussed.
As regards the statistical knowledge, the course aims at providing the fundamental concepts of descriptive and inferential statistics and epidemiological study design. The course also provides concrete tools to apply the main statistical techniques to real cases. At the end of the course the student should demonstrate knowledge and understanding of the main statistical techniques for the description and analysis of the phenomena being studied and the basic principles for setting up an epidemiological study; should have the ability to apply the knowledge acquired and the ability to interpret the results of the statistical analyses; should develop the skills necessary to continue studies independently in the context of statistical analysis and epidemiology.
Risultati apprendimento attesi
Regarding informatics and databases, students are expected to be able to understand relational database schemas and languages and to describe the meaning, the properties, the relationships, and the constraints featuring data stored in a database. Students will be able to apply concepts, models, and languages introduced in the course to formulate SQL queries over a database schema, with appropriate conditions to filter and retrieve target data satisfying specific user needs, also referring to real biological databases.
As regards the statistical knowledge, students will be expected to have assimilated the concepts exposed in the teaching, knowing how to critically compare the use of different statistical tests and study designs. In addition, students will develop the basic skills necessary to design an epidemiological study and face its statistical analysis.
Programma e organizzazione didattica

Edizione unica

Primo semestre
More specific information on the delivery modes of training activities for the academic year 2021/22 will be provided over the coming months, based on the evolution of the public health situation.
"Informatics and Databases"
Introduction to databases. Information systems, information, and data. Database and Database Management System (DBMS). Data models. Schemas and instances. Abstraction levels in DBMSs. Database languages and users.
Relational databases. The relational model. Relations and tables. Relations with attributes. Relations and databases. Incomplete information and null values. Integrity constraints. Definitions and properties of keys. Primary key and foreign key constraints.
Query languages for relational databases: SQL. Basic SQL query format. Selection and projection queries. Join queries (inner join, natural join, outer joins). Aggregate queries. Group by queries. Set (union, intersection, difference) queries. Nested queries. Correlated nested queries.
Conceptual data modeling with the Entity-Relationship model.
Introduction to biological databases. Direct access to relational biological databases. Understanding a sample biological database and its schema; formulation of SQL queries over the sample database using a DBMS.

"Statistics applied to epidemiology"
Introduction to the course (objectives, final evaluation criteria). All of the following theoretical topics are accompanied with examples.
Main criteria for evaluation of scientific studies with special reference to statistical studies applied to epidemiology.
Collecting data sets: populations and samples. Describing data sets: frequency tables, line graphs, bar graphs, frequency graphs; pie charts; grouped data and histograms. Summarizing data sets: mean, median, mode. Outliers. Variability: variance, standard deviation. Accuracy and precision.
Describing data sets: sets of paired data, scatter diagram, least squares regression line,
slope (linear regression coefficient), intercept.
Qualitative and quantitative evaluation of linear regression. The correlation coefficient. Interpreting correlation. 'Correlation does not imply causation'. Spurious relationships.
Probability. Definitions: experiment, outcome, sample space, event. Union, intersection, Venn diagrams. Rules of probability and independent events. Law of total probability. Conditional probability, Bayes Theorem. Probability distribution, expected value. Binomial distribution. Normal continuous random variables, normal probability density function (Gaussian distribution). Standardizing normally distributed random variables. Population and sample. Population mean, population variance. Sample mean, expected value of the sample mean, variance of the sample mean, standard deviation of the sample mean. Sample size.
Confidence intervals. Statistical inference. Null hypothesis. Statistical significance. Student's t test. Chi-squared test.
Main sources of data errors: gross errors, systematic errors, random errors. Their relation with accuracy and precision; effects on distribution. True positives, true negatives, false positives, false negatives. Graphical representation. Sensitivity, specificity.
Types of epidemiological studies. Observational studies: cross-sectional, case-control, cohort. Experimental studies: randomized controlled trials, field trials, community trials. Potential errors in epidemiological studies. Review of the whole program with exercises on the board.
Students must have knowledge of basic mathematics studied during the three-year degree course.
Metodi didattici
"Informatics and Databases." The teaching consists of lectures, supported by the use of slides and blackboard exercises. Slides, which follow the contents of the lectures, are made available on the ARIEL website

"Statistics applied to epidemiology." The teaching consists of lectures, supported by the use of slides and blackboard exercises. Slides, which follow the contents of the lectures, are made available on ARIEL. During the course, paper statistical tables (also available on the ARIEL website are distributed so that students can directly follow the analyzes presented during the lessons.
Materiale di riferimento
"Informatics and Databases"
- P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database Systems - Concepts, Languages and Architectures - Mc-Graw Hill, available online at
Chapters: 1 (whole), 2 (whole), 3(until §3.1.6 included)-4 (only § 4.2. and related subparagraphs)-5 (only § 5.2. and related subparagraphs)
- Lecture slides downloadable from the ARIEL course website (

"Statistics applied to epidemiology"
The teaching material consists of the slides uploaded on ARIEL and of the following books:
- Barbara Illowsky, Susan Dean (2013), Introductory Statistics by OpenStax. 1st Edition, XanEdu Publishing Inc.
- Beaglehole, Robert, Bonita, Ruth, Kjellström, Tord & World Health Organization (‎1993)‎. Basic epidemiology.Updated reprint, World Health Organization.…
- Darrell Huff (1991), How to Lie with Statistics. Penguin (1991).…
Modalità di verifica dell’apprendimento e criteri di valutazione
The course exam consists of two separate exams, one exam for the "Informatics and databases" part of the course and one exam for the "Statistics applied to epidemiology" part of the course. The vote of each part-exam is expressed in thirtieths. The final vote of the course exam is expressed in thirtieths as the average of the two part-exam votes.

"Informatics and Databases".
The exam consists of a single test. No intermediate tests are foreseen. The exam is written (approximately 1 hour and 30 minutes), it covers all the topics presented during lectures, and it will consist of multiple-choice questions and exercises. The exam aims to verify that the course objectives have been achieved, namely, that students have learned the basic concepts related to the relational data model and that they are able to solve query exercises on relational databases.
The same assessment methods and criteria apply to attending and non-attending students.

"Statistics applied to epidemiology."
The exam consists of a single test. No intermediate tests are foreseen. The test consists of a written exam (2 hours). A paper taken from an international indexed journal will be assigned, containing a study evaluated with statistical methods presented in class. Students will have to answer some open questions regarding the understanding of the statistical methods used in the paper. To pass the exam, the student must demonstrate to:
- understand the concepts of epidemiological study and basic statistics.
- know how to apply the knowledge acquired to real situations
- know how to interpret the results obtained from the analyses carried out.
The same assessment methods and criteria apply to attending and non-attending students.
Moduli o unità didattiche
Informatics and Database
Lectures: 24 ore
Docente: Castano Silvana

Statistics applied to Epidemiology
Lectures: 24 ore

Ricevimento su appuntamento tramite email
controllare tramite email
studio presso Dipartimento di Informatica . via Celoria 18 Milano