Databases and Exposure scenarios

A.Y. 2020/2021
6
Max ECTS
48
Overall hours
SSD
INF/01 SECS-S/01
Language
English
Learning objectives
The course is organized into two parts, namely "Informatics and databases" and "Statistics applied to epidemiology".
Informatics and databases part of the course aims at providing the basic concepts of database and database management systems, with focus on relational data modeling and SQL query language. To develop a deeper understanding of the relational data organization in real contexts, examples relational data schemas of biological databases and SQL queries to extract data from them are presented and discussed.
As regards the statistical knowledge, the course aims at providing the fundamental concepts of descriptive and inferential statistics and epidemiological study design. The course also provides concrete tools to apply the main statistical techniques to real cases. At the end of the course the student should demonstrate knowledge and understanding of the main statistical techniques for the description and analysis of the phenomena being studied and the basic principles for setting up an epidemiological study; should have the ability to apply the knowledge acquired and the ability to interpret the results of the statistical analyses; should develop the skills necessary to continue studies independently in the context of statistical analysis and epidemiology.
Expected learning outcomes
Regarding informatics and databases, students are expected to be able to understand relational database schemas and languages and to describe the meaning, the properties, the relationships, and the constraints featuring data stored in a database. Students will be able to apply concepts, models, and languages introduced in the course to formulate SQL queries over a database schema, with appropriate conditions to filter and retrieve target data satisfying specific user needs, also referring to real biological databases.
As regards the statistical knowledge, students will be expected to have assimilated the concepts exposed in the teaching, knowing how to critically compare the use of different statistical tests and study designs. In addition, students will develop the basic skills necessary to design an epidemiological study and face its statistical analysis.
Course syllabus and organization

Single session

Responsible
Lesson period
First semester
Teaching methods.
"Informatics and databases" and "Statistics applied to epidemiology" lectures will be given on the Zoom platform and can be followed both synchronously (even in class if the emergency situation makes this possible), based on the timetable of the first semester, and asynchronously, as lectures will be recorded and uploaded on the Ariel website of the course.

Course syllabus and Teaching Resources.
No changes.

Assessment methods and Criteria.
Assessments methods and criteria are unchanged: exams will be either in class or online, depending on the rules in force at that time.
Course syllabus
"Informatics and Databases"
Introduction to databases. Information systems, information and data. Database and Database
Management System (DBMS). Data models. Schemas and instances. Abstraction levels in DBMSs.
Database languages and users. Relational databases. The relational model. Relations and tables. Relations with attributes. Relations
and databases. Incomplete information and null values. Integrity constraints. Definitions and properties
of keys. Primary key and foreign key constraints.
Query languages for relational databases: SQL. Basic SQL query format. Selection amd projection queries. Join queries (inner join, natural join, outer joins). Aggregate queries. Group by queries. Set (union, intersection, difference) queries. Nested queries. Correlated nested queries.
Conceptual data modeling with the Entity-Relationship model.
Introduction to biological databases. Direct access to relational biological databases. The Ensembl database and its schema; sample queries over the Ensembl database.

"Statistics applied to epidemiology"
Introduction to the course (objectives, final evaluation criteria). All of the following theoretical topics are accompanied with examples.
Main criteria for evaluation of scientific studies with special reference to statistical studies applied to epidemiology.
Collecting data sets: populations and samples. Describing data sets: frequency tables, line graphs, bar graphs, frequency graphs; pie charts; grouped data and histograms. Summarizing data sets: mean, median, mode. Outliers. Variability: variance, standard deviation. Accuracy and precision.
Describing data sets: sets of paired data, scatter diagram, least squares regression line,
slope (linear regression coefficient), intercept.
Qualitative and quantitative evaluation of linear regression. The correlation coefficient. Interpreting correlation. 'Correlation does not imply causation'. Spurious relationships.
Probability. Definitions: experiment, outcome, sample space, event. Union, intersection, Venn diagrams. Rules of probabilty and independent events. Law of total probability. Conditional probability, Bayes Theorem. Probability distribution, expected value. Binomial distribution. Normal continuous random variables, normal probability density function (Gaussian distribution). Standardizing normally distributed random variables. Population and sample. Population mean, population variance. Sample mean, expected value of the sample mean, variance of the sample mean, standard deviation of the sample mean. Sample size.
Confidence intervals. Statistical inference. Null hypothesis. Statistical significance. Student's t test. Chi-squared test.
Main sources of data errors: gross errors, systematic errors, random errors. Their relation with accuracy and precision; effects on distribution. True positives, true negatives, false positives, false negatives. Graphical representation. Sensitivity, specificity.
Types of epidemiological studies. Observational studies: cross-sectional, case-control, cohort. Experimental studies: randomized controlled trials, field trials, community trials. Potential errors in epidemiological studies. Review of the whole program with exercises on the board.
Prerequisites for admission
Students must have knowledge of basic mathematics studied during the three-year degree course.
Teaching methods
"Informatics and Databases." The teaching consists of lectures, supported by the use of slides. Slides, which follow the contents of the lectures, are made available on the ARIEL web site https://scastanodes.ariel.ctu.unimi.it/.

"Statistics applied to epidemiology." The teaching consists of lectures, supported by the use of slides and blackboard exercises. Slides, which follow the contents of the lectures, are made available on the ARIEL web site. During the course, paper statistical tables (also available on the ARIEL web site https://rpizzisae.ariel.ctu.unimi.it/) are distributed so that students can directly follow the analyses presented during the lessons.
Teaching Resources
"Informatics and Databases"
- P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database Systems - Concepts, Languages and Architectures - Mc-Graw Hill, available on-line at http://dbbook.dia.uniroma3.it/
Chapters: 1(whole), 2 (whole), 3(until §3.1.6 included)-4 (only § 4.2. and related subparagraphs)-5 (only § 5.2. and related subparagraphs)
- Lecture slides downloadable from the course web site (https://scastanodes.ariel.ctu.unimi.it/).

"Statistics applied to epidemiology"
The teaching material consists of the slides uploaded on ARIEL and of the following books:
- Barbara Illowsky, Susan Dean (2013), Introductory Statistics by OpenStax. 1st Edition, XanEdu Publishing Inc.
https://openstax.org/details/books/introductory-statistics
- Beaglehole, Robert, Bonita, Ruth, Kjellström, Tord & World Health Organization (‎1993)‎. Basic epidemiology.Updated reprint, World Health Organization. https://apps.who.int/iris/bitstream/handle/10665/36838/9241544465.pdf?s…
- Darrell Huff (1991), How to Lie with Statistics. Penguin (1991). https://archive.org/details/HowToLieWithStatistics
Assessment methods and Criteria
The course exam consists of two separate exams, one exam for "Informatics and databases" part of the course and one exam for "Statistics applied to epidemiology" part of the course. The vote of each part-exam is expressed in thirtieths. The final vote of the course exam is expressed in thirtieths as the average of the two part-exam votes.

"Informatics and Databases".
The exam consists of a single test. No intermediate tests are foreseen. The exam is written (approximately 1 hour and 30 minutes), it covers all the topics presented during lectures, and it will consist in multiple-choice questions and exercises. The exam aims to verify that the course objectives have been achieved, namely, that students have learned the basic concepts related to the relational data model and that they are able to solve query exercises on relational databases, also biological.
The same assessment methods and criteria apply to attending and non-attending students.

"Statistics applied to epidemiology."
The exam consists of a single test. No intermediate tests are foreseen. The test consists of a written exam (2 hours). A paper taken from an international indexed journal will be assigned, containing a study evaluated with statistical methods presented in class. Students will have to answer some open questions regarding the understanding of the statistical methods used in the paper. To pass the exam, the student must demonstrate to:
- understand the concepts of epidemiological study and basic statistics.
- know how to apply the knowledge acquired to real situations
- know how to interpret the results obtained from the analyses carried out.
The same assessment methods and criteria apply to attending and non-attending students.
Informatics and Database
INF/01 - INFORMATICS - University credits: 0
SECS-S/01 - STATISTICS - University credits: 0
Lectures: 24 hours
Professor: Castano Silvana
Statistics applied to Epidemiology
INF/01 - INFORMATICS - University credits: 0
SECS-S/01 - STATISTICS - University credits: 0
Lectures: 24 hours
Professor(s)
Reception:
Upon request by email
Office P109 - via Comelico 39