Databases
A.Y. 2018/2019
Learning objectives
Discuss the main concepts and foundations of databases, in particular of relational DBMSs, database design and organization, transaction management, distributed database architectures, and evolution of database technologies such as semi-structured data, active databases, and database for decision support.
Expected learning outcomes
Undefined
Lesson period: Second semester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
Single session
Responsible
Lesson period
Second semester
Course syllabus
1. Introduction to databases. Information systems, organizational systems, and informatics systems. Information and data. Introduction to databases and DBMSs, data models, schemas, and instances. Logical and physical data independence, database languages, people involved in the database management.
2. The relational model. Logical data models. The relational data model: relations vs. tables; relations with attributes; notations; incomplete information and null values. Integrity constraints: tuple constraints; keys and null values; referential constraints.
3. Relational algebra. Basic operators (union, intersection, difference, selection, and projection) and derived operators (natural join, theta join, semi-join). Queries in relational algebra and equivalence of algebraic expressions. Query idioms.
4. SQL. Data Definition Language: elementary domains, schema definition, table definition, and user defined domains. Intra-relational and inter-relational constraints. SQL queries: simple queries, aggregate queries, GROUP BY queries, set and nested queries. Data modification in SQL: insertions, deletions, and updates. Definition of integrity constraints, assertions, and views. Access control.
5. Database design. The life cycle of information systems. Requirements collection and analysis. Methodologies for database design. Phases of the design methodology. The Entity-Relationship model: the basic constructs (entity, relationship, attribute, cardinality, identifiers, generalizations); documentation of E-R schemas; rules. Design strategies: top-down, bottom-up, inside-out, and mixed. Quality of a conceptual schema. Logical design: restructuring of E-R schemas (removing generalizations; selection of primary identifiers; partitioning/merging of entities and relationships); translation into the relational model; documentation of logical schemas. Mention of physical design.
6. Physical database organization. Access manager. Main memory, secondary memory, and buffer. Buffer manager and its primitives. File organization: sequential structures (entry-sequenced, array, sequentially ordered), hash-based structures, tree structures. B- and B+- trees. Organization of tuples within pages. Physical database design and definition of indexes.
7. Transactions management. Definition of transactions. ACID properties of transactions. Transactions and system modules. Reliable control system. Stable memory. Log: organization, record, and management. Failure management: warm restart and cold restart. Concurrency control. Anomalies of concurrent transactions. Serial and serializable schedules. View-equivalence and conflict-equivalence. Two-phase locking and its variations. Timestamp (mono-version and multi-version). Lock management. Locking and isolation levels in SQL. Deadlock management. Livelock and starvation.
8. Distributed architectures. Distributed data paradigms. Types of architectures. Distributed system properties. Client-server architecture. Distributed databases. Data fragmentation and allocation. Transparency levels. Distributed transactions: classification and ACID properties. Distributed query optimization. Lamport method. Distributed deadlock: definition and detection. Two-phase commit protocol: basic protocol; recovery protocols; protocol optimization; other commit protocols.
9. Semi-structured data. XML. Semi-structured data in XML. XML queries: XQuery and XPath; FLOWR expressions.
10. Active databases. E-C-A paradigm. Triggers. Levels of granularity and evaluation behaviour. Advanced features of active rules. Properties of active rules: termination, confluence, identical observable behaviour. Termination analysis. Applications of active rules.
11. Data analysis. OLTP vs. OLAP. Data warehouse: characteristics and architecture. Multi-dimensional data model. Operations on multi-dimensional data: slice-and-dice, roll-up, drill-down. Development of the data warehouse: ROLAP and MOLAP. ROLAP schemas: star schema and snowflake schema. ROLAP operations. SQL aggregations. Data mining: association and classification rules.
2. The relational model. Logical data models. The relational data model: relations vs. tables; relations with attributes; notations; incomplete information and null values. Integrity constraints: tuple constraints; keys and null values; referential constraints.
3. Relational algebra. Basic operators (union, intersection, difference, selection, and projection) and derived operators (natural join, theta join, semi-join). Queries in relational algebra and equivalence of algebraic expressions. Query idioms.
4. SQL. Data Definition Language: elementary domains, schema definition, table definition, and user defined domains. Intra-relational and inter-relational constraints. SQL queries: simple queries, aggregate queries, GROUP BY queries, set and nested queries. Data modification in SQL: insertions, deletions, and updates. Definition of integrity constraints, assertions, and views. Access control.
5. Database design. The life cycle of information systems. Requirements collection and analysis. Methodologies for database design. Phases of the design methodology. The Entity-Relationship model: the basic constructs (entity, relationship, attribute, cardinality, identifiers, generalizations); documentation of E-R schemas; rules. Design strategies: top-down, bottom-up, inside-out, and mixed. Quality of a conceptual schema. Logical design: restructuring of E-R schemas (removing generalizations; selection of primary identifiers; partitioning/merging of entities and relationships); translation into the relational model; documentation of logical schemas. Mention of physical design.
6. Physical database organization. Access manager. Main memory, secondary memory, and buffer. Buffer manager and its primitives. File organization: sequential structures (entry-sequenced, array, sequentially ordered), hash-based structures, tree structures. B- and B+- trees. Organization of tuples within pages. Physical database design and definition of indexes.
7. Transactions management. Definition of transactions. ACID properties of transactions. Transactions and system modules. Reliable control system. Stable memory. Log: organization, record, and management. Failure management: warm restart and cold restart. Concurrency control. Anomalies of concurrent transactions. Serial and serializable schedules. View-equivalence and conflict-equivalence. Two-phase locking and its variations. Timestamp (mono-version and multi-version). Lock management. Locking and isolation levels in SQL. Deadlock management. Livelock and starvation.
8. Distributed architectures. Distributed data paradigms. Types of architectures. Distributed system properties. Client-server architecture. Distributed databases. Data fragmentation and allocation. Transparency levels. Distributed transactions: classification and ACID properties. Distributed query optimization. Lamport method. Distributed deadlock: definition and detection. Two-phase commit protocol: basic protocol; recovery protocols; protocol optimization; other commit protocols.
9. Semi-structured data. XML. Semi-structured data in XML. XML queries: XQuery and XPath; FLOWR expressions.
10. Active databases. E-C-A paradigm. Triggers. Levels of granularity and evaluation behaviour. Advanced features of active rules. Properties of active rules: termination, confluence, identical observable behaviour. Termination analysis. Applications of active rules.
11. Data analysis. OLTP vs. OLAP. Data warehouse: characteristics and architecture. Multi-dimensional data model. Operations on multi-dimensional data: slice-and-dice, roll-up, drill-down. Development of the data warehouse: ROLAP and MOLAP. ROLAP schemas: star schema and snowflake schema. ROLAP operations. SQL aggregations. Data mining: association and classification rules.
INF/01 - INFORMATICS - University credits: 12
Laboratories: 48 hours
Lessons: 72 hours
Lessons: 72 hours
Professors:
Foresti Sara, Samarati Pierangela
Professor(s)