1. Introduction to databases. Information systems, organizational systems, and informatics systems. Information and data. Introduction to databases and DBMSs, data models, schemas, and instances. Logical and physical data independence, database languages.
2. The relational model. Logical data models. The relational data model: relations vs. tables; relations with attributes; notations; incomplete information and null values. Integrity constraints: tuple constraints; keys and null values; referential constraints.
3. Relational algebra. Basic operators (union, intersection, difference, selection, and projection) and derived operators (natural join, theta join, semi-join). Queries in relational algebra and equivalence of algebraic expressions. Query idioms.
4. SQL. Data Definition Language: elementary domains, schema definition, table definition, and user defined domains. Intra-relational and inter-relational constraints. SQL queries: simple queries, aggregate queries, GROUP BY queries, set and nested queries. Data modification in SQL: insertions, deletions, and updates. Definition of integrity constraints, assertions, and views. Access control.
5. Database design. The life cycle of information systems. Requirements collection and analysis. Methodologies for database design. Phases of the design methodology. The Entity-Relationship model: basic constructs (entity, relationship, attribute, cardinality, identifiers, generalizations); documentation of E-R schemas; rules. Design strategies: top-down, bottom-up, inside-out, and mixed. Quality of a conceptual schema. Logical design: restructuring of E-R schemas (removing generalizations; selection of primary identifiers; partitioning/merging of entities and relationships); translation into the relational model; documentation of logical schemas. Mention of physical design.
6. Physical database organization. Access manager. Main memory, secondary memory, and buffer. Buffer manager and its primitives. File organization: sequential structures (entry-sequenced, array, sequentially ordered), hash-based structures, tree structures. B- and B+- trees. Organization of tuples within pages. Physical database design and definition of indexes.
7. Transactions management. Definition of transactions. ACID properties of transactions. Transactions and system modules. Reliable control system. Stable memory. Log: organization, record, and management. Failure management: warm restart and cold restart. Concurrency control. Anomalies of concurrent transactions. Serial and serializable schedules. View-equivalence and conflict-equivalence. Two-phase locking and its variations. Timestamp (mono-version and multi-version). Lock management. Locking and isolation levels in SQL. Deadlock management. Livelock and starvation.
8. Distributed architectures. Distributed data paradigms. Types of architectures. Distributed system properties. Client-server architecture. Distributed databases. Data fragmentation and allocation. Transparency levels. Distributed transactions: classification and ACID properties. Distributed query optimization. Lamport method. Distributed deadlock: definition and detection. Two-phase commit protocol: basic protocol; recovery protocols; protocol optimization; other commit protocols.
9. Semi-structured data. XML. Semi-structured data in XML. XML queries: XQuery and XPath; FLOWR expressions.
10. Active databases. E-C-A paradigm. Triggers. Levels of granularity and evaluation behavior. Advanced features of active rules. Properties of active rules: termination, confluence, identical observable behavior. Termination analysis. Applications of active rules.
11. Data analysis. OLTP vs. OLAP. Data warehouse: characteristics and architecture. Multi-dimensional data model. Operations on multi-dimensional data: slice-and-dice, roll-up, drill-down. Development of the data warehouse: ROLAP and MOLAP. ROLAP schemas: star schema and snowflake schema. ROLAP operations. SQL aggregations. Data mining: association and classification rules.
Prerequisites for admission
Knowledge of basic concepts of computer science.
Due to requirements established by the Academic Programs Committee, it is compulsory to have first passed the exam of Computer Programming.
Assessment methods and Criteria
Written exam aimed at verifying the student's knowledge and understanding of the subject. The written exam includes theory questions and exercises. The duration of the exam is 2:30h. The mark is expressed in thirtieths and the grading will consider the correctness, completeness, and clarity of the answers to the questions and exercises. The exam is closed book.