Data Management

A.Y. 2023/2024
6
Max ECTS
56
Overall hours
SSD
INF/01
Language
English
Learning objectives
The course aims at providing the basic concepts of data management, with focus on structured databases and on unstructured (big) data. Basic concepts of relational database systems and of the SQL query language are provided. To develop a deeper understanding of the relational data management in real contexts, examples of relational data schemas and SQL queries formulation for selective data extraction are presented and discussed. Recent and innovative NoSQL solutions for unstructured data management are also illustrated, with special focus on the MongoDB document-oriented system and on basic concepts of the Python scripting language, to interact with a MongoDB database for data extraction and manipulation purposes.
The contents of this course are essential to provide the students a sufficient background in data management that will be applied first in subsequent classes (in particular in the laboratories), and later on in their professional career when monitoring, analysing and addressing natural resource management issues.
Expected learning outcomes
· Knowledge and understanding.
Students are expected to be able to understand relational database schemas and languages.
Students are expected to be able to understand the principles of data organization in NoSQL systems with basic notions of scripting programming with the Python language.

· Applying knowledge and understanding.
Students will be able to describe the meaning, the properties, the relationships, and the constraints featuring data stored in a database.
Students will be able to apply concepts, models, and languages introduced in the course to formulate SQL queries over a database schema, with appropriate conditions to filter and retrieve target data satisfying specific user needs, also referring to real databases in environmental contexts.
Students will be able to apply NoSQL concepts and Python programming principles illustrated in the course for data extraction, aggregation, and manipulation over a MongoDB database.
Single course

This course can be attended as a single course.

Course syllabus and organization

Single session

Responsible
Lesson period
First semester
Course syllabus
Part I (3 CFU)
Introduction to relational databases. Database and database systems (DBMS). Data definition languages and data manipulation languages for databases.
The relational model. Queries with the SQL language. Simple queries and group queries with aggregate operators. Queries with set operators. Nested queries.

Part II (3 CFU)
Introduction to NoSQL databases. Data models for NoSQL. Types of NoSQL. Comparison against the relational model.
The "document-oriented" data model. The MongoDB system. Collection in MongoDB. Collection queries in MongoDB. Aggregation pipeline in MongoDB.
Python language. Principles of programming and introduction to the language. Data structures and data types.
The Pandas library for data manipulation.
Prerequisites for admission
None.
Teaching methods
For attending and non-attending students: slides and handouts that are progressively published on the Ariel website https://scastanodm.ariel.ctu.unimi.it/
Teaching Resources
Relational databases and SQL:
- P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database Systems - Concepts, Languages and Architectures - Mc-Graw Hill, available online at http://dbbook.dia.uniroma3.it/ (Chapters 1, 2, 4).

Python / Numpy / Pandas:
- Allen B. Downey, Think Python 2nd Edition - O'Reilly Media, available online at https://greenteapress.com/wp/think-python-2e/

- J. VanderPlas, Python Data Science Handbook - O'Reilly Media, available online at https://jakevdp.github.io/PythonDataScienceHandbook/ (Chapters 2,3,4)

MongoDB:
https://www.mongodb.com/docs/manual/tutorial/getting-started/

Online resources and handouts provided throughout the lectures available on the Ariel website https://scastanodm.ariel.ctu.unimi.it/
Assessment methods and Criteria
The exam is written (90 minutes) with quizzes/questions/exercises covering the course syllabus. The result is expressed in thirtieths. The exam aims to verify that the course objectives have been achieved, namely, that students have learned the basic concepts related to relational and NoSQL data organization; that they are able to interpret requests and implement correct queries to extract and organize appropriate data from NoSQL and relational databases for a given target; that they have learned Python language fundamentals for data management.
For attending students only: a first intermediate exam is foreseen covering relational databases and SQL and a second intermediate exam is foreseen (at the end of the course) covering NoSQL databases and Python.
INF/01 - INFORMATICS - University credits: 6
Computer room practicals: 16 hours
Lessons: 40 hours
Educational website(s)
Professor(s)
Reception:
Upon request by email
Office P109 - via Comelico 39
Reception:
Upon request by email