Coding for data science and data management

A.A. 2019/2020
Insegnamento per
12
Crediti massimi
80
Ore totali
SSD
INF/01 SECS-S/01
Lingua
Inglese
Obiettivi formativi
Upon completion of the course, students will be able to:
- manage data using R and R Studio;
- solve coding challenges using R libraries and functions;
- make statistical inference and graphics using R;
- writing an apply family of functions in R;
- understand the Python data model and the flow control statements;
- use the built-in Python data structures;
- perform basic linear algebra operations using Numpy;
- perform basic data set manipulations using Pandas:
- perform simple machine learning experiments using Scikit-learn;
- understand and apply the core notions of data modeling in relational databases;
- use the SQL language for creating and querying relational database structures;
- understand and apply the principles of data organization in NoSQL systems;
- use MongoDB for data retrieval and aggregation in a document-oriented NoSQL system.

Struttura insegnamento e programma

Edizione attiva
Responsabile
Moduli o unità didattiche
Module Coding for Data Science
INF/01 - INFORMATICA - CFU: 6
Lezioni: 40 ore

Module Data Management
SECS-S/01 - STATISTICA - CFU: 6
Lezioni: 40 ore

STUDENTI FREQUENTANTI
Prerequisiti e modalità di esame
No prerequisites are required. The exam is written and it aims at verifying that the objectives of the two units have been achieved.
The written exam is organized in:
- exercices on R scripts;
- quizzes on Python code fragments;
- open-ended questions on topics from data analysis;
- quizzes on i) relational data modeling and ii) NoSQL systems;
- exercises on the SQL language and find/aggregation queries in MongoDB.
Metodi didattici
Exam organization: written exam on the whole syllabus.
Attendance: highly recommended.
Class organization: frontal teaching, classroom exercises.
Module Coding for Data Science
Programma
PYTHON unit
Introduction to the Language
Data structures
Control flow
File I/O
Numpy
Pandas and Matplotlib
Scikit-learn

DATA MANAGEMENT unit
Introduction to relational databases:
- information and data;
- database and database systems (DBMS);
- the relational model;
- integrity constraints;
- key definition and primary key constraints;
- foreign key constraints.
Database languages:
- data definition languages;
- data manipulation languages;
- queries with the SQL language;
- simple queries, group queries with aggregate operators;
- queries with set operators;
- nested queries.
Metodi didattici
Exam organization: written exam.
Attendance: highly recommended.
Class organization: frontal teaching, classroom exercises.
Materiale didattico e bibliografia
Python: online resources and handouts.

Data management: choose the preferred book between the following alternatives:
- P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database Systems - Concepts, Languages and Architectures - Mc-Graw Hill, available on-line at http://dbbook.dia.uniroma3.it/.
- R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, 7th edition, Pearson, 2015.
Module Data Management
Programma
R unit
Introduction to R framework and R Studio
Data structures: vectors, array, data.frames, lists, environments
Statistical Inference with R
Making plots and base Graphics with R
Data Manipulation with R
References to parallel computing with R

DATA MANAGEMENT unit
Introduction to NoSQL databases:
- data models for NoSQL;
- types of NoSQL;
- comparison against the relational model.
NoSQL database systems:
- the "document-oriented" data model;
- the MongoDB system;
- collection in MongoDB;
- collection queries in MongoDB;
- aggregation pipeline in MongoDB.
Metodi didattici
Exam organization: written exam.
Attendance: highly recommended.
Class organization: frontal teaching, classroom exercises.
Materiale didattico e bibliografia
Online resources and handouts
STUDENTI NON FREQUENTANTI
Prerequisiti e modalità di esame
No prerequisites are required. The exam is written and it aims at verifying that the objectives of the two units have been achieved.
The written exam is organized in:
- exercices on R scripts;
- quizzes on Python code fragments;
- open-ended questions on topics from data analysis;
- quizzes on i) relational data modeling and ii) NoSQL systems;
- exercises on the SQL language and find/aggregation queries in MongoDB.
Module Coding for Data Science
Programma
PYTHON unit
Introduction to the Language
Data structures
Control flow
File I/O
Numpy
Pandas and Matplotlib
Scikit-learn

DATA MANAGEMENT unit
Introduction to relational databases:
- information and data;
- database and database systems (DBMS);
- the relational model;
- integrity constraints;
- key definition and primary key constraints;
- foreign key constraints.
Database languages:
- data definition languages;
- data manipulation languages;
- queries with the SQL language;
- simple queries, group queries with aggregate operators;
- queries with set operators;
- nested queries.
Materiale didattico e bibliografia
Python: online resources and handouts.

Data management: choose the preferred book between the following alternatives:
- P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database Systems - Concepts, Languages and Architectures - Mc-Graw Hill, available on-line at http://dbbook.dia.uniroma3.it/.
- R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, 7th edition, Pearson, 2015.
Module Data Management
Programma
R unit
Introduction to R framework and R Studio
Data structures: vectors, array, data.frames, lists, environments
Statistical Inference with R
Making plots and base Graphics with R
Data Manipulation with R
References to parallel computing with R

DATA MANAGEMENT unit
Introduction to NoSQL databases:
- data models for NoSQL;
- types of NoSQL;
- comparison against the relational model.
NoSQL database systems:
- the "document-oriented" data model;
- the MongoDB system;
- collection in MongoDB;
- collection queries in MongoDB;
- aggregation pipeline in MongoDB.
Materiale didattico e bibliografia
Online resources and handouts
Periodo
Secondo trimestre
Periodo
Secondo trimestre
Modalità di valutazione
Esame
Giudizio di valutazione
voto verbalizzato in trentesimi
Docente/i
Ricevimento:
Mercoledì 9:30-12:30
via Celoria 18, stanza 7007
Ricevimento:
Giovedì, ore 11-12 (controllare eventuali spostamenti alla pagina specificata)
Stanza S209, Dipartimento di Informatica, Via Comelico 39 - 20135 Milano
Ricevimento:
Settembre, ricevimenti programmati: 09/09 13.30-15.30, 17/09 9.30-11.00, 25/09 10.30-12.00. Da ottobre, il ricevimento si svolgerà regolarmente il mercoledì 10.30-12.00 e il venerdì dale 10.30-12.00.
DEMM, stanza 31, 3° p