Coding for data science and data management

A.A. 2018/2019
Insegnamento per
12
Crediti massimi
80
Ore totali
SSD
INF/01 SECS-S/01
Lingua
Inglese
Obiettivi formativi
The course is articulated in two modules. The module of Coding for Data Science aims at providing technical skills about data analysis with focus on coding aspects supported by the Python programming language and the R framework. The module of Data Management deals with information and data management issues within modern information systems, with focus on relational databases and emergent NoSQL systems.

Struttura insegnamento e programma

Edizione attiva
Responsabile
Moduli o unità didattiche
Module Coding for Data Science
INF/01 - INFORMATICA - CFU: 6
Lezioni: 40 ore

Module Data Management
SECS-S/01 - STATISTICA - CFU: 6
Lezioni: 40 ore

STUDENTI FREQUENTANTI
Prerequisiti e modalità di esame
The exam aims at verifying that the objectives of the two modules have been achieved.
Regarding Coding for Data Science: no prerequisites. The exam consists in a written exam followed by an oral exam about the topics of the module syllabus.
Regarding Data Management: no prerequisites. The exam consists in a written exam about the topics of the module syllabus
Metodi didattici
Exam organization: written and oral exam on the syllabus of Coding for Data Science; written exam on the syllabus of Data Management.
Attendance: highly recommended.
Class organization: lectures.
Module Coding for Data Science
Programma
PYTHON
Introduction to the language, environment, set up
Data structures 1: numeric, string.
Control flow structures
Data structures 2: list, set, tuple, dict
Data Structures 3: collections, lambda, json
File I/O
Objects: introduction and basics
Numpy and linear algebra
Pandas 1: indexing and series
Pandas 2: dataframe
Pandas 3: graphics
R
Introduction and first steps
Data structures: vectors, array, data.frames, lists, environments
Loop and flow controls
Functions and object
Vectorized calculus and function vectorization
S3 and S4 object oriented programming
Parallelization and optimizztion
Metodi didattici
Exam organization: written + oral exam.
Attendance: highly recommended.
Class organization: lectures.
Materiale didattico e bibliografia
Online resources and handouts
Module Data Management
Programma
Introduction to relational database. Information and data; database and database systems (DBMS); the relational model, integrity constraints, key definition, primary key constraints; foreign key constraints.
Database languages. Data definition languages; data manipulation languages; queries with the SQL language; simple queries, group queries with aggregate operators; queries with set operators; nested queries.
Introduction to NoSQL databases. Data models for NoSQL; CAP theorem (coherence, availability, tolerance); types of NoSQL; comparison against the relational model.
NoSQL database systems. The "document-oriented" data model; the MongoDB system; collection in MongoDB; collection queries in MongoDB; aggregation pipeline in MongoDB; the "column-family" data model; the Cassandra system; keyspace in Cassandra; keyspace queries in Cassandra; the CQL language.
Metodi didattici
Exam organization: written exam.
Attendance: highly recommended.
Class organization: lectures.
Materiale didattico e bibliografia
Choose the preferred book between the following alternatives:
- P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database Systems - Concepts, Languages and Architectures - Mc-Graw Hill, Available on-line at http://dbbook.dia.uniroma3.it/.
- R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, 7th edition, Pearson, 2015.

Online resources and lecture stuff that can be downloaded from the course website
STUDENTI NON FREQUENTANTI
Module Coding for Data Science
Programma
PYTHON
Introduction to the language, environment, set up
Data structures 1: numeric, string.
Control flow structures
Data structures 2: list, set, tuple, dict
Data Structures 3: collections, lambda, json
File I/O
Objects: introduction and basics
Numpy and linear algebra
Pandas 1: indexing and series
Pandas 2: dataframe
Pandas 3: graphics
R
Introduction and first steps
Data structures: vectors, array, data.frames, lists, environments
Loop and flow controls
Functions and object
Vectorized calculus and function vectorization
S3 and S4 object oriented programming
Parallelization and optimizztion
data structures: vectors, matrices, arrays, data.frames, lists, environment
flow control
Materiale didattico e bibliografia
Online resources and handouts
Periodo
Terzo trimestre
Periodo
Terzo trimestre
Modalità di valutazione
Esame
Giudizio di valutazione
voto verbalizzato in trentesimi
Docente/i
Ricevimento:
Mercoledì 9:30-12:30
via Celoria 18, stanza 7007
Ricevimento:
Giovedì, ore 11-12 (controllare eventuali spostamenti alla pagina specificata)
Stanza 7015, Dipartimento di Informatica "Giovanni degli Antoni", Via Celoria 18 - 20133 Milano