Coding for Data Science and Data Management

A.Y. 2018/2019
Overall hours
INF/01 SECS-S/01
Learning objectives
The course is articulated in two modules. The module of Coding for Data Science aims at providing technical skills about data analysis with focus on coding aspects supported by the Python programming language and the R framework. The module of Data Management deals with information and data management issues within modern information systems, with focus on relational databases and emergent NoSQL systems.
Expected learning outcomes
Course syllabus and organization

Single session

Lesson period
Third trimester
Module Coding for Data Science
Course syllabus
Introduction to the language, environment, set up
Data structures 1: numeric, string.
Control flow structures
Data structures 2: list, set, tuple, dict
Data Structures 3: collections, lambda, json
File I/O
Objects: introduction and basics
Numpy and linear algebra
Pandas 1: indexing and series
Pandas 2: dataframe
Pandas 3: graphics
Introduction and first steps
Data structures: vectors, array, data.frames, lists, environments
Loop and flow controls
Functions and object
Vectorized calculus and function vectorization
S3 and S4 object oriented programming
Parallelization and optimizztion
Module Data Management
Course syllabus
Introduction to relational database. Information and data; database and database systems (DBMS); the relational model, integrity constraints, key definition, primary key constraints; foreign key constraints.
Database languages. Data definition languages; data manipulation languages; queries with the SQL language; simple queries, group queries with aggregate operators; queries with set operators; nested queries.
Introduction to NoSQL databases. Data models for NoSQL; CAP theorem (coherence, availability, tolerance); types of NoSQL; comparison against the relational model.
NoSQL database systems. The "document-oriented" data model; the MongoDB system; collection in MongoDB; collection queries in MongoDB; aggregation pipeline in MongoDB; the "column-family" data model; the Cassandra system; keyspace in Cassandra; keyspace queries in Cassandra; the CQL language.
Module Coding for Data Science
INF/01 - INFORMATICS - University credits: 6
Lessons: 40 hours
Module Data Management
SECS-S/01 - STATISTICS - University credits: 6
Lessons: 40 hours
Professor: Montanelli Stefano
Wednesday 9:30AM-12:30PM
39, via Comelico. Room P101
Thursday from 11AM to 12PM (check news at the specified webpage)
Room 7015, Dipartimento di Informatica "Giovanni degli Antoni", Via Celoria 18 - 20133 Milano