Coding for Data Science and Data Management

A.Y. 2019/2020
Lesson for
12
Max ECTS
80
Overall hours
SSD
INF/01 SECS-S/01
Language
English
Learning objectives
Upon completion of the course, students will be able to:
- manage data using R and R Studio;
- solve coding challenges using R libraries and functions;
- make statistical inference and graphics using R;
- writing an apply family of functions in R;
- understand the Python data model and the flow control statements;
- use the built-in Python data structures;
- perform basic linear algebra operations using Numpy;
- perform basic data set manipulations using Pandas:
- perform simple machine learning experiments using Scikit-learn;
- understand and apply the core notions of data modeling in relational databases;
- use the SQL language for creating and querying relational database structures;
- understand and apply the principles of data organization in NoSQL systems;
- use MongoDB for data retrieval and aggregation in a document-oriented NoSQL system.

Course structure and Syllabus

Active edition
Yes
Responsible
Module Coding for Data Science
INF/01 - INFORMATICS - University credits: 6
Lessons: 40 hours
Module Data Management
SECS-S/01 - STATISTICS - University credits: 6
Lessons: 40 hours
ATTENDING STUDENTS
Module Coding for Data Science
Syllabus
PYTHON unit
Introduction to the Language
Data structures
Control flow
File I/O
Numpy
Pandas and Matplotlib
Scikit-learn

DATA MANAGEMENT unit
Introduction to relational databases:
- information and data;
- database and database systems (DBMS);
- the relational model;
- integrity constraints;
- key definition and primary key constraints;
- foreign key constraints.
Database languages:
- data definition languages;
- data manipulation languages;
- queries with the SQL language;
- simple queries, group queries with aggregate operators;
- queries with set operators;
- nested queries.
Module Data Management
Syllabus
R unit
Introduction to R framework and R Studio
Data structures: vectors, array, data.frames, lists, environments
Statistical Inference with R
Making plots and base Graphics with R
Data Manipulation with R
References to parallel computing with R

DATA MANAGEMENT unit
Introduction to NoSQL databases:
- data models for NoSQL;
- types of NoSQL;
- comparison against the relational model.
NoSQL database systems:
- the "document-oriented" data model;
- the MongoDB system;
- collection in MongoDB;
- collection queries in MongoDB;
- aggregation pipeline in MongoDB.
NON-ATTENDING STUDENTS
Module Coding for Data Science
Syllabus
PYTHON unit
Introduction to the Language
Data structures
Control flow
File I/O
Numpy
Pandas and Matplotlib
Scikit-learn

DATA MANAGEMENT unit
Introduction to relational databases:
- information and data;
- database and database systems (DBMS);
- the relational model;
- integrity constraints;
- key definition and primary key constraints;
- foreign key constraints.
Database languages:
- data definition languages;
- data manipulation languages;
- queries with the SQL language;
- simple queries, group queries with aggregate operators;
- queries with set operators;
- nested queries.
Module Data Management
Syllabus
R unit
Introduction to R framework and R Studio
Data structures: vectors, array, data.frames, lists, environments
Statistical Inference with R
Making plots and base Graphics with R
Data Manipulation with R
References to parallel computing with R

DATA MANAGEMENT unit
Introduction to NoSQL databases:
- data models for NoSQL;
- types of NoSQL;
- comparison against the relational model.
NoSQL database systems:
- the "document-oriented" data model;
- the MongoDB system;
- collection in MongoDB;
- collection queries in MongoDB;
- aggregation pipeline in MongoDB.
Lesson period
Second trimester
Lesson period
Second trimester
Assessment methods
Esame
Assessment result
voto verbalizzato in trentesimi
Professor(s)
Reception:
Wednesday 9:30AM-12:30PM
39, via Comelico. Room P101
Reception:
Thursday from 11AM to 12PM (check news at the specified webpage)
Room S209 (second floor) - Via Comelico, 39 - 20135 Milano