The course is articulated in two modules. The module of Coding for Data Science aims at providing technical skills about data analysis with focus on coding aspects supported by the Python programming language and the R framework. The module of Data Management deals with information and data management issues within modern information systems, with focus on relational databases and emergent NoSQL systems.
Expected learning outcomes
Lesson period: Third trimester
(In case of multiple editions, please check the period, as it may vary)
PYTHON Introduction to the language, environment, set up Data structures 1: numeric, string. Control flow structures Data structures 2: list, set, tuple, dict Data Structures 3: collections, lambda, json File I/O Objects: introduction and basics Numpy and linear algebra Pandas 1: indexing and series Pandas 2: dataframe Pandas 3: graphics R Introduction and first steps Data structures: vectors, array, data.frames, lists, environments Loop and flow controls Functions and object Vectorized calculus and function vectorization S3 and S4 object oriented programming Parallelization and optimizztion
Module Data Management
Introduction to relational database. Information and data; database and database systems (DBMS); the relational model, integrity constraints, key definition, primary key constraints; foreign key constraints. Database languages. Data definition languages; data manipulation languages; queries with the SQL language; simple queries, group queries with aggregate operators; queries with set operators; nested queries. Introduction to NoSQL databases. Data models for NoSQL; CAP theorem (coherence, availability, tolerance); types of NoSQL; comparison against the relational model. NoSQL database systems. The "document-oriented" data model; the MongoDB system; collection in MongoDB; collection queries in MongoDB; aggregation pipeline in MongoDB; the "column-family" data model; the Cassandra system; keyspace in Cassandra; keyspace queries in Cassandra; the CQL language.