Algorithms for massive datasets

A.A. 2019/2020
Insegnamento per
6
Crediti massimi
48
Ore totali
SSD
INF/01
Lingua
Inglese
Obiettivi formativi
Students:
- will be able to use technologies for the distributed storage of datasets;
- will know the map-reduce distributed processing framework and its main extensions;
- will know the main algorithms used in order to deal with classical big data problems, as well as to implement them using a distributed processing framework;
- will be able to choose appropriate methods for solving big data problems.

Struttura insegnamento e programma

Edizione attiva
Responsabile
INF/01 - INFORMATICA - CFU: 6
Lezioni: 48 ore
Docente: Malchiodi Dario
STUDENTI FREQUENTANTI
Programma
Distributed file systems. HDFS.
Distributed computational frameworks. Map-reduce. Spark.
Analysis of distributed algorithms.
Link analysis.
Similar items.
noSQL data base: mongodb.
Frequent itemsets.
Clustering.
Deep learning and TensorFlow. Feed-forward NNs, CNNs, RNNs, LSTM.
PCA, T-SNE, and word2vec.
Prerequisiti e modalità di esame
Computer programming, calculus, probability and statistics.
Metodi didattici
Frontal classes
Materiale didattico e bibliografia
Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets, Cambridge University Press (ISBN:9781107015357).

Lecture notes.

Suggested readings:

- Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, Learning Spark. Lightning-Fast Big Data Analysis, O'Reilly, 2015 (ISBN:978-1-449-35862-4)
- Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills, Advanced Analytics with Spark. Patterns for Learning from Data at Scale, O'Reilly, 2015 (ISBN:978-1-491-91276-8)
STUDENTI NON FREQUENTANTI
Programma
Distributed file systems. HDFS.
Distributed computational frameworks. Map-reduce. Spark.
Analysis of distributed algorithms.
Link analysis.
Similar items.
noSQL data base: mongodb.
Frequent itemsets.
Clustering.
Deep learning and TensorFlow. Feed-forward NNs, CNNs, RNNs, LSTM.
PCA, T-SNE, and word2vec.
Prerequisiti e modalità di esame
Computer programming, calculus, probability and statistics.
Materiale didattico e bibliografia
Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets, Cambridge University Press (ISBN:9781107015357).

Lecture notes.

Suggested readings:

- Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, Learning Spark. Lightning-Fast Big Data Analysis, O'Reilly, 2015 (ISBN:978-1-449-35862-4)
- Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills, Advanced Analytics with Spark. Patterns for Learning from Data at Scale, O'Reilly, 2015 (ISBN:978-1-491-91276-8)
Periodo
Secondo semestre
Periodo
Secondo semestre
Modalità di valutazione
Esame
Giudizio di valutazione
voto verbalizzato in trentesimi
Docente/i
Ricevimento:
Su appuntamento
Stanza 5015, Dipartimento di Informatica