Web Algorithmics | Università degli Studi di Milano Statale

A.Y. 2019/2020

Max ECTS

Overall hours

SSD

INF/01

Language

English

Included in the following degree programmes

Computer Science (Classe LM-18)-Enrolled from 2014/2015 Academic Year

Learning objectives

The course has the goal of putting the student in touch with advanced techniques concerning the harvesting of large-scale document collections, and with the construction of search engines, in particular in the case of the web, with its ranking algorithms.

Expected learning outcomes

The student should be able to design and implement a large-scale text indexer, with a particular focus on the ranking algorithms applied to the results.

Lesson period: First semester

Lessons timetable

Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi

Exams calendar

Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Search a single course

Course syllabus and organization

Single session

Responsible

Vigna Sebastiano

Lesson period

First semester

Syllabus

Course syllabus

- Anatomy of a search engine and indexing techniques.
- Information harvesting: algorithmic, technical and social problems in creating crawlers.
- Comparing visiting strategies.
- Global web structure and algorithmic tools for its analysis.
- Web graph representation: compression problems.
- Parallel and distributed crawlers. Consistent hashing techniques.
- Network centrality: closeness, harmonic centrality, Katz, PageRank, dominant eigenvector, etc.
- Techniques for the construction of inverse indices.
- Methodologies for the construction of large distributed indices.
- Methodologies for building large dictionaries: minimal perfect order-preserving maps, prefix dictionaries, lexicographic compression.
- Instantaneous codes for index compression: unary, Elias γ, Elias δ, Golomb, etc.
- Representation of Elias-Fano and quasi-succinct indexes.
- Textual ranking techniques (TF/IDF, BM25, LSI, cosine, etc.).
- Substring indices: trie, suffix trees, and suffix vectors.

Prerequisites for admission

Basic graph algorithms (visits). Fundamental linear algebra (matrices, eigenvalues, eigenvectors).

Teaching methods

Frontal classes.

Teaching Resources

Since almost all the topics are relatively recent, or researched, it is not possible to give a unitary text. However, there are numerous books, articles and handouts online that cover the topics discussed, first of all the new book by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008. The slides used for advanced-topic classes are also available. All materials are accessible from the teacher web site.

Assessment methods and Criteria

Oral exam. The exam is based on a subset of the course topics published each year on the teacher web site.

Course structure

INF/01 - INFORMATICS - University credits: 6

Lessons: 48 hours

Professor: Vigna Sebastiano

Shifts:

Professor: Vigna Sebastiano

Educational website(s)

Web Algorithmics

Professor(s)

Vigna Sebastiano

Web site