Web Algorithmics

A.Y. 2021/2022
Overall hours
Learning objectives
The course has the goal of putting the student in touch with advanced techniques concerning the harvesting of large-scale document collections, and with the construction of search engines, in particular in the case of the web, with its ranking algorithms.
Expected learning outcomes
The student should be able to design and implement a large-scale text indexer, with a particular focus on the ranking algorithms applied to the results.
Course syllabus and organization

Single session

Lesson period
First semester
Synchronous video lessons (if needed).
Course syllabus
The course aims to put the student in contact with advanced techniques relating to the collection of large document collections and the construction of search engines, with particular attention to the web and its ranking mechanisms. The course requires knowledge of basic algorithmic techniques and network protocols. A subset of the following topics will be covered, also based on the interests of the students:

Anatomy of a search engine and indexing techniques.
Information harvesting: algorithmic, technical and social problems in creating crawlers.
Comparing visiting strategies.
Global web structure and algorithmic tools for its analysis.
Web graph representation: compression problems.
Parallel and distributed crawlers. Consistent hashing techniques.
Network centrality: closeness, harmonic centrality, Katz, PageRank, dominant eigenvector, etc.
Techniques for the construction of inverse indices.
Methodologies for the construction of large distributed indices.
Methodologies for building large dictionaries: minimal perfect order-preserving maps, prefix dictionaries, lexicographic compression.
Instantaneous codes for index compression: unary, Elias γ, Elias δ, Golomb, etc.
Representation of Elias-Fano and quasi-succinct indexes.
Textual ranking techniques (TF/IDF, BM25, LSI, cosine, etc.).
Substring indices: trie, suffix trees and suffix vectors.
Prerequisites for admission
Discrete mathematics, linear algebra, programming, algorithms.
Teaching methods
Assessment methods and Criteria
Oral exam.
INF/01 - INFORMATICS - University credits: 6
Lessons: 48 hours
Professor: Vigna Sebastiano