Web Algorithmics

A.Y. 2024/2025
6
Max ECTS
48
Overall hours
SSD
INF/01
Language
Italian
Learning objectives
The course has the goal of putting the student in touch with advanced techniques concerning the harvesting of large-scale document collections, and with the construction of search engines, in particular in the case of the web, with its ranking algorithms.
Expected learning outcomes
The student should be able to design and implement a large-scale text indexer, with a particular focus on the ranking algorithms applied to the results.
Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Course syllabus and organization

Single session

Responsible
Lesson period
First semester
Course syllabus
The course aims to put the student in contact with advanced techniques relating to the collection of large document collections and the construction of search engines, with particular attention to the web and its ranking mechanisms. The course requires knowledge of basic algorithmic techniques and network protocols. A subset of the following topics will be covered, also based on the interests of the students:

Anatomy of a search engine and indexing techniques.
Information harvesting: algorithmic, technical and social problems in creating crawlers.
Comparing visiting strategies.
Global web structure and algorithmic tools for its analysis.
Web graph representation: compression problems.
Parallel and distributed crawlers. Consistent hashing techniques.
Network centrality: closeness, harmonic centrality, Katz, PageRank, dominant eigenvector, etc.
Techniques for the construction of inverse indices.
Methodologies for the construction of large distributed indices.
Methodologies for building large dictionaries: minimal perfect order-preserving maps, prefix dictionaries, lexicographic compression.
Instantaneous codes for index compression: unary, Elias γ, Elias δ, Golomb, etc.
Representation of Elias-Fano and quasi-succinct indexes.
Textual ranking techniques (TF/IDF, BM25, LSI, cosine, etc.).
Substring indices: trie, suffix trees and suffix vectors.
Vector-product quantization and vector databases.
Prerequisites for admission
Discrete mathematics, linear algebra, programming, algorithms.
Teaching methods
Classes.
Assessment methods and Criteria
Esame orale.
INF/01 - INFORMATICS - University credits: 6
Lessons: 48 hours
Professor: Vigna Sebastiano
Shifts:
Turno
Professor: Vigna Sebastiano
Professor(s)