Web Algorithmics
A.Y. 2019/2020
Learning objectives
The course has the goal of putting the student in touch with advanced techniques concerning the harvesting of large-scale document collections, and with the construction of search engines, in particular in the case of the web, with its ranking algorithms.
Expected learning outcomes
The student should be able to design and implement a large-scale text indexer, with a particular focus on the ranking algorithms applied to the results.
Lesson period: First semester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
Single session
Responsible
Lesson period
First semester
Course syllabus
- Anatomy of a search engine and indexing techniques.
- Information harvesting: algorithmic, technical and social problems in creating crawlers.
- Comparing visiting strategies.
- Global web structure and algorithmic tools for its analysis.
- Web graph representation: compression problems.
- Parallel and distributed crawlers. Consistent hashing techniques.
- Network centrality: closeness, harmonic centrality, Katz, PageRank, dominant eigenvector, etc.
- Techniques for the construction of inverse indices.
- Methodologies for the construction of large distributed indices.
- Methodologies for building large dictionaries: minimal perfect order-preserving maps, prefix dictionaries, lexicographic compression.
- Instantaneous codes for index compression: unary, Elias γ, Elias δ, Golomb, etc.
- Representation of Elias-Fano and quasi-succinct indexes.
- Textual ranking techniques (TF/IDF, BM25, LSI, cosine, etc.).
- Substring indices: trie, suffix trees, and suffix vectors.
- Information harvesting: algorithmic, technical and social problems in creating crawlers.
- Comparing visiting strategies.
- Global web structure and algorithmic tools for its analysis.
- Web graph representation: compression problems.
- Parallel and distributed crawlers. Consistent hashing techniques.
- Network centrality: closeness, harmonic centrality, Katz, PageRank, dominant eigenvector, etc.
- Techniques for the construction of inverse indices.
- Methodologies for the construction of large distributed indices.
- Methodologies for building large dictionaries: minimal perfect order-preserving maps, prefix dictionaries, lexicographic compression.
- Instantaneous codes for index compression: unary, Elias γ, Elias δ, Golomb, etc.
- Representation of Elias-Fano and quasi-succinct indexes.
- Textual ranking techniques (TF/IDF, BM25, LSI, cosine, etc.).
- Substring indices: trie, suffix trees, and suffix vectors.
Prerequisites for admission
Basic graph algorithms (visits). Fundamental linear algebra (matrices, eigenvalues, eigenvectors).
Teaching methods
Frontal classes.
Teaching Resources
Since almost all the topics are relatively recent, or researched, it is not possible to give a unitary text. However, there are numerous books, articles and handouts online that cover the topics discussed, first of all the new book by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008. The slides used for advanced-topic classes are also available. All materials are accessible from the teacher web site.
Assessment methods and Criteria
Oral exam. The exam is based on a subset of the course topics published each year on the teacher web site.
INF/01 - INFORMATICS - University credits: 6
Lessons: 48 hours
Professor:
Vigna Sebastiano
Shifts:
-
Professor:
Vigna SebastianoEducational website(s)
Professor(s)