Web Algorithmics

A.Y. 2018/2019
6
Max ECTS
48
Overall hours
SSD
INF/01
Language
Italian
Learning objectives
Il corso si propone di mettere lo studente a contatto con tecniche avanzate relative alla raccolta di grandi collezioni documentali e alla costruzione di motori di ricerca, con particolare attenzione al web e ai suoi meccanismi di ranking.
Expected learning outcomes
Undefined
Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Course syllabus and organization

Milan

Responsible
Lesson period
First semester
ATTENDING STUDENTS
Course syllabus
- Anatomy of a search engine and indexing techniques.
- Collection of information: algorithmic, technical and social problems in creating crawlers.
- Comparison strategies for comparison.
- Global web structure and algorithmic tools for its analysis.
- Representation of the web graph: compression problems.
- Parallel and distributed crawlers. Coherent hashing techniques.
- Graph rankings: proximity, PageRank, HITS, etc.
- Link spamming and adversarial ranking mechanisms.
- Techniques for building inverted indices.
- Methods for the construction of large distributed indices.
- Methods for the construction of large dictionaries: perfect minimal ordered hash tables, prefix dictionaries, lexicographic compression.
- Instant codes for index compression: Elias, Golomb, arithmetic compression.
- Representation of Elias-Fano and quasi-succinct indices.
- Textual ranking techniques (LSI, cosine, Clarke-Cormack, etc.).
- Substring indexes: trie, suffix trees and suffix vectors.
NON-ATTENDING STUDENTS
Course syllabus
- Anatomy of a search engine and indexing techniques.
- Collection of information: algorithmic, technical and social problems in creating crawlers.
- Comparison strategies for comparison.
- Global web structure and algorithmic tools for its analysis.
- Representation of the web graph: compression problems.
- Parallel and distributed crawlers. Coherent hashing techniques.
- Graph rankings: proximity, PageRank, HITS, etc.
- Link spamming and adversarial ranking mechanisms.
- Techniques for building inverted indices.
- Methods for the construction of large distributed indices.
- Methods for the construction of large dictionaries: perfect minimal ordered hash tables, prefix dictionaries, lexicographic compression.
- Instant codes for index compression: Elias, Golomb, arithmetic compression.
- Representation of Elias-Fano and quasi-succinct indices.
- Textual ranking techniques (LSI, cosine, Clarke-Cormack, etc.).
- Substring indexes: trie, suffix trees and suffix vectors.
INF/01 - INFORMATICS - University credits: 6
Lessons: 48 hours
Professor: Vigna Sebastiano
Professor(s)