Knowledge Extraction and Information Retrieval

A.Y. 2020/2021
Overall hours
Learning objectives
The course provides a general introduction to information retrieval research concerning both the state of the art and the main research trends in the field. In particular, the course addresses the issues of document retrieval, document classification, topic discovery and language modeling. Besides an updated review of the literature, the course is then focused on the evaluation of information retrieval systems, the use of machine learning techniques on textual data collections, and on latent and probabilistic semantic indexing. Finally, the course provides also an introduction to the use of NoSql databases for the implementation of information retrieval systems.
Expected learning outcomes
Students will acquire the following skills: 1) knowing and understanding the main topics as well as the research issues and the future trends in the field of information retrieval; 2) learn how to apply natural language processing, indexing, clustering and classification techniques to a corpus of texts for a specific information need; 3) being able to judge the quality of different design and implementation choices; 4) being able to design, implement, and evaluate a specific project focused on document search or document classification; 5) understand the notion of language model and being able to detect language specificities and topics in a corpus of text documents; 6) being able to use the Python stack of libraries and tools required to develop a text analysis project.
Course syllabus and organization

Single session

Lesson period
Second trimester
In the emergency phase, the teaching activities will be provided in form of synchronous lectures by the Zoom platform according to the course time schedule. The lectures will also be recorded and uploaded on the ARIEL website. Students will be informed as soon as possible, through the ARIEL website, also in case they will be able to attend classes at University. This possibility will depend upon the evolution of the emergency and the respect of the security directives.
Course syllabus
The Information Retrieval course provides a general introduction of classical and web information retrieval. The course addresses the topic of web search and it is particularly focused on text mining, analysis, classification and clustering. It gives an up-to-date treatment of the design and implementation of systems for indexing, and searching documents, of methods for evaluating information retrieval systems, and an introduction to the use of machine learning methods on text collections, especially for what concerns latent and probabilistic semantic indexing. Moreover, the course provides an introduction to novel and NoSql software solutions for the implementation of modern information retrieval systems.

Boolean retrieval
Term vocabulary and dictionaries
Text indexing
Scores and weights for terms
The vector space model
Evaluation in information retrieval
Probabilistic systems and text classification
Matrix decomposition and latent semantic analysis
Word and sentence embedding
Statistical and Neural Language models
Prerequisites for admission
Basic skills on data management. Basic knowledge on statistics.
Teaching methods
The course is given in the form of lectures with extensive use of examples and support materials such as Python notebooks. Slides and handouts are employed throughout the lectures and they are progressively published on the reference course website on the Ariel platform (
Lecture attendance is not mandatory, but it is strongly recommended.
Teaching Resources
- Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1, p. 496). Cambridge: Cambridge University Press. (Http://
- Notes, notebooks and materials provided by the lecturer and published on the website of the course (
Assessment methods and Criteria
Examination procedure
Development of a project. The project topic has to be previously discussed with the lecturer. The project should demonstrate the comprehension of the lectures topics and the capability of proposing and motivating innovative solutions to specific research problems.

The project will be evaluated through a discussion with the lecturer about the project outcomes and the related topics of the course. The evaluation will take into account both the project and the interview.

Students who did not take part in the lessons
Deep discussion on the course topics with the lecturer.

How to do the examination
Using the SIFA service for participating in the examination is mandatory. After the registration to an examination on SIFA, the students are requested to contact the lecturer for scheduling the discussion.
INF/01 - INFORMATICS - University credits: 6
Lessons: 40 hours
Professor: Ferrara Alfio
Friday, 15.00- 16.00
Department of Computer Science, via Celoria 18 Milano, Room 7012 (7 floor)