Natural Language Processing
A.Y. 2025/2026
Learning objectives
The course provides an extensive and in-depth introduction to the state of the art and the main research trends in Natural Language Processing (NLP). In particular, the course focus on deep learning methods for NLP, with a specific attention on large language models. Students will deal with fundamental tasks such as syntactic, semantic, and discourse analysis, as well as methods to solve these tasks. A specific focus will be on transfer learning methods and model architectures to address concrete tasks such as text classification, question answering, automatic translation and text generation. These goals will be pursued by a combination of theory, seminars on recent papers and methods, and practical examples. The program is intended for graduate students in computer science and data science who are familiar with machine learning basics. An intruduction to deep learning and neural networks will be provided together with a practical introduction to PyTorch. Coding in Python will play also an important role in the classes.
Expected learning outcomes
Through reading recent research papers, programming assignments, and a final project, students will acquire the following skills: 1) knowing and understanding the main topics as well as the research issues and the future trends in the field of Natural Language Processing (NLP); 2) learn how to apply NLP methods to a corpus of texts for a specific need; 3) being able to judge the quality of different design and implementation choices when coming to a NLP project; 4) being able to design, implement, and evaluate a specific project focused on NLP tasks; 5) understand the notion of language model and being able to detect language specificities and topics in a corpus of text documents; 6) being able to use the Python stack of libraries and tools required to develop a NLP project.
Lesson period: First four month period
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course can be attended as a single course.
Course syllabus and organization
Single session
Responsible
Lesson period
First four month period
Course syllabus
The lectures provide an in-depth introduction to the main research topics in the field of Deep Learning applied to Natural Language Processing. In addition to the lectures, there is a final project through which students will acquire the necessary skills to design, implement and understand the main neural network models for natural language, using Python and Pytorch.
Introduction to Natural Language Processing
- Vector Space
- Text tokenization and normalization
- TfIdf
Introduction to neural networks
- Classification problems
- Introduction to Neural Networks
- Linear classifiers and neural networks
- Tutorial on PyTorch (basics)
- Tutorial on PyTorch (deep learning)
Language models
- Introduction to the notion of Language Modeling
- Markov Language Model
Neural networks as language models
- Using Neural Networks as language models
- Word2Vec and word embedding
Text encoding and sequence learning
- Encoding text and sequences
- Simple RNN example
- Sequence classification
- Sequence generation
Transformers
- Transformers architecture
- BERT
- GPT
Image Processing
- Foundations of image processing
Proprietary models and prompt engineering
- Prompt Engineering
Explainable AI, bias and ethics
- Explainability
- SHAP methods
- Saliency
- Concept Activation
Bias and Stereotypes
- Simple example
- Masking
- Text completion
Frameworks for LLMs
- LLama-cpp
- vLLM
- MLX LM
Introduction to Natural Language Processing
- Vector Space
- Text tokenization and normalization
- TfIdf
Introduction to neural networks
- Classification problems
- Introduction to Neural Networks
- Linear classifiers and neural networks
- Tutorial on PyTorch (basics)
- Tutorial on PyTorch (deep learning)
Language models
- Introduction to the notion of Language Modeling
- Markov Language Model
Neural networks as language models
- Using Neural Networks as language models
- Word2Vec and word embedding
Text encoding and sequence learning
- Encoding text and sequences
- Simple RNN example
- Sequence classification
- Sequence generation
Transformers
- Transformers architecture
- BERT
- GPT
Image Processing
- Foundations of image processing
Proprietary models and prompt engineering
- Prompt Engineering
Explainable AI, bias and ethics
- Explainability
- SHAP methods
- Saliency
- Concept Activation
Bias and Stereotypes
- Simple example
- Masking
- Text completion
Frameworks for LLMs
- LLama-cpp
- vLLM
- MLX LM
Prerequisites for admission
Intermediate knowledge of Python. Basic knowledge of derivatives and understanding of matrix/vector notation and operations. Basics of probabilities and gaussian distributions.
Teaching methods
The course is given in the form of lectures with extensive use of examples and support materials such as Python notebooks. Slides and handouts are employed throughout the lectures and they are progressively published on the reference course website on the Ariel platform and on the GitHub repository (https://github.com/afflint/nlp).
Lecture attendance is not mandatory, but it is strongly recommended.
Lecture attendance is not mandatory, but it is strongly recommended.
Teaching Resources
- Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1, p. 496). Cambridge: Cambridge University Press. (Http://nlp.stanford.edu/IR-book/)
- Alfio Ferrara. Le macchine del linguaggio. L'uomo allo specchio dell'intelligenza artificiale. Einaudi, 2025.
- Notes, notebooks and materials provided by the lecturer and published on the website of the course (https://aferrarair.ariel.ctu.unimi.it)
- Alfio Ferrara. Le macchine del linguaggio. L'uomo allo specchio dell'intelligenza artificiale. Einaudi, 2025.
- Notes, notebooks and materials provided by the lecturer and published on the website of the course (https://aferrarair.ariel.ctu.unimi.it)
Assessment methods and Criteria
Development of a project. The project topic has to be previously discussed with the lecturer. The project should demonstrate the comprehension of the lectures topics and the capability of proposing and motivating innovative solutions to specific research problems.
The project will be evaluated through a discussion with the lecturer about the project outcomes and the related topics. The evaluation will take into account both the project and the interview.
Using the SIFA service for participating in the examination is mandatory. After the registration to an examination on SIFA, the students are requested to contact the lecturer for scheduling the discussion.
The project will be evaluated through a discussion with the lecturer about the project outcomes and the related topics. The evaluation will take into account both the project and the interview.
Using the SIFA service for participating in the examination is mandatory. After the registration to an examination on SIFA, the students are requested to contact the lecturer for scheduling the discussion.
Professor(s)
Reception:
On appointment. The meeting will be online by first contacting the professor by email.
Online. In case of a meeting in person, Department of Computer Science, via Celoria 18 Milano, Room 7012 (7 floor)