Reinforcement Learning | Università degli Studi di Milano Statale

A.Y. 2023/2024

Max ECTS

Overall hours

SSD

INF/01

Language

English

Included in the following degree programmes

Data Science for Economics (Classe LM-data)-Enrolled from 2022/23 Academic Year

Learning objectives

Expected learning outcomes

Upon completion of the course students will be able to:
- formalize problems in terms of Markov Decision Processes,
- understand basic methods of strategic exploration,
- understand algorithms for direct policy optimization,
- run experiments in simulated environments.
These objectives are measured via a combination of two components: the project report and the oral discussion. The final grade is formed by assessing the project report, and then using the oral discussion for fine tuning.

Lesson period: Second trimester

Lessons timetable

Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi

Exams calendar

Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Search a single course

Course syllabus and organization

Single session

Responsible

Cesa Bianchi Nicolo' Antonio

Lesson period

Second trimester

Syllabus

Course syllabus

This course introduces the theoretical and algorithmic foundations of Reinforcement Learning, the subfield of Machine Learning studying adaptive agents that take actions and interact with an unknown environment. Reinforcement learning is a powerful paradigm for the study of autonomous AI systems, and has been applied to a wide range of tasks, including self-driving cars, game playing, customer management, and healthcare. Topics will be:
Introduction
What is reinforcement learning
Deterministic decision processes
Markov decision processes
Evaluation criteria: finite horizon, infinite horizon, discounted horizon
Markov policies and their properties
Finite horizon
State-value function
Action-value function
Bellman optimality equations for finite horizon
Discounted horizon
Bellman optimality equations for discounted horizon
Value iteration
Policy iteration
Linear programming interpretation
Model-based reinforcement learning
Model-free reinforcement learning
Q-learning
SARSA
Temporal difference algorithms
TD(0)
TD(λ)
Equivalence between forward and backward view
Value function approximation
Policy gradient methods
Developing a reinforcement learning project

Prerequisites for admission

Knowledge in the areas of statistical methods, machine learning, and Python programming.

Teaching methods

The course is given in the form of lectures with extensive use of examples and support materials such as Python notebooks. Slides and handouts are employed throughout the lectures and they are progressively published on the reference course website on the Ariel platform.
Lecture attendance is not mandatory, but it is strongly recommended.

Teaching Resources

Notes, notebooks and materials provided by the lecturers and published on the Ariel website of the course.

Assessment methods and Criteria

Upon completion of the course students will be able to:
- formalize problems in terms of Markov Decision Processes,
- describe the basic performance criteria for MDPs,
- understand the main algorithms for model-based and model-free RL,
- understand the main RL approaches in large state spaces
- run experiments in simulated environments.
These objectives are measured via a combination of two components: the project report and the oral discussion. The final grade is formed by assessing the project report, and then using the oral discussion for fine tuning. The grade is in the range 0/30.

Course structure

INF/01 - INFORMATICS - University credits: 6

Lessons: 40 hours

Professors: Cesa Bianchi Nicolo' Antonio, Ferrara Alfio

Educational website(s)

Reinforcement learning (a.a. 2023/24)

Professor(s)

Cesa Bianchi Nicolo' Antonio

Web site

Reception:

By appointment

18, via Celoria. Room 7007

Ferrara Alfio

Web site

Reception:

On appointment. The meeting will be online by first contacting the professor by email.

Online. In case of a meeting in person, Department of Computer Science, via Celoria 18 Milano, Room 7012 (7 floor)