Text Mining and Sentiment Analysis

A.Y. 2023/2024
6
Max ECTS
40
Overall hours
SSD
INF/01
Language
English
Learning objectives
Understand the state of the art on text mining and sentiment analysis. Design and develop methods for text classification and topic modeling. Design and develop methods for sentiment classification and polarity detection. Understand the differences between sentiment analysis and emotion detection. Design and develop methods for emotion detection in text.
Expected learning outcomes
At the end of the course the student will be able to address a specific problem in the area of text mining and sentiment analysis. In particular student will know he main notions needed to understand text processing, foundations of natural language processing, text classification, and topic modeling. Moreover students will deal with sentiment analysis in the context of opinion mining and rule-based models and machine learning models for text.
Single course

This course can be attended as a single course.

Course syllabus and organization

Single session

Responsible
Lesson period
Second trimester
Course syllabus
Introduction to the course program and exam modalities
- Tokenization, normalization and scoring
- Vector Space Model
- Tools: introduction to NLTK and SpaCy
- N-grams and Skip-grams
Levels of natural language
- Morphological and Syntactical parsing
- SpaCy: Part-of-speech and syntax tree
Text Classification and Evaluation
- Unsupervised classification: clustering
- Supervised classification: vector-based vs probabilistic approaches
Evaluation
- Outcome classes: True Positive, False Positive, True Negative and False Negative
- Micro, Macro and Average Precision, Recall and F-score
- Confusion Matrix
Opinion Mining
- Aspect-based sentiment analysis
- Reported opinions and indirect, implicit and inferred opinions
- Sentiment Lexicon
- SentiWordNet, Vader and TextBlob
- The problem of word sense disambiguation
- Sentiment analysis as classification
- Aspect Based Sentiment Analysis
Latent Semantic Indexing
- Latent Dirichlet Allocation
- Topics and sentiment
- Named entity recognition
Word Embeddings
- Introduction to Neural networks
- Word2Vec, GloVe, FastText
Language Models
- Neural approaches to modeling language
- Sequence to sequence learning
- RNN and LSTM architectures
- Autoencoders and context embeddings
Transformers and autoencoders
- BERT
- Use of language models for ABSA
- Case studies and applications
Prerequisites for admission
Knowledge in the areas of Python programming, basic statistical methods and notions of machine learning.
Teaching methods
Contents are delivered through lectures dedicated to the theoretical foundations and completed by examples and case studies illustrated by the lecturer. Coding is a fundamental part of teaching. We will present several use cases and programming examples. Students are required to integrate the theoretical study with a practical approach to the proposed case studies.
Teaching Resources
The course does not make use of a single reference text but rather of different materials and bibliographic references provided by the teacher during the lessons and through the Ariel website. The repository https://github.com/afflint/textsent is also available which contains several examples of python software and case studies. The material is the same for attending and non-attending students.
Assessment methods and Criteria
The final project consists in the preparation of a short study on one of the topics of the course, identifying a precise research question and measurable objectives. The project will propose a methodology for solving the research question and provide an experimental verification of the results obtained according to results evaluation metrics. The emphasis is not on obtaining high performance but rather on the critical discussion of the results obtained in order to understand the potential effectiveness of the proposed methodology.

The results must be documented in a short article of not less than 4 pages and no more than 8. Students have also to provide access to a GitHub repository containing the code and reproducible experimental results.

Finally, the project will be discussed after a 10 minutes presentation in English with slides. The vote is expressed out of thirty and evaluates the ability to present the ideas and methods of the project, knowledge of the necessary literature and the organization of the work and of the software produced.

The exam modalities are the same for attending and non-attending students.
INF/01 - INFORMATICS - University credits: 6
Lessons: 40 hours
Professor: Ferrara Alfio
Professor(s)
Reception:
On appointment. The meeting will be online until the end of the Covid emergency
Department of Computer Science, via Celoria 18 Milano, Room 7012 (7 floor)