Methods in bioinformatics

A.A. 2023/2024
6
Crediti massimi
48
Ore totali
SSD
INF/01
Lingua
Inglese
Obiettivi formativi
High-throughput experimental assays generate large amounts of data that must be handled and processed appropriately in order to extract meaningful biological knowledge. Bioinformatics provides methods and tools to perform complex and elaborate analyses of large scale (BIG) biological data, prompting novel testable hypotheses and allowing their verification. Proficiency in data handling and processing, and the ability to unravel and highlight complex relationships in biological data using adequate tools and methods constitute a crucial skill for the modern biotechnology researcher.
The aims of this course are (i) to introduce the basic principles of procedural and object-oriented programming, (ii) to present the R programming language and software environment as an effective instrument for the analysis of large scale biological data, (iii) to provide a primer on methods for the analysis of gene expression (RNA-Seq) data and their statistical foundations.
The course is ideally linked to those dealing with genomics and bioinformatics.
Risultati apprendimento attesi
After following this course, the students are expected to:
(1)Understand the basic principles of programming and be able to map those concepts to R programming language specificities and features.
(2)Know the syntax of the R programming language and its basic data types, data structures, and functions.
(3)Become proficient in splitting simple data analysis procedures into elementary logical steps and translate them to R functions and scripts.
(4)Know how to import data into the R environment.
(5)Be able to represent data and their relationships using basic R plotting functions.
(6)Know how to manage R software packages and libraries.
(7)Produce impactful reports of an analysis workflow, by integrating text, R code, and plots.
(8)Perform and interpret preliminary RNA-seq data analysis: normalization, Principal Component Analysis (PCA), and quality control.
(9)Know how to execute differential expression analysis.
(10)Be able to perform post-processing and functional enrichment analysis of differentially expressed genes.
Corso singolo

Questo insegnamento può essere seguito come corso singolo.

Programma e organizzazione didattica

Edizione unica

Responsabile
Periodo
Primo semestre

Programma
First, teachings will introduce students to programming principles for data analysis, using the R programming language as the practice ground for their understanding and application. In particular, students will familiarise themselves with the following concepts:

- Data types and variables
- Basic data structures: vectors, factors, matrices, arrays, lists.
- Essential standard functions of R.
- Control of the execution flow: blocks, conditional statements, loops.
- Environments, custom functions, and scripts.
- I/O operations: data import and export.
- Graphical representation of biological data: scatterplots, bar plots, histograms, heat-maps, boxplots, and Venn diagrams.
- Software packages, libraries, and repositories.

This first part of the course will be followed by an introduction to the analysis of Next Generation Sequencing (NGS) data using R, with insights on the theoretical and practical principles underlying state-of-the-art methods for processing RNA-Seq assays to assess differential gene expression. In particular:

- Basics of NGS data analysis.
- Primer on dimensionality reduction techniques and descriptive statistics.
- Normalization, PCA and quality control of RNA-Seq data.
- Introduction to statistical tests for the comparison of gene expression levels.
- Differential gene expression analysis.
- Post-processing and functional enrichment analyses.

Classes will consist of intuitive descriptions of programming principles, bioinformatic methods, and their underlying statistics, compounded with practicals. Students will apply the newly introduced concepts to data analysis use cases. Prof Zambelli will cover the first part of the course (3 CFUs) introducing R programming; the second part of the course (3 CFUs), delivered by Prof Chiara, will follow seamlessly and focus on NGS data analysis.
Prerequisiti
Knowledge of basic molecular biology topics, with particular reference to transcription, gene expression regulation, and nucleic acid sequencing, is highly recommended for attending the course.
Metodi didattici
Teaching mode: classroom lectures supported by practicals on real or realistic datasets. Teachers will assign exercises at the end of most lessons to help in fixing concepts between classes. Attendance is highly recommended.
Materiale di riferimento
W. N. Venables, D. M. Smith and the R Core Team. An introduction to R.
https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf

Chen Y, McCarthy D, Ritchie M, Robinson, M, Smyth G. edgeR: differential expression analysis
of digital gene expression data. https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

Copies of the slides projected during the classes, as well as additional materials and datasets will be made available through the course website on the myARIEL platform of the University of Milano (https://myariel.unimi.it/course/view.php?id=1214). This material is intended as a support for lectures, and its study cannot be considered as a full alternative to constant attendance of classes. The material is made available only to registered students of the Degree Course in Molecular Biotechnology and Bioinformatics and should not be distributed to others without express consent of the teachers.
Modalità di verifica dell’apprendimento e criteri di valutazione
Notions and skills acquired in this course will be evaluated through an oral exam. Students will be required to complete a small project to qualify for an exam session, consisting of the analysis of gene expression data from real experiments. The students will produce and submit a report describing their results to the teachers. Delivery of the report is due at least 48h before the selected exam session. Projects will be undertaken in small groups (1-3 students per group).
The exam will consist of a brief individual dissertation (approx 15 minutes) of the project report and the theoretical topics covered in the classes. The grade will result from the joint evaluation of each candidate by the two teachers weighted as follows:
Knowledge of the R programming language - 25%
Theoretical principles of gene expression analysis - 25%
Project report and its discussion - 50%
INF/01 - INFORMATICA - CFU: 6
Lectures: 48 ore
Docente/i
Ricevimento:
Giovedì(Thursday) 15:00-17:00
Secondo piano torre B
Ricevimento:
Venerdì 15.00-16.00 previo appuntamento
Beacon Lab, Piano 2, Torre B, Dip. Bioscienze o su MS Teams