Methods in Bioinformatics
A.Y. 2019/2020
Learning objectives
High-throughput experimental assays generate large amounts of data that must be handled and processed appropriately in order to extract meaningful biological knowledge. Bioinformatics provides methods and tools to perform complex and elaborate analyses of large scale (BIG) biological data, prompting novel testable hypotheses and allowing their verification. Proficiency in data handling and processing, and the ability to unravel and highlight complex relationships in biological data using adequate tools and methods constitute a crucial skill for the modern biotechnology researcher.
The aims of this course are (i) to introduce the basic principles of procedural and object-oriented programming, (ii) to present the R programming language and software environment as an effective instrument for the analysis of large scale biological data, (iii) to provide a primer on methods for the analysis of gene expression (RNA-Seq) data and their statistical foundations.
The course is ideally linked to those dealing with genomics and bioinformatics.
The aims of this course are (i) to introduce the basic principles of procedural and object-oriented programming, (ii) to present the R programming language and software environment as an effective instrument for the analysis of large scale biological data, (iii) to provide a primer on methods for the analysis of gene expression (RNA-Seq) data and their statistical foundations.
The course is ideally linked to those dealing with genomics and bioinformatics.
Expected learning outcomes
After following this course, the students are expected to:
(1)Understand the basic principles of programming and be able to map those concepts to R programming language specificities and features.
(2)Know the syntax of the R programming language and its basic data types, data structures, and functions.
(3)Become proficient in splitting simple data analysis procedures into elementary logical steps and translate them to R functions and scripts.
(4)Know how to import data into the R environment.
(5)Be able to represent data and their relationships using basic R plotting functions.
(6)Know how to manage R software packages and libraries.
(7)Produce impactful reports of an analysis workflow, by integrating text, R code, and plots.
(8)Perform and interpret preliminary RNA-seq data analysis: normalization, Principal Component Analysis (PCA), and quality control.
(9)Know how to execute differential expression analysis.
(10)Be able to perform post-processing and functional enrichment analysis of differentially expressed genes.
(1)Understand the basic principles of programming and be able to map those concepts to R programming language specificities and features.
(2)Know the syntax of the R programming language and its basic data types, data structures, and functions.
(3)Become proficient in splitting simple data analysis procedures into elementary logical steps and translate them to R functions and scripts.
(4)Know how to import data into the R environment.
(5)Be able to represent data and their relationships using basic R plotting functions.
(6)Know how to manage R software packages and libraries.
(7)Produce impactful reports of an analysis workflow, by integrating text, R code, and plots.
(8)Perform and interpret preliminary RNA-seq data analysis: normalization, Principal Component Analysis (PCA), and quality control.
(9)Know how to execute differential expression analysis.
(10)Be able to perform post-processing and functional enrichment analysis of differentially expressed genes.
Lesson period: First semester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
Single session
Responsible
Lesson period
First semester
Course syllabus
First, teachings will introduce students to the principles of programming for data analysis, using the R programming language as the practice ground for their understanding and application. In particular, students will familiarise with the following concepts:
- Data types and variables
- Basic data structures: vectors, factors, matrices, arrays, lists.
- Essential standard functions of R.
- Control of the execution flow: blocks, conditional statements, loops.
- Environments, custom functions, and scripts.
- I/O operations: data import and export.
- Graphical representation of biological data: scatterplots, bar plots, histograms, heat-maps, boxplots, and Venn diagrams.
- Software packages, libraries, and repositories.
This first part of the course will be followed by an introduction to the analysis of Next Generation Sequencing (NGS) data using R, with insights on the theoretical and practical principles underlying state-of-the-art methods for processing RNA-Seq assays to assess differential gene expression. In particular:
- Basics of NGS data analysis.
- Primer on dimensionality reduction techniques and descriptive statistics.
- Normalization, PCA and quality control of RNA-Seq data.
- Introduction to statistical tests for the comparison of gene expression levels.
- Differential gene expression analysis.
- Post-processing and functional enrichment analyses.
Classes will consist of intuitive descriptions of programming principles, bioinformatic methods, and their underlying statistics, compounded with practicals, where the newly introduced concepts will be applied to data analysis use cases. Dr. Zambelli will cover the first, introductory, part of the course; the second part of the course, delivered by Dr. Chiara, will follow seamlessly and move the focus towards NGS data analysis.
- Data types and variables
- Basic data structures: vectors, factors, matrices, arrays, lists.
- Essential standard functions of R.
- Control of the execution flow: blocks, conditional statements, loops.
- Environments, custom functions, and scripts.
- I/O operations: data import and export.
- Graphical representation of biological data: scatterplots, bar plots, histograms, heat-maps, boxplots, and Venn diagrams.
- Software packages, libraries, and repositories.
This first part of the course will be followed by an introduction to the analysis of Next Generation Sequencing (NGS) data using R, with insights on the theoretical and practical principles underlying state-of-the-art methods for processing RNA-Seq assays to assess differential gene expression. In particular:
- Basics of NGS data analysis.
- Primer on dimensionality reduction techniques and descriptive statistics.
- Normalization, PCA and quality control of RNA-Seq data.
- Introduction to statistical tests for the comparison of gene expression levels.
- Differential gene expression analysis.
- Post-processing and functional enrichment analyses.
Classes will consist of intuitive descriptions of programming principles, bioinformatic methods, and their underlying statistics, compounded with practicals, where the newly introduced concepts will be applied to data analysis use cases. Dr. Zambelli will cover the first, introductory, part of the course; the second part of the course, delivered by Dr. Chiara, will follow seamlessly and move the focus towards NGS data analysis.
Prerequisites for admission
Knowledge of basic molecular biology topics, with particular reference to transcription, gene expression regulation, and nucleic acid sequencing is highly recommended for attending the course.
Teaching methods
Teaching mode: classroom lectures supported by practicals on real or realistic datasets. Exercises will be assigned at the end of most lessons to help to fix concepts between classes. Attendance is highly recommended.
Teaching Resources
W. N. Venables, D. M. Smith and the R Core Team. An introduction to R.
https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
Chen Y, McCarthy D, Ritchie M, Robinson, M, Smyth G. edgeR: differential expression analysis
of digital gene expression data. https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
Copies of the slides projected during the classes, as well as additional materials and datasets will be made available through the course website on the ARIEL platform of the University of Milano (http://fzambellimb.ariel.ctu.unimi.it/v3/Home/). This material is intended as a support for lectures, and its study cannot be considered as a full alternative to constant attendance of classes. The material is made available only to registered students of the Degree Course in Molecular Biotechnology and Bioinformatics and should not be distributed to others without express consent of the teachers.
https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
Chen Y, McCarthy D, Ritchie M, Robinson, M, Smyth G. edgeR: differential expression analysis
of digital gene expression data. https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
Copies of the slides projected during the classes, as well as additional materials and datasets will be made available through the course website on the ARIEL platform of the University of Milano (http://fzambellimb.ariel.ctu.unimi.it/v3/Home/). This material is intended as a support for lectures, and its study cannot be considered as a full alternative to constant attendance of classes. The material is made available only to registered students of the Degree Course in Molecular Biotechnology and Bioinformatics and should not be distributed to others without express consent of the teachers.
Assessment methods and Criteria
Evaluation of the notions and competencies acquired during this course will consist of the execution of a small project where students will be required to perform bioinformatics analyses on real gene expression data and in the production of a report describing their results. Projects will be undertaken in small groups (1-3 students per group). Delivery of the report will be followed by an oral discussion of the project itself and all the topics covered in the classes. The grade will result from the joint evaluation of each candidate by the two teachers.
INF/01 - INFORMATICS - University credits: 6
Lectures: 48 hours
Professors:
Chiara Matteo, Zambelli Federico
Shifts:
Professor(s)
Reception:
Friday 15.00-16.00 by appointment
Beacon Lab, 2nd floor, B Tower, Dept. of Biosciences / MS Teams