Bioinformatics

A.Y. 2018/2019
Lesson for
6
Max ECTS
48
Overall hours
SSD
INF/01
Language
Italian
Learning objectives
Introduction to Bioinformatics. Application of pattern recognition, machine learning and probabilistic models to biomolecular data analysis.

Course structure and Syllabus

Milan
Active edition
Yes
Responsible
INF/01 - INFORMATICS - University credits: 6
Lessons: 48 hours
Professor: Valentini Giorgio
ATTENDING STUDENTS
Syllabus
Introduction.
A taxonomy of computational problems in computational biology. Types of data available in bioinformatics and their representation through computational data structures. Genomics and proteomics data bases.

I. Machine Learning methods
0. Introduction to Machine Learning Methods
1. Different types of computational learning; generalization and evaluation of the learning capabilities of learning machines:
(a) Supervised, semi-supervised and non supervised learning
(b) Learning, over and underfitting, generalization.
(c) Experimental methods to estimate the generalization error
2. Supervised learning
- Look-up table and Nearest Neighbours.
- Probabilistic approaches and the Bayes Theorem; the dimensionality problem and the Naive Bayes approach.
- Neural networks: linear perceptrons, multi-layer perceptron and backpropagation algorithm
- Support Vector Machines and supervised kernel methods
- Ensemble methods
2. Non supervised methods
Clustering methods: k-means, hierarchical clustering methods, graph-based clustering.

II. Machine Learning methods for Computational Biology
0. A survey of the applications of Machine Learning to the different ares of Computational Biology.
1. Supervised prediction of the gene and protein function (AFP - Automated Function Prediction)
(a) AFP formalization as a multi-class, multi-label hierarchical classification problem
(b) Methods based on the joint kernelization of input and output spaces
(c) Ensemble and Bayesian-Net based approaches
(d) Hierarchical Ensemble Methods based on the True Path Rule
(e) Applications to the prediction of Gene Ontology terms and human abnormal phenotypes
2. Semi-supervised network-based prediction methods for the analysis of biological networks
(a) Modelling biomolecular networks as graphs
(b) Examples of computational biology problems formalized as node label ranking in graphs: gene function annotation, disease gene prioritization, drug repositioning.
(c) Semi-supervised learning algorithms for the analysis of biological networks: Random walk and random walk with restart, diffusion methods, kernelized score functions, cost-sensitive Hopfield neural networks. Parallelization of network-based algorithms for the analysis of big data in computational biology.
3. Machine Learning methods for Genomic Medicine
3.1 Prediction of deleterious and pathogenic genomic variants
(a) Regulation of gene expression. Genomic variants and mutations associated with genetic and tumoral diseases. Biotechnologies for the full-genome detection and analysis of genomic variants. Genomic features associated with Single Nucleotide Variants (SNV).
(b) Advanced Machine Learning methods for the detection and prioritization of potentially pathogenic genomic variants in coding and non coding regions of the human genome. Applications in the context of the Precision Medicine.
3.2. Patient stratification and biomarker selection
(a) Machine Learning methods for the detection of pathological subtypes of diseases and for the prediction of the response to drug treatments.
(b) Feature extraction and feature selection methods for dimensionality reduction and biomarker selection and their application to computationally assisted biomolecular diagnosis/prognosis and patients stratification for individualized therapies.
NON-ATTENDING STUDENTS
Syllabus
Introduction.
A taxonomy of computational problems in computational biology. Types of data available in bioinformatics and their representation through computational data structures. Genomics and proteomics data bases.

I. Machine Learning methods
0. Introduction to Machine Learning Methods
1. Different types of computational learning; generalization and evaluation of the learning capabilities of learning machines:
(a) Supervised, semi-supervised and non supervised learning
(b) Learning, over and underfitting, generalization.
(c) Experimental methods to estimate the generalization error
2. Supervised learning
- Look-up table and Nearest Neighbours.
- Probabilistic approaches and the Bayes Theorem; the dimensionality problem and the Naive Bayes approach.
- Neural networks: linear perceptrons, multi-layer perceptron and backpropagation algorithm
- Support Vector Machines and supervised kernel methods
- Ensemble methods
2. Non supervised methods
Clustering methods: k-means, hierarchical clustering methods, graph-based clustering.

II. Machine Learning methods for Computational Biology
0. A survey of the applications of Machine Learning to the different ares of Computational Biology.
1. Supervised prediction of the gene and protein function (AFP - Automated Function Prediction)
(a) AFP formalization as a multi-class, multi-label hierarchical classification problem
(b) Methods based on the joint kernelization of input and output spaces
(c) Ensemble and Bayesian-Net based approaches
(d) Hierarchical Ensemble Methods based on the True Path Rule
(e) Applications to the prediction of Gene Ontology terms and human abnormal phenotypes
2. Semi-supervised network-based prediction methods for the analysis of biological networks
(a) Modelling biomolecular networks as graphs
(b) Examples of computational biology problems formalized as node label ranking in graphs: gene function annotation, disease gene prioritization, drug repositioning.
(c) Semi-supervised learning algorithms for the analysis of biological networks: Random walk and random walk with restart, diffusion methods, kernelized score functions, cost-sensitive Hopfield neural networks. Parallelization of network-based algorithms for the analysis of big data in computational biology.
3. Machine Learning methods for Genomic Medicine
3.1 Prediction of deleterious and pathogenic genomic variants
(a) Regulation of gene expression. Genomic variants and mutations associated with genetic and tumoral diseases. Biotechnologies for the full-genome detection and analysis of genomic variants. Genomic features associated with Single Nucleotide Variants (SNV).
(b) Advanced Machine Learning methods for the detection and prioritization of potentially pathogenic genomic variants in coding and non coding regions of the human genome. Applications in the context of the Precision Medicine.
3.2. Patient stratification and biomarker selection
(a) Machine Learning methods for the detection of pathological subtypes of diseases and for the prediction of the response to drug treatments.
(b) Feature extraction and feature selection methods for dimensionality reduction and biomarker selection and their application to computationally assisted biomolecular diagnosis/prognosis and patients stratification for individualized therapies.
Lesson period
Second semester
Lesson period
Second semester
Assessment methods
Esame
Assessment result
voto verbalizzato in trentesimi
Professor(s)
Reception:
Room S216 - second floor Computer Science Dept.