A.Y. 2017/2018
Lesson for
Overall hours
Learning objectives
Introduction to Bioinformatics. Application of pattern recognition, machine learning and probabilistic models to biomolecular data analysis.

Course structure and Syllabus

Active edition
Lessons: 48 hours
Professor: Valentini Giorgio
A taxonomy of computational problems in computational biology. Types of data available in bioinformatics and their representation through computational data structures. Genomics and proteomics data bases.

I. Machine Learning methods
0. Introduction to Machine Learning Methods
1. Different types of computational learning; generalization and evaluation of the learning capabilities of learning machines:
(a) Supervised, semi-supervised and non supervised learning
(b) Learning, over and underfitting, generalization.
(c) Experimental methods to estimate the generalization error
2. Supervised learning
- Look-up table and Nearest Neighbours.
- Probabilistic approaches and the Bayes Theorem; the dimensionality problem and the Naive Bayes approach.
- Neural networks: linear perceptrons, multi-layer perceptron and backpropagation algorithm
- Support Vector Machines and supervised kernel methods
- Ensemble methods
2. Non supervised methods
Clustering methods: k-means, hierarchical clustering methods, graph-based clustering.

II. Supervised, semi-supervised and non supervised methods in Bioinformatics
0. Application of machine learning methods to the different areas of Computational Biology.
1. The problem of the supervised prediction of protein function (AFP - Automated Function Prediction)
(a) Formalization of AFP as a multiclass, multilabel and hierarchically structured classification problem
(b) Methods based on the joint kernelization of the input and output spaces
(c) Methods based on ensembles and Bayesian Networks
(d) Hierarchical ensembles based on True Path Rule.
2. Semi-supervised learning with biomolecular networks
(a) Modeling biomolecuar networks as graphs
(b) Modeling computational problems in biomolecular networks as node labeling and ranking in graphs: functional annotation of genes, search of gene-disease associations, drug repositioning.
(c) Algorithms based on random walk and random walk with restart
(d) Algorithms based on kernels and kernelized score functions
(e) Algorithms based on cost-sensitive Hopfield networks.
(f) Secondary memory-based technologies and vertex-centric implementation of algorithms to process big biomolecular networks
3. Prediction of genetic variants associated to genetic and tumoral diseases.
(a) Single nucleotide variants (SNV) and their characterization through genomic features.
Mendelian and complex genetic diseases. Somatic mutations and cancer.
(b) Supervised machine learning methods for the prediction of deleterious SNVs associated to Mendelian and tumoral diseases.
Lesson period
Second semester
Lesson period
Second semester
Assessment methods
Assessment result
voto verbalizzato in trentesimi
Room S216 - second floor Computer Science Dept.