Introduction. A taxonomy of computational problems in computational biology. Types of data available in bioinformatics and their representation through computational data structures. Genomics and proteomics data bases.
I. Machine Learning methods 0. Introduction to Machine Learning Methods 1. Different types of computational learning; generalization and evaluation of the learning capabilities of learning machines: (a) Supervised, semi-supervised and non supervised learning (b) Learning, over and underfitting, generalization. (c) Experimental methods to estimate the generalization error 2. Supervised learning - Look-up table and Nearest Neighbours. - Probabilistic approaches and the Bayes Theorem; the dimensionality problem and the Naive Bayes approach. - Neural networks: linear perceptrons, multi-layer perceptron and backpropagation algorithm - Support Vector Machines and supervised kernel methods - Ensemble methods 2. Non supervised methods Clustering methods: k-means, hierarchical clustering methods, graph-based clustering.
II. Supervised, semi-supervised and non supervised methods in Bioinformatics 0. Application of machine learning methods to the different areas of Computational Biology. 1. The problem of the supervised prediction of protein function (AFP - Automated Function Prediction) (a) Formalization of AFP as a multiclass, multilabel and hierarchically structured classification problem (b) Methods based on the joint kernelization of the input and output spaces (c) Methods based on ensembles and Bayesian Networks (d) Hierarchical ensembles based on True Path Rule. 2. Semi-supervised learning with biomolecular networks (a) Modeling biomolecuar networks as graphs (b) Modeling computational problems in biomolecular networks as node labeling and ranking in graphs: functional annotation of genes, search of gene-disease associations, drug repositioning. (c) Algorithms based on random walk and random walk with restart (d) Algorithms based on kernels and kernelized score functions (e) Algorithms based on cost-sensitive Hopfield networks. (f) Secondary memory-based technologies and vertex-centric implementation of algorithms to process big biomolecular networks 3. Prediction of genetic variants associated to genetic and tumoral diseases. (a) Single nucleotide variants (SNV) and their characterization through genomic features. Mendelian and complex genetic diseases. Somatic mutations and cancer. (b) Supervised machine learning methods for the prediction of deleterious SNVs associated to Mendelian and tumoral diseases.