Scientific programming

A.A. 2020/2021
Crediti massimi
Ore totali
Obiettivi formativi
The objective of the course is to make students proficient in writing programs and scripts in the programming languages most widely used in modern genomic research: R and Python.
Risultati apprendimento attesi
At the end of this class , the students are expected to be able to design and write advanced programs in Python and R programming languages, applying them to case studies derived from the analysis of genomic data.
Programma e organizzazione didattica

Edizione unica

Secondo semestre
Seminar lectures and practicals in informatics room on the following topics:

Python programming language:
- Quick revision on:
- Variables, Expressions and Statements
- Strings, Lists, Tuples, Dictionaries and indexes
- Functions and Classes
- Functions of second order for manipulation of lists and data collections
- Principles of object programming
- Simple Abstract Data Types: Stacks, Queues, Trees and Graphs
- File management
- Libraries for the management of matrices and tables (e.g. Biopython, Pandas and NumPy). Common statistical libraries (e.g. SciPy).
- Techniques of data visualization through graphs and curves.
- Biopython tools and functions.
- Building Python libraries.
- Integration of informatics systems: hints of REST Web services and Web service invocation from Python

Practicals on Phyton implementations of dynamic programming, statistical analyses of Next Generation Sequencing data, and/or others.

R programming language:
- Main data structures in R: vectors, factors, matrices, arrays, lists and environments
- Control of execution flow: blocks, conditional statements, loops
- Functions and scripts
- Input/Output functions and operators; R data import/export
- Graphical representation of the data, heatmaps, boxplots and Venn diagrams
- Vector operations
- Packages and R "extensions"
- Building packages in R and Bioconductor
- Analyzing Next Generation Sequencing data with R and Bioconductor packages

Practicals on R implementation of statistical analyses of gene expression data (differentially expressed genes, clustering, principal component analysis, and/or others).
No prerequisites different from those required for admission to the Master Degree program.
Metodi didattici
Class lectures and practices in an informatics room or using the students' laptop computers.
Materiale di riferimento
The slides presented during the course and the estimated detailed schedule of lectures and practices are available on the "Be e-Poli" (BeeP), the portal for the network activities of students and professors at the Politecnico di Milano, accessible from the Politecnico di Milano Web site; students registered to the course for the current academic year can access it.
Modalità di verifica dell’apprendimento e criteri di valutazione
The assessment is based on a written exam at the end of the course, with exercises and open questions on all the topics presented during the course lectures or practices.
Esercitazioni: 24 ore
Lectures: 36 ore
Docente: Piro Rosario Michael