Genomic big data management and computing
A.A. 2023/2024
Obiettivi formativi
Many projects in genomics rely on increasingly large data sets, analyzing, for example, genomes of thousands of individuals affected by a particular disease.
It is paramount to understand how large data sets can be managed and processed in an efficient way and how Next-Generation Sequencing processing pipelines and workflows can be used to benefit such large-scale projects.
In this course, we will introduce some of the existing technologies, tools and platforms available for this aim, including Apache Spark, GMQL, Amazon WS, Galaxy, and KNIME. We will also discuss some examples of software libraries in different programming languages (mainly Python and R) that can help to design own tools and pipelines.
It is paramount to understand how large data sets can be managed and processed in an efficient way and how Next-Generation Sequencing processing pipelines and workflows can be used to benefit such large-scale projects.
In this course, we will introduce some of the existing technologies, tools and platforms available for this aim, including Apache Spark, GMQL, Amazon WS, Galaxy, and KNIME. We will also discuss some examples of software libraries in different programming languages (mainly Python and R) that can help to design own tools and pipelines.
Risultati apprendimento attesi
Given the vastness of the topics presented, the ultimate goal of the course is not an in-depth knowledge of specific data analysis approaches, but rather to provide a broad overview of different solutions paired with the understanding of strengths and weaknesses of different methodologies and computing environments for managing scientific workflows used for big data analysis in the field of genomics.
Periodo: Primo semestre
Modalità di valutazione: Esame
Giudizio di valutazione: voto verbalizzato in trentesimi
Programma e organizzazione didattica
Edizione unica
BIO/11 - BIOLOGIA MOLECOLARE
ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI
ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI
Lezioni: 48 ore
Docente:
Piro Rosario Michael
Turni:
-
Docente:
Piro Rosario Michael