The course aims at describing the big data processing framework, both in terms of methodologies and technologies. Part of the lessons will focus on Apache Spark and distributed patterns.
Risultati apprendimento attesi
Students will learn:
How to distribute computation over clusters using Map Reduce model How to write Apache Spark code How Hadoop works and why it works that way What a software architecture is How to design batch architectures to manage data workflows Several design patterns that could be used in a distributed environment The limit of traditional SQL with Big Data