Gpu Computing
A.Y. 2025/2026
Learning objectives
The course aims to provide students with advanced training in the use of GPUs as high-performance computational platforms, with a dual focus: on one hand, the acquisition of skills in the CUDA parallel programming model for general-purpose computing, and on the other, the application of those skills in developing and optimizing deep learning models using libraries such as PyTorch. Through a balance of theoretical lessons and practical activities, the course promotes integration of GPU architectural foundations, key parallelism patterns, and acceleration strategies for neural networks, generative models, and models operating on non-Euclidean geometric structures. Developing design and experimental skills within concrete application scenarios enables students to gain a solid understanding of both theoretical concepts and their operational implications, also in connection with current research trends in HPC and artificial intelligence.
Expected learning outcomes
The expected outcomes are as follows:
- Understand modern GPU architectures and the CUDA parallel computing model for HPC and AI.
- Gain knowledge of key deep learning paradigms (e.g., GANs, VAEs, Transformers, GNNs) and acceleration techniques for GPU-based training.
- Write and optimize CUDA C code for general-purpose parallel kernels.
- Implement and train advanced deep learning models in PyTorch, leveraging GPUs for computational acceleration.
- Apply profiling techniques and performance analysis to improve the efficiency of training and inference processes.
- Critically evaluate GPU-based computational solutions for AI and scientific computing problems.
- Identify optimal design choices in implementing models and algorithms on parallel architectures.
- Clearly and rigorously present and discuss architectures, acceleration techniques, and experimental results, including written reports or oral presentations.
- Continue independently exploring emerging models and software libraries in the field of GPU computing and applied artificial intelligence.
- Understand modern GPU architectures and the CUDA parallel computing model for HPC and AI.
- Gain knowledge of key deep learning paradigms (e.g., GANs, VAEs, Transformers, GNNs) and acceleration techniques for GPU-based training.
- Write and optimize CUDA C code for general-purpose parallel kernels.
- Implement and train advanced deep learning models in PyTorch, leveraging GPUs for computational acceleration.
- Apply profiling techniques and performance analysis to improve the efficiency of training and inference processes.
- Critically evaluate GPU-based computational solutions for AI and scientific computing problems.
- Identify optimal design choices in implementing models and algorithms on parallel architectures.
- Clearly and rigorously present and discuss architectures, acceleration techniques, and experimental results, including written reports or oral presentations.
- Continue independently exploring emerging models and software libraries in the field of GPU computing and applied artificial intelligence.
Lesson period: Second four month period
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
Single session
Responsible
Lesson period
Second four month period
Course syllabus
The course is structured into three parts and aims to introduce and explore techniques and paradigms of parallel computing on GPUs in the context of High Performance Computing (HPC) and Artificial Intelligence (AI).
Part 1: GPU Computing for HPC (18 hours)
1. The CUDA parallel computing model for Nvidia GPUs - Hierarchical multithreading
2. Architectural elements of Nvidia GPUs - Memory model and process deployment
3. The CUDA parallel programming model - Use of C and Python
4. Acceleration techniques for CUDA kernels in general-purpose or HPC computing
5. Performance analysis and profiling - Optimization and concurrency
6. Parallelism patterns - Scan, reduction, sorting
Part 2: GPU Programming for AI (9 hours)
1. Managing deep learning models on GPU - Optimized libraries such as PyTorch and cuDNN
2. Initialization and optimization - Cross-entropy loss, stochastic gradient descent, PyTorch optimizers, operator graphs and autograd
3. Acceleration techniques for deep learning - Architectural choices, training protocols, dropout and batch normalization, model fine-tuning
Part 3: Deep Learning Models and Their Implementation in PyTorch (21 hours)
1. Normalizing flows and Neural Ordinary Differential Equations (NODEs)
2. Generative models (GAN, VAE) and diffusion models (DDPM)
3. Attention-based models and Transformer architectures (BERT, GPT)
4. Geometric Deep Learning: models on non-Euclidean structures with PyTorch Geometric (equivariant GNNs)
5. Overview of main GPU application domains: scientific and industrial research, physical and natural simulations, generative AI and LLMs.
Part 1: GPU Computing for HPC (18 hours)
1. The CUDA parallel computing model for Nvidia GPUs - Hierarchical multithreading
2. Architectural elements of Nvidia GPUs - Memory model and process deployment
3. The CUDA parallel programming model - Use of C and Python
4. Acceleration techniques for CUDA kernels in general-purpose or HPC computing
5. Performance analysis and profiling - Optimization and concurrency
6. Parallelism patterns - Scan, reduction, sorting
Part 2: GPU Programming for AI (9 hours)
1. Managing deep learning models on GPU - Optimized libraries such as PyTorch and cuDNN
2. Initialization and optimization - Cross-entropy loss, stochastic gradient descent, PyTorch optimizers, operator graphs and autograd
3. Acceleration techniques for deep learning - Architectural choices, training protocols, dropout and batch normalization, model fine-tuning
Part 3: Deep Learning Models and Their Implementation in PyTorch (21 hours)
1. Normalizing flows and Neural Ordinary Differential Equations (NODEs)
2. Generative models (GAN, VAE) and diffusion models (DDPM)
3. Attention-based models and Transformer architectures (BERT, GPT)
4. Geometric Deep Learning: models on non-Euclidean structures with PyTorch Geometric (equivariant GNNs)
5. Overview of main GPU application domains: scientific and industrial research, physical and natural simulations, generative AI and LLMs.
Prerequisites for admission
A good knowledge of the Python programming language is required, with particular emphasis on object-oriented programming, array manipulation using NumPy, and the use of libraries for numerical computing. A basic understanding of the C language is also recommended, as it is helpful for grasping the CUDA C parallel programming model introduced in the first module of the course.
Teaching methods
The course is delivered through a combination of lectures, hands-on lab sessions, and guided discussions of scientific articles. The theoretical lessons aim to provide the conceptual and mathematical foundations needed to understand parallel algorithms and deep learning models on non-Euclidean and equivariant structures.
The laboratory activities focus on the practical development of classical parallelism patterns and deep learning models on graphs, using libraries such as PyTorch and PyTorch Geometric.
The course makes extensive use of the Ariel platform for sharing teaching materials, notebooks, and further readings. Attendance is strongly recommended, as the practical and interactive components are integral to achieving the expected learning outcomes.
The laboratory activities focus on the practical development of classical parallelism patterns and deep learning models on graphs, using libraries such as PyTorch and PyTorch Geometric.
The course makes extensive use of the Ariel platform for sharing teaching materials, notebooks, and further readings. Attendance is strongly recommended, as the practical and interactive components are integral to achieving the expected learning outcomes.
Teaching Resources
Reference Materials Lecture slides, reference texts, and supporting technical documentation will be made available on the course webpage hosted on the Ariel platform. Additional scientific papers, manuals, or software resources will be indicated and discussed during the course.
Assessment methods and Criteria
The examination consists of two parts:
1. a written test based on the lecture subjects (70% of final grade)
2. a project based on the CUDA C programming language (30% of final grade)
1. a written test based on the lecture subjects (70% of final grade)
2. a project based on the CUDA C programming language (30% of final grade)
Professor(s)