hands on a laptop keyboard

Deep Learning at Scale Training

March 3 - 4, 2025

NERSC and NVIDIA are hosting a hybrid, hands-on Deep Learning at Scale training event on March 3-4 in Berkeley, CA . This training will help users explore distributed training for deep learning models on high-performance computing systems (specifically Perlmutter). The training will focus on building a large-scale deep learning model on a real scientific application (transformers for weather forecasting) and walk users through profiling tools and performance optimization on a single GPU, scaling to multiple GPUs (and nodes) through distributed training with data parallelism (along with tips and techniques to scale) as well as advanced parallelization for very large models with model parallelism. 

We will provide example code and datasets to allow attendees to experiment hands-on with optimized and scalable distributed training of our scientific deep learning model on Perlmutter. Due to the hands-on experiments on Perlmutter, the event attendance will be capped. However, all training material as well as the lecture recordings will be made available after the event. OLCF and ALCF users are welcome to attend. Training accounts will be provided if needed.

Logistics

This event will be hybrid. Onsite location (in B59, see visitor information for more details). 

Materials

Agenda

Day 1: March 3

Time Topic Presenters
9 - 10 a.m. Introduction + Perlmutter Setup Shashank Subramanian (NERSC) and Steven Farrell (NERSC)
10 - 10:15 a.m. Break
10:30 - 11 a.m. Deep Learning Performance on a GPU Josh Romero (NVIDIA)
11 a.m. - 12 p.m. Hands-on: Profiling and Optimizing GPU Training Josh Romero (NVIDIA) and NERSC
12 - 1 p.m. Discussions NERSC

Day 2: March 4

Time Topic Presenters
9 - 9:30 a.m. Scaling with Data Parallelism Steven Farrell (NERSC)
9:30 - 10:30 a.m. Hands-on: Data Parallelism NERSC
10:30 - 10:45 a.m. Break
10:45 - 11:15 a.m. Scaling with Model Parallelism Shashank Subramanian (NERSC)
11:15 - 12:15 p.m. Hands-on: Model Parallelism NERSC
12:15 - 1 p.m. Discussions NERSC