OLCF AI Training Series: AI for Science at Scale – Part 2, Oct 12, 2023

October 12, 2023

Held October 12, 2023, this session is the second part of the OLCF’s AI for Science at Scale training series and is open to NERSC users. Part one covered how to train a deep learning model in a distributed fashion across multiple GPUs of the Summit supercomputer using data parallelism. Building on this, this session will focus on how to train a model on multiple GPUs across nodes of the Frontier supercomputer and will demonstrate and focus on model parallelism techniques and frameworks, such as DeepSpeed, FSDP, and Megatron.

Time (EDT) Topic Speaker
1 - 1:45 p.m. Scaling, LLMs Sajal Dash (OLCF, Analytics & AI Methods at Scale)
1:45 - 2 p.m. Scientific Applications Sajal Dash
2 - 3 p.m. Hands-on Examples Sajal Dash