NERSCPowering Scientific Discovery for 50 Years

Data Analytics in Python on GPUs with NVIDIA RAPIDS Training (ONLINE ONLY), April 14, 2020

April 14, 2020

Introduction

Perlmutter will be NERSC's very first supercomputer featuring a non-testbed GPU partition. This timely architectural shift promises to bring NERSC users many new and exciting opportunities to accelerate their science analyses, but also new challenges in terms of tools, techniques, and libraries to learn.

To prepare its users to make efficient and productive use of Perlmutter's GPU partition from day one, NERSC is hosting a series of GPU training events and hackathons in 2020.  There also have been a number of GPU hackathons for developers of codes written in compiled languages like C, C++, and Fortran, but what about high-level languages like Python?

Python is an integral programming language used for data analytics, data analysis pipelines, and machine learning at NERSC. Python is the platform of choice for programmers who value productivity, prototyping models, and rapid iterative data exploration. Increasingly we see experimental and observational data facilities run by DOE turning to high-level productivity languages like Python. So why should the compiled languages have all the GPU fun?

As part of the Perlmutter strategy, NERSC has partnered with NVIDIA to close the gap between the software needs of NERSC users and current state-of-the art GPU analytics libraries. Part of this effort involves sharing selected (CPU-based) application codes from real NERSC users with NVIDIA engineers to inform product development and get feedback from NVIDIA engineers to users to optimize their GPU utilization.

NERSC and NVIDIA are pleased to announce a one day ONLINE ONLY training event on Tuesday, April 14, 2020 to teach NERSC users about the NVIDIA RAPIDS software ecosystem for GPU-accelerated data analytics and machine learning. The RAPIDS data science framework includes a collection of libraries for executing end-to-end data science pipelines completely in the GPU, and is designed to have a familiar look and feel to data scientists working in Python. We expect RAPIDS to become the most productive way for Python users to do data analytics on Perlmutter's GPUs. Join us online and learn directly from developers how to transition your data analytics workflows to GPUs.

As part of the training, attendees will be given temporary access to the Cori GPU testbed to try exercises and try porting their own codes.

Since this event will be fully online, we will try a new strategy to help participants get the most out of this event. We will incorporate the "Flipped Classroom" approach in which we give participants the material in advance and expect that they have tried the exercises prior to the workshop. RAPIDS experts will be available during the workshop to help answer questions about the exercises.

Schedule

Date and Time: 9 AM - 4 PM (Pacific time), Tuesday, April 14, 2020

Presenters:

  • Nick Becker (NVIDIA RAPIDS Engineering)
  • Ayush Dattagupta (NVIDIA RAPIDS Engineering)
  • Vibhu Jawa (NVIDIA RAPIDS Engineering)
  • Zahra Ronaghi (NVIDIA Solutions Architecture)

Notebooks

https://github.com/beckernick/nersc-rapids-workshop

9:00-9:15 AM Welcome
9:15-9:30 AM Introduction to GPU Computing
9:30-10:15 AM Introduction to RAPIDS
Focused on cuDF and cuML
10:15-11:00 AM Introduction to cuDF (flipped classroom)
11:00-11:15 AM Break
11:15-12:00 PM Introduction to cuML (flipped classroom)
12:00-1:00 PM Break
1:00-1:30 PM Introduction to Dask (flipped classroom)
1:30-2:00 PM Introduction to Dask + GPUs
2:00-2:15 PM Evaluating CPU Workflows for the GPU
Thinking columnar, rather than row-wise
2:15-3:15 PM Demo: Accelerate a Real Workflow
3:15-3:30 PM Break
3:30-4:00 PM Open Q&A, closing remarks, attendee survey

Videos

See this playlist on the NERSC YouTube channel.