NERSCPowering Scientific Discovery for 50 Years

Using HPCToolkit to Measure and Analyze the Performance of GPU-accelerated Applications Tutorial, Mar-Apr 2021

March 29, 2021

The developers of HPCToolkit from Rice University will present a 2-part training series for NERSC and OLCF users about using HPCToolkit to measure and analyze the performance of GPU-accelerated applications. This tutorial will (1) introduce HPCToolkit’s general capabilities for performance measurement and analysis, (2) highlight new capabilities for performance measurement and analysis of GPU-accelerated codes, (3) contrast HPCToolkit’s capabilities with those of other tools, (4) describe how to use HPCToolkit’s user interfaces effectively, and (5) describe how to use HPCToolkit to analyze GPU-accelerated applications on platforms with NVIDIA GPUs.  Cori GPU for NERSC users (to prepare for Perlmutter) and Summit for OLCF users will be used for hands on exercises. 

This tutorial will introduce the capabilities of Rice University’s HPCToolkit performance tools for measurement and analysis of both CPU and GPU-accelerated applications. Today, HPCToolkit can measure and analyze performance of GPU-accelerated applications on AMD, Intel, and NVIDIA GPUs. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling  on NVIDIA GPUs and instrumentation on Intel GPUs to measure and attribute GPU performance metrics to source lines, loops, and inlined code.  To help developers understand the performance of complex GPU code generated from high-level programming models such as OpenMP or template-based programming abstractions, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations from flat PC samples collected by NVIDIA GPUs.  To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces of CPU and GPU activity time lines within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples.  

This event will be presented online only using Zoom. Please see below for remote connection information. 


Date & Time Topics and Speakers

Day 1, Monday, March 29

10:00 am - 1:00 pm PDT

  • Welcome to HPCToolkit Training (Helen He, Suzanne Parete-Koon)
  • Lecture: Introduction to HPCToolkit (John Mellor-Crummey)
Presentations with live demos
  • Analyzing GPU-accelerated Applications (Keren Zhou)
  • Using HPCToolkit’s Graphical User Interfaces (Laksono Adhianto)
  • Hands on work by attendees with example codes and/or own applications

Day 2, Friday,  April 2

10:00 am - 1:00 pm PDT
  • Analyzing CPU-accelerated Applications (John Mellor-Crummey)
  • Walkthrough of using HPCToolkit with example codes 
  • More hands on work and answer questions about experiences with example codes
  • Help developers applying HPCToolkit to their own codes


Please use this form to register. There is no registration fee.