NERSCPowering Scientific Discovery Since 1974

Using HPCToolkit to Measure and Analyze the Performance of GPU-accelerated Applications Tutorial, Mar-Apr 2021

The developers of HPCToolkit from Rice University will present a 2-part training series for NERSC and OLCF users about using HPCToolkit to measure and analyze the performance of GPU-accelerated applications. This tutorial will (1) introduce HPCToolkit’s general capabilities for performance measurement and analysis, (2) highlight new capabilities for performance measurement and analysis of GPU-accelerated codes, (3) contrast HPCToolkit’s capabilities with those of other tools, (4) describe how to use HPCToolkit’s user interfaces effectively, and (5) describe how to use HPCToolkit to analyze GPU-accelerated applications on platforms with NVIDIA GPUs.  Cori GPU for NERSC users (to prepare for Perlmutter) and Summit for OLCF users will be used for hands on exercises. 

This tutorial will introduce the capabilities of Rice University’s HPCToolkit performance tools for measurement and analysis of both CPU and GPU-accelerated applications. Today, HPCToolkit can measure and analyze performance of GPU-accelerated applications on AMD, Intel, and NVIDIA GPUs. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling  on NVIDIA GPUs and instrumentation on Intel GPUs to measure and attribute GPU performance metrics to source lines, loops, and inlined code.  To help developers understand the performance of complex GPU code generated from high-level programming models such as OpenMP or template-based programming abstractions, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations from flat PC samples collected by NVIDIA GPUs.  To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces of CPU and GPU activity time lines within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples.  

This event will be presented online only using Zoom. Please see below for remote connection information. 


Date & TimeTopics and Speakers

Day 1, Monday, March 29

10:00 am - 1:00 pm PDT

  • Welcome to HPCToolkit Training (Helen He, Suzanne Parete-Koon)
  • Lecture: Introduction to HPCToolkit (John Mellor-Crummey)
Presentations with live demos
  • Analyzing GPU-accelerated Applications (Keren Zhou)
  • Using HPCToolkit’s Graphical User Interfaces (Laksono Adhianto)
  • Hands on work by attendees with example codes and/or own applications

Day 2, Friday,  April 2

10:00 am - 1:00 pm PDT
  • Analyzing CPU-accelerated Applications (John Mellor-Crummey)
  • Walkthrough of using HPCToolkit with example codes 
  • More hands on work and answer questions about experiences with example codes
  • Help developers applying HPCToolkit to their own codes


Please use this form to register. There is no registration fee.

Remote Connection Information

Join Zoom Meeting
Meeting ID: 510 486 5180
Passcode: hpctoolkit
One tap mobile
+16699006833,,5104865180#,,,,*3403249885# US (San Jose)
+12532158782,,5104865180#,,,,*3403249885# US (Tacoma)

Dial by your location
+1 669 900 6833 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 346 248 7799 US (Houston)
+1 312 626 6799 US (Chicago)
+1 646 558 8656 US (New York)
+1 301 715 8592 US (Washington DC)
Meeting ID: 510 486 5180
Passcode: 3403249885
Find your local number: