NERSCPowering Scientific Discovery Since 1974

Performance and Debugging Tools

NERSC provides many popular debugging and profiling tools. Some of them are general-purpose tools and others are geared toward more specific tasks.

A quick guideline on when to use which debugging tool is as follows:

  • DDT and TotalView: general purpose parallel debuggers allowing users to interactively control the pace of execution of a program using a graphical user interface
  • gdb: serial command line mode debugger; can be useful in quickly examining core files to see where the code crashed (DDT and TotalView can be used for this purpose, too)
  • STAT: used for obtaining call backtraces for all parallel tasks from a live parallel application and displaying a call tree graphically, showing where each task is executing; useful in debugging a hung application

  • ATP: used for generating call backtraces for all parallel tasks when a code crashes; useful in debugging a hung appliation; can be a good starting point if a code crashes with little hint left behind

  • CCDB and lgdb: the unique and great feature is to run two versions of a code (e.g., one working version and an incorrect version, or a code run with two different numbers of tasks) side by side to find out where the two runs start to generate diverging results

  •  Valgrind: a suite of debugging and profiling tools; the best known tool is memcheck which can detect memory errors or memory leaks; other tools include cache profiling, heap memory profiling tools and more

A "Getting Started" tutorial on some debugging tools:

A quick guideline for performance analysis tools below:

  • IPM: a low-overhead easy-to-use tool for getting hardware counters data, MPI function timings, and memory usage
  • CrayPat: a suite of sophisticated Cray tools for a more detailed performance analysis which can show routine-based hardware counters data, MPI message statistics, I/O statistics, etc; in addition to getting performance data deduced from a sampling method, tracing of certain routines (or library routines) can be performed for better understanding of performance statistics associated with the selected routines
  • MAP: a sampling tool for performance metrics; time series of the collected data for the entire run of the code is displayed graphically, and the source code lines are annotated with performance metrics
  • Intel VTune Amplifier XE: a GUI-based tool that can find performance bottlenecks

A "Getting Started" tutorial on some performance tools:


For more information about how to use a tool, click on the relevant item below.


DDT is a parallel debugger that can be run with up to 8,192 processors. It has features similar to Totalview and and a similarly intuitive user interface. Read More »


TotalView, from Rogue Wave Software, is a parallel debugging tool that can be run with up to 512 processors. It provides an X Windows-based Graphical User Interface and a command line interface. Read More »


GDB can be used to quickly and easily examine a core file that was produced when an execution crashed to give an approximate traceback. Read More »


STAT (the Stack Trace Analysis Tool) is a highly scalable, lightweight tool that gathers and merges stack traces from all of the processes of a parallel application. ATP (Abnormal Termination Processing) automatically runs STAT when the code crashes. Read More »

CCDB and lgdb

Parallel Debugging with lgdb lgdb (Cray Line Mode Parallel Debugger) is a GDB-based parallel debugger, developed by Cray. It allows programmers to either launch an application or attach to an already-running application that was launched with aprun, to debug the parallel code in command-line mode. These features can be useful, but you will probably want to use a more powerful GUI-based debuggers instead. Below is an example of running lgdb for a parallel application: % qsub -I -lmppwidth=24 -q… Read More »


The Valgrind tool suite provides several debugging and profiling tools that can help make your programs faster and more correct. The most popular tool is Memcheck, which can detect many memory-related errors that are common in C and C++ programs. Read More »


IPM is a portable profiling infrastructure which provides a high level report on the execution of a parallel job. IPM reports hardware counters data, MPI function timings, and memory usage. Read More »


CrayPat is a performance analysis tool provided by Cray for the XT and XE platforms. Read More »


Allinea MAP is a parallel profiler with simple GUI. It can be run with up to 512 processors. perf-report is a new tool from Allinea, which may be available for a limited time, that characterizes code performance based on percentage of walltimes used in different performance metrics categories. Read More »


Intel VTune is a GUI-based tool for identify performance bottlenecks and getting performance metrics. Read More »