NERSCPowering Scientific Discovery Since 1974

CCDB and lgdb

Parallel Debugging with lgdb

lgdb (Cray Line Mode Parallel Debugger) is a GDB-based parallel debugger, developed by Cray. It allows programmers to either launch an application or attach to an already-running application that was launched with aprun, to debug the parallel code in command-line mode. These features can be useful, but you will probably want to use a more powerful GUI-based debuggers instead.

Below is an example of running lgdb for a parallel application:

% qsub -I -lmppwidth=24 -q debug
…
% cd $PBS_O_WORKDIR
% module rm altd                       # Remove altd because it interferes
% module load cray-lgdb
% lgdb
…
dbg_all> launch $pset{8} ./hello_mpi   # Launch 'hello_mpi' using 8 tasks which I name '$pset'
dbg_all> break hello_mpi.c:21          # Set a breakpoint at line 21 of hello_mpi.c
dbg_all> continue                      # Run

dbg_all> print $pset::myRank           # Print the value of 'myRank' for all processes in $pset
pset[0]: 0
…
pset[7]: 7
dbg_all> print $pset{3}::myRank        # Print the value of 'myRank' for process 3 ($pset[3]) only
pset[3]: 3

Comparative Debugging

What makes lgdb (and CCDB) unique is the comparative debugger technology, which enables programmers to run two executing applications side by side and compare data structures between them. This allows users to run two versions of the same application simultaneously, one that you know generates the correct results and another that gives incorrect results, to identify the location where the two codes start to deviate from each other.

CCDB is a GUI tool for comparative debugging. It runs lgdb underneath. Its interface makes it easy for users to interact with lgdb for debugging. Users are advised to use CCDB over lgdb.

To compare something between two applicaions, you need to let lgdb and CCDB know the name of the variable, and the location where a comparison is to be made, and how the data is distributed over MPI processes. For these, lgdb and CCDB use 3 entities:

  • PE set: A set of MPI processes
  • Decomposition: How a variable is distributed over the MPI processes in a PE set
  • Assertion script: A collection of mathematical relationships (e.g. equality) to be tested

Please see the man page "man lgdb" for usage information about lgdb's comparative debugging feature. Cray also has a manual that documents lgdb's comparative debugging feature. The tutorial manual uses example codes that are provided in the lgdb distribution package. You can build executables using the provided script as follows:

% module load cray-lgdb
% cp -R $CRAY_LGDB_DIR/demos/hpcc_demo .  # copy the entire directory to the current directory
% cd hpcc_demo
% module swap PrgEnv-intel PrgEnv-cray    # its Makefile uses the Cray compiler
% ./build_demo.sh

This will build two binaries, 'hpcc_working' and 'hpcc_broken'.

CCDB Example

To use:

% qsub -I -v DISPLAY -lmppwidth=48           # request enough nodes for launching two applications
% cd $PBS_O_WORKDIR
% module load cray-ccdb
$ ccdb

Then, launch two applications from the CCDB window.

Below is an assertion script which tests whether the 6 variables have the same values between the applications, at line 418 of HPL_pdtest.c. It shows that resid0 and XmormI have different values between the applications and therefore both applications have stopped at line 418.

 

Avalability

PackagePlatformCategoryVersionModuleInstall DateDate Made Default

PackagePlatformCategoryVersionModuleInstall DateDate Made Default
lgdb hopper libraries/ general 2.0.3 cray-lgdb/2.0.3 2013-05-24 2013-06-20
lgdb hopper libraries/ general 2.2.1 cray-lgdb/2.2.1 2013-09-25 2013-12-11
lgdb hopper libraries/ general 2.2.3 cray-lgdb/2.2.3 2013-12-18 2013-12-18
lgdb hopper libraries/ general 2.3.1 cray-lgdb/2.3.1 2014-06-12