NERSCPowering Scientific Discovery Since 1974


Distributed Debugging Tool (DDT) from Allinea Software is a parallel debugger installed on Hopper, Edison, Cori and Babbage.


DDT is a parallel debugger which can be run with up to 8192 processors. It can be used to debug serial, OpenMP, MPI, Coarray Fortran (CAF), UPC (Unified Parallel C) codes. It also supports GPU debugging, but NERSC doesn't currently have a license on Dirac.

Totalview users will find DDT has very similar functionality and an intuitive user interface. All of the primary parallel debugging features from Totalview are available with DDT.

The Allinea DDT web page and 'Allinea DDT and MAP User Guide' (available as $ALLINEA_TOOLS_DOCDIR/userguide.pdf after loading an allineatools module) is a good resource for learning more about some of the advanced DDT features.

Loading the Allinea Tools Module

To use DDT at NERSC, first load the 'allineatools' module to set the correct environment settings:

% module load allineatools

Compiling Code to Run with DDT

In order to use DDT, code must be compiled with the -g option. Add the -O0 flag with the Intel compiler. We also recommend that you do not run with optimization turned on, flags such as -fast.

A Fortran example:

% ftn -g -o testDDT_ex testDDT.f             # on Hopper, Edison or Cori
% mpiifort -g -mmic -o testDDT_ex testDDT.f # on Babbage for MIC

A C example:

% cc -g -o testDDT_ex testDDT.c             # on Hopper, Edison or Cori
% mpicc -g -mmic -o testDDT_ex testDDT.c # on Babbage for MIC

Starting a Job with DDT

Running an X window GUI application can be painfully slow when it is launched from a remote system over internet. NERSC recommends to use the free NX software because the performance of the X Window-based DDT GUI can be greatly improved. Another way to cope with the problem is to use Allinea's remote client, which will be discussed in the next section.

You can also start Be sure to log in with an X window forwarding enabled. This could mean using the -X or -Y option to ssh. The -Y option often works better for Mac OSX.

% ssh -Y

After loading the allineatools module and compiling with the -g option, request an interactive session on Hopper, Edison, Cori or Babbage; be sure to export your DISPLAY environment variable into the batch environment if a qsub command is used.

% qsub -I -v DISPLAY -q debug -lmppwidth=numCores       # Hopper or Edison
% salloc -p debug -N numNodes # SLURM scheduler on Cori or Babbage

Then launch the debugger with the ddt command followed by the name of the executable to debug:

% ddt ./testDDT_ex

or, starting from the 5.x version,

% forge ./testDDT_ex

The Allinea Forge GUI will pop up, showing a start up menu for you to select what to do. For basic debugging choose the option Run with the 'allinea DDT' tool. A user can also choose 'ATTACH'  to attach DDT to an already running program, or 'OPEN CORE' to view a core dump file from a previous job.

Then a submission window will appear with a prefilled path to the executable to debug. Select the number of processors on which to run and press run. To pass command line arguments to a program enter them in the aprun arguments box.

DDT Submit window

Running DDT on Babbage requires some additional steps. Let's assume that two MIC cards, bc1003-mic0 and bc1003-mic1, are assigned to your batch job:

% get_micfile                            # to get hostfile for MIC cards
% cat micfile.$SLURM_JOB_ID # hostfile generated by get_micfile

Load the allineatools module and start ddt as before. One way that DDT works is to run the executable in MPMD (Multiple Program Multiple Data) mode when you want to run on more than one MIC card, as though you were running separate executables, one on each MIC card. The following shows how to specify in the Run window to run 4 MPI tasks on each MIC card. Please note that the value in the Number of Processes box (8 in this example) should be the sum of the MPI tasks over all cards as specified in the mpiexec.hydra arguments box.

DDT Submit window

Attaching to a Running Application

Sometimes you suspect your application is hanging somewhere, not progressing as expected. In this case you can attach DDT to the running application. In this section we show how to do this on Edison and Hopper.

This feature is not available on Cori.

To attach such a running application on the Cray machines, you first need to login to a MOM node from a login node (MOM nodes are intermediary nodes sitting between login nodes and compute nodes that execute submitted user batch scripts and launch applications to compute nodes). If you have already set up passwordless ssh connection with agent forwarding, you can then login to any MOM node to attach to a running application. If you have not, you need to know which MOM node launched your application to compute nodes. This can be determined as follows. Let's assume that your job ID is 2897769, and that you are on a login node.

% qstat -f 2897769
    login_node_id = nid02433

Although the output confusingly uses the term, "login_node_id", it means the MOM node for this batch job (actually, the internal node number ID for the MOM node). Log into the MOM node from the login node that you're on (you cannot directly log into a MOM node), and start DDT there. If you have set up passwordless ssh connection with agent forwarding, you can login to any MOM node (that is, edimom01, ..., or edimom24 in case of Edison; hmom01, ..., or hmom24 in case of Hopper), as explained above.

% ssh -Y nid02433         # or any MOM node with passwordless ssh; from a login node
% module load allineatools
% ddt

Select the 'ATTACH' button. A window pops up, prompting you to select the application to attach DDT to:

DDT Submit window

When clicking the 'Attach to ...' button at the bottom, DDT connects to the running processes and display where the code is currently executing, waiting for your debugging actions.

DDT Submit window

Remote Client

Allinea provides remote clients for Windows, OS X and Linux that can run on your local desktop to connect via SSH to NERSC systems to debug, profile, edit and compile files directly on the remote NERSC machine. You can download the clients from Allinea and install on your laptop/desktop. Please note that the client version must be the same as the Allinea version that you're going to use on the NERSC machines.

First, we need to configure the client for running a debugging session on a NERSC machine. Start the client, and select 'Configure...' in the 'Remote Launch' pull-down menu.

DDT Submit window

That will open the 'Configure Remote Connections' window.

Using the 'Add', 'Edit' and other buttons, create configuration for a machine, as shown in the following example.

For the 'Remote Installation Directory', use the path for the default allineatools module. The value for the 'Remote Script' field should be exactly the same as shown above.

You can repeat this for other NERSC machines. However, the remote client works only for Edison and Hopper at this time.

To start a debugging session on a machine, choose the configuration for the machine from the same 'Remote Launch' menu.

DDT Submit window

You'll be prompted to enter the passphrase if you have a SSH key set up, or the NIM password, otherwise.

Since you're going to run the debugging/profiling session in a batch job, you need to do additonal setting for batch job submission when you run a job for the first time. After logging into a NERSC machine, click 'Options' in the Allinea DDT main window. Then, set as shown below. The example below is for the Cray machines.

DDT Submit window

DDT Submit window

That is to select 'Cray X-Series (MPI/shmem/CAF)' for the 'MPI/UPC Implementation' and to choose 'pbs-xt4.qtf' for the 'Submission template file'. Please note that the path shown above is for the default allineatools version (5.0.1) at the time of this writing. If the default version changes, the path will be pointing to the file in the correct installation directory.

Then, click the 'RUN' button in the main Allinea DDT window, and fill out the fields appropriately. Please make sure that the 'Submit to Queue' item is selected. Also, you will have to set the 'Working Directory' field correctly when the remote client is used. Otherwise, the application will run in your home directory.

DDT Submit window

Click the 'Submit' button to submit this batch job. When the job starts, you will see the familar DDT window.

DDT Submit window

Trouble Shooting

If you are having trouble launching DDT try these steps.

Make sure you have the most recent version of the system.config configuration file. The first time you run DDT, you pick up a master template which then gets stored locally in your home directory in ~/.allinea/${NERSC_HOST}/system.config where ${NERSC_HOST} is the machine name: hopper, edison cori or babbage. If you are having problems launching DDT you could be using an older verion of the system.config file and you may want to remove the entire directory:

% rm -rf ~/.allinea/${NERSC_HOST}  

Remove any stale processes that may have been left by DDT.

% rm -rf $TMPDIR/allinea-$USER 

In case of a font problem where every character is displayed as a square, please delete the .fontconfig directory in your home directory and restart ddt.

% rm -rf ~/.fontconfig

Make sure you are requesting an interactive batch session on Hopper, Edison, Cori, or Babbage; be sure to export your DISPLAY environment variable into the batch environment. NERSC has configured DDT to run from the interactive batch jobs.

% qsub -I -v DISPLAY -q debug -lmppwidth=numCores       # Hopper or Edison
% salloc -p debug -N numNodes # SLURM scheduler on Cori or Babbage

Finally make sure you have compiled your code with -g. A large number of users who are having trouble running with parallel debuggers forget to compile their codes with debugging flags turned on. If none of these tips help, please contact the consultants at

Basic Debugging Functionality

The DDT GUI interface should be intuitive to anyone who has used a parallel debugger like Totalview before. Users can set breakpoints, step through code, set watches, examine and change variables, dive into arrays, dereference pointers, view variables across processors, step through processors etc. Please see the Allinea Forge User Guide if you have trouble with any of these basic features.                              

Useful DDT Features

Process Groups

With DDT, the user can easily change the debugger to focus on a single process or group of processes. If Focus on current Processor is chosen, then stepping through the code, setting a breakpoint etc will occur only for a given processor. If Focus on current Group is chosen then the entire group of processors will advance when stepping forward in a program and a breakpoint will be set for all processors in a group.

Similary, when Focus on current Thread is chosen, then all actions are for an OpenMP thread. DDT doesn't allow to create a thread group. However, one can click the Step Threads Together box to make all threads to move together inside a parallel region. In the image shown above, this box is grayed out simply because the code is not an OpenMP code.

A user can create new sub-groups of processors in several ways. One way is to click on the Create Group button at the bottom of the Process Group Window. Another way is to right-click in the Process Group Window to create a group and then drag the desired processors to the group. Groups can also be created more efficiently using sub-groups from the Parallel Stack View described below. The below image shows 3 different groups of processors, the default All group, a group with only a single master processor Master and a group with the remaining Workers processors.

Parallel Stack View

A feature which should help users debug at high concurrencies is DDT's Parallel Stack View window found in the lower left area, which allows the user to see the position of all processors in a code at the same time from the main window. A program is displayed as a branching tree with the number and location of each processor at each point. Instead of clicking through windows to determine where each processor has stopped, the Parallel Stack View presents a quick overview which easily allows users to identify stray processes. Users can also create sub-groups of processors from a branch of the tree by right clicking on the branch. A new group will appear in the Process Group Window at the top of the GUI.

Message Queues

DDT can allows to examine the status of the intermal MPI message buffers. With this feature, you can detect a coding error that generates communication deadlock where all processes are waiting for each other when no message was sent.

Currently message queue debugging is not provided on Hopper, Edison and Cori.

There are three types of message queues:

  • Send: Calls to MPI send functions that have not yet completed; shown in red arrows
  • Receive: Calls to MPI receive functions that have not yet completed; shown in green arrows
  • Unexpected Message: Represents messages received by the system but the corresponding receive function call has not been made ; shown in blue arrows

To view message queues, you need to select Message Queues from the View pull-down menu. Below are some examples: 

DDT Receive Message Queue

DDT Send and Receive Message Queues

Memory Debugging

DDT has a memory debugging tool that can show heap memory usage across processors.

To access the memory debugging feature, you must first build your code for memory debugging. On Babbage, you can build it as usual. However, on Hopper, Edison and Cori, you have to follow certain steps. Below is a table showing steps for building a static executable using different compilers for memory debugging on Hopper, Edison and Cori. For the compilers other than PGI, the linking step is made of two parts. The first is to run in verbose mode using the -v flag to show all the linking steps taken. The second step is to rerun the last linker line after inserting some more options.

Compiler For static linking

% ftn -g -c prog.f
% ftn -Bstaticddt -o prog prog.o


% ftn -g -c prog.f
% ftn -v -o prog prog.o          # -v to get the last linker line

Rerun the last linker line after inserting '-zmuldefs' right after the command and putting ${DDT_LINK_DMALLOC} just before -lc:
% /opt/gcc/4.7.1/snos/libexec/gcc/x86_64-suse-linux/4.7.1/collect2 -zmuldefs ... ${DDT_LINK_DMALLOC} -lc ...


% ftn -g -c prog.f
% ftn -v -o prog prog.o

Do similarly as above:
% /opt/cray/cce/8.0.7/cray-binutils/x86_64-unknown-linux-gnu/bin/ld -zmuldefs ... ${DDT_LINK_DMALLOC} -lc ...


% ftn -g -c prog.f
% ftn -v -o prog prog.o

Do similary as above. There are two locations to put ${DDT_LINK_DMALLOC} as there are two -lc's:
% ld -zmuldefs ... ${DDT_LINK_DMALLOC} -lc ... ${DDT_LINK_DMALLOC} -lc ...

The example commands are shown for a Fortran case. cc and CC should be used similarly for C and C++ codes. In case of a C++ code, ${DDT_LINK_DMALLOCXX} is to be used instead of ${DDT_LINK_DMALLOC}.

A simple script, static_linking_ddt_md, is provided in your $PATH to help you complete the somewhat complicated steps shown above.

% module load allineatools
% ftn -g -c prog.f
% static_linking_ddt_md ftn -o prog prog.o # instead of 'ftn -o prog prog.o'
% ls -l prog
-rwx------ 1 wyang wyang 6701908 2012-10-15 15:19 prog

You need to separate the compile and link stages. That is, you need to create *.o files using the -c compile flag first; otherwise, you can see the following message:

/usr/bin/ld: cannot find /scratch/scratchdirs/wyang/ifortnr7R21.o: No such file or directory

 For multi-threaded codes, DDT_LINK_DMALLOCTH and DDT_LINK_DMALLOCTHCXX are used in place of DDT_LINK_DMALLOC and DDT_LINK_DMALLOCXX, respectively. Again, a utility script, static_linking_ddt_md_th, is provided to help with linking:

% static_linking_ddt_md_th ftn -mp -o prog prog.o   # instead of 'ftn -mp -o prog prog.o' 

Below is a table showing how to prepare your code using dyanmic linking on Hopper, Edison and Cori. The example is provided for a Fortran code case. Adjustments should be made for C and C++ codes as above. Again, in case of a C++ code, ${DDT_LINK_DMALLOC} must be repalced with ${DDT_LINK_DMALLOCXX} .

Compiler For dynamic linking
PGI, Cray

% ftn -g -c prog.f
% ftn -dyanmic -o prog prog.o ${DDT_LINK_DMALLOC} --Wl,--allow-multiple-definition

GNU, Intel

% ftn -g -c prog.f
% ftn -dynamic -o prog.o ${DDT_LINK_DMALLOC} -zmuldefs

For multi-threaded codes, ${DDT_LINK_DMALLOCTH} or ${DDT_LINK_DMALLOCTHCXX} should be used instead.

Next, when DDT starts, you must click the "Memory Debugging" checkbox in the DDT run menu that first comes up

DDT Groups

To set detailed memory debugging options, click the 'Details...' button on the far right side, which will open the 'Memory Debugging Options' window. There you can set the heap debugging level, the number of guard pages before or after arrays (but not both) for detection of heap overflow or underflow in the program, etc. The default page size is 4 KB.

DDT - memory debugging option

When running ddt with a statically built code, please deselect the 'Preload the memory debugging library' item. Otherwise, ddt can hang indefinitely during startup on Cray machines.

Also, leave the 'Preload ...' checkbox with a dynamically linked executable unchecked on the Cray machines if a C++ version of Allinea's dmalloc library was used (that is, when $DDT_LINK_DMALLOCXX or $DDT_LINK_DMALLOCTHCXX was used). Otherwise, ddt hangs during startup.

Several features are enabled with memory debugging. Select Current Memory Usage or  Memory Statistics under the Tools menu. With the following buggy code that generates memory leaks:

      program memory_leaks

!...  Buggy code prepared by NERSC User Service Group for a debugging tutorial
!...  February, 2012

      implicit none
      include 'mpif.h'
      integer, parameter :: n = 1000000
      real val
      integer i, ierr
      call mpi_init(ierr)
      val = 0.
      do i=1,10
         call sub_ok(val,n)
      end do
      do i=1,10
         call sub_bad(val,n)
      end do
      do i=1,10
         call sub_badx2(val,n)
      end do
      print *, val
      call mpi_finalize(ierr)

      subroutine sub_ok(val,n)      ! no memory leak
      integer n
      real val
      real, allocatable :: a(:)
      allocate (a(n))
      call random_number(a)
      val = val + sum(a)
!     deallocate(a)                 ! ok not to deallocate

      subroutine sub_bad(val,n)     ! memory leak of 4*n bytes per call
      integer n
      real val
      real, pointer :: a(:)
      allocate (a(n))
      call random_number(a)
      val = val + sum(a)
!     deallocate(a)                 ! not ok not to deallocate

      subroutine sub_badx2(val,n)   ! memory leak of 8*n bytes per call
      integer n
      real val
      real, pointer :: a(:)
      allocate (a(n))
      call random_number(a)
      val = val + sum(a)
      allocate (a(n))               ! not ok to allocate again
      call random_number(a)
      val = val + sum(a)
!     deallocate(a)                 ! not ok not to deallocate

you can easily see heap memory information (such as how much is being used, how much has been allocated, how much is freed, etc.), from which you can deduce where memory leaks occur. Below is a window shown when the Current Memory Usage menu is selected:

DDT - Current Memory Usage

It displays current heap memory usage of the program and the routines where it is allocated. Clicking on a histogram bar on the right, you will see the 'Allocation Details' box on the left filled up with information about where the memory allocation was made. By clicking on one of the pointers in the 'Allocation Details' list you can get information mapped to source code:

DDT - Pointer Details

It shows how much It is known that memory debugging can fail with the error message "A tree node closed prematurely. One or more proceses may be unusable.", especially with MPI_Bcast. A workaround is to disable 'store stack backtraces for memory allocations' option in the 'Enable Memory Debugging' setting. This problem will be fixed in the next release.

Offline Debugging

Offline debugging is to run DDT in a command-line mode, without using GUI. This mode may be useful if all you want is to get tracepoint (a specified location in the code where requested values are printed) output or stack backtraces without directly interacting with DDT. This can be good for a "parameter study" where you want to check for an error condition for a range of a parameter value, which would become a tedious task if GUI is used.

To run DDT in this mode, you submit a batch job using a batch script that looks like:

% cat runit
#PBS ...

module load allineatools
ddt --offline=filename.html --np=4 myprogram arg1 ... # to get HTML output file
ddt --offline=filename      --np=4 myprogram arg1 ... # to get plain text output file

% qsub runit 6350051.hopque01

Please note that we are using 'ddt -offline ...' in place of 'aprun' or 'mpirun' for launching an application. Output of the debugging session is saved in the specified file ('filename.html' or 'filename' in the above example).

Some options can be used for the ddt command:

  • --session=sessionfile: run using settings saved using the Save Session option during a previous GUI run session
  • --np=numTasks: run with numTasks (MPI) tasks
  • --mem-debug: enable memory debugging
  • --trace-at=LOCATION[,N:M,P],VAR1,VAR2,... [if CONDITION]: set a tracepoint at location LOCATION (given by either 'filename:linenumber' or 'functionname' as in 'main.c:22' or 'myfunction'), beginning recording after the N-th visit of each process to the location, and recording every M-th subsequent pass until it has been triggered P times; record the value of variable VAR1, VAR2, ...; the if clause allows to specify a boolean CONDITION that must be satisfied to trigger the tracepoint
  • --break-at=LOCATION[,N:M:P] [if CONDITION]: set a breakpoint at a location using the format explained above; the stack back traces of pausing processes will be recorded at the breakpoint before they are then made to continue

An example using the following simple code is shown below:

      program offline
!... Prepared for a debugger tutorial by NERSC
include 'mpif.h'
integer, parameter :: n = 24
real, allocatable :: a(:)
integer i, me
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world,me,ierr)
allocate (a(n))
call random_number(a)
do i=1,n
if (mod(i,2) == 1) call sub(i,n,a) ! 'sub' called when i=1,3,5,...
end do
print *, me, sum(a)
call mpi_finalize(ierr)
subroutine sub(i,n,a)
integer n, i, j
real a(n)
do j=1,n
a(j) = cos(a(j))
end do

 The following is to set a tracepoint at the beginning of the routine 'sub' where values of i and a(1) are to be printed; and to set a breakpoint at line 23, using the activation scheme of '5:3:2':

ddt --offline=offline.html --np=4 --trace-at=sub,i,a\(1\) --break-at=offline.f:23,5:3:2 ./offline

The output file is broken into three sections: Messages (showing process activities such as startup and termination etc., as well as call backtrace at breakpoints), Tracepoints (showing output from activated tracepoints), and Output (program output).

Introductory Video Tutorial

Watch the video.

Tutorial Materials

Previous tutorial presentation files are available on how to use DDT although some contents may be out-dated:

Installed Versions

PackagePlatformCategoryVersionModuleInstall DateDate Made Default
Allinea tools babbage applications/ debugging 5.1-43629 allineatools/5.1-43629 2015-08-25 2015-08-25
Allinea tools babbage applications/ debugging 5.1-43857 allineatools/5.1-43857 2015-09-10 2015-09-10
Allinea tools babbage applications/ debugging 5.1-43967 allineatools/5.1-43967 2015-10-19 2015-10-19
Allinea tools cori applications/ debugging 5.1-43967 allineatools/5.1-43967 2015-10-15 2015-10-15
Allinea tools edison applications/ debugging 5.1-43629 allineatools/5.1-43629 2015-08-25 2015-08-25
Allinea tools edison applications/ debugging 5.1-43857 allineatools/5.1-43857 2015-09-10 2015-09-10
Allinea tools edison applications/ debugging 5.1-43967 allineatools/5.1-43967 2015-10-19 2015-10-19
Allinea tools hopper_cle52 applications/ debugging 5.1-43629 allineatools/5.1-43629 2015-08-25 2015-08-26
Allinea tools hopper_cle52 applications/ debugging 5.1-43857 allineatools/5.1-43857 2015-09-10 2015-09-10
Allinea tools hopper_cle52 applications/ debugging 5.1-43967 allineatools/5.1-43967 2015-10-19 2015-10-19