NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 
PackagePlatformVersionModule Docs
totalview bassi 7.1.1-1 totalview/7.1.1-1 NERSCVendor
totalview bassi 8.0.1-0 totalview NERSCVendor
totalview bassi 8.2.0-0 totalview/8.2.0-0 NERSCVendor
totalview jacquard 8.1.0-0 totalview NERSCVendor
totalview jacquard 8.2.0-0 totalview/8.2.0-0 NERSCVendor
(*) Denotes limited support

The TotalView debugger on Jacquard is a provided and supported by Etnus. There is extensive documentation for it, including the User's Guide at User's Guide. There are web-based tutorials at Etnus and LLNL.

Using Totalview on Jacquard

Compiling an example program

In order to use the debugger, code must be compiled with the -g option. This will produce a larger executable that may run relatively slowly, so be sure to recompile without the -g option once you are ready to execute production runs.

The attached example program, totex.f, can be compiled with the following command:

% mpif90 -o totex -g totex.f

Running the program, without the debugger, produces the following output:

% mpirun -np 4 ./totex
 All these values should be the same:
 Processor Number : 0  Before send  = 0.761171758
 x1(3) on Processor No. : 0  After  recv  = 0.761171758
 x1(3) on Processor No. : 1  After  recv  = 0.E+0
 x1(3) on Processor No. : 2  After  recv  = 0.E+0
 x1(3) on Processor No. : 3  After  recv  = 0.761171758

Processors 1 and 2 contain unexpected values in array x1.

Starting the debugger

In order to run the program under totalview on Jacquard it is necessary to start a batch job that runs an xterm on the desired number of nodes. The attached example batch script, totex.pbs, runs an xterm on 2 nodes. It reserves 2 processors on each node, so that a 4 processor job can be run.

Submit this job to pbs:

%  qsub totex.pbs 

When pbs runs the xterm job, an xterm window will pop up on your workstation. From this window you must load the totalview module and launch the job under totalview using the -tv argument to the mpiexec job launcher.

%  module load totalview 
%  mpiexec -n 4 -tv ./totex

This code will use two nodes with a total of four processes to run the executable totex. The above command will open two windows: the root window and the process window.

Setting break points

At this point, you can set break points at the lines at which you wish the program to stop during the debug run in the process window. To set a break point, left-click with the mouse on the line number. Use a left mouse click on line 18 to create a break point there. The breakpoint has been set for all of the MPI tasks. The process window will look like this once the break point is set successfully.

Advancing to breakpoints

To start the program go to the process window, and left click on the Go button.

Three windows will pop up in succession, and you must click the appropriate response on each before the program actually starts.

At the first window, left click on No. We have found that stopping the program here to set break points can lead to unpredictable results.

At the second window, left click on No again.

At the third window, left click on OK.

After this last click, the program will start executing on all processors and will stop at line 18. By default the process window shows the state of MPI process 0.

Totalview processes on Jacquard

Totalview process labelling conventions on Jacquard are somewhat confusing and inconsistent. Within MPI an N process job has processes (called ranks within the MPI program) labelled from 0 to N-1. Totalview, on the other hand, labels the processes from 1 to N. In the case of our 4 process totex run, the processes are labelled totex.1, totex.2, totex.3, and totex.4, and totalview process totex.m corresponds to MPI process m-1.

In addition, each MPI process on Jacquard consists of 3 threads, one actually running the user code and two running in the system routine ioctl. Only the thread running the user code is of interest to you. For totalview process 1 (MPI process 0), the user code is thread 1.1. For all other processes the user code thread is number 3, so if you want to look at the state of MPI process 2, you click on thread 3.3.

Debugging the example program

Once the MPI program has started executing and you have reached a break point, the root window now shows the status of each MPI process.

Left click on the plus(+) sign beside the process with the ID of 3 and Rank of 2 in the root window to see the status of each thread of MPI process 2.

To see what's happening in this process right-click the user MAIN_ thread 3.3 then choose Dive in New Window from the popup menu (Dive will change the existing process window), and a new process window will open for MPI process 2.

You can also step to adjacent processes with the "P-" and "P+" button in the lower right corner of the process window.

Finding the error

Now let's try to find the problem. Allow the program to advance through all the MPI calls and stop it right afterward by setting a breakpoint at line 52 on a process window and clicking "Go" on that window. All MPI processes of the program will advance to line 52 and stop.

Examining variables

You examine variables by right-clicking with the mouse on the variable name in the process window and selecting Dive from the popup menu. For example, right click on on the x1 variable name on line 46 of the process window. A new window containing the values of x1 on the thread represented by the process window will open.

In the upper left hand corner of this data window, you see the process.thread combination of 1.1 indicating that this is the user thread of process 1 (MPI process 0). By clicking on the arrow on the left of this process.thread number, you can advance to process 2 (MPI process 1)at 2.3. will open showing the values of x1 on processor 1.

Similarly, we can examine the values on process 3 (MPI process 2, thread 3.3) and values on process 4 (MPI process 3, thread 4.3).

Solving the problem

By looking at the values contained in the x1 array, we get a big clue to finding the solution. Since each processor has a number of non-zero elements that depends on its MPI rank, we suspect the problem is contained in one of the loops that performs the MPI_SENDs and/or MPI_RECVs.

If we first convince ourselves that line number 46 is OK, we are led to take a look at line 34. There we see that we're sending i elements of the array x1, not all im1 elements as we had intended. Once we make the change, recompile and run the program, we get the following output:

 All these values should be the same:
 Processor Number : 0  Before send  = 0.761171758
 x1(3) on Processor No. : 0  After  recv  = 0.761171758
 x1(3) on Processor No. : 1  After  recv  = 0.761171758
 x1(3) on Processor No. : 3  After  recv  = 0.761171758
 x1(3) on Processor No. : 2  After  recv  = 0.761171758

And the code has been fixed!


LBNL Home
Page last modified: Thu, 09 Nov 2006 19:10:06 GMT
Page URL: http://www.nersc.gov/nusers/systems/jacquard/software/totalview/
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science