NERSCPowering Scientific Discovery Since 1974

MAP

MAP from Allinea Software is a parallel profiler with a simple graphical user interface. It is installed on Hopper, Edison and Carver.

Note that the performance of the X Windows-based MAP Graphical User Interface can be greatly improved if used in conjunction with the free NX software.

Introduction

Allinea MAP is a parallel profiler with simple Graphical User Interface. MAP can be run with up to 512 processors, to profile serial, OpenMP and MPI codes.

The Allinea MAP web page and 'Allinea Forge User Guide' (available as $ALLINEA_TOOLS_DOCDIR/userguide-forge.pdf after loading an allineatools module or the Allineas Forge User Guide web page) are good resources for learning more about some of the advanced MAP features.

Loading the Allinea Tools Module

To use MAP, first load the 'allineatools' module to set the correct environment settings:

% module load allineatools

Compiling Code to Run with MAP

To collect performance data, MAP uses two small libraries: MAP sampler (map-sampler) and MPI wrapper (map-sampler-pmpi) libraries. These must be used with your program, which can be done relatively easily with dynamic linking on Carver. There are somewhat strict rules regarding linking order among object codes and these libraries (please read the User Guide for detailed information). But if you follow the instructions printed by MAP utility scripts, then it is very likely your code will run with MAP.

Your program must be compiled with the -g option to keep debugging symbols, together with optimization flags that you would normally use. If you use the Cray compiler on the Cray machines, we recommend the -G2 option.

Below we show build instructions using a Fortran case, but the C or C++ usage is the same.

On Carver

To minimize problems you should stay with the default dynamic linking mode on Carver.  The libraries are built and preloaded at runtime, so the build procedure is relatively simple in this case.

% mpif90 -g -c testMAP.f
% mpif90 -o testMAP_ex testMAP.o

If you use the PGI compiler, you will need to add an additonal option:

% mpif90 -o testMAP_ex testMAP.o -Wl,--eh-frame-hdr        # with PGI

The MAP sampler and MPI wrapper libraries needed by MAP are built during runtime in the ~/.allinea/${NERSC_HOST}/wrapper directory. However, if you prefer to build them in the current directory (to save runtime, for example), run the 'make-profiler-libraries' command and link your code following the instructions printed by the command:

% make-profiler-libraries
Created the libraries in /your/directory: libmap-sampler.so (and .so.1, .so.1.0, .so.1.0.0) libmap-sampler-pmpi.so (and .so.1, .so.1.0, .so.1.0.0) To instrument a program, add these compiler options: compilation for use with MAP - not required for Performance Reports: -g (or '-G2' for native Cray Fortran) (and -O3 etc.) linking (both MAP and Performance Reports): -dynamic -L/your/directory -lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr Note: These libraries must be on the same NFS/Lustre/GPFS filesystem as your program. Before running your program (interactively or from a queue), set LD_LIBRARY_PATH: export LD_LIBRARY_PATH=/your/directory:$LD_LIBRARY_PATH mpirun ... or add -Wl,-rpath=/your/directory when linking your program. % mpif90 -g -c testMAP.f % mpif90 -o testMAP_ex testMAP.o -dynamic -L/your/directory -lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr

You can optionally provide an argument to the 'make-profiler-libraries' command for a directory where the libraries are to be built. For more info, type

% make-profiler-libraries --help

Save the information about how to modify the LD_LIBRARY_PATH environment variable because you will need it when you run the program with MAP. Csh/tcsh users should use the equivalent 'setenv' command.

On Babbage

Just like Carver, the libraries are built and preloaded at runtime when the default dynamic linking is used. So building an executable can be done easily:

% mpiifort -mmic -g -c testMAP.f
% mpiifort -mmic -o testMAP_ex testMAP.o

On Cray Machines

Note: MAP does not work with pgi/14.2.0, the default PGI compiler version on Hopper, due to a bug in the compiler, which is fixed in 14.9.

Buidling an executable for MAP is more complicated on Cray machines. First, you need to explicitly build the MAP sampler and MPI wrapper libraries using 'make-profiler-libraries', and link your executable against them.

To build a statically-linked executable, follow this procedure. It creates a plain text file 'allinea-profiler.ld' which contains suggested options for linking the map libraries. You only need to use '-Wl,@/your/directory/allinea-profiler.ld' flag with this file in your link command in order to use the options contained in the file.

% make-profiler-libraries --lib-type=static
Created the libraries in /your/directory:
   libmap-sampler.a
   libmap-sampler-pmpi.a

To instrument a program, add these compiler options:
   compilation for use with MAP - not required for Performance Reports:
      -g (or '-G2' for native Cray Fortran) (and -O3 etc.)
   linking (both MAP and Performance Reports):
      -Wl,@/your/directory/allinea-profiler.ld ... EXISTING_MPI_LIBRARIES
   If your link line specifies EXISTING_MPI_LIBRARIES (e.g. -lmpi), then
   these must appear *after* the Allinea sampler and MPI wrapper libraries in
   the link line.  There's a comprehensive description of the link ordering
   requirements in the 'Preparing a Program for Profiling' section of either
   userguide-forge.pdf or userguide-reports.pdf, located in
   /usr/common/usg/allineatools/5.0.1/doc/.

% ftn -g -c testMAP.f        # Use -G2 instead of -g for the Cray compiler
% ftn -o testMAP_ex testMAP.o -Wl,@/your/directory/allinea-profiler.ld

To build a dynamically-linked executable, follow this procedure:

% make-profiler-libraries
Created the libraries in /your/directory:
   libmap-sampler.so       (and .so.1, .so.1.0, .so.1.0.0)
   libmap-sampler-pmpi.so  (and .so.1, .so.1.0, .so.1.0.0)

To instrument a program, add these compiler options:
   compilation for use with MAP - not required for Performance Reports:
      -g (or '-G2' for native Cray Fortran) (and -O3 etc.)
   linking (both MAP and Performance Reports):
      -dynamic -L/your/directory -lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr

Note: These libraries must be on the same NFS/Lustre/GPFS filesystem as your program.

Before running your program (interactively or from a queue), set
LD_LIBRARY_PATH:
   export LD_LIBRARY_PATH=/your/directory:$LD_LIBRARY_PATH
   mpirun  ...
or add -Wl,-rpath=/your/directory when linking your program.

% ftn -c -g testMAP.f          # Use -G2 for the Cray compiler
% ftn -dynamic -o testMAP_ex testMAP.o -L/your/directory -lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr

Save the information about how to reset the LD_LIBRARY_PATH because you will need it before you run MAP.

Remember that you can provide an optional argument to 'make-profiler-libraries' to build the libraries in a directory other than the current working directory.

Starting a Job with MAP

Running an X window GUI application can be painfully slow when it is launched from a remote system over internet. NERSC recommends to use the free NX software because the performance of the X Window-based DDT GUI can be greatly improved. Another way to cope with the problem is to use Allinea's remote client, which will be discussed in the next section.

You must log in with an X window forwarding enabled.  One way of ensuring this is to use the -XY flag with the ssh command.

% ssh -XY username@hopper.nersc.gov

After loading the allineatools module and compiling with the -g option, request an interactive batch session on Hopper, Edison, or Carver; be sure to export your DISPLAY environment variable into the batch environment.

% qsub -I -v DISPLAY -q debug -l mppwidth=numCores                      # on Hopper or Edison

% qsub -I -v DISPLAY -q debug -l nodes=numNodes:ppn=numTasksPerNode     # on Carver

% qsub -I -v DISPLAY -q debug - lnodes=numNodes                         # on Babbage

Load the 'allineatools' module if you haven't loaded it yet:

% module load allineatools

If you are profiling with a dynamically linked executable and you explicitly created the libraries that MAP needs, using a make-map-* commmand, run the command to modify the LD_LIBRARY_PATH that you saved when you ran the command:

% setenv LD_LIBRARY_PATH /your/directory:$LD_LIBRARY_PATH     # for csh/tcsh

$ export LD_LIBRARY_PATH=/your/directory:$LD_LIBRARY_PATH     # for bash/sh/ksh

Then, run the map command followed by the name of the executable to profile:

% map ./testMAP_ex     # or 'map -n ... ./testMAP_ex', 'map -np ... ./testMAP_ex'

or, starting from version 5.0,

% allinea-forge ./testMAP_ex

The Allinea Forge GUI will pop up with a start up menu. For profiling choose the option PROFILE with the 'allinea MAP' tool.  You can also choose to LOAD PROFILE DATA FILE to view profiling results saved in a file created in a previous MAP run.

DDT Submit window

Next a submission window will appear with a prefilled path to the executable to run. Select the number of processors on which to run and press Run. To pass command line arguments to a program enter them in the aprun arguments box.

DDT Submit window

Running MAP on Babbage requires some additional steps. Let's assume that two MIC cards, bc1012-mic0 and bc1012-mic1, are assigned to your batch job:

% get_micfile                            # to get hostfile for MIC cards
% cat micfile.$PBS_JOBID # hostfile generated by get_micfile
bc1012-mic0
bc1012-mic1

Load the allineatools module and start map as before. One way that MAP works is to run the executable in MPMD (Multiple Program Multiple Data) mode when you want to run on more than one MIC card, as though you were running separate executables, one on each MIC card. The following is how to specify this in the Run window. Please note that the sum of the processes (8 for this example) should be the sum of the MPI tasks over all cards.

DDT Submit window

On Babbage you may see the following error message:

Other: ERROR: ld.so: object 'libmap-sampler.so' from LD_PRELOAD cannot be preloaded: ignored.
...
Other: ERROR: ld.so: object '/global/homes/w/wyang/.allinea/wrapper/libmap-sampler-pmpi-bc1004-19292.so' from LD_PRELOAD cannot be preloaded: ignored.
...

Allinea suggests to ignore the message at this time.

MAP will start your program and collect performance data from all processes.

DDT Submit window

By default, MAP lets your program run to completion and will display data for the entire run.  You can also use the 'Stop and Analyze' button and the menu beneath it to control how long to profile your program.

Remote Client

Allinea provides remote clients for Windows, OS X and Linux that can run on your local desktop to connect via SSH to NERSC systems to debug, profile, edit and compile files directly on the remote NERSC machine. You can download the clients from Allinea and install on your laptop/desktop. Please note that the client version must be the same as the Allinea version that you're going to use on the NERSC machines.

For configuring the client for NERSC systems, follow the similar steps shown in the DDT web page. If you have done configuration for using DDT on a NERSC machine, the same configuration will be used for running MAP.

You can start MAP similarly. Select a NERSC machien for the 'allinea MAP' tool and login to the machine.

DDT Submit window

Click the 'PROFILE' button for the 'allinea MAP' tool. Set the run parameters and click 'Submit'. 

DDT Submit window 

Profiling Results

After completing the run, MAP displays the collected perfromance data using GUI.

DDT Submit window

The window is made of a few sections, providing different view points in presenting collected performance data.

Metrics View

The top section shows the "Metrics view," displaying a timeline of a few selected performance data. By default it shows 'Main thread activity', 'CPU floating-point (%)' for the percentage of time each rank spends in floating-point CPU instruction, and 'Memory usage (MB)' for each task's memorage usage.

Each vertical slice shows the distribution of values across (MPI) tasks at the moment. The minimum, maximum and the mean are displayed, and shading gives you an idea about how data is clustered. A region of large load imbalnce can be visually identified with a fat shaded region.

You can add more metrics (such as 'CPU floating-point' (instructions), 'CPU fp vector' (instructions), 'CPU time', 'Kernel-mode CPU time', 'MPI call duration', 'MPI point-to-point', etc.) to the view area by clicking the 'Metrics' button at the bottom and then adding the ones from the list that interest you. The metrics are available under metric menu groups: 'Activity Timelines', 'CPU Instructions', 'CPU Time', 'IO', 'Memory', and 'MPI'.

Source Code View

The center pane shows the source code, annotated with performance information to the left of each line. It shows how much total time was spent computing (dark green), communicating (blue) and I/O (orange) on that line. In a OpenMP parallel region, light green is used for multi-threaded computation time and dark grey is used for thread idle time. This coloring scheme applies to the other area, too. Only lines that spent at least 0.1% of the total time get charts.

Stacks View

The "Parallel Stacks View" area (shown when selecting the 'Main Thread Stacks ' tab in the bottom pane) lists the lines where a large wall time was spent, sorted by wallclock time. Clicking on any line jumps the code view to that position in the source code pane.

Functions View

The "Functions View", which is displayed when selecting the 'Functions' tab, shows a flat profile of the functions in your program. This is what you would see with a typical profiler tool. The value in the 'Self' column is for the time spent in the function itself (so called the "exclusive" time), the value in the 'Total' column is for the time in the function itself and all its callees (so called the "inclusive" time), and the one in the 'Child' shows the time spent in the callees only.

Project Files View

The "Project Files View" area (shown when selecting the 'Project Files' tab) offers a way to browse around and navigate through the codes. You can view functions arranged under source files. 'External Code' is typically system libraries.

 When you hover your mouse over the metrics view area, a thin hairline will appear and distribution information for the selected performance metric (i.e. the metric window where your mouse's cursor is located) will be displayed at the bottom of the metrics view area. Similar hairlines will appear in the source code pane and the bottom pane, and they move in sync with the top hairline.

One can also select a region of interest in the horizontal axis (wallclock time) by clicking the left mouse botton, dragging the mouse and then releasing the mouse button. The selected region will appear highlighted. The center and bottom pane's contents will be adjusted by the selection.

DDT Submit window

MAP saves profiling results in a file, 'executablename_#p_yyyy-mm-dd_HH-MM.map' where '#' is for the process count and yyyy-mm-dd_HH-MM' is the time stamp.

% ls -l
-rw-------  1 wyang wyang   273822 Apr  4 17:16 jacobi_mpi_24p_2015-04-04_17-16.map

You can save this file to run MAP on it to examine the profiling results later:

% map jacobi_mpi_24p_2015-04-04_17-16.map

Running in Command Line Mode

MAP can be run from the command line without GUI, by using the '-profile' option. You can submit a batch job as follows:

% cat runit
#!/bin/csh
#PBS -l mppwidth=24
#PBS -q debug
#PBS -l walltime=10:00
#PBS -j oe

cd $PBS_O_WORKDIR
module load allineatools
map --profile --np=24 ./jacobi_mpi

% qsub runit
2671863.edique02

% cat runit.o2671863
Allinea Forge 5.0.1 - Allinea MAP
Profiling             : /global/scratch2/sd/wyang/debugging/jacobi_mpi 
Allinea sampler       : statically linked
MPI implementation    : Auto-Detect (Cray X-Series (MPI/shmem/CAF))
* number of processes : 24
* Allinea MPI wrapper : statically linked

MAP analysing program...
MAP gathering samples...
MAP generated /global/scratch2/sd/wyang/debugging/jacobi_mpi_24p_2015-04-04_18-29.map
           1   60.37418
...
          10   11.93056
...

% ls -l
...
-rw-r-----  1 wyang wyang   262194 Apr  4 18:30 jacobi_mpi_24p_2015-04-04_18-29.map

Trouble Shooting

If you are having trouble launching MAP try these steps.

Make sure you have the most recent version of the system.config configuration file. The first time you run MAP, you pick up a master template which then gets stored locally in your home directory in ~/.allinea/${NERSC_HOST}/system.config where ${NERSC_HOST} is the machine name: hopper, edison or carver. If you are having problems launching MAP you could be using an older verion of the system.config file and you may want to remove the entire directory:

% rm -rf ~/.allinea/${NERSC_HOST}  

Remove any stale processes that may have been left by MAP.

% rm -rf $TMPDIR/allinea-$USER 

In case of a font problem where every character is displayed as a square, please delete the .fontconfig directory in your home directory and restart ddt.

% rm -rf ~/.fontconfig

Make sure you are requesting an interactive batch session on Hopper, Edison and Carver. NERSC has configured MAP to run from the interactive batch jobs.

% qsub -I -v DISPLAY -q debug -lmppwidth=numCores                      # Hopper or Edison
% qsub -I -v DISPLAY -q debug -lnodes=numNodes:ppn=numTasksPerNode     # on Carver

Finally make sure you have compiled your code with -g. If none of these tips help, please contact the consultants at consult@nersc.gov. 

Installed Versions

PackagePlatformCategoryVersionModuleInstall DateDate Made Default
Allinea tools babbage applications/ debugging 5.0-40932 allineatools/5.0-40932 2015-02-26
Allinea tools babbage applications/ debugging 5.0.1 allineatools/5.0.1 2015-04-02 2015-04-03
Allinea tools babbage applications/ debugging 5.0.1-42253 allineatools/5.0.1-42253 2015-06-05 2015-06-05
Allinea tools babbage applications/ debugging 5.0.1-42607 allineatools/5.0.1-42607 2015-06-19 2015-06-19
Allinea tools carver applications/ debugging 4.2.1-36484 allineatools/4.2.1-36484 2014-05-22
Allinea tools carver applications/ debugging 5.0-40932 allineatools/5.0-40932 2015-05-13
Allinea tools carver applications/ debugging 5.0.1 allineatools/5.0.1 2015-05-13 2015-05-13
Allinea tools carver applications/ debugging 5.0.1-42253 allineatools/5.0.1-42253 2015-05-13
Allinea tools carver applications/ debugging 5.0.1-42607 allineatools/5.0.1-42607 2015-06-19
Allinea tools carver_sl6 applications/ debugging 5.0-40932 allineatools/5.0-40932 2015-02-26
Allinea tools carver_sl6 applications/ debugging 5.0.1 allineatools/5.0.1 2015-04-02 2015-04-03
Allinea tools carver_sl6 applications/ debugging 5.0.1-42253 allineatools/5.0.1-42253 2015-05-13
Allinea tools edison applications/ debugging 5.0-40932 allineatools/5.0-40932 2015-02-12
Allinea tools edison applications/ debugging 5.0.1 allineatools/5.0.1 2015-04-02 2015-04-03
Allinea tools edison applications/ debugging 5.0.1-42253 allineatools/5.0.1-42253 2015-05-13 2015-06-03
Allinea tools edison applications/ debugging 5.0.1-42591 allineatools/5.0.1-42591 2015-06-09
Allinea tools edison applications/ debugging 5.0.1-42607 allineatools/5.0.1-42607 2015-06-19 2015-06-19
Allinea tools hopper_cle52 applications/ debugging 5.0-40932 allineatools/5.0-40932 2015-03-03
Allinea tools hopper_cle52 applications/ debugging 5.0-41047 allineatools/5.0-41047 2015-03-03
Allinea tools hopper_cle52 applications/ debugging 5.0.1 allineatools/5.0.1 2015-04-02 2015-04-03
Allinea tools hopper_cle52 applications/ debugging 5.0.1-42253 allineatools/5.0.1-42253 2015-05-13 2015-06-03
Allinea tools hopper_cle52 applications/ debugging 5.0.1-42591 allineatools/5.0.1-42591 2015-06-09
Allinea tools hopper_cle52 applications/ debugging 5.0.1-42607 allineatools/5.0.1-42607 2015-06-19