NERSCPowering Scientific Discovery Since 1974

Reveal

Description

Cray Reveal is part of the Cray Perftools software package. It utilizes the Cray CCE program library (hence it only works under PrgEnv-cray) for loopmark and source code analysis, combined with performance data collected from CrayPat. Reveal helps to identify top consuming loops, with compiler feedback on dependency and vectorization.  Its loop scope analysis provides variable scope and compiler directive suggestions for inserting OpenMP parallelism to a serial or pure MPI code.

Steps to use Cray Reveal

Reveal is available on Hopper and Edison under PrgEnv-cray by loading the Cray perftools module.  It has conflict with the NERSC default loaded "darshan" module (used for collecting IO information).   Reveal has a GUI interface, so it needs X11 access.  Make sure to access Hopper or Edison via "ssh -X -Y" option.

1. The basic steps to setup user environment are as follows

% module swap PrgEnv-intel PrgEnv-cray         # on Edison
  or: % module swap PrgEnv-pgi PrgEnv-cray     # on Hopper
% module unload darshan
% module load perftools

2. Generate loop work estimates

2a) Build with -h profile_generate

Fortran code example:

% ftn -c -h profile_generate myprogram.f90
% ftn -o myprogram -h profile_generate myprogram.o

 C code example (will use this code in the following steps):

% cc -c -h profile_generate myprogram.c
% cc -o myprogram -h profile_generate myprogram.o 

Note: It is a good idea to separate compile and link to preserve object files.   It is also suggested to separate this step from generating program library (with -hpl)
since -h profile_generate disables all optimizations. 

2b) Build craypat executable 

% pat_build -w myprogram
The executable "myprogram+pat" will be generated.  Here "-w" flag is to enable tracing.

2c) Run this program, to generate performance raw data in *.xf format.  Below is a simple batch interactive session exmaple.  A regular batch script can also be used to launch the "myprogram+pat" program.
% qsub -I -v DISPLAY -lmppwidth=24
...  
% cd $PBS_O_WORKDIR
% aprun -n 24 ./myprogram+pat
...
% exit 

It generates one or more raw data files in *.xf format.  For example: myprogram+pat+2882429-4180t.xf

2d) Generate *.ap2 and *.rpt files via pat_report

% pat_report myprogram+pat+....xf > myprogram+pat.rpt
or
% pat_report -o myprogram+pat.rpt myprogram+pat+....xf

 3. Generate a program library

% cc -O3 -hpl=myprogram.pl -c myprogram.c

 Notes:  

a) -O3 can be used here to keep the optimization level  
b) one source code a time, use "-c"
c) If there are multiple source code directories, this program library directory needs to be an absolute path
d) myprograml.pl is a directory, users need to clean it from time to time

4. Save an original copy of your source code, since the Reveal suggested code may overwrite your original version.

% cp myprogram.c myprogram.c.orig      

5. Launch Reveal.  (Needs X11 access, make sure to access Hopper or Edison via "ssh -X -Y" option)

% reveal myprogram.pl myprogram+pat+....ap2      (# use the exact *.ap2 file name)

6. Choose "Top Loops" view (instead ot the "Program View") from the "Navigation" scroll menu, pick some of the high time consuming loops, start scoping, and insert directives.

The left side panel lists the top time consuiimg loops.   The top right panel displays the source code.  The right bottem panel displays the compiler information about a loop.


 











Double click a line in the "info" section displays more explanations of a compiler decision about each loop, whether it is vectorized or unrolled, for example:










Right click a loop from the left panel, then choose "Scope Loop", a new "Reveal OpenMP Scoping" window will pop up:








Click the left bottom button of "Start Scoping", the scoping results of each variable will be provided. Some of the variables are marked red as "Unresolved" with the reason why it fails in scoping.
















Click "Show Directive", the Reveal suggested OpenMP Directive will be given:










Click "Insert Directive":

 







 

Click the top right "Save" button at the main window,  a "Save Source" window will pop up:







choose "Yes", and a file with the same name of the original file with the OpenMP directives inserted will be created.

The above steps can be repeated with one loop at a time.  Save.  Notice the newly saved file will have the same file name of your original code. 

% cp myprogram.c myprogram.c.reveal (# this is the code with OpenMP directives generated by Reveal)
% cp myprogram_omp.c.orig myprogram.c (# this is the copy of your original code)
% cp myprogram.c.reveal myprogram_omp.c

7.  Now work with myprogram_omp.c:
a) Start to resolve all unresolved variables: change them to private or shared

For example, Reveal provides the directives for the following loop:

// Directive inserted by Cray Reveal.  May be incomplete.
#pragma omp parallel for default(none)                                   \
        unresolved (my_change,j)                                         \
        shared  (my_n,N,u_new,u,i)
      for ( j = 1; j <= N; j++ )
      {
        if ( u_new[INDEX(i,j)] != 0.0 )
        {
          my_change = my_change
            + fabs ( 1.0 - u[INDEX(i,j)] / u_new[INDEX(i,j)] );

          my_n = my_n + 1;
        }
      }

Notice  the keyword "unresolved" above since Reveal could not resolve the data scope. We need to change them to "private (my_change,j)", and save a new copy of myprogram_omp.c
b) Compile with openmp enabled. You can do it under any PrgEnv.  Resolve compilation warnings and errors.
c) Compare performance between myprogram and myprogram_omp

More Information