NERSCPowering Scientific Discovery Since 1974

Advisor

Introduction

Intel Advisor provides two workflows to help ensure that Fortran, C and C++ applications can make the most of today's processors:

  • Vectorization Advisor identifies loops that will benefit most from vectorization, specifies what is blocking effective vectorization, finds the benefit of alternative data reorganizations, and increases the confidence that vectorization is safe.

  • Threading Advisor is used for threading design and prototyping and to analyze, design, tune, and check threading design options without disrupting normal code development.

For more information on Intel Advisor visit https://software.intel.com/en-us/intel-advisor-xe

Using Intel Advisor on Edison and Cori

To launch Advisor, the Lustre File System should be used instead of GPFS. Either the command line tool, "advixe-cl" or the GUI can be used. We recommend you to use the commandl ine tool,  "inspxe-cl",  to collect data via batch jobs, and then display results using the GUI, "inspxe-gui", on a login node on Edison.  

Compiling Codes to Run with Advisor

Additional Compiler Flags

In order to compile the code to work with Advisor, some additional flags need to be used.

Cray Compiler Wrapper (ftn, cc, CC)

When using the Cray compiler wrappers to compile codes to work with Advisor,  the '-g' and the '-dynamic' flags should be used. it is recommended that a minimum optimization level of 2 should be used for compiling codes that will be analyzed using Intel Advisor. To compile a C code for MPI as well as OpenMP, use the following command:

cc -g -dynamic -openmp -O2 -o mycode.exe mycode.c

Here, the -g option is needed to assist Advisor to associate addresses to source lines, and the -dynamic option is needed to build dynamically linked applications with the compiler wrappers on Edison (the compiler wrappers, ftn, cc, and CC, link applications statically by default). 

Without the -dynamic option, the following error is generated:

% module load advisor
% cc -g -openmp -o mycode.exe mycode.c
% srun -n 1 -c 8 advixe-cl --collect survey --project-dir ./myproj  -- ./mycode.exe
advixe: Error: Binary file of the analysis target does not contain symbols required for profiling. See the 'Analyzing Statically Linked Binaries' help topic for more details.
advixe: Error: Valid pthread_spin_trylock symbol is not found in the binary of the analysis target.
With Intel native compilers (mpiifort, mpiicc, mpiicpc)

When using the Intel native compilers to compile codes to work with Advisor,  the '-g'  flag should be used. There is no need to use the '-dynamic' flag because it is already a dynamic build. To compile a C code for MPI as well as OpenMP, use the following command:

% mpiicc -g -openmp -O3 -o mycode.impi mycode.c

Launching Advisor with Single MPI Rank

It is recommended that the following commands should be executed from the Lustre file system.

Using Cray compiler wrappers

To launch Advisor for an MPI plus OpenMP code, use the following commands:

% salloc -N 1 -t 30:00 -q debug
% module load advisor
% export OMP_NUM_THREADS=8
% cc -g -dynamic -openmp -o mycode.exe mycode.c
% srun -n 1 -c 8 advixe-cl --collect survey --project-dir ./myproj  -- ./mycode.exe
This will store the results of the analysis performed by Advisor in the 'myproj' directory.
 

Using Intel native compilers

To launch Advisor for an MPI plus OpenMP code use the following commands:

% salloc -N 1 -t 30:00 -q debug
% module load advisor
% export OMP_NUM_THREADS=8
% module load impi
% mpiicc -g -openmp  -o mycode.exe mycode.c
% export I_MPI_PMI_LIBRARY=/opt/slurm/default/lib/pmi/libpmi.so
% srun -n 1 -c 8 advixe-cl --collect survey --project-dir ./myproj  -- ./mycode.exe

This will store the results of the analysis performed by Advisor in the 'myproj' directory. 

Launching Advisor with Multiple MPI Ranks

It is recommended that the following commands should be executed from the Lustre file system.

Using MPMD

This can be done using code compiled with Cray compiler wrappers or Intel native compiler.

Using Cray Compiler Wrappers

To launch Advisor using MPMD for an MPI plus OpenMP code having multiple MPI ranks, use the following commands which involve creating the 'mpmd.conf' file:

% salloc -N 1 -t 30:00 -q debug  
% module load advisor
% export OMP_NUM_THREADS=8
% vi mpmd.conf

Contents of mpmd.conf:

0 advixe-cl --collect survey --project-dir ./myproj -- ./mycode.exe
1-3 ./mycode.exe

Compilation and Execution:

% cc -g -dynamic -openmp -O3 -o mycode.exe mycode.c
% srun --multi-prog ./mpmd.conf
Using Intel Native Compiler

To launch Advisor using MPMD for an MPI plus OpenMP code having multiple MPI ranks, use the following commands which involve creating the 'mpmd.conf' file:

% salloc -N 1 -t 30:00 -q debug  
% module load advisor
% export OMP_NUM_THREADS=8
% vi mpmd.conf

Contents of mpmd.conf:

0 advixe-cl --collect survey --project-dir ./myproj -- ./mycode.exe
1-3 ./mycode.exe

Compilation and Execution:

% module load impi
% mpiicc -g -openmp -O3 -o mycode.exe mycode.c
% export I_MPI_PMI_LIBRARY=/opt/slurm/default/lib/pmi/libpmi.so
% srun --multi-prog ./mpmd.conf

Using a script

This can be done using code compiled with Cray compiler wrappers or Intel native compiler.

Using Cray Compiler Wrappers

To launch Advisor using a script for an MPI plus OpenMP code having multiple MPI ranks, use the following commands which involve creating a script:

% salloc -N 1 -t 30:00 -q debug  
% module load advisor
% export OMP_NUM_THREADS=8
% vi ascript

Contents of ascript:

#!/bin/bash
if [ $SLURM_PROCID -eq 0 ]
then
advixe-cl --collect survey   --search-dir src:r=./ -- ./mycode.exe
else
./mycode.exe
fi

Compilation and Execution:

% cc -g -dynamic -openmp -O3 -o mycode.exe mycode.c
% srun -n 4  -c 8 ./ascript
Using Intel Native Compiler

To launch Advisor using a script for an MPI plus OpenMP code having multiple MPI ranks, use the following commands which involve creating a script:

% salloc -N 1 -t 30:00 -q debug  
% module load advisor
% export OMP_NUM_THREADS=8
% vi ascript

Contents of ascript:

#!/bin/bash
if [ $SLURM_PROCID -eq 0 ]
then
advixe-cl --collect survey   --search-dir src:r=./ -- ./mycode.exe
else
./mycode.exe
fi

Compilation and Execution:

% module load impi
% mpiicc -g -openmp -O3 -o mycode.exe mycode.c
% export I_MPI_PMI_LIBRARY=/opt/slurm/default/lib/pmi/libpmi.so
% srun –n 4 ./ascript

Using 'mpirun' 

This can be done using code compiled with Intel native compiler only.

Using Intel Native Compiler

To launch Advisor using 'mpirun' for an MPI plus OpenMP code having multiple MPI ranks, use the following commands:

% salloc -N 1 -t 30:00 -q debug  
% module load advisor
% export OMP_NUM_THREADS=8
% module load impi
% mpiicc -g -openmp -O3 -o mycode.exe mycode.c
% mpirun -n 4 advixe-cl --collect survey --project-dir ./myproj  -- ./mycode.exe

The I_MPI_PMI_LIBRARY environment variable needs to be unset for this.

Using the '-trace-mpi' flag

This can be done using code compiled with Cray compiler wrappers or Intel native compiler. However, this option is not available in the current version and is expected to be available in future versions of Advisor.

Using Cray Compiler Wrappers

To launch Advisor using the '-trace-mpi' flag for an MPI plus OpenMP code having multiple MPI ranks, use the following commands:

% salloc -N 1 -t 30:00 -q debug  
% module load advisor
% export OMP_NUM_THREADS=8
% cc -g -dynamic -openmp -O3 -o mycode.exe mycode.c
% srun -n 4  -c 8 advixe-cl --collect survey --trace-mpi --project-dir ./myproj  -- ./mycode.exe
Using Intel Native Compiler

To launch Inspector using the '-trace-mpi' flag for an MPI plus OpenMP code having multiple MPI ranks, use the following commands:

% salloc -N 1 -t 30:00 -q debug  
% module load advisor
% export OMP_NUM_THREADS=8
% module load impi
% mpiicc -g -openmp -O3 -o mycode.exe mycode.c
% export I_MPI_PMI_LIBRARY=/opt/slurm/default/lib/pmi/libpmi.so
% srun -n 4 -c 8 advixe-cl --collect survey --trace-mpi --project-dir ./myproj  -- ./mycode.exe

Using the GUI to View Results

Note that the performance of the X Windows-based Graphical User Interface can be greatly improved if used in conjunction with the free NX software.

Launching Advisor in GUI Mode

Login to Edison using the following command:

ssh -XY edison.nersc.gov

In the login node load Advisor module and then open the GUI.

% module load advisor
% advixe-gui

Viewing Results using the GUI 

Advisor open res

Use the 'Open Result' button to browse for and open the '.advixeexp' file in the directory that contains the result. Then, you should see a screen similar to the following one which shows a list of top time consuming loops:

Advisor Result

To exit the GUI, simply click the cross on the top left hand corner of the Advisor dialog box.

Some Important command Line Options for Intel Advisor

The general Intel Advisor 'advixe-cl' command syntax is:

advixe-cl <-action> [-project-dir PATH] [-action-options] [-global-options] [[--] target [target options]]

In our case, we use 'srun' or 'mpirun' with this command. Here, <-action> specifies the action to perform, such as collect. There must be only one action per command. There are a number of available actions, but 'report' and 'collect' are the most common. Following is a list of the available 'action-options' for these two types of actions:

Options for the Collect Action

Option

Description

survey

Surveys the  application and collects data about sites in the code that may benefit from parallelism. 

suitability

Collects suitability data by executing the annotated code to analyze the proposed parallelism opportunities and estimate where performance gains are most likely.

correctness

Collects correctness data from annotated code and helps to predict and eliminate data sharing problems.

The search-dir option  should be used to specify which directories store the source, symbol and binary files that are to be used during analysis. For the collect action option 'suitability', the annotations can only be found if the location of the source file is known. To perform a Suitability Analysis, the following command can be used. This command also specifies the source directory.

srun -n 1 advixe-cl -search-dir src:=/scratch2/scratchdirs/ahanarc --collect suitability --project-dir ./o3c  -- ./mulmvma 10000

Options for the Report Action

Option

Description

survey

Generates a report on the data obtained from the Survey analysis

suitability

Generates a report on the data obtained from the Suitability analysis

correctness

Generates a report on the data obtained from the Correctness analysis data

annotations

Generates an Annotation report. This displays the location of annotations in the source code.

summary

Generate a Summary report, which summarizes the analysis. 

Using the Advisor GUI 

In order to launch Advisor in GUI mode so that the code is executed on the compute nodes, use the following commands:

% ssh -XY edison.nersc.gov
% cd $SCRATCH
% salloc -N 1 -t 30:00 -q debug
% module load advisor
% advixe-gui

Creating a Project

To create a project, click on the 'New Project' button in the Welcome screen.

Advisor create proj

Then enter the name of the project and click the 'Create Project' button.

Advisor create proj2Next, browse for and select the binary file that is to be executed. If required, also specify the parameters to be passed to the application and the required environment variables and their values.

You might also want to modify the working directory and the directory where the results will be stored. By default, the result directory is the same as the project directory. 

Advisor proj prop1

In the 'Source Search' tab, browse for and select the directory that contains your source file. 

Advisor proj prop2

Then click on the 'OK' button to create the project.

Opening a Project

To open an existing project, click on the 'Open Project' button on the welcome screen.

Advisor open proj1Browse for and select the '.advixeproj' file in the project directory and then click the 'Open' button.

Advosr open proj2

Collecting Survey Data

After opening a project, click on the 'Collect Survey data' button in the Workflow or the 'Collect' button in the Survey 'Survey Target' box.

Advisor survey report1

This executes the code and provides an analysis. It shows the time taken to execute the loops in decreasing order of time. It also shows the source code for the selected loop.

Advisor survey report2 

Inserting Annotations

Double click on a the line representing a specific loop in the survey output to open the following window:

Advisor annotations1

The lower part of the window shows annotation suggestions. Use the 'Copy to Clipboard' button in order to copy the annotation suggestion. The annotation suggestions provide a description of exactly how the annotations should be placed in the source code.

Advisor annotations2After copying the annotations, double click on any part of the code in the upper half of the window to open the source code file in an editor.

Advisor annotate3

Insert the annotations in the correct positions and save the file. Here, the annotation indicates the intent to parallelize a simple loop. Then, build the application again using the following command:

icpc -g -openmp -I${ADVISOR_XE_2016_DIR}/include -o mulmvs10 mulmv_fp.c

The '-I${ADVISOR_XE_2016_DIR}/include' option is used so that the annotations for Advisor can be recognized.

NOTE: In order to access all the following types of analysis, you may have to click the Threading Workflow/ Vectorization Workflow button at the bottom left hand corner of the window.

Performing Suitability Analysis for the Annotations

After compiling the annotated source file, collect the 'Survey' report once again. Then, click on the 'Collect' button in the 'Check Suitability' box in order to analyze the annotated program to check its predicted parallel performance.

Advisor suitability report1

Once the analysis has been performed, you will see the details of the result as follows:

Advisor suitability report2

By default, Advisor uses CPU as the target system with 8 threads and Intel TBB as the threading model. However, it is possible to increase or decrease the number of threads, change the Threading Model to any one of the other available options( including OpenMP) and change the Target System to Intel Xeon Phi. The number of coprocessor threads to be executed on the Intel Xeon Phi can also be selected.

The suitability analysis result also shows:

  1. Expected speedup
  2. Scalability graph: The green region in the scalability graph shows that the program scales well and the advantage to be obtained from parallelizing the code is well worth the effort. The yellow region indicates that there will be some advantage when the annotated part is parallelized but it may or may not justify the required effort. The red part indicates that parallelizing the annotated part might even degrade performance and is not worth the effort. The small red circle shows the currently selected conditions and marks out its location on the graph.
  3. Runtime Environment: This indicates the amount of performance gain that can be obtained by selecting a runtime environment that minimizes different types of overheads or allows task chunking. Select the checkboxes against these options to identify the performance improvement or see the best possible performance.
  4. The Task Modeling allows the user to model for different sizes of data sets ( by changing the number of iterations). Also, the duration of each iteration can be modified. This helps to see how the parallel code will scale.
  5. It also shows the current percentages of Load Imbalance, Lock Contention and Runtime Overhead.

Advisor suitability report3

Comparison of Advisor Estimated Performance and Actual, Measured Performance

Advisor graph2

Comparison between Advisor estimated and measured wall clock times. The % variation range is 3-15% and increases with increasing numbers of threads.

Performing Trip Count Analysis

To find how many iterations of each loop are executed, click on the 'Collect' button in the 'Trip Count' box. However, this should only be done after the Survey information has been collected.

Advisor trip count1The number of times each loop was executed is displayed as follows:

Advisor trip count2

Marking Loops for Deeper Analysis

In order to perform dependency analysis of a loop or check the memory access pattern, the check box next to the specific loop in the Survey report has to be marked.

Screen Shot 2016 06 29 at 5.25.53 PM

Check Dependences

To check loop-carried dependences in the loops that have been marked for deeper analysis, click on the 'Collect' button in the 'Check Dependences' box.

Advisor Dependency1

 

Advisor dependency2

As in the above screenshot, if there is no loop-carried dependency, the report will specify that. In case there is any loop-carried dependency, the report will specify the kind of dependency and when that row is selected, the bottom part of the window will show the line of code that causes this dependency.

Advisor dependencies

 

Checking Memory Access Patterns

To check the memory access patterns in the loops that have been marked for deeper analysis, click on the 'Collect' button in the 'Check Memory Access Patterns' box. 

Advisor memory1

This analysis specifies the stride at which the data is accessed in the loop and helps in optimizations that can improve memory access, prefetching and locality.

Advisor memory2

 

The access pattern is displayed in the form of 'x%/y%/z%'. The significance of this is displayed when the mouse pointer is made to point to any one of the cells in the 'Strides Distribution' column.

Screen Shot 2016 06 29 at 5.45.29 PM

 

Roofline tool on Cori

The latest versions of Advisor (v2018) provide the Roofline model  automation. Two analyses, the survey and tripcounts, are required to run for the Roofline analysis. We ran into some issues to use this feature on Cori (this feature is still in its early development stage), especially on Cori KNL. While Intel (Cray as well) work to resolve the issues, the following job scripts worked to collect data for the Roofline analysis.

A sample job script to collect data for Roofline analysis on Cori KNL with an application linked to Cray MPICH

#!/bin/bash -l 
#SBATCH -q regular
#SBATCH -C knl,quad,cache
#SBATCH -N 4
#SBATCH -t 8:00:00                                                            

export OMP_PROC_BIND=true
export OMP_PLACES=threads
export OMP_NUM_THREADS=8                                                                  

module swap craype-haswell craype-mic-knl                                                                                                                                      
module load advisor/2018.integrated_roofline                                                                                                                            

export PMI_MMAP_SYNC_WAIT_TIME=3600
export PMI_CONNECT_RETRIES=3600

srun -N 4 -n 64 -c 16 --cpu_bind=cores run_survey.sh 

srun -N 4 -n 64 -c 16 --cpu_bind=cores run_tripcounts.sh
                                                                                                                              
where, the run_survye.sh and run_tripcounts.sh look like follows:
cat run_survey.sh

#!/bin/bash

if [[ $SLURM_PROCID == 0 ]];then
advixe-cl -collect=survey --project-dir knl-result -data-limit=0 -- ./a.out                                                                   
else
sleep 30
./a.out
fi
                                
cat run_tripcounts.sh
#!/bin/bash
if [[ $SLURM_PROCID == 0 ]];then
advixe-cl -collect=tripcounts -flop --project-dir knl-result -data-limit=0 -- ./a.out                                                               
else                                                                
./a.out                                                                   
fi 

                    
A sample job script to collect data for Roofline analysis on Cori KNL with an application linked to Intel MPI 

#!/bin/bash -l
#SBATCH -q regular
#SBATCH -C knl,quad,cache
#SBATCH -N 4
#SBATCH -t 6:00:00

export OMP_PROC_BIND=true
export OMP_PLACES=threads
export OMP_NUM_THREADS=8

module load impi

export I_MPI_PMI_LIBRARY=/usr/lib64/slurmpmi/libpmi.so
export I_MPI_FABRICS=shm:tcp

module load advisor/2018.integrated_roofline

export PMI_MMAP_SYNC_WAIT_TIME=1800

srun -N 4 -n 64 -c 16 --cpu_bind=cores run_survey.sh

srun -N 4 -n 64 -c 16 --cpu_bind=cores run_tripcounts.sh
%cat run_survey.sh
#!/bin/bash
if [[ $SLURM_PROCID == 0 ]];then
advixe-cl -collect=survey --project-dir impi-knl3 -data-limit=0 -- ./a.out
else
./a.out
fi
% cat run_tripcounts.sh                                                                       
#!/bin/bash                                                              
if [[ $SLURM_PROCID == 0 ]];then                                                             
advixe-cl -collect=tripcounts -flops-and-masks --project-dir impi-knl3 -data-limit=0 -- ./a.out                                                             
else                                                                
./a.out                                                                      
fi          



Intel has posted a video on YouTube about how to use this functionality.

 

Downloads

  • mulmv-annotated.c.txt |
    This file contains the annotations.
  • mulmv.c.txt |
    This is the sample code used for the Advisor analysis. It is a matrix and vector multiplication code.