NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
PackagePlatformCategoryVersionModule Install DateDate Made Default
papi bassi applications/performance 3.2.1 papi
papi bassi applications/performance 3.5.0 papi
papi bassi applications/performance 3.6.0 papi
papi davinci applications/performance 3.1.0 papi
papi franklin applications/performance 3.5.99b xt-papi/3.5.99b
papi franklin applications/performance 3.5.99c xt-papi/3.5.99c
papi franklin applications/performance 3.6.1a xt-papi
papi franklin applications/performance 3.6.2 xt-papi/3.6.2
papi jacquard applications/performance 3.1.0 papi

PAPI at NERSC

Introduction

The Performance API (PAPI) specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors. PAPI was developed at the Innovative Computing Laboratory at the University of Tennessee.

PAPI provides two interfaces to the underlying counter hardware; a simple, high level interface for the acquisition of simple measurements and a fully programmable, low level interface directed towards users with more sophisticated needs.

PAPI provides portability across different platforms. It uses the same routines with similar argument lists to control and access the counters for every architecture. As part of PAPI, there is a predefined set of events that represents the lowest common denominator of most counter implementations. The intent is that the same source code will count similar and possibly comparable events when run on different platforms.

Using PAPI

To use PAPI to examine the performance of your program, you must insert calls to one or more PAPI routines into your code and compile with the PAPI library. The full functionality of PAPI is only available to C programs, although many routines are callable from Fortran starting with version 1.2.

The PAPI library is made available through the module command. Use the following to compile and link a program.

% module load papi

% cc -c a.c ${PAPI}
% cc -o a.out a.o b.o ... ${PAPI}

% xlf -c a.f ${PAPI}
% xlf -o a.out a.o b.o ... ${PAPI}

Getting Started

One of the high-level PAPI routines is PAPI_flops. This routine counts flips (floating point instructions), real and processor time, and shows Mflips/s. The first call to PAPI_flops will initialize PAPI, setup the counters to monitor the PAPI_FP_INS and PAPI_TOT_CYC events and start the counters. Subsequent calls will read the counters and return total real time, total process time, total floating point instructions since the start of the measurement and Mflips/s rate since last call to PAPI_flops().

The calling sequence for PAPI_flops is:

#include <papi.h> 

int PAPI_flops (float *rtime, float *ptime, 
			long long *flpins, float *mflips);

for C programs, and:

include "f90papi.h"

integer iret
real (kind=4) rtime, ptime, mflips
integer (kind=8) flpins

call PAPIF_flops(rtime, ptime, flpins, mflips, irc)

for Fortran programs.

There are two other PAPI header files for Fortran:

  • fpapi.h - this interface relies on a preprocessing step that IBM's xlf Fortran compiler will perform using filenames with suffix .F or .F90. With this file, you must use the '#include' syntax.
  • f77papi.h - this file is formatted for Fortran 77 fixed format source code.

To obtain performance data for your program, insert a call to PAPI_flops at the beginning and end of the main program, and print the values produced by the second call.

The IBM PWR3 hardware has a compound multiply and add instruction - FMA. The PAPI_flops counts each FMA as a single instruction, even though it performs two floating point operations. In order to count the number of flops for a program we need to count the number of FMA instructions, the number of floating point-instructions and add them. The PAPI_presets man page shows all the events available, the two events we need are PAPI_FMA_INS and PAPI_FP_INS.

In Fortran, this could be programmed as:

#include "fpapi.h"
...
  integer*8 values(2)
  integer counters(2), ncounters, irc

  irc=PAPI_VER_CURRENT
  call papif_library_init(irc)
  counters(1)=PAPI_FMA_INS
  counters(2)=PAPI_FP_INS
  ncounters=2
  call papif_start_counters(counters, ncounters, irc)

...

  put your code here

...
  call papif_stop_counters(values, ncounters, irc)
  write(6,*) 'Total FMA ', values(1), ' Total FP ', values(2)

PAPI Documentation

The PAPI Home Page has links to a number of presentations on PAPI, and a set of PAPI Manual Reference Pages.

In addition, man pages are available. See man PAPI or man PAPIF for an introduction.


LBNL Home
Page last modified: Fri, 06 Mar 2009 22:24:22 GMT
Page URL: http://www.nersc.gov/nusers/resources/software/tools/papi.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science