SoftwareCompilersLibraries Applications Tools & Utilities Software by PlatformHopperFranklin Bassi Jacquard DaVinci PDSF HPSS Affiliated CollectionsACTS Collection
|
PAPI at NERSCIntroductionThe Performance API (PAPI) specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors. PAPI was developed at the Innovative Computing Laboratory at the University of Tennessee. PAPI provides two interfaces to the underlying counter hardware; a simple, high level interface for the acquisition of simple measurements and a fully programmable, low level interface directed towards users with more sophisticated needs. PAPI provides portability across different platforms. It uses the same routines with similar argument lists to control and access the counters for every architecture. As part of PAPI, there is a predefined set of events that represents the lowest common denominator of most counter implementations. The intent is that the same source code will count similar and possibly comparable events when run on different platforms. Using PAPITo use PAPI to examine the performance of your program, you must insert calls to one or more PAPI routines into your code and compile with the PAPI library. The full functionality of PAPI is only available to C programs, although many routines are callable from Fortran starting with version 1.2. The PAPI library is made available through the module command. Use the following to compile and link a program.
% module load papi
% cc -c a.c ${PAPI}
% cc -o a.out a.o b.o ... ${PAPI}
% xlf -c a.f ${PAPI}
% xlf -o a.out a.o b.o ... ${PAPI}
Getting StartedOne of the high-level PAPI routines is PAPI_flops. This routine counts flips (floating point instructions), real and processor time, and shows Mflips/s. The first call to PAPI_flops will initialize PAPI, setup the counters to monitor the PAPI_FP_INS and PAPI_TOT_CYC events and start the counters. Subsequent calls will read the counters and return total real time, total process time, total floating point instructions since the start of the measurement and Mflips/s rate since last call to PAPI_flops(). The calling sequence for PAPI_flops is: #include <papi.h> int PAPI_flops (float *rtime, float *ptime, long long *flpins, float *mflips); for C programs, and: include "f90papi.h" integer iret real (kind=4) rtime, ptime, mflips integer (kind=8) flpins call PAPIF_flops(rtime, ptime, flpins, mflips, irc) for Fortran programs. There are two other PAPI header files for Fortran:
To obtain performance data for your program, insert a call to PAPI_flops at the beginning and end of the main program, and print the values produced by the second call. The IBM PWR3 hardware has a compound multiply and add instruction - FMA. The PAPI_flops counts each FMA as a single instruction, even though it performs two floating point operations. In order to count the number of flops for a program we need to count the number of FMA instructions, the number of floating point-instructions and add them. The PAPI_presets man page shows all the events available, the two events we need are PAPI_FMA_INS and PAPI_FP_INS. In Fortran, this could be programmed as: #include "fpapi.h" ... integer*8 values(2) integer counters(2), ncounters, irc irc=PAPI_VER_CURRENT call papif_library_init(irc) counters(1)=PAPI_FMA_INS counters(2)=PAPI_FP_INS ncounters=2 call papif_start_counters(counters, ncounters, irc) ... put your code here ... call papif_stop_counters(values, ncounters, irc) write(6,*) 'Total FMA ', values(1), ' Total FP ', values(2) PAPI DocumentationThe PAPI Home Page has links to a number of presentations on PAPI, and a set of PAPI Manual Reference Pages. In addition, man pages are available. See man PAPI or man PAPIF for an introduction. |
![]() |
Page last modified: Fri, 06 Mar 2009 22:24:22 GMT Page URL: http://www.nersc.gov/nusers/resources/software/tools/papi.php Web contact: webmaster@nersc.gov Computing questions: consult@nersc.gov Privacy and Security Notice |
![]() |