IBM Compiler
|
IBM Compiler Optimization Argument ExamplesThis describes the compilation and run time impact on several different publicly available benchmarks of a variety of compiler optimization arguments. IntroductionPublicly available benchmarks are compiled and run with several different sets of optimization options and the performance recorded. The time required to compile and link the code is also recorded, and the results summarized. The following information is given for each benchmark:
The numbers given in the tables below for the individual benchmarks are the best of several dedicated runs in batch mode. LinpackThe Linpack benchmark solves a dense set of linear equations. The version tested here is the 1000x1000 double precision version obtained from 1000d. It is contained in a single 755 line source code file containing 11 subroutines and functions in addition to the main driver. It is a simple Fortran 77 code originally written in 1978 and last modified in 1992.
Source Code ChangesAll references to the second() in the source were replace with references to rtc(). Compile ChangesThe code was compiled with the xlf compiler with no options beyond the optimization options. TimesThese runs were done with the 8.1.1.3 version of xlf in December, 2003. The Compile Time is the wall-clock time for the compile and link returned by the unix time command. The MFLOPS result is that returned by the internal timer in the code. Results
CommentsThere was no significant difference in the code's performance at any optimization level when the threaded compiler (xlf_r) was used or when the mass library was included. In this case, the recommended optimization options, -O3 -qstrict -qarch=pwr3 -qtune=pwr3 compilation give close to the best performance obtainable by any other optimization options. This example exhibits the limitations of the use of compiler options alone to improve a code's performance, since the best performance obtained on a single processor in this example attains only 18% of the processor's theoretically peak performance of 1.5 GFlops. Most of the work in this example is done by four BLAS routines, daxpy, ddot, dscal, and idamax, that are also in the IBM high performance ESSL library. However, when the benchmark versions of these routines are replaced with the ESSL routines the performance attained is no better than 250 MFlops, worse than with the benchmark versions. Fortran Livermore LoopsThis version of the double precision Livermore Loop benchmark was obtained from livermore. This is the 1991 update of the benchmark whose earliest version dates from the 1970's. It contains 24 numeric kernels written in fairly straightforward, uncomplicated Fortran 77. Several summary figures are returned by the program at the end of the run. Source Code ChangesThe only changes to the original source required were to the timing routines. The SECOND function definition in the main routine at line 556 was uncommented: REAL*8 SECOND These three lines, 4469-4471, in the SECOND function were commented out and replaced with a call to the system elapsed time measurement function rtc(): C REAL*4 CPUTYM(4), ETIME C XT= ETIME( CPUTYM) C SECOND= CPUTYM(1) second=rtc() Compile ChangesThe code was compiled with the xlf compiler with no additional arguments beyond those for optimization. TimersThe internal Compile Time result is the seconds required to compile and link the test program. To compare the effects of the various optimization levels, the Average (mean), Minimum, and Maximum MFLOP Rates for the loops returned by the code are listed. Livermore Loop MFLOPS
CommentsThe performance of this benchmark is significantly degraded when the -qhot option is specified. Not only is the compile time greatly increased, but both the Average and Maximum MFLOPS are significantly worse than the corresponding optimization level without the -qhot option. This may be due to the fact that all of the loops are fairly small and uncomplicated, and the sophisticated analysis and loop restructuring done by this option add too much overhead at execution time. Another interesting feature is that two of the higher level optimizations that do not include -qhot, -O4 -qnohot and -O5 -qnohot, are reported as attaining a MFLOP total greater than the theoretical peak performance of the POWER3 processor, 1.5 GFLOPS. The loop that attains this speed is Kernel 7, a very short loop representing an equation of state fragment:
1007 DO 7 k= 1,n
X(k)= U(k ) + R*( Z(k ) + R*Y(k )) +
1 T*( U(k+3) + R*( U(k+2) + R*U(k+1)) +
2 T*( U(k+6) + Q*( U(k+5) + Q*U(k+4))))
7 CONTINUE
Very likely, these optimizations make use of the fact that several of the elements in the equation are used in more than one iteration of the loop and need only be computed once. When the POWER3 hardware performance monitor is applied to this loop by means of hpmcount, the measured MFLOPS for this loop are around 750. NAS KernelsThe NAS Kernel Benchmark consists of seven Fortran test kernels that perform calculations typical of scientific applications run at the NASA Ames Research Center. It was written in the 1980's and consists of approximately 1000 lines of Fortran code, organized into seven separate tests. Source Code ChangesThe only changes made were to the CPTIME internal timer routine. The original version was replaced by this SP specific version:
common /savetime/tx
real*8 rtc,tx,t
T = rtc()
if (tx.gt.t) tx=0
CPTIME = real(T - TX)
TX = T
RETURN
END
Compile ChangesThe code was compiled with the xlf compiler with no options beyond the optimization options. TimersThe Compile Time result is the wall clock seconds for the compile and link returned by the unix time command. In this table the average MFLOPS for all the kernels returned by the program's internal is given. Timings
CommentsThis provides a contrast with the Livermore kernels in that the -qhot option significantly improves performance when it is added to other optimization options at the cost of of an almost five fold increase in compile time in some cases. Individual KernelsThis benchmark also provides timings for the seven individual kernels.
Individual Kernel MFLOPS
|
![]() |
Page last modified: Mon, 24 May 2004 19:26:15 GMT Page URL: http://www.nersc.gov/nusers/resources/software/ibm/opt_options/optex.php Web contact: webmaster@nersc.gov Computing questions: consult@nersc.gov Privacy and Security Notice |
![]() |