NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

-On

The compilers allow you to specify a general level of optimization by specifying a numeric optimization level with the -O flag. The higher the number the greater the amount of optimization the compiler does, the longer the compile takes, and the more memory the compile uses. The lowest numeric optimization is -O2. There are no -O0 nor -O1 optimization arguments currently supported by the compilers.

-O2 (-O)

The -O2 option is designed to provide an intermediate level of optimization that does not require an excessive amount of time to perform the compile and will produce numeric results identical to those produced by an unoptimized compile. It avoids certain types of optimizations that have the potential to produce different numeric results. See the section on the -qstrict argument below for a discussion on how the exact equality of numeric results is accomplished.

The -O option is identical to the -O2 option.

The optimizations done at the -O2 level include:

  • Value numbering - folding several instructions into a single instruction.
  • Branch straightening - rearranging program code to minimize branch logic and combining physically separate blocks of code.
  • Common expression elimination - eliminating duplicate expressions.
  • Code motion - performing calculations outside a loop (if the variables in the loop are not altered within it) and using those results within the loop.
  • Reassociation and strength reduction - rearranging calculation sequences in loops in order to replace less efficient instructions with more efficient ones.
  • Global constant propagation - combining constants used in an expression and generating new ones.
  • Store motion - moving store operations out of loops.
  • Dead store elimination - eliminating stores when the value stored is never referred to again.
  • Dead code elimination - eliminating code for calculations that are not required and portions of the code that can never be reached.
  • Global register allocation - keeping variables and expressions in registers instead of memory.
  • Instruction scheduling - reordering instructions to minimize program execution time.

This is a dot product example of the store motion optimization done at the -O2 level.

Fortran

	x=0.0
	do i=1,ilim
	         x=a(i)*b(i)+x
	enddo

C

	x = 0.0;
	for ( i=0 ; i < ilim ; i++ )
	{
	        x+= a[i] * b[i] ;
        }

The unoptimized, default compile follows all the source code instructions literally. In this case, for each iteration of the loop, there would be a new load and a new store of the variable x. With -O2 optimization, the compiler would recognize that there is no need to store the value of x until the loop is completed, and intermediate values would be kept in registers. Even if the loads and stores are cached, the optimization could lead to an order of magnitude or better improvement in the performance of this loop.

-O3

The -O3 level of optimization peforms all of the optimizations done at the -O2 level as well as several other optimizations that require more memory or time to accomplish.

Some optimizations may be done that will change the semantics of the program slightly, and might cause numeric differences between the results of the program and the same program compiled at the -O2 optimization level or with no optimization. To disable those optimizations that might produce different results, include the -qstrict option on the compile line after -O3 is specificied.

These are the types of optimizations done at this level that are not done at the -O2 level:

  • Rewriting floating-point expressions:

    Computations such as a*b*c may be rewritten as a*c*b if, for example, an opportunity exists to get a common subexpression by the rearrangement. This is not done at the -O2 level, since it may give different numeric results.

    In addition, divides are replaced by multiplies by the reciprocal at this level. Divides on the POWER3, as on most processors, are very expensive operations. They require 14 cycles for 32 bit floating point and 18 cycles for 64 bit floating point operands. The floating-point reciprocal estimate function and multiply are much cheaper operations, at the cost of potentially different numeric results.

  • Aggressive code motion and scheduling:

    At this optimization level the compiler will rearrange the code and instruction sequence much more aggressively. In particular, computations that have the potential to raise an exception whose execution is conditional in the program might be definitely scheduled at this level if this might lead to improvements in performance. In other words, load and floating-point computations may be placed onto execution paths where they will be executed even though, according to the actual semantics of the program, they might not have been.

    Loop-invariant floating-point computations that are found on some, but not all, paths through a loop will not be moved at -O2 because the computations may cause an exception. At -O3, the compiler will move the computations if the move is not certain to cause an exception.

    The same principle is followed when it comes to moving many kinds of loads. Although a load by means of a pointer will never be moved, at the -O3 optimization level the compile will move other types of loads for a potential performance improvement. Loads in general are not movable at the -O2 level of optimization because a program can declare a static array and then load to an element of that array far beyond the declared boundary which might cause a segmentation violation.

    The same principles is followed when it comes to scheduling instructions as this example shows.

    Example: In the following example, at the -O2 optimization level the computation of b+c is not moved out of the loop for two reasons: it is considered dangerous because it is a floating-point operation and could thus possibly cause an exception and it does not occur on every path through the loop, so it potentially may never be executed. For this reason, at -O2 the loop invariant b+c computation may be performed many times based on the values of the elements of the array a. At -O3, the computation is moved outside the loop and done only once.

    Fortran

    	do i=1,ilim-1
    	  if(a(i).lt.a(i+1)) a(i)=b+c
    	enddo
    

    C

             for (i = 0 ; i < ilim-2 ; i++)
               {
               if (a[i] < a[i+1])
                 a[i] = b + c ;
               }
    
  • Incorrect sign for zero:

    -O3 also will do some optimizations not performed at -O2 because they may produce an incorrect sign in cases with a zero result. For example, the expression "x + 0.0" would not be replaced with "x" at the -O2 level. A redundant add of 0.0 to x will be done because x might be equal to -0.0 and, under IEEE rules, -0.0 + 0.0 = 0.0 which would be -x in this case. Since, in the overwhelming majority of cases, this has no significant impact on a program's results, -O3 will substitute "x" for "x + 0.0".

Some limitations of the -O3 level are:

  • It does not do any processor specific optimizations like those done by the -qarch or -qtune options.
  • It does not optimize complex loops or those with 3 or more loop indices very well. See the discussion under -qhot.
  • Integer divide instructions are not optimized.

-O4

The -O4 level of optimization peforms all of the optimizations done at the -O3 level as well as several other optimizations. This argument is equivalent to:

% -O3 -qarch=auto -qtune=auto -qcache=auto -qhot -qipa 

This flag should be specified both at compile and link time.

-O5

The -O5 level of optimization peforms all of the optimizations done at the -O4 as well as the optimization specified by -qipa=level=2 which is described below. This argument is equivalent to:

This flag should be specified both at compile and link time.


LBNL Home
Page last modified: Mon, 24 May 2004 19:26:15 GMT
Page URL: http://www.nersc.gov/nusers/resources/software/ibm/opt_options/on.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science