1. Input file
2. Output files
the only differences in the INPUT are the keyword number_bands_fft and some new output flags (timing and diagperf). Diagperf gives the MFLOPS for certain operations during the iterative operations scheme. Timing gives a breakdown of the time spent in diagonalization at the end. Number_bands_fft allows for better scaling accross SMP nodes as this groups larger chunks of data together by doing the parallel FFT of several bands at once. An analysis of its use can be found here
Here are the sample output. The value before the "pr" signify the number of processors used. The last number in the file name is the value of the number_bands_fft keyword. OUT_16pr OUT_32pr_1b OUT_32pr_4b OUT_64pr_16b One can compare the the 32 processor runs and see that using number_bands_fft 4 is faster than jsut using the deafult of 1 band.