Homework 3 (10/22 - 11/15): Parallel Preconditioned Conjugate Gradients

[ Intro ] [ Details ] [ Submit ] [ Grades ] [ References ]
[ Discussion ] [ FAQ ]

[ Millennium Essentials ] [ NERSC Essentials ]

Intro

The method of conjugate gradients (CG) is an iterative technique for solving symmetric positive-definite linear systems. The conjugate gradient algorithm, popular in practice, is similar in structure to many other linear and nonlinear optimization and equation-solving algorithms, and is relatively simple to code. All these points make CG an attractive benchmark kernel. Indeed, CG appears in both the NAS parallel benchmarks and the SPEC floating point benchmark suite.

Read discussion notes for more details of the assignment and description of the mathematics.

Details

You may work in groups of up to 3.

The implementations we give you are capable of solving a simple model problem (the 1-d Poisson equation) or more general sparse matrix problem. The code can read sparse matrix files in the Matrix Market format; for instance
  cg gr_30_30.mtx
will solve the 30-by-30 two-dimensional Poisson problem from the Matrix Market file gr_30_30.mtx.

We provide a serial implementation in C and Titanium (readme). Examine and profile one of the serial codes on a platform of your choice (does not have to be Millennium or seaborg). If you choose C for this portion, you may still choose Titanium for your parallel version (or visa versa). Familiarize yourself with the implementation of the CG algorithm. What operation(s) seem to take a long time? What kind of pattern do the memory accesses follow? What might these observations tell you about how to parallelize the code? Try downloading some other files from the Matrix Market. How are the memory access patterns changed with a different matrix structure?

Select either Millennium or seaborg (or your own if you are using UPC) as the parallel machine on which you want to optimize CG.

We provide a parallel implementation in UPC (warning) and Titanium (readme). Select either UPC (warning) or Titanium as the language in which you wish to parallelize CG. Our expectations for UPC optimizations and extensions will be less than our expectations for Titanium optimizations and extensions. We provide initial Titanium and UPC code. If you are feeling ambitious, choose a different parallel language and implement your own CG algorithm. 

Run, time, and evaluate your chosen CG code on your chosen platform. Are there obvious bottlenecks? What do your timings tell you about where to target your improvements?

Do some optimizations. Possible targets of optimization include the serial performance of the sparse matrix-vector multiply (or other parts of the code) and the communication patterns. Experiment with different partitioning schemes as discussed in lecture.

Extend the program. Possible extensions include optimizations targeted toward matrices with a specific structure or implementation of a differently-organized version of CG. For re-ordering the matrix, you may try to use the RCM (reverse cuthill McKee) algorithm to improve locality for the matrix vector multiplication. Pick something that looks interesting, and if you are unsure about it, just ask. If you are ambitious, you may try implementing a better preconditioner for some problem of interest. How does changing the preconditioner change the iteration count, the time each iteration takes, and the overall processing time? 

If you have time, try compiling your code on the other platform and see if it has the same performance characteristics. Why might they be different/similar?

UPC Warning: There does not seem to be support for UPC currently on either Millennium or seaborg. If you have access to a parallel system that can compile UPC code, you may use that system.

Titanium: If you want more discussion on Titanium, possibly see last year's slides.

Submit

Your group needs to submit your code, along with a write-up describing what you did and why you did it. Tell us about the modifications you made to the code; but more importantly, tell us about any profiling, experiments, models, or other considerations that lead you to make your changes.

A web page for your write-up would be preferable, but we will accept other formats, too. When you are ready, e-mail your submission (or a pointer to it) to the TA.

Grades

Grades will be based on your explanations and problem-solving attempts. The optimization and extension guidelines are flexible. If you mainly focus on interesting extensions, then less expectation will be given to optimizations.

References

FAQ

[ Assignment ] [ Compilation ] [ Execution ] [ Miscellaneous ]

Assignment Questions

UPC?

UPC is not ready on Seaborg or Millennium, so this option is unsupported. If you know of a machine that runs UPC and can do performance testing on that machine, then those results will be counted. Remember that you can choose to implement CG in a different parallel language.

Fresh hand-picked matrices?

http://math.nist.gov/MatrixMarket/data/misc/xlatmr/matgen.html - up to size 250. -- thanks to Scott Hafeman

Reducing bandwidth of sparse matrices? Try ParMETIS:

http://www-users.cs.umn.edu/~karypis/metis/parmetis/  -- thanks to Sukun Kim

Other software you may want to try:

TI-PETSC - http://www.cs.berkeley.edu/~liblit/ti-petsc/  -- thanks to Wim Vanroose (it unfortunately does not work on seaborg or Millennium)

The ACTs toolkit on seaborg provides PETSc. Have a look at http://acts.nersc.gov/petsc/main.html  . 

Compilation Problems

CSR.ti has an incorrect implementation of toString -- Wim Vanroose

public String toString () {
  String result = "";

  int n = rowStarts.domain().size()-1;
  for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
      int flag=-1;
      for(int curr = rowStarts[i];
        curr < rowStarts[i+1]; curr++) {
         if ( rowIndex[curr] == j) {
            flag = curr;
         }
      }
      if ( flag != -1) {
        result += data[flag] + "\t";
      }
      else {
        result += "0\t";
      }
    }
    result += "\n";
  }
  return result;
}

Execution Problems

Miscellaneous