Horst Simon / Associate Laboratory Director, Computing Sciences / Director, National Energy Research Scientific Computing Center / Director, Computational Research Division Computing Sciences (CS) National Energy Research Scientific Computing (NERSC) Center National Energy Research Scientific Computing (NERSC) Center Computational Research Division (CRD)
Image of Horst Simon

CS267: Lecture 3

Matrix Multiplication

September 3, 2002

Lecturer: Horst D. Simon

Abstract

We begin with a continuation of Lecture 2, applying some of the optimization techniques to matrix-matrix multpilication. We are examining matrix multiplication (search-based blocking and Strassens' algorithm), as well as a set of techniques for optimizing serial programs.

2002 Lecture Notes

PowerPoint,

Readings

  • Sourcebook Chapter 3, (note that chpater 2 and 3 cover the material of lecture 2 and lecture 3, but not in the same order).
  • "Performance Optimization of Numerically Intensive Codes", by Stefan Goedecker and Adolfy Hoisie, SIAM 2001. (if you would like to read up on some of the optimization techniques, briefly mentioned in class, you may consider buying this book).
  • BeBOP Homepage
  • ATLAS Homepage
  • BLAS (Basic Linear Algebra Subroutines), Reference for (unoptimized) implementations of the BLAS, with documentation.
  • LAPACK (Linear Algebra PACKage), a standard linear algebra library optimized to use the BLAS effectively on uniprocessors and shared memory machines (software, documentation and reports)
  • ScaLAPACK (Scalable LAPACK), a parallel version of LAPACK for distributed memory machines (software, documentation and reports)
  • Tuning Strassen's Matrix Multiplication for Memory Efficiency Mithuna S. Thottethodi, Siddhartha Chatterjee, and Alvin R. Lebeck in Proceedings of

  • Supercomputing '98, November 1998 postscript

    Questions that you should be able to answer:

    1. What is the key to understand algorithm efficiency in our simple memory model?
    2. What is the key to understand machine efficiency in our simple memory model?
    3. What is tiling?
    4. Why does block matrix multiply reduce the number of memory references?
    5. What are the BLAS?
    6. What is LAPACK? ScaLAPACK?
    7. Why does loop unrolling improve uniprocessor performance?

    Assignments

    Assignment 1 (due 9/23/01). We have assigned "multidisciplinary" teams of 2-3 students for this assignment. If you are not in a team, please contact David Garmire. Top

    Calendar


    An event.

    Another event.

    Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud quid exercitation ulliam corper suscipit.

    OOS logo DOE logo Berkeley Lab logo NERSC mark CRD mark