CS267: Lecture 2

Principles of Parallel Computation

Uniprocessor Optimizations

 

Th, August 29, 2002

Lecturer: Horst D. Simon

Abstract

We consider some basic principles of parallel computing, such as speed-up, scaled speed-up, Amdahl's law,  data locality, load balancing, and synchromization. We study the structure and performance properties of modern processors, with special attention on their memory hierarchies.  We describe a type of memory benchmark that can be used to expose performance feature of the memory hierarchy, and look at some examples from specific machines.

2002 Lecture Notes

PowerPoint

Readings

  • Sourcebook, Chapter 2
  • "Development Of Parallel Methods For A 1,024-Processor Hypercube,"  J.L. Gustafson, G.R. Montry and R.E. Benner, SIAM Journal on Scientific and Statistical Computing, Vol. 9, No. 4, July 1988. (HTML version ) (PDF version)
  • "Empirical Evaluation of the Cray T3D - A Compiler Perspective" (explains the memory benchmark plot used in lecture)
  • BeBOP Homepage
  • ATLAS Homepage
  • BLAS (Basic Linear Algebra Subroutines), Reference for (unoptimized) implementations of the BLAS, with documentation.
  • LAPACK (Linear Algebra PACKage), a standard linear algebra library optimized to use the BLAS effectively on uniprocessors and shared memory machines (software, documentation and reports)
  • ScaLAPACK (Scalable LAPACK), a parallel version of LAPACK for distributed memory machines (software, documentation and reports)
  • Questions that you should be able to answer:

    1. What is Amdahl's Law?
    2. What does scaled speed-up imply for the selection of the number of processors for an application?
    3. What are physics limits to scalability?
    4. Why do we have memory hierarchies?
    5. What type of parallelism can be found on a uniprocessor?
    6. How can it be exploited for performance?
    7. What is pipelining?
    8. What are the limits to instruction level parallelism?
    9. How does the memory hierarchy (cache) influence performance?