IBM SP Parallel Scaling - Perspective

In the abstract, scaling is about translations of size or extent. Here's a view from 30,000 feet of the extent in time and level of parallelism of jobs that are run on seaborg.

In the following graphs the horizontal axis is time (labeled by month and year) and the colored rectangles are parallel jobs each of which has a defined start and stop time (its horzontal position) and a definite level of parallelism depicted here as the box's height (# nodes). The boxes are arranged vertically as best they can without overlapping. The color of each box corresponds to some property of the job itself, i.e., user who ran the job, how much memory was used, the wait time, etc.

Although it could use more annotation, it does show reasonably well the changes in job mix, overall level of parallelism in use, and changes in the extent (scale) of the machine itself (see December 2002).

Have fun thinking about this. If you find it useful, have questions or ideas how to improve it feel free to let me know.

-David Skinner (dskinner@nersc.gov)

hint: use your browser's horizontal scrollbar to navigate the data

Jobs by User

Jobs by Memory Usage per Task

(blue = less memory, red = more memory)

Jobs by Wait Time

(blue = shorter wait, red = longer wait)

Questions

What can one learn from this?
Some trends in job size are notable as it regards scaling. As seaborg doubled in size , the number of jobs increased correspondingly, and over time the number of jobs then decreased as the parallelism of jobs increased. The wait time for large concurrency jobs is also seen to change in accordance with changes in queueing policies. Recurring temporal usage patterns can be identified (see the weekly change in wait time during 08/02-10/02).

Other trends and usage patterns can be eyeballed. Being able to look at all or most of the data in a sensible way is meant to give an abstract overview from which quantitative metrics involving reductions of the data might be considered.

What does the overall height or space between the boxes mean?
They have no direct meaning since they depend on the algorithm used to do the packing, the job mix, and the number of jobs. The goal is not to provide machine utilization data but rather to depict the extent, both temporal and parallel, of jobs on the machine. Interpeting the height as utilization is a common confusion with people new to this sort of display.
Why are there gaps?
If there are no jobs during some period of time either the machine was down or the Load Leveler data is missing.
How was this done?
I wrote a little c++ program, Industrious Box Packer (ibp.cc), that packs boxes in 2D. The graphical output is from PHP's png functions.
Why are the narrow jobs at the bottom and the short jobs at the top?
The algorithm is roughly to put taller boxes (more parallel) down first and then try to fit smaller (less parallel)jobs around or in between them. This provides a reasonably compact packing and nicely stratifies jobs between the capability and capacity realms.
Who is Piet Mondrian?
A neoplasticisist artist who was fond of rectangles.
Where did this idea come from?
Hard to tell for sure, but someone mentioned it might have something to do with the area where I grew up.

Back to the Overview