Historical: Running on Seaborg Seaborg Decommissioned in January 2008
Overview: POE and LoadLeveler
The IBM SP uses two software packages to run parallel programs:
the Parallel Operating Environment (POE)
executes parallel programs and LoadLeveler
schedules jobs. Users can interact with this IBM software in a number
of different ways and at a number of different levels. This can be
very confusing, so a brief discussion of POE and LoadLeveler follow.
- Parallel Operating Environment (POE)
-
POE is used to run parallel programs on the SP. This product
augments the basic AIX operating system with software needed to
run parallel programs.
The command poe (all lower case) executes parallel programs.
However, the poe command is not explicitly required to run
a parallel program, depending on which options were used to compile an
executable.
POE recognizes environment variables and poe command line flags
that specify how a parallel program should run. Please see
"Operation and Use" in the
IBM Manuals for more on POE.
- LoadLeveler
-
LoadLeveler
is used in addition to POE
in order to run parallel jobs.
Loadleveler is a "job management system" that is used to
schedule all parallel jobs on the NERSC SP, regardless of whether the jobs are
batch or interactive. More information on Loadleveler
can be found on the IBM Batch page.
When running in job in batch mode, a user submits to LoadLeveler a script
that contains commands and LoadLeveler keywords. The value of
the LoadLeveler keywords determines how the code executes (e.g. number
of nodes used, number of tasks, etc.)
You control how your parallel job executes by specifying
- LoadLeveler keyword values (batch mode), and/or
- values passed to POE on the command line, and/or
- environment variables
In batch mode you should completely specify how your job
should run using LoadLeveler keywords exclusively, if possible.
NERSC recommends that you
be as explicit as possible in your specifications in order
to avoid confusion.
In interactive mode
poe
command-line options override environment variable settings.
Avoid confusion! POE vs. LoadLeveler keywords and options
It is important to make the distinction between
LoadLeveler keywords and poe command line options.
They do not have the same names in general.
For example, node is a LoadLeveler keyword, but
is not a poe command-line option. The poe option is
called nodes and is not a LoadLeveler keyword.
total_tasks is a LoadLeveler keyword, but not
a poe command-line switch. Therefore
poe will completely ignore -total_tasks on the command line
without warning or comment.
For example, the following will run 4 tasks, rather than
8 tasks as might be expected:
% poe ./a.out -nodes 4 -total_tasks 8 (does not work as expected!)
Because the default value of the
MP_TASKS_PER_NODE POE environment variable is
1, this command line will run 1 task on each of 4 nodes,
and ignore the total_tasks specification on the command
line because it is not a valid poe command line option.
See Interactive jobs below.
|