NERSCPowering Scientific Discovery Since 1974

Interactive Jobs

To run an interactive job on Hopper's compute nodes you must request the number of nodes you want and have the system allocate resources from the pool of free nodes. The following command requests 2 nodes using the interactive queue.

hopper% qsub -I -q debug -l mppwidth=48 

The -I flag specifies an interactive job.  The -q flag specifies the name of the queue and -l mppwidth determines the number of nodes to allocate for your job, but not as you might expect. The number of nodes given to your job (remember, the system allocates nodes, not cores), is the value of mppwidth divided by the umber of cores per node. On Hopper, with 24 cores per node, the number of nodes is mppwidth/24 plus one more if there is a remainder. (Other job directives including the account name [-A repo] can be passed as arguments).

Assuming there are free nodes, the qsub command will log you into a MOM node and return your prompt. You will be in your home directory, but can reference the directory from which you submitted the job as $PBS_O_WORKDIR. From the shell prompt, you can start your program on the compute nodes using the "aprun" command. You can find out more about aprun in this section of pages.

hopper% cd $PBS_O_WORKDIR
hopper% aprun -n 48 ./a.out

The aprun run above launches 48 instances of the executable "a.out" on 2 Hopper nodes, which will use all 48 cores. If your code is parallelized using MPI, then each instance is an MPI task and they can communicate via MPI routines. If a.out is "serial" then you will have 48 identical copies of the serial program running concurrently.

You don't have to launch 24 instances of your code on each node. The -N option to aprun specifies the number of instances (MPI tasks) to run on each node. The following sequence of commands will grab 4 Hopper nodes and run 48 total instances (MPI tasks) of your code, 12 on each node.

hopper% qsub -I -q debug -l mppwidth=96 
(wait for job to start ...)

hopper% cd $PBS_O_WORKDIR
hopper% aprun -n 48 -N 12 ./a.out