NERSCPowering Scientific Discovery Since 1974

Interactive Jobs

Serial Code or Commands

Franklin is a massively parallel high-performance computing platform and is intended and designed to run large parallel codes. While it is possible to run serial jobs on Franklin, it is discouraged.

Any code or command that is not preceeded by the aprun command will execute serially on a service (usually login) node. The login nodes are for executing general UNIX shell commands, building code, and submitting jobs intended to run on the compute nodes.

The service nodes are shared by many users, so. please do not run your compute- or memory-intensive jobs on these nodes. NERSC may kill running processes that severely degrade service node performance. If your job will run for more than 5 minutes, or use more than 1 GB of memory it should not be run serially on a service node. If you need to run a big job that can not run on the compute nodes, please contact the NERSC consultants for advice.

It is possible to run a serial job on the compute nodes, but you will be charged for all 4 cores on a node: 4 * walltime. The procedure to run a single instance of the code is the same as described in the next section "Parallel Codes" with n=1 and mppwidth=1.

Parallel Codes

Most computing on Franklin occurs under the control of batch scripts. However, for code development, debugging, and testing, interactive computing is convenient and appropriate. You use the compute nodes interactively by first requesting resources with the qsub -I command, waiting for a batch shell to start, and then launching your executable with the aprun command. E.g.,

franklin% qsub -I -V -q interactive -l mppwidth=8 -l mppnppn=4 -l walltime=00:20:00 

tells the system that you want to run interactively (-I) in the interactive queue for 20 minutes and have 8 total instances of your code execute, one instance on each of 4 cores per node. That works out to 2 quad-core Franklin nodes. The -V tells the system to import your currently defined environment variables in the batch computing context. Other Torque keywords including other queues and an account name, can also be passed as arguments.

After a (hopefully) brief wait (the interactive queue is designed to have a short wait time), a new shell will be started for you. This shell runs on a service node dedicated to launching parallel jobs (called a "MOM" node). You are now in what we call a batch computing environment, or context, even though you are entering commands at a shell prompt.

You now have dedicated access to the resources you requested with qsub and you are being charged for those resources.

The new shell starts with you in your home directory. To get back into the directory from which you ran the qsub command, use

%  cd $PBS_O_WORKDIR

Any command you type at the prompt will execute on the MOM service node (which is shared by other users). To launch a parallel job, use the aprun command.

% aprun -n 8 -N 4 ./a.out

In this example we launched 8 instances of the binary executable a.out on the compute nodes with 4 cores on each node running a single instance of the executable. If a.out is a pure MPI code, this is the same as running 8 MPI tasks on 2 nodes.

Once you are in this interactive environment, you can do anything you can do using a batch script, which is described in further detail in Batch Jobs section. To quit the job, type exit at the shell prompt (assuming you have no background processes running). If you exceed the walltime request, your shell will be abruptly terminated.