To run interactive jobs on Cori, you must request the number of nodes you want and have the system allocate resources from the pool of free nodes.
The salloc command is used to launch an interactive job. The syntax is
salloc -N <number of nodes> -q <QOS> -C <node type> -t <wall time> -L <file system>
where node type is one of haswell or knl and the format for wall time is HH:MM:SS. Available QOSs and job limits are available on the Cori Queues and Policies page.
In the following example we request two Haswell nodes with the debug QOS for 30 minutes using the $SCRATCH file system More details on requesting file system licenses can be found on the Specifying File Systems page.
% salloc -N 2 -q debug -C haswell -t 00:30:00 -L SCRATCH
salloc may be issued with several options. For the complete list of all options for salloc, refer to the SLURM salloc page or "man salloc" for more info.
Commonly used options and more can also be found in the Torque/Moab vs. SLURM Comparisons page.
Using the Dedicated Interactive Queue
We have deployed experimental functionality to support medium-length interactive work on Cori. This queue is intended to deliver nodes for interactive use within 6 minutes of the job request. To access the interactive queue add the "-q interactive" flag to your salloc command, as in the following example that uses a Haswell node.
% salloc -N 1 -C haswell -q interactive -t 01:00:00
To run on KNL nodes, use "-C knl" instead of "-C haswell".
Users in this queue are limited to a single running job on as many as 64 nodes for up to 4 hours. Additionally, each NERSC allocation (MPP repo) is further limited to a total of 64 nodes between all their interactive jobs. This means that if UserA in repo m9999 has a job using 1 haswell node, UserB, who is also in repo m9999, can have a simultaneous job using up 63 Haswell nodes. Since this queue is intended for interactive work, each user can submit only one job at a time. KNL nodes are currently limited to quad,cache mode only. Jobs in this queue can not share individual nodes with other jobs, as can be done in the shared queue.
We have configured this queue to reject the job if it cannot be scheduled within a few minutes. This could be because the job violates the single job per user limit, the total number of nodes per NERSC allocation limit, or because there are not enough nodes available to satisfy the request. In some rare cases, jobs may also be rejected because the batch system is overloaded and wasn't able to process your job in time. If that happens, please resubmit.
Since there is a limit on number of nodes used per repo, you may be unable to run a job because other users who share your repo are using it. To see who in your repo is using the interactive queue you can use
% squeue -q interactive -A <reponame> -O jobid,username,starttime,timelimit,maxnodes,account
NERSC can only deploy a limited number of nodes to serve this queue, so if you don't see any other jobs from your repo, it could be that all of the available nodes are in use. We will monitor the load on this queue and make adjustments in response to the level of usage. We are still experimenting with this service, so please bear with us while we get the wrinkles ironed out.
Running jobs inside an interactive session
Once your session is granted, you may run your jobs using the command srun. Refer to SLURM srun page or "man srun" for more info. Most options can be used by both salloc and srun commands, as illustrated in the following example:
To request an interactive session with 2 nodes, 8 total MPI tasks and 4 MPI tasks per node on Cori Haswell nodes, and to run on the $SCRATCH filesystem, do:
% salloc -N 2 -n 8 -C haswell -q debug -t 00:30:00 -L SCRATCH
Note that you can request the number of nodes with salloc and then specify the number of MPI tasks via the srun command, as in the following example:
% salloc -N 2 -C haswell -q debug -t 00:30:00 -L SCRATCH
% srun -n 8 -c 16 ./myprogram
The -c option is used to assign MPI tasks to cores in a way that performs optimally when there are fewer MPI tasks on a node than the number of hardware threads a node can support. The "-c" value should be set as the "number of hardware threads the node can support" divided by "the number of MPI tasks per node". For example, on Haswell, there are a total of 32 physical cores, each capable of 2 hyperthreads, so the value of "-c" should be set to 64/#MPI_tasks_per_node = 64/4 = 16 in this example.
More details on requesting file system license can be found here.
Selecting the KNL Nodes
In some examples above, we selected the Haswell nodes. To select the KNL nodes, we need to change the -C flag's argument from "haswell" to "knl".
Only the "quad cache" KNL mode is available for interactive use. To use other modes, please make a request for a reservation at https://help.nersc.gov/.
Our bare-bones interactive batch command for running 32 MPI processes across two nodes becomes:
% salloc -N 2 -C knl -q debug -t 00:30:00 -L SCRATCH
% srun -n 32 -c 16 ./my_executable
Burst Buffer interactive use
You can access the Burst Buffer during an interactive session from either Haswell or KNL nodes on Cori, as either scratch space (usable only during your session), or to access your persistent reservation. The simplest way to do this is to specify a configuration script that contains the #DW or #BB directives you would otherwise use in your batch script, and specify that script when you request the interactive session with salloc:
salloc -N 1 -C haswell -q interactive -t 00:10:00 --bbf="bbf.conf"
where the file bb.conf contains the line:
#DW persistentdw name=myBBname
#DW jobdw capacity=10GB access_mode=striped type=scratch
The path to the Burst Buffer space will then be available in your interactive session using the environmental variables $DW_JOB_STRIPED or $DW_PERSISTENT_STRIPED_myBBname, as usual. You can also stage data in/out of your Burst Buffer reservation - multi-line config files are permitted.