To run interactive jobs on Cori, you must request the number of nodes you want and have the system allocate resources from the pool of free nodes.
The salloc command is used to launch an interactive job. The syntax is
salloc -N <number of nodes> -p <partition> -C <node type> -t <wall time> -L <file system>
where node type is one of haswell or knl and the format for wall time is HH:MM:SS. Available partitions and job limits are available on the Cori Queues and Policies page.
In the following example we request two Haswell nodes in the debug partition for 30 minutes using the $SCRATCH file system More details on requesting file system licenses can be found on the Specifying File Systems page.
% salloc -N 2 -p debug -C haswell -t 00:30:00 -L SCRATCH
salloc may be issued with several options. For the complete list of all options for salloc, refer to the SLURM salloc page or "man salloc" for more info.
Commonly used options and more can also be found in the Torque/Moab vs. SLURM Comparisons page.
Using the Dedicated Interactive Queue
We have deployed experimental functionality to support medium-length interactive work on Cori. This queue is intended to deliver nodes for interactive use within 6 minutes of the job request. To access the interactive queue add the qos flag to your salloc command.
% salloc -N 1 -C haswell --qos=interactive -t 01:00:00
To run on KNL nodes, use "-C knl" instead of "-C haswell".
Users in this queue are limited to a single running job on as many as 64 nodes for up to 4 hours. Additionally, each NERSC allocation (MPP repo) is further limited to a total of 64 nodes between all their interactive jobs (KNL or haswell). This means that if UserA in repo m9999 has a job using 1 haswell node, UserB (who is also in repo m9999) can have a simultaneous job using 63 haswell nodes or 63 KNL nodes, but not 64 nodes. Since this is intended for interactive work, each user can submit only one job at a time (either KNL or haswell). KNL nodes are currently limited to quad,cache mode only. You can only run single node job; sub-node jobs like those in the shared queue are not possible.
We have configured this queue to reject the job if it cannot be scheduled within a few minutes. This could be because the job violates the single job per user limit, the total number of nodes per NERSC allocation limit, or because there are not enough nodes available to satisfy the request. In some rare cases, jobs may also be rejected because the batch system is overloaded and wasn't able to process your job in time. If that happens, please resubmit.
Since there is a limit on number of nodes used per allocation, you may be unable to run a job because other users who share your allocation are using it. To see who in your allocation is using the interactive queue you can use
% squeue --qos=interactive -A <reponame> -O jobid,username,starttime,timelimit,maxnodes,account
If the number of nodes in use by your repo sums up to 64 nodes, please contact the other group members if you feel they need to release interactive resources.
NERSC can only deploy a limited number of nodes to serve this queue, so if you don't see any other jobs from your repo, it could be that all of the available nodes are in use. We will monitor the load on this queue and make adjustments in response to the level of usage. We are still experimenting with this service, so please bear with us while we get the wrinkles ironed out.
Running jobs inside an interactive session
Once your session is granted, you may run your jobs using the command srun. Refer to SLURM srun page or "man srun" for more info. Most options can be used by both salloc and srun commands, as illustrated in the following example:
To request an interactive session with 2 nodes, 8 total MPI tasks and 4 MPI tasks per node on Cori Haswell nodes, and to run on the $SCRATCH filesystem, do:
% salloc -N 2 -n 8 -C haswell -p debug -t 00:30:00 -L SCRATCH
Note that you can request the number of nodes with salloc and then specify the number of MPI tasks via the srun command, as in the following example:
% salloc -N 2 -C haswell -p debug -t 00:30:00 -L SCRATCH
% srun -n 8 -c 16 ./myprogram
The -c option is used to assign MPI tasks to cores in a way that performs optimally when there are fewer MPI tasks on a node than the number of hardware threads a node can support. The "-c" value should be set as the "number of hardware threads the node can support" divided by "the number of MPI tasks per node". For example, on Haswell, there are a total of 32 physical cores, each capable of 2 hyperthreads, so the value of "-c" should be set to 64/#MPI_tasks_per_node = 64/4 = 16 in this example.
More details on requesting file system license can be found here.
Selecting the KNL Nodes
In the bare-bones batch script above, we selected the Haswell nodes. To select the KNL nodes, we need to change the constraint from "haswell" to "knl".
There are multiple memory modes for the KNL nodes, the details of which are covered elsewhere. If you specify only "knl" as the constraint, you will receive the default configuration, which is currently "quad cache" mode. We recommend this mode as most codes perform well under this mode.
Our bare-bones interactive batch command for running 32 MPI processes across two nodes becomes:
% salloc -N 2 -C knl,quad,cache -p debug -t 00:30:00 -L SCRATCH
% srun -n 32 -c 16 ./my_executable
Burst Buffer interactive use
You can access the Burst Buffer during an interactive session from either Haswell or KNL nodes on Cori, as either scratch space (usable only during your session), or to access your persistent reservation. The simplest way to do this is to specify a configuration script that contains the #DW or #BB directives you would otherwise use in your batch script, and specify that script when you request the interactive session with salloc:
salloc -N 1 -C haswell --qos=interactive -t 00:10:00 --bbf="bbf.conf"
where the file bb.conf contains the line:
#DW persistentdw name=myBBname
#DW jobdw capacity=10GB access_mode=striped type=scratch
The path to the Burst Buffer space will then be available in your interactive session using the environmental variables $DW_JOB_STRIPED or $DW_PERSISTENT_STRIPED_myBBname, as usual. You can also stage data in/out of your Burst Buffer reservation - multi-line config files are permitted.