Cori, like most supercomputers, is a distributed-memory, massively parallel processor (MPP) machine. The system consists of many independent nodes, each having processor cores, memory, and a network interface. Memory on each node is only directly addressable by cores on that node. See Cori Configuration for more information.
Most codes run on Cori in parallel using SPMD (single program, multiple data) mode, where a single binary executable is broadcast to each node and executed independently by each parallel task (more information can be found in the pages below). If a processor needs data that resides in the memory of a different node, programs typically use the Message Passing Interface (MPI) to transfer data from one node to another.
Most jobs are run in batch mode, although interactive computing is available for code development and testing using a small number of nodes for a short time. In batch mode, the user prepares a text script file – usually a shell script – that contains batch directives and LINUX commands. The batch directives tell the batch system how to run the job (see Batch Jobs below). The batch system software used at NERSC is called SLURM.
Running xfer Jobs The intended use of the xfer queue is to transfer data between Cori and HPSS. The xfer jobs run on one of the login nodes and are therefore free of charge. If you want to transfer data to the HPSS archive system at the end of a job, you can submit an xfer job at the end of your batch job script via "sbatch -M escori hsi put <my_files>", so that you will not get charged for the duration of the data transfer. The xfer jobs can be monitored via “squeue -M escori”. Do… Read More »