NERSCPowering Scientific Discovery Since 1974

Running Jobs

Cori, like most supercomputers, is a distributed-memory, massively parallel processor (MPP) machine. The system consists of many independent nodes, each having  processor cores, memory, and a network interface. Memory on each node is only directly addressable by cores on that node. See Cori Configuration for more information. 

Most codes run on Cori in parallel using SPMD (single program, multiple data) mode, where a single binary executable is broadcast to each node and executed independently by each parallel task (more information can be found in the pages below). If a processor needs data that resides in the memory of a different node, programs typically use the Message Passing Interface (MPI) to transfer data from one node to another.

Most jobs are run in batch mode, although interactive computing is available for code development and testing using a small number of nodes for a short time. In batch mode, the user prepares a text script file – usually a shell script – that contains batch directives and LINUX commands. The batch directives tell the batch system how to run the job (see Batch Jobs below). The batch system software used at NERSC is called SLURM.

Batch Jobs

How to run batch jobs on Cori. Read More »

General Running Jobs Recommendations

This page provides general running MPI and hybrid MPI/OpenMP jobs recommendations to achieve optimal process and thread affinity on Haswell and KNL compute nodes. Read More »

Example Batch Scripts

Use these batch script examples as templates to get started. Read More »

Example Batch Scripts for KNL

Use these batch script examples as templates to get started on KNL. Read More »

Advanced Running Jobs Options

Running xfer Jobs The intended use of the xfer queue is to transfer data between Cori and HPSS. The xfer jobs run on one of the login nodes and are therefore free of charge. If you want to transfer data to the HPSS archive system at the end of a job, you can submit an xfer job at the end of your batch job script via "sbatch -M escori hsi put <my_files>", so that you will not get charged for the duration of the data transfer.  The xfer jobs can be monitored via “squeue -M escori”. Do… Read More »

Interactive Jobs

Learn how to run interactive jobs on Cori. Read More »

Monitoring Jobs

Learn commands to query, hold, submit and monitor jobs on Cori. Read More »

Specifying Required File Systems

Users can now specify the filesystems required for their jobs by requesting “licenses” for them in their batch scripts. This will protect your jobs from failures or decreased performance from known issues. Read More »

Cori Queues and Policies

Queue limits, policies and tips for getting your job through the queue faster. Read More »

Changes for Running on Haswell Nodes

October 31, 2016

After the merge of Cori Phase 1 Haswell and Phase 2 KNL cabinets, running jobs on Haswell for getting the process and thread affinity has changed from before the merge. Read More »

SLURM at NERSC Overview

Introduction to native SLURM. Advantages and available SLURM features on Cori. Also pointers to more NERSC documentations on SLURM. Read More »

Cori Haswell Nodes for Edison Users

The Cori Haswell nodes are very similar to those on Edison. Read More »