NERSCPowering Scientific Discovery Since 1974

Overview

Overview

Jobs on Edison execute on one or more "compute" nodes dedicated to that job. These nodes are distinct from the shared "login" nodes where you do interactive work such as compiling codes, manage files, etc. Typically, users write a batch script with a text editor and submit it to the batch system. A batch system is a resource manager and scheduler. Once your job is submitted to the batch system, the batch system will decide when and on which nodes your job will run. Edison runs Slurm as its batch system, see the Introduction to Slurm page to learn more about Slurm. A batch script contains a number of job control directives and also the srun command, the parallel job launcher, which actually launches the program on to the compute nodes. See the Example Batch Scripts to learn how to write a batch script for your job. You need to submit a batch job via the sbatch command, and then monitor its status in the queue via the squeue command. It is possible to run small, short parallel jobs interactively as described in the Interactive Jobs page.

Pages in this section explain the process in more detail.

Types of Nodes on Edison

Before running your job it is useful to understand the different types of nodes on Edison.

Compute Nodes

The 5,576 compute nodes are dedicated to running scientific applications. A job is given exclusive access to each node it requests for the entirety of the job's run time. Since Edison has 24 cores on each node, the maximum allocatable number of cores is 133,824. By default the compute nodes run a somewhat restricted Linux operating system for performance reasons. If a system call or dynamic library needed by your code is not available in this environment, there is an option to enable a more expansive Linux environment by using the cluster compatibility mode. See Cluster Compatibility Mode for more detail.

Login Nodes

Edison's login nodes run a full Linux operating system and provide the standard Linux services for the system. When you connect to Edison with SSH, you land on a login node. These nodes are shared by many users; please do not run applications on the login nodes.

File Systems to Use for Running Jobs

There are multiple file systems available on Edison.  In general, it it not recommended to run from the home file system, since it is not tuned for parallel applications, especially applications with large I/O footprints.   Users should use the Lustre file systems ($SCRATCH, or /scratch3) or Global Parallel File Systems (/project, /projectb) for production runs.

Interactive Access to Compute Nodes

Only the login nodes are available for direct interactive access. You cannot SSH directly to compute nodes. However, you can access a compute node via an interactive batch job (salloc -p debug). See Interactive Jobs for more information. An interactive job allows you to access the head compute node only.