NERSCPowering Scientific Discovery Since 1974


Overview and Basic Description

Jobs on Edison execute on one or more "compute" nodes dedicated to that job. These nodes are distinct from the shared "login" nodes that host interactive sessions and the shared "MOM" nodes that execute commands from a "batch script" that controls how the job runs. Typically, users write the batch script with a text editor and submit it to the system using the "qsub" command. The batch script contains a number of job control directives and also the "aprun" command that actually launches the program on to the compute nodes. It is possible to run small, short parallel jobs interactively as described in the pages in this section.

Pages in this section explain the process in more detail.

Types of Nodes on Edison

Before running your job it is useful to understand the different types of nodes on Edison.

Compute Nodes

The 5,200 compute nodes are dedicated to running scientific applications. A job is given exclusive access to each node it requests for the entirety of the job's run time. Since Edison has 24 cores on each node, the maximum allocatable number of cores is 124,800. By default the compute nodes run a somewhat restricted Linux operating system for performance reasons. If a system call or dynamic library needed by your code is not available in this environment, there is an option to enable a more expansive Linux enviroment by using cluster compatibility mode. See Cluster Compatibility Mode and Queues and Scheduling Policies for selecting the correct queue.

Login Nodes

Edison's login nodes run a full Linux operating system and provide support services for the system. When you connect to Edison with SSH, you land on a login node. These nodes are shared by many users; please do not run applications on the login nodes.

Job Host (MOM) Nodes

MOM nodes are servers that execute batch job commands. These nodes are shared by many users and thus are not intended for compute- or memory-intensive applications. When you submit a job but do not run your executable with aprun, you are executing on the MOM node, so be sure to use aprun even for serial programs (except if they are run in the serial queue).

File Systems to Use for Running Jobs

There are multiple file systems available on Edison.  In general, it it not recommended to run from the home file system, since it is not tuned for parallel applications, especially applications with large I/O footprints.   Users should use the Lustre file systems ($SCRATCH, or /scratch3) or Global Parallel File Systems ($GSCRATCH, /project, /projectb) for production runs.

Interactive Access

Only the login nodes are available for direct interactive access. You cannot SSH directly to compute nodes or MOM nodes.