NERSCPowering Scientific Discovery Since 1974

Overview

Overview

Edison is a supercomputer designed for Massively Parallel Processing (MPP), which is the parallel execution of a program using multiple program tasks and/or threads on multiple processors. Processors on Edison are combined into "nodes" with 64 GB of memory and 24 processor cores, each running a separate instance of the operating system.  Tasks  typically communicate data with other tasks outside their working memory space using a message passing API, like the Message Passing Interface (MPI). Other parallel execution and communication models are also possible (e.g. by using OpenMP or a PGAS language like UPC).

Before a job begins execution, a resource manager and batch scheduler reserve nodes for the job based on instructions given to the scheduler by the user. When the nodes are ready, a job launcher distributes the executable code to the nodes allocated to the job, then starts and manages execution of the code on multiple nodes in a coordinated fashion. The scheduler and job launcher work together throughout the process, including program termination and cleanup.

Working on Edison

Jobs on Edison execute on one or more "compute" nodes dedicated to that job. These nodes are distinct from the shared "login" nodes where you do interactive work such as compiling codes, manage files, etc. Typically, users write a batch script with a text editor and submit it to the batch system. The batch script contains specifications and instructions for the batch scheduler and job launcher.  Edison uses SLURM as its batch system/resource manager and job launcher, see the Introduction to Slurm page to learn more.

Before running your job it is useful to understand the different types of nodes on Edison.

Compute Nodes

Edison has 5,576 compute nodes  dedicated to running scientific applications. A job is given exclusive access to each node it requests for the entirety of the job's run time. Since Edison has 24 cores on each node, the maximum allocatable number of cores is 133,824. By default the compute nodes run a somewhat restricted Linux operating system for performance reasons. 

Login Nodes

Edison's login nodes run a full Linux operating system and provide the standard Linux services for the system. When you connect to Edison with SSH, you land on a login node. These nodes are shared by many users; please do not run applications on the login nodes.

File Systems to Use for Running Jobs

There are multiple file systems available on Edison.  In general, it is not recommended to run from the home file system, since it is not tuned for parallel applications, especially applications with large I/O footprints.   Users should use the Lustre file systems ($SCRATCH, or /scratch3) or Global Parallel File Systems (/project, /projectb) for production runs.

Interactive Access to Compute Nodes

Only the login nodes are available for direct interactive access. You cannot SSH directly to compute nodes. However, you can access a compute node via an interactive batch job (salloc -p debug). See Interactive Jobs for more information. An interactive job allows you to access the head compute node only.