Overview and Basic Description
Jobs on Edison execute on one or more "compute" nodes dedicated to that job. These nodes are distinct from the shared "login" nodes that host interactive sessions and the shared "MOM" nodes that execute commands from a "batch script" that controls how the job runs. Typically, users write the batch script with a text editor and submit it to the system using the "qsub" command. The batch script contains a number of job control directives and also the "aprun" command that actually launches the program on to the compute nodes. It is possible to run small, short parallel jobs interactively as described in the pages in this section.
Pages in this section explain the process in more detail.
Types of Nodes on Edison
Before running your job it is useful to understand the different types of nodes on Edison.
The 5,200 compute nodes are dedicated to running scientific applications. A job is given exclusive access to each node it requests for the entirety of the job's run time. Since Edison has 24 cores on each node, the maximum allocatable number of cores is 124,800. By default the compute nodes run a somewhat restricted Linux operating system for performance reasons. If a system call or dynamic library needed by your code is not available in this environment, there is an option to enable a more expansive Linux enviroment by using cluster compatibility mode. See Cluster Compatibility Mode and Queues and Scheduling Policies for selecting the correct queue.
Edison's login nodes run a full Linux operating system and provide support services for the system. When you connect to Edison with SSH, you land on a login node. These nodes are shared by many users; please do not run applications on the login nodes.
Job Host (MOM) Nodes
MOM nodes are servers that execute batch job commands. These nodes are shared by many users and thus are not intended for compute- or memory-intensive applications.
File Systems to Use for Running Jobs
There are multiple file systems available on Edison. In general, it it not recommended to run from home file system, since it is not tuned for parallel applications, especially applications with large I/O. Users should choose Lustre file systems ($SCRATCH, or /scratch3) or Global Parallel File Systems ($GSCRATCH, /project, /projectb) for production runs.
Only the login nodes are available for direct interactive access. You can not SSH directly to compute nodes or MOM nodes.