Overview and Basic Description
Jobs on Hopper execute on one or more "compute" nodes dedicated to that job. These nodes are distinct from the shared "login" nodes that host interactive sessions and the shared "MOM" nodes that execute commands from a "batch script" that controls how the job runs. Typically, users write the batch script with a text editor and submit it to the system using the "qsub" command. The batch script contains a number of job control directives and also the "aprun" command that actually runs the program in parallel on the compute nodes. It is possible to run small, short parallel jobs interactively as described in the pages in this section.
Pages in this section explain the process in more detail.
Types of Nodes on Hopper
Before running your job it is useful to understand the different types of nodes on Hopper.
- The 6,384 compute nodes are dedicated to running scientific applications. A job is given exclusive access to each node it requests for the entirety of the job's run time. Since Hopper has 24 cores on each node, the minimum allocatable number of cores is 24. By default the compute nodes run a somewhat restricted Linux operating system for performance reasons. If a system call or dynamic library needed by your code is not available in this environment, there is an option to enable a more expansive Linux enviroment.
- Hopper's login nodes run a full Linux operating system and provide support services for the system. When you connect to Hopper with SSH, you land on the login nodes. These nodes are shared by many users; please do not run applications on the login nodes.
Job Host (MOM) Nodes
- MOM nodes are servers that execute batch job commands. These nodes are shared by many users and thus are not intended for compute- or memory-intensive applications.
Only the login nodes are available for direct interactive access. You can not SSH directly to compute nodes or MOM nodes.