SLURM at NERSC Overview
SLURM (Simple Linux Utility For Resource Management) is a very powerful open source, fault-tolerant, and highly scalable resource manager and job scheduling system of high availability currently developed by SchedMD. Initially developed for large Linux Clusters at the Lawrence Livermore National Laboratory, SLURM is used on Cori and Edison.
A Brief Summary of SLURM Commands
- NERSC provides a custom queue monitor sqs.
- Job control and monitoring are performed by scontrol and squeue.
- Batch jobs are submitted with sbatch.
- Interactive job sessions are requested through salloc.
- The command to launch a job is srun. (Used for running jobs)
- Nodes info and cluster status may be requested with sinfo.
- Job and job steps accounting data can be accessed with sacct.
- Useful environment variables are $SLURM_NODELIST and $SLURM_JOBID.
Using SLURM at NERSC
- Running and submitting batch jobs
- Interactive sessions
- Monitoring jobs under SLURM
- Torque/Moab vs SLURM Comparisons
- Example Batch Scripts