Running Grid Jobs
How to submit a grid job to NERSC
The following NERSC resources support job submission via Grid interfaces. Remote job submission is based on Globus GRAM.
Jobs can be submitted either to the fork jobmanager (default) which will fork and execute the job immediately, or to the batch system jobmanager which interfaces with the underlying batch queue.
|Hostname||Available Jobmanagers||Software Configuration||GRAM Resource Endpoints|
|pdsfgrid.nersc.gov||Fork, SGE||OSG CE 3.1.43 (Globus 5.2.0)||pdsfgrid.nersc.gov/jobmanager
|corigrid.nersc.gov||Fork, SLURM||Globus 6.0||corigrid.nersc.gov/jobmanager corigrid.nersc.gov/jobmanager-slurm|
|edisongrid.nersc.gov||Fork, SLURM||Globus 6.0||
To submit a job to one of the above hosts:
1. Initialize your grid certificate:
2. Submit job to the batch system:
Here we use /bin/hostname as the test job, carvergrid.nersc.gov as the target host, and PBS as the jobmanager.
We will use globus-job-submit to submit jobs to the batch system. The syntax for this command is:
Syntax : globus-job-submit [-help] <contact string> [-np N] <executable> [<arg>...]
Use -help to display full usage.
To submit the job:
% globus-job-submit corigrid.nersc.gov/jobmanager /bin/hostname
You will receive a contact URL that you may use to query the job. Here are some sample queries that you can make:
To query job status:
% globus-job-status https://corigrid.nersc.gov:49625/16506027876253269761/6411767159294573954
To get the output of your job:
% globus-job-get-output https://corigrid.nersc.gov:49625/16506027876253269761/6411767159294573954
To clean up:
% globus-job-clean https://corigrid.nersc.gov:49625/16506027876253269761/6411767159294573954
WARNING: Cleaning a job means:
- Kill the job if it still running, and
- Remove the cached output on the remote resource
Are you sure you want to cleanup the job now (Y/N) ? Y
3. Submit a job for immediate execution:
Here we use:
/bin/hostname as the test job and carvergrid.nersc.gov as the target host, and
% globus-job-run corigrid.nersc.gov/jobmanager /bin/hostname
- It is recommended that for jobs with a large amounts of output data, you direct the output to a file (-stdout option to globus-job-submit/run) and use GridFTP (see below) to transfer the data.
- You may use the -s option to stage the job executable from your local workstation.
% globus-job-submit corigrid.nersc.gov/jobmanager-slurm -s /home/myhomedir/myjob.sh
For more information on Globus GRAM job submission, refer to the Globus GRAM documentation.
How to submit a parallel job
In order to take advantage of parallel MPP resources at NERSC (Edison, Cori) you will need to first build your MPI application on the appropriate platform, and create a job specification using the Globus RSL.
Assuming your parallel application is called cpi, a very simple RSL file to set up a 4 way run would look something like this:
If this file is called cpi.rsl, you would submit your job as follows:
% globusrun -r corigrid.nersc.gov/jobmanager-slurm -f cpi.rsl -b
GRAM Job submission successful
The job status and output can be queried using the contact URL. Refer to the example queries in the "How to Submit a Grid Job to NERSC" section.
In the above example, we use the following flags:
|-r||resource contact information||eg. -r corigrid.nersc.gov/jobmanager-slurm|
|-f||RSL file name||eg.-f cpi.rsl|
|-b||Submit to batch system|
For more information on RSL refer to the Globus RSL documentation.