NERSCPowering Scientific Discovery Since 1974

Running Grid Jobs

How to submit a grid job to NERSC

The following NERSC resources support job submission via Grid interfaces. Remote job submission is based on Globus GRAM.

Jobs can be submitted either to the fork jobmanager (default) which will fork and execute the job immediately, or to the batch system jobmanager which interfaces with the underlying batch queue.

HostnameAvailable JobmanagersSoftware ConfigurationGRAM Resource Endpoints
pdsfgrid.nersc.gov Fork, SGE OSG CE 3.1.43 (Globus 5.2.0) pdsfgrid.nersc.gov/jobmanager
pdsfgrid.nersc.gov/jobmanager-sge
corigrid.nersc.gov Fork, SLURM Globus 6.0 corigrid.nersc.gov/jobmanager corigrid.nersc.gov/jobmanager-slurm
edisongrid.nersc.gov Fork, SLURM Globus 6.0

edisongrid.nersc.gov/jobmanager edisongrid.nersc.gov/jobmanager-slurm

       

To submit a job to one of the above hosts:
1. Initialize your grid certificate:

% grid-proxy-init

2. Submit job to the batch system:
Here we use /bin/hostname as the test job, carvergrid.nersc.gov as the target host, and PBS as the jobmanager.
We will use globus-job-submit to submit jobs to the batch system. The syntax for this command is:

Syntax : globus-job-submit [-help] <contact string> [-np N] <executable> [<arg>...]

Use -help to display full usage.

To submit the job:

% globus-job-submit corigrid.nersc.gov/jobmanager /bin/hostname

https://corigrid.nersc.gov:49625/16506027876253269761/6411767159294573954

You will receive a contact URL that you may use to query the job. Here are some sample queries that you can make:

To query job status:

% globus-job-status https://corigrid.nersc.gov:49625/16506027876253269761/6411767159294573954

DONE

To get the output of your job:

% globus-job-get-output https://corigrid.nersc.gov:49625/16506027876253269761/6411767159294573954 

cori17

To clean up:

% globus-job-clean https://corigrid.nersc.gov:49625/16506027876253269761/6411767159294573954
 WARNING: Cleaning a job means:
- Kill the job if it still running, and
- Remove the cached output on the remote resource

Are you sure you want to cleanup the job now (Y/N) ? Y

Cleanup successful.

3. Submit a job for immediate execution:
Here we use:
/bin/hostname as the test job and carvergrid.nersc.gov as the target host, and

% globus-job-run corigrid.nersc.gov/jobmanager /bin/hostname

cori17


Notes

  1. It is recommended that for jobs with a large amounts of output data, you direct the output to a file (-stdout option to globus-job-submit/run) and use GridFTP (see below) to transfer the data.

     

  2. You may use the -s option to stage the job executable from your local workstation.
    % globus-job-submit corigrid.nersc.gov/jobmanager-slurm -s /home/myhomedir/myjob.sh

For more information on Globus GRAM job submission, refer to the Globus GRAM documentation.

How to submit a parallel job

In order to take advantage of parallel MPP resources at NERSC (Edison, Cori) you will need to first build your MPI application on the appropriate platform, and create a job specification using the Globus RSL.

Assuming your parallel application is called cpi, a very simple RSL file to set up a 4 way run would look something like this:

& (count=4)
(jobtype=mpi)
(directory=/home/your_home_dir)
(executable=/home/your_home_dir/bin/cpi)
(stdout=x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stdout anExtraTag)
(stderr=x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stderr anExtraTag)

If this file is called cpi.rsl, you would submit your job as follows:

% globusrun -r corigrid.nersc.gov/jobmanager-slurm -f cpi.rsl -b 

globus_gram_client_callback_allow successful
GRAM Job submission successful
https://corigrid.nersc.gov:49625/16506028973481129106/6411767159294573954/
GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING

The job status and output can be queried using the contact URL. Refer to the example queries in the "How to Submit a Grid Job to NERSC" section.

In the above example, we use the following flags:

-r resource contact information eg. -r corigrid.nersc.gov/jobmanager-slurm
-f RSL file name eg.-f cpi.rsl
-b Submit to batch system  

For more information on RSL refer to the Globus RSL documentation.