NERSCPowering Scientific Discovery Since 1974

Running Grid Jobs

| Tags: Grid

How to submit a grid job to NERSC

The following NERSC resources support job submission via Grid interfaces. Remote job submission is based on Globus GRAM.

Jobs can be submitted either to the fork jobmanager (default) which will fork and execute the job immediately, or to the batch system jobmanager which interfaces with the underlying batch queue.

HostnameAvailable JobmanagersSoftware ConfigurationGRAM Resource Endpoints
pdsfgrid.nersc.gov Fork, SGE OSG CE 1.2 (Globus 4.0.8) pdsfgrid.nersc.gov/jobmanager
pdsfgrid.nersc.gov/jobmanager-sge
carvergrid.nersc.gov Fork, PBS OSG CE 1.2 (Globus 4.0.8) carvergrid.nersc.gov/jobmanager carvergrid.nersc.gov/jobmanager-pbs
edisongrid.nersc.gov Fork Globus 5.0.5 edisongrid.nersc.gov/jobmanager
hoppergrid.nersc.gov Fork, PBS Globus 5.0.3 hoppergrid.nersc.gov/jobmanager hoppergrid.nersc.gov/jobmanager-pbs

To submit a job to one of the above hosts:
1. Initialize your grid certificate:

% grid-proxy-init

2. Submit job to the batch system:
Here we use /bin/hostname as the test job, carvergrid.nersc.gov as the target host, and PBS as the jobmanager.
We will use globus-job-submit to submit jobs to the batch system. The syntax for this command is:

Syntax : globus-job-submit [-help] <contact string> [-np N] <executable> [<arg>...]

Use -help to display full usage.

To submit the job:

% globus-job-submit carvergrid.nersc.gov/jobmanager-pbs /bin/hostname

https://carvergrid.nersc.gov:60001/14658/1182211679/

You will receive a contact URL that you may use to query the job. Here are some sample queries that you can make:

To query job status:

% globus-job-status https://carvergrid.nersc.gov:60001/14658/1182211679/

DONE

To get the output of your job:

% globus-job-get-output https://carvergrid.nersc.gov:60001/14658/1182211679/

carvergrid

To clean up:

% globus-job-clean https://carvergrid.nersc.gov:60001/14658/1182211679/

WARNING: Cleaning a job means:
- Kill the job if it still running, and
- Remove the cached output on the remote resource

Are you sure you want to cleanup the job now (Y/N) ? Y

Cleanup successful.

3. Submit a job for immediate execution:
Here we use:
/bin/hostname as the test job and carvergrid.nersc.gov as the target host, and

% globus-job-run carvergrid.nersc.gov/jobmanager /bin/hostname

crvsvc06


Notes

  1. It is recommended that for jobs with a large amounts of output data, you direct the output to a file (-stdout option to globus-job-submit/run) and use GridFTP (see below) to transfer the data.

     

  2. You may use the -s option to stage the job executable from your local workstation.
    % globus-job-submit carvergrid.nersc.gov/jobmanager-pbs -s /home/myhomedir/myjob.sh

For more information on Globus GRAM job submission, refer to the Globus GRAM documentation.

How to submit a parallel job

In order to take advantage of parallel MPP resources at NERSC (Franklin, Carver, Hopper, Euclid) you will need to first build your MPI application on the appropriate platform, and create a job specification using the Globus RSL.

Assuming your parallel application is called cpi, a very simple RSL file to set up a 4 way run would look something like this:

& (count=4)
(jobtype=mpi)
(directory=/home/your_home_dir)
(executable=/home/your_home_dir/bin/cpi)
(stdout=x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stdout anExtraTag)
(stderr=x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stderr anExtraTag)

If this file is called cpi.rsl, you would submit your job as follows:

% globusrun -r carvergrid.nersc.gov/jobmanager-pbs -f cpi.rsl -b 

globus_gram_client_callback_allow successful
GRAM Job submission successful
https://carvergrid.nersc.gov:60001/8827/1182377532/
GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING

The job status and output can be queried using the contact URL. Refer to the example queries in the "How to Submit a Grid Job to NERSC" section.

In the above example, we use the following flags:

-r resource contact information eg. -r carvergrid.nersc.gov/jobmanager-pbs
-f RSL file name eg.-f cpi.rsl
-b Submit to batch system  

For more information on RSL refer to the Globus RSL documentation.