NERSCPowering Scientific Discovery Since 1974

MySGE

Overview

  • MySGE allows users to create a private Sun GridEngine cluster on large parallel systems like Hopper.  One the cluster is started, users can submit serial jobs, array jobs, and other through-put oriented workloads into the personal SGE scheduler.  The jobs are then run within the user private cluster.

How it works

When the user executes vpc_start, a job is submitted to the standard system scheduler (Moab).  The user can specify the requested time and number of cores using the normal moab syntax (i.e. -l mppwidth=240,mppnppn=24,walltime=1:00:00).  When the system job is scheduled, MySGE launches an SGE scheduler and uses a single aprun command to start the SGE execuction daemons on the allocated compute nodes.  The user can then source a setup file created by MySGE to configure the shell environment.  Once this is done, the user can use typical SGE queue commands to submit jobs to the personal SGE scheduler.  The user can stop the private cluster by running vpc_stop.

Instructions

  • Load the mysge module
module load mysge
  • Initialize mysge for your account (you do this only once).  This takes roughly a minute to complete so don't ctrl-c it unless it hangs for several minutes.
mysge_init

(the defaults should be fine.  So just hit enter.)

  • Source the vpc setup file.  You will need to do this on login or add to your dot file.  Note that qstat will now apply to your VPC not the normal system batch system.
. ~/.vpc.sh
or
source ~/.vpc.csh
  • Start the vpc.  Use the debug queue for quicker testing.  The default size is 240 cores.  Use the normal batch options to request a different number (i.e. -l mppwidth=480).
vpc_start -q debug
  • Wait for the VPC to start.  You can use vpc_status to monitor the request.
canon@hopper06:~> vpc_status
canon@hopper06:~> vpc_status
265639.sdb           canon    debug    MySGE               --    --   --    --  00:30 Q   --

canon@hopper06:~> vpc_status
265639.sdb           canon    debug    MySGE             23058   --   --    --  00:30 R 00:00
  • Submit jobs to your VPC.
canon@hopper06:~> qsub ./job.q
canon@hopper06:~> qstat
job-ID  prior   name       user         state submit/start at     queue                     slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
     36 0.00000 job.q      canon        qw    04/08/2011 11:46:24                               1        
  • To shutdown the VPC, run vpc_stop
vpc_stop

Important Considerations

Please be aware that while the virtual private cluster (VPC) is running, the user will be charged for all of the allocated cores regardless of whether there is an MySGE job running on the cores.  This is because when the VPC is running, the cores are dedicated to the MySGE cluster and cannot be used by other users on the system.  Once the user is finished with the cluster, they should issue a vpc_stop to stop the cluster and return the cores back to the standard scheduler.