Best Practices - and Practices to Avoid
Best Practices - Dos and Don'ts on the cluster
Please keep the following in mind when running/submitting jobs on Genepool:
| It is best to run qstat and qmod interactively. DO NOT use these commands in scripts or tight loops. DO NOT clear job errors from scripts. To determine whether or not a job has finished, use job dependencies, empty files (flags), logs, signal traps, etc. You can also set up your script so that you recieve an email when the job is running and when the job is completed. See the examples below. |
| Checkpoint your pipelines! If there are breaks in your workflow, write the intermediate output to a checkpoint file. This will save you from having to restart the calculation from the beginning if the job is killed or dies before completion. This also applies to pipeline testing: do not expect that the debug queue will run for a long time in order for the complete pipeline run to finish. |
| Use 'ulimit -c 0' option with qsub in your scripts to disable coredumps. Otherwise, the fileserver may become overwhelmed when hundreds of coredump files are written to the same location. We are currently trying to make this the default behavior. |
| Use array jobs instead of many individual jobs whenever possible. This reduces load on the scheduler and reduces the number of scripts that you need to maintain. See examples below |
| Don't run short jobs on the cluster! If the jobs require less than a minute to complete, consider combining them into longer jobs or running them outside the cluster. Scheduling such short jobs will cost more than the jobs themselves! |
| Do not overestimate the runtime needed for a job to complete by too much. Shorter runtimes allow your job to be scheduled and run more quickly. It might end up being faster to request the lower bound on the run time for most of the jobs and re-run any that don't complete. This is also where checkpointing your pipelines can be helpful - if you run out of time and the job has to be re-run, it is far more efficient to not start the job over from the beginning. |
| Request soft limits that are a bit smaller than the hard limits for consumable options and trap signals to know why a job was killed. |
| Use the '-w e' option in qsub. This will prevent unschedulable jobs from entering the queue. This may be the default option in the future. |


