NERSCPowering Scientific Discovery Since 1974

Resolved -- Job cannot be executed

January 3, 2012 by Helen He

Symptom:

The problem happens mostly to interactive batch jobs, but it happens to regular batch jobs as well.  It usually happens when user does ctrl-C at the "qsub -I" command, but it also happens when user does nothing.

Job just fails, showq can not locate the job (qstat can), then user gets an email similar to:

PBS Job Id: 1095290.sdb
Job Name: STDIN
Exec host: nid03934/9
Aborted by PBS Server
Job cannot be executed
See Administrator for help

This is a bug related to the connection timeout between MOM nodes and compute nodes with the Moab scheduler. NERSC is working with Cray and Adaptive Computing to resolve the issue.

Workaround:

The problem is transient. Resubmit usually works.

Status:

Problem resolved with an increased value for connection timeout setting in the MOM node configuration.