Resolved -- Job cannot be executed
January 3, 2012 by Helen He (0 Comments)
The problem happens mostly to interactive batch jobs, but it happens to regular batch jobs as well. It usually happens when user does ctrl-C at the "qsub -I" command, but it also happens when user does nothing.
Job just fails, showq can not locate the job (qstat can), then user gets an email similar to:
PBS Job Id: 1095290.sdb
Job Name: STDIN
Exec host: nid03934/9
Aborted by PBS Server
Job cannot be executed
See Administrator for help
This is a bug related to the connection timeout between MOM nodes and compute nodes with the Moab scheduler. NERSC is working with Cray and Adaptive Computing to resolve the issue.
The problem is transient. Resubmit usually works.
Problem resolved with an increased value for connection timeout setting in the MOM node configuration.