NERSCPowering Scientific Discovery Since 1974

Resolved -- Job cannot be executed

January 3, 2012 by Helen He (0 Comments)

Symptom:

The problem happens mostly to interactive batch jobs, but it happens to regular batch jobs as well.  It usually happens when user does ctrl-C at the "qsub -I" command, but it also happens when user does nothing.

Job just fails, showq can not locate the job (qstat can), then user gets an email similar to:

PBS Job Id: 1095290.sdb
Job Name: STDIN
Exec host: nid03934/9
Aborted by PBS Server
Job cannot be executed
See Administrator for help

This is a bug related to the connection timeout between MOM nodes and compute nodes with the Moab scheduler. NERSC is working with Cray and Adaptive Computing to resolve the issue.

Workaround:

The problem is transient. Resubmit usually works.

Status:

Problem resolved with an increased value for connection timeout setting in the MOM node configuration.

 

 

 

 

 


Post your comment

You cannot post comments until you have logged in. Login Here.

Comments

No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments