NERSCPowering Scientific Discovery Since 1974

Trouble Shooting and Error Messages

Error Messages

Message or SymptomFaultRecommendation
job hit wallclock time limit user or system Submit job for longer time or start job from last checkpoint and resubmit. If your job hung and produced no output contact consultants.
received node failed or halted event for nid xxxx system One of the compute nodes assigned to the job failed. Resubmit the job
error while loading shared libraries: cannot open shared object file: No such file or directory mostly user, sometimes system Make sure environment variable CRAY_ROOTFS is set to DSL, also the modules loaded when building the dynamic executable is also loaded at run time. Report to consultants if still not resolved.
a critical file could not be located user Verify input or output files have correct path
segmentation violation user Most likely a crash in the code. Try a debugger or contact consultants.
node count exceeds reservation claim user The cores you requested in the aprun -n [numProcs] ..." line is larger than those requested in the batch script with the #PBS -l mppwidth directive. Modify script and resubmit
application called MPI_Abort user or system Often a user code will call MPI_Abort. It could also indicated a system problem if an MPI call times out.
compute node initiated termination user or system Look in your standard out file. Usually this will give some indication of the problem. If you get this error repeatedly there could be an undetected bad node in the system. Please contact the consultants
ROMIO-IO level error user Most likely IO error in user code. Contact consultants for help.
OOM killer terminated this process user The application used more memory than available on a compute node. Examine memory usage of code. See Memory Considerations on Edison
application exited with non-zero exit code user May not be a problem. Check the error code of your application.
error obtaining user credentials system Resubmit. Contact consultants for repeated problems.