NERSCPowering Scientific Discovery Since 1974

Resolved: Running jobs error: "inet_arp_address_lookup"

September 22, 2013 by Helen He

Symptom:

After the Hopper August 14 maintenance, users reporting get the error message similar as follows occassionaly:

[PE_456]:inet_arp_address_lookup:Failed to read output of /sbin/arp -a -i ipogif0 command. Try rerunning job with CRAY_ROOTFS environment variable set to DSL.
[PE_456]:inet_arp_address_lookup:Failed to read output of arp - No such file or directory
[PE_456]:_pmi_inet_setup:inet_arp_address_lookup failed for hostname nid05021
[PE_456]:_pmi_init:_pmi_inet_setup returned -1
[Mon Aug 26 16:16:28 2013] [c14-2c2s6n1] Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(428):
MPID_Init(184).......: channel initialization failed
MPID_Init(538).......: PMI2 init failed: 1
[NID 02963] 2013-08-26 09:16:50 Apid 20043232: initiated application termination

Status:

11 nodes were identified with issues accessing shared system root file systems.  After taking these nodes out on Sept 27 (and later reboot), this issue has been cleared.