Resolved: Running jobs error: "inet_arp_address_lookup"
September 22, 2013 by Helen He (0 Comments)
After the Hopper August 14 maintenance, users reporting get the error message similar as follows occassionaly:
[PE_456]:inet_arp_address_lookup:Failed to read output of /sbin/arp -a -i ipogif0 command. Try rerunning job with CRAY_ROOTFS environment variable set to DSL.
[PE_456]:inet_arp_address_lookup:Failed to read output of arp - No such file or directory
[PE_456]:_pmi_inet_setup:inet_arp_address_lookup failed for hostname nid05021
[PE_456]:_pmi_init:_pmi_inet_setup returned -1
[Mon Aug 26 16:16:28 2013] [c14-2c2s6n1] Fatal error in MPI_Init: Other MPI error, error stack:
MPID_Init(184).......: channel initialization failed
MPID_Init(538).......: PMI2 init failed: 1
[NID 02963] 2013-08-26 09:16:50 Apid 20043232: initiated application termination
11 nodes were identified with issues accessing shared system root file systems. After taking these nodes out on Sept 27 (and later reboot), this issue has been cleared.