Reminder: NERSC Maintenance Beginning Friday; All Systems Shut Down
October 2, 2017 by Rebecca Hartman-Baker
All systems at NERSC will be unavailable for more than 4 days while our EPO (Emergency Power Off) system is upgraded and an additional pump is installed in support of water-cooling the large HPC systems.
NERSC will begin powering down all systems at 3:00 pm this Friday, October 6. When the facilities work is complete, we will power up the machines and restore service to users. We will return machines to users by the end of the business day, Tuesday, October 10. During the outage, everything must be powered down, so no machines or filesystems will be available. The batch systems will retain any jobs submitted before the outage, and scheduling will resume when service is restored.
Additionally, the unavailability of some non-compute infrastructure and services will impact users directly:
- Our web server will be powered down. There is a backup, placeholder website that traffic will be redirected to, which will display the expected outage schedule, but the regular content of the NERSC website will be unavailable.
- Our web login system will be powered down. This will impact even systems that are not on-site at NERSC, such as our online help desk and ERCAP application system. We have extended the ERCAP deadline an additional week as a result (please see below). Tickets can be submitted via email, but logging in to help.nersc.gov or ercap.nersc.gov will not work until the service is restored.
- Our email services will be powered down as well. The weekly email on Monday, October 9, which will include an update on the outage, will not be sent until these services are restored.
We hope to have these services restored on Monday, and expect to send an email update at 9:30 am Pacific Time. In the event of further delays, we'll update our placeholder website until email services are restored.
The deadline for ERCAP has been extended by one week to 11:59 pm (Pacific), Monday, October 16, to accommodate for the impact of the outage.
We appreciate your patience and understanding as we build a more robust infrastructure for you!