NERSCPowering Scientific Discovery Since 1974

Outage Log

This table lists events reported, ongoing, and resolved on NERSC systems. It is a historical record and may not be updated while a system event is in progress.

Event Date/TimeUp Date/TimeSystemComment
02/09/16 11:50 PST02/09/16 17:10 PSTCoriSystem in degraded mode. Batch issues caused loss of jobs submitted between 11:50AM and 5:10PM, and termination of all running jobs at 6:02PM.
02/09/16 9:30 PST02/09/16 16:30 PSTScience Database ServicesScheduled maintenance. MySQL databases was unavailable for version updates and all databases was unavailable for short periods due to reboots for OS security patches.
02/09/16 9:00 PST02/09/16 11:17 PSTNIMScheduled maintenance.
02/09/16 8:00 PST02/11/16 7:27 PSTData Transfer NodesScheduled maintenance. The relocation of DTN02 and DTN04 is complete. Systems are available.
02/09/16 1:32 PST-CoriLustre Filesystem Degraded. Cray engineers are working on the issue. Next update 03:00 PST
02/08/16 7:00 PST02/08/16 13:35 PSTMongoDBScheduled maintenance. MongoDB will be unavailable while the servers are moved from OSF to CRT.
02/04/16 18:50 PST02/04/16 23:34 PSTCoriSystem in degraded mode. The /global/cscratch1 filesystem was on degraded mode. Our engineers have fixed the issue.
02/04/16 11:00 PST02/04/16 11:29 PSTNX ServicesScheduled maintenance. All connected sessions will be terminated. Please close out your sessions before and reconnect after this maintenance.
02/03/16 8:00 PST02/03/16 23:40 PSTEdisonScheduled maintenance.
01/28/16 9:00 PST01/28/16 16:00, nerscca2.nersc.govScheduled maintenance. NERSC Certificate Services and will be be down for system upgrades. Globus Online / NEWT, Grid Services and GSISSH collaboration accounts may be unavailable during this period.
01/28/16 7:30 PST01/28/16 23:05 PSTEdisonUnavailable. Emergency maintenance due to a power issue.
01/27/16 14:00 PST02/03/16 18:00 PSTSeqFSScheduled maintenance. The /global/seqfs is under maintenance for moving from OSF to CRT. The expected end time is 5 PM on 2/5. /global/seqfs is available during the maintenance, with some degraded IO performance.
01/27/16 10:13 PST01/27/16 10:31 PSTEdisonSystem in degraded mode. Edison was degraded. Users can submit jobs and currently running jobs are ok. No new jobs will start until degradation has been resolved.
01/27/16 9:00 PST01/27/16 13:00 PSTHPSS UserScheduled maintenance.
01/22/16 2:25 PST01/23/16 11:59 PSTEdisonUnavailable. Edison is down and not running jobs. Logins are accessible and new jobs are accepted for queueing. Our engineers are currently working on this issue.
01/21/16 8:00 PST01/21/16 16:50 PSTScience Database ServicesScheduled maintenance. Postgres and MySQL Database services will be unavailable because the host machines and are being moved from OSF to CRT"
01/20/16 10:00 PST01/20/16 12:15 PSTNIMScheduled maintenance. Any updates on ( including password changes ) will not propagate until after the maintenance.
01/20/16 9:00 PST01/20/16 11:05 PSTHPSS UserScheduled maintenance.
01/20/16 5:30 PST01/20/16 20:35 PSTCoriScheduled maintenance. Dedicated time from 14:30 - 18:00. Extended time until 20:35.
01/17/16 13:30 PST01/22/16 10:10 PSTProjectBScheduled maintenance. /global/projectb is under maintenance for it's move to CRT. During the maintenance, /global/projectb is available for access but the users may experience degraded IO performance.
01/14/16 15:00 PST01/14/16 16:06 PSTHPSS UserUnavailable.
01/13/16 9:00 PST01/13/16 11:44 PSTScience Gateway ServicesScheduled maintenance. All Science Gateway Services will be affected.
01/11/16 22:38 PST-DNAThe /global/dna file system is under maintenance for moving to CRT from 21:00 01/11/16 to 21:00 01/16/16. The /global/dna file system is available during the maintenance but the users may experience degraded I/O performance.
01/07/16 8:00 PST01/08/16 0:00 PSTEdisonScheduled maintenance.
01/06/16 13:57 PST01/06/16 16:55 PSTScience Gateway ServicesUnavailable. Portal-auth (SGN02) was unavailable
01/06/16 9:00 PST01/08/16 16:39 PSTData Transfer NodesScheduled maintenance. DTN01 and DTN03 were unavailable due to scheduled maintenance. However, DTN02 and DTN04 remained available.
01/06/16 9:00 PST01/06/16 12:40 PSTHPSS UserScheduled maintenance.
01/05/16 15:00 PST01/05/16 15:35 PSTPDSFUnavailable. PDSF was unavailable. Engineers resolved the outage.
01/05/16 15:00 PST01/05/16 16:30 PSTGenepoolUnavailable. Genepool was unavailable. Engineers resolved the outage.
01/05/16 15:00 PST01/05/16 16:10 PSTProjectBUnavailable. ProjectB was unavailable. Engineers resolved the outage.
01/05/16 15:00 PST01/05/16 16:10 PSTDNAUnavailable. DNA was unavailable. Engineers resolved the outage
01/04/16 2:18 PST01/04/16 9:20 PSTScience Gateway ServicesUnavailable. sgn02 was down. Engineers resolved the outage
01/01/16 6:30 PST01/01/16 9:00 PSTCoriSystem in degraded mode. The /global/cscratch1 filesystem was degraded. Engineers resolved the issue.
12/30/15 9:00 PST12/30/15 13:00 PSTHPSS BackupScheduled maintenance.
12/21/15 8:36 PST-Global CommonFilesystems /global/common, /global/u1, /global/u2 and all /usr/syscom have been successfully moved to CRT and they are in full production.
12/18/15 15:38 PST12/18/15 20:10 PSTScience Gateway ServicesUnavailable. Some Science Gateway Services were down; ipython and rstudio were down due to a docker issue.
12/17/15 20:27 PST-Global HomesFilesystems /global/common, /global/u1, /global/u2 and all /usr/syscom have been successfully moved to CRT and they are in full production.
12/17/15 16:28 PST-HopperHopper has been retired on Dec 15, 2015. Login nodes and scratch file systems are available until noon Dec 22, 2015 (no guarantee, only if there are no system issues.)
12/17/15 8:00 PST12/17/15 12:12 PSTCoriScheduled maintenance.
12/16/15 5:33 PST12/16/15 13:50 PSTCoriUnavailable. Cori was unavailable due to filesystem problems.
12/15/15 20:05 PST-Global HomesFilesystems /global/common, /global/u1, and /global/u2 are being moved to CRT starting from 21:00 today (Tuesday, 12/15). The work is expected to finish by 12:00 on Friday (12/18). During the move, the file systems will remain available but user may experience short IO delay occasionally.
12/15/15 20:04 PST-Global CommonFilesystems /global/common, /global/u1, and /global/u2 are being moved to CRT starting from 21:00 today (Tuesday, 12/15). The work is expected to finish by 12:00 on Friday (12/18). During the move, the file systems will remain available but user may experience short IO delay occasionally.
12/15/15 7:00 PST12/15/15 18:00 PSTCoriUnavailable. Dedicated test time. Logins will remain available.
12/14/15 1:17 PST12/14/15 8:20 PSTCoriLogins available, batch jobs not running. scratch filesystem was unavailable
12/13/15 8:30 PST12/14/15 17:00 PSTNX ServicesSystem in degraded mode. NX service was on degraded mode due to a hardware failure.
12/11/15 9:40 PST12/11/15 10:45 PSTHopperSystem in degraded mode. Engineers are investigating /scratch access issues. Logins/jobs may hang.
12/10/15 9:00 PST12/10/15 14:55 PSTWebsite: www.nersc.govScheduled maintenance. Maintenance on Sites will be read-only during the maintenance window, and may be unavailable for short periods of time.
12/09/15 9:00 PST12/09/15 11:15 PSTScience Gateway ServicesScheduled maintenance. Maintenance on sgnworker. Some services (including iPython and R-Studio) may be down for a GPFS upgrade.
12/07/15 8:03 PST-CoriLustre filesystem was degraded. Slurm was also degraded - debug is still available. Lustre filesystem recovered at 17:30 12/7/15
12/04/15 21:34 PST12/04/15 22:56 PSTHopperSystem in degraded mode. Hopper /scratch2 was degraded. Logins were available and jobs were running. However, jobs that were using /scratch2 may have been lost.
12/03/15 16:03 PST12/03/15 17:28 PSTHopperSystem in degraded mode. Logins were sluggish.
12/03/15 8:20 PST-CoriDedicated Benchmark testing 0800-14:00
12/03/15 8:00 PST12/03/15 14:20 PSTCoriScheduled maintenance. Dedicated benchmark testing. Login nodes will not be available.
12/03/15 1:20 PST12/03/15 3:55 PSTCoriSystem in degraded mode. Users are able to login and jobs are running.
12/02/15 8:00 PST12/02/15 18:42 PSTCoriScheduled maintenance. Software maintenance. **Updated end time**
12/01/15 8:59 PST-CoriCompiles done using the Intel environment are currently experiencing delays or hangs. We have identified the cause and are working to resolve the problem. Please use an older version of the intel compilers by typing
12/01/15 0:31 PST-Project/project is under maintenance for our move from OSF to CRT starting on 18:00 PST, November 30, 2015 - 18:00 PST, December 13, 2015. /project is available during the maintenance, however, users may experience degraded IO performance.
11/30/15 7:15 PST01/04/16 14:05 PSTEdisonUnavailable. Edison is currently powered down for it's move from OSF to CRT. Edison is expected be offline for up to six weeks.
11/29/15 23:45 PST11/30/15 10:18 PSTScience Gateway ServicesUnavailable. Science Gateway's are currently unavailable. Engineers are investigating the cause of the outage.
11/25/15 9:00 PST11/25/15 12:25 PSTHPSS BackupScheduled maintenance.
11/20/15 18:10 PST11/20/15 19:50 PSTPDSFLogins unavailable, batch jobs running. The PDSF loadbalancer was unavailable. Direct logins to pdsf6,7,8 are available. Jobs ran okay. Engineers fixed the problem.
11/20/15 16:53 PST11/20/15 19:50 PSTGenepoolSystem in degraded mode. Genepool logins via the loadbalancer were unavailable. Direct logins to Genepool10,11,12 login nodes are available. Jobs ran okay.
11/20/15 10:35 PST11/20/15 12:54 PSTEdisonSystem in degraded mode. Due to a problem with the batch scheduler, the system was degraded.
11/19/15 17:09 PST11/19/15 17:46 PSTHopperSystem in degraded mode. Jobs were paused while engineers investigated possible scheduling issues.
11/18/15 9:00 PST11/18/15 12:52 PSTHPSS UserScheduled maintenance.
11/16/15 7:32 PST-Project/project migration completed at 04:30 PST
11/15/15 9:32 PST-HopperAll Jobs on Hopper were lost due to an unplanned power outage event on 11/14/15. Jobs (and user environment) may have been impacted due to GPFS filesystems being unavailable
11/15/15 9:13 PST-EdisonAll Jobs on Edison were lost due to an unplanned power outage event on 11/14/15.
11/15/15 2:14 PST-HopperHopper is currently scheduled to retire on December 15, 2015 at noon. No software changes will be made, except for security-related issues. A more detailed timeline will be announced as the date gets closer. All files on the /scratch and /scratch2 file systems will be not accessible after Hopper retirement. Please back up your files to HPSS. If you plan to back up many small files to HPSS, be sure to use the htar utility to concatenate those small files into a larger archive.
11/14/15 23:30 PST11/15/15 11:30 PSTGenepoolSystem in degraded mode. System is available, but in degraded state as more compute nodes are being brought back from down status.
11/14/15 16:59 PST11/15/15 0:55 PSTCoriSystem in degraded mode. Cori was degraded due to filesystem issues caused by an OSF site power outage.
11/14/15 15:09 PST11/14/15 19:45 PSTScience Gateway ServicesUnavailable. Science Gateway Services were down due to the result of the power outage.
11/14/15 13:47 PST11/14/15 15:30 PSTData Center Power OutageUnavailable. NERSC experienced a power outage at 13:47. Power has been restored at 14:49 and Environment was stable at 15:30.
11/14/15 13:47 PST11/14/15 17:26 PSTCoriSystem in degraded mode.
11/14/15 13:47 PST11/15/15 3:04 PSTEdisonUnavailable. Edison was unavailable due to OSF site power outage.

Show Older Outages: