NERSCPowering Scientific Discovery Since 1974

Outage Log

This table lists events reported, ongoing, and resolved on NERSC systems. It is a historical record and may not be updated while a system event is in progress.

Event Date/TimeUp Date/TimeSystemComment
04/25/17 15:48 PDT-CoriBurstBuffer users should read: www.nersc.gov/users/computational-systems/cori/burst-buffer/known-issues
04/25/17 14:54 PDT04/25/17 15:05 PDTCoriUnavailable. The batch system was unavailable. Logins were available.
04/25/17 1:00 PDT04/25/17 14:29 PDTEdisonUnavailable. Edison is currently unavailable and engineers are currently investigating the issue.
04/24/17 11:00 PDT04/24/17 17:25 PDTEdisonUnavailable. Edison is currently unavailable. Engineers are actively working on the situation.
04/21/17 16:53 PDT04/25/17 15:45 PDTCoriSystem in degraded mode. BurstBuffer users should read: www.nersc.gov/users/computational-systems/cori/burst-buffer/known-issues
04/20/17 21:15 PDT04/21/17 7:35 PDTCoriDedicated runs. Cori will be undergoing dedicated test runs follow maintenance. Logins and job submission will be available.
04/19/17 9:00 PDT04/19/17 12:30 PDTHPSS BackupScheduled maintenance. Regent will be unavailable while engineers work on the system.
04/19/17 9:00 PDT04/19/17 14:00 PDTHPSS UserScheduled maintenance. Archive will be unavailable while engineers work on the system.
04/19/17 7:00 PDT04/20/17 21:15 PDTCoriScheduled maintenance. Cori will be down for hardware upgrades.
04/18/17 6:00 PDT04/19/17 7:00 PDTCoriScheduled maintenance. A subset of KNL resources will be unavailable to prepare for the hardware upgrade on 04/19. No other resources will be affected.
04/15/17 9:00 PDT04/15/17 12:35 PDTCoriDedicated runs. System is scheduled for KNL full-scale runs. Haswell nodes will remain available. Job submission will still be available.
04/14/17 9:00 PDT04/14/17 21:00 PDTCoriDedicated runs. System is scheduled for KNL full-scale runs. Haswell nodes will remain available. Job submission will still be available.
04/13/17 19:07 PDT04/14/17 5:04 PDTCoriUnavailable. scratch filesystems unavailable. No new jobs starting. Engineers are investigating.
04/13/17 19:07 PDT04/14/17 3:58 PDTEdisonUnavailable. scratch filesystems unavailable. No new jobs starting. Engineers are investigating.
04/13/17 9:00 PDT04/13/17 22:00 PDTCoriDedicated runs. System is scheduled for KNL full-scale runs. Haswell nodes will remain available. Job submission will still be available.
04/12/17 5:50 PDT04/12/17 12:00 PDTNERSC WebsiteSystem in degraded mode. www.nersc.gov is currently degraded. Engineers are working to resolve the issue.
04/10/17 13:40 PDT04/10/17 14:00 PDTNERSC WebsiteUnavailable. www.nersc.gov, my.nersc.gov, cs.lbl.gov, and crd.lbl.gov are currently unavailable. Engineers are actively working to resolve this issue.
04/10/17 11:00 PDT04/10/17 13:00 PDTNERSC WebsiteScheduled maintenance. Maintenance will be performed on: www.nersc.gov, my.nersc.gov, crd.lbl.gov, cs.lbl.gov.
04/10/17 8:43 PDT04/10/17 16:13 PDTEdisonUnavailable. Edison is currently down due to failure of a node cabinet. Engineers are actively working to resolve the issue.
04/10/17 8:00 PDT04/10/17 18:00 PDTCoriDedicated runs. System is scheduled for KNL full-scale runs. Haswell nodes will remain available. Job submission will still be available.
04/07/17 15:45 PDT04/07/17 21:45 PDTCoriDedicated runs. System is scheduled for KNL full-scale runs. Haswell nodes will remain available. Job submission will still be available.
04/07/17 10:00 PDT04/07/17 10:45 PDTNX ServicesScheduled maintenance. NX license is being swapped. User sessions may be interrupted.
04/07/17 8:00 PDT04/07/17 15:42 PDTCoriScheduled maintenance. Engineers will be performing system maintenance to improve stability and reliability of KNL compute nodes.
04/05/17 11:11 PDT04/05/17 11:27 PDTCoriSystem in degraded mode. Job submission and slurm commands were unavailable during this event. Engineers are working to determine the root cause.
04/05/17 9:00 PDT04/05/17 10:45 PDTHPSS BackupScheduled maintenance. Regent will be unavailable while engineers work on the system.
03/31/17 0:34 PDT03/31/17 1:57 PDTCoriUnavailable. Cori was unavailable due to problems with the batch scheduler.
03/30/17 9:13 PDT-CoriAll KNL nodes are reserved for dedicated science runs until 3/31 08:00 PDT. Haswell nodes are available
03/29/17 12:06 PDT03/29/17 15:15 PDTCoriSystem in degraded mode. Cori is experiencing a filesystem issue that is under investigation. Some SLURM partitions were marked down.
03/29/17 10:00 PDT03/29/17 11:22 PDTNIMScheduled maintenance.
03/29/17 9:00 PDT03/29/17 13:49 PDTHPSS UserScheduled maintenance. Correction: Archive Maintenance is complete.
03/28/17 7:00 PDT03/28/17 17:20 PDTCoriScheduled maintenance. Cori will be down for firmware and software updates.
03/24/17 10:00 PDT03/24/17 12:30 PDTEdisonScheduled maintenance. Edison is unavailable. SLURM is being upgraded.
03/23/17 18:00 PDT03/23/17 21:00 PDTCoriUnavailable. The system is down for a short time while engineers resolve remaining issues from scheduled maintenance.
03/23/17 5:49 PDT03/23/17 6:13 PDTEdisonSystem in degraded mode. The Edison scheduler was down, and new jobs could not be submitted. Running jobs were not affected. Engineers are currently investigating.
03/22/17 11:00 PDT03/22/17 11:30 PDTNIMScheduled maintenance. Engineers are performing a planned maintenance. NIM will be unavailable.
03/21/17 7:00 PDT03/23/17 18:00 PDTCoriScheduled maintenance.
03/19/17 12:00 PDT03/19/17 21:30 PDTCoriSystem in degraded mode. KNL jobs were unable to be submitted, and new KNL jobs would not start.
03/16/17 16:48 PDT03/21/17 7:00 PDTCoriSystem in degraded mode. Cori is in degraded mode at this time. Only knl,quad,cache and haswell jobs will run. This will be fixed after the upgrade tomorrow. Logins are still available.
03/16/17 14:17 PDT03/16/17 14:36 PDTGenepoolSystem in degraded mode. Genepool Web Servers are degraded. Engineers are working to resolve the issue ASAP.
03/16/17 7:07 PDT03/16/17 7:15 PDTCoriSystem in degraded mode. SLURM was briefly unavailable, no jobs could be submitted and some running jobs may have been affected.
03/15/17 9:02 PDT03/15/17 12:59 PDTHPSS UserScheduled maintenance. Archive (User) Scheduled Maintenance. The system is down.
03/15/17 8:06 PDT03/15/17 9:55 PDTScience Database ServicesUnavailable. mysql and postgres services hosted on scidb1 were unavailable. Engineers have resolved the issue.
03/12/17 8:00 PDT03/12/17 16:05 PDTCoriUnavailable. Jobs are running. New jobs will not start. Engineers are working to solve the issue.
03/10/17 14:16 PST03/10/17 15:47 PSTEdisonSystem in degraded mode. Jobs requiring /scratch3 will not run. Engineers are actively working on the situation.
03/07/17 19:15 PST03/08/17 9:30 PSTCoriSystem in degraded mode. Datawarp is unavailable. Engineers are investigating the issue
03/07/17 13:56 PST03/07/17 15:00 PSTNX ServicesUnavailable.
03/07/17 7:30 PST03/07/17 9:30 PSTNIMScheduled maintenance. NIM will be unavailable.
03/03/17 11:50 PST03/03/17 15:40 PSTScience Gateway ServicesSystem in degraded mode. Engineers are currently investigating a possible network issue with the Science Gateway nodes. Portal sites may be affected.
03/01/17 9:00 PST03/01/17 10:23 PSTData Transfer NodesScheduled maintenance. DTN cluster will be down for maintenance.
03/01/17 9:00 PST03/01/17 12:53 PSTHPSS BackupScheduled maintenance.
03/01/17 9:00 PST03/01/17 9:19 PSTMatgenScheduled maintenance. Logins will be unavailable. Running jobs should not be affected.
03/01/17 6:00 PST03/03/17 19:34 PSTCoriScheduled maintenance. Cori was down for cabinet additions and HSN (high-speed network) maintenance. Logins will not be available.
02/28/17 6:00 PST03/01/17 6:00 PSTCoriScheduled maintenance. Cori will be degraded due to cabinet additions. Datawarp nodes will be reduced during this time.
02/28/17 1:35 PST02/28/17 4:25 PSTCoriSystem in degraded mode. /globla/cscratch1 filesystem was in degrade mode, preventing new jobs to run.
02/25/17 11:10 PST02/25/17 17:18 PSTEdisonUnavailable. Engineers are actively working to resolve a rectifier issue with Edison row 2. Further updates will be provided as soon as possible. Logins are available however jobs are impacted.
02/24/17 16:15 PST02/24/17 16:38 PSTCoriLogins available, batch jobs not running. Engineers were working on an issue with the batch system.
02/24/17 13:04 PST-CoriThe BurstBuffer DW alternate pool 'sm_pool' is temporarily unavailable until March 1, 2017.
02/23/17 8:00 PST02/23/17 15:50 PSTEdisonScheduled maintenance. Engineers will be performing hardware maintenance. Login nodes will be available.
02/21/17 21:15 PST02/21/17 22:15 PSTCoriSystem in degraded mode. The majority of the system's compute nodes are currently unavailable. Engineers are investigating the issue
02/21/17 8:00 PST02/21/17 21:15 PSTCoriScheduled maintenance. Cori will be unavailable while updates are applied. Logins will be available, however no jobs will run.
02/21/17 0:05 PST02/21/17 9:24 PSTEdisonSystem in degraded mode. Edison is running in a degraded mode w/o /global/cscratch1. Engineers are investigating the issue.
02/18/17 12:22 PST02/18/17 13:07 PSTCoriSystem in degraded mode. Cori "cscratch1" is currently unavailable due to issues with its metadata servers. Engineers are actively working on this issue, and expect to have the filesystem up shortly.
02/17/17 22:37 PST02/17/17 22:45 PSTCoriSystem in degraded mode. /cscratch1 was briefly unavailable due to filesystem issues.

Show Older Outages: