NERSCPowering Scientific Discovery Since 1974

Move to CRT

Beginning in Fall 2015, NERSC is in the process of moving from the Oakland Scientific Facility (OSF) in downtown Oakland to a brand new building:  the Computational Research and Theory (CRT) facility, located on the main Lawrence Berkeley National Laboratory campus. 

Impact on NERSC users

NERSC is making every effort to minimize the move's impact on its users. We do not expect to shut down the center completely at any point during the move and NERSC will always have at least one computational system available. However, less computing resources were available while Edison was being moved.  There will also be times when the I/O bandwidth to various global file systems will be reduced for extended periods. 

The move is being performed in phases spread over several months. During the move there will be extended periods where some resources are at OSF, and others are at CRT. Communication between the two sites will be provided by a single 400 Gb/s network connection, and a backup 100Gb/s connection.. Thus, if you are running on a compute platform located at OSF, but performing I/O on a file system located at CRT, I/O bandwidth may be reduced.

Summary

Compute Systems

Edison began its move to CRT on Nov 30, 2015 and was brought back online on January 4, 2016. Carver and Hopper were retired at OSF on September 30 and December 15, 2015, respectively. 

There will be no outage for Genepool and PDSF.

Storage Systems

Global scratch was retired on Oct 14, 2015. All other global file systems (homes, common, project, projectb, dna, seqfs) are being moved to CRT in phases starting in early November. Communication between the two sites is provided by a primary 400Gb/s network connection, and a backup 100Gb/s connection. 

Each global file system is being migrated over the high speed network to CRT. The migration process is an I/O intensive activity, and will consume a sizable fraction of available file system bandwidth. During the migration, you may notice a reduction in file system performance. After the migration completes, some compute systems at OSF will have decreased file system bandwidth. We do not expect any outage of the global file systems during this process.

Both HPSS systems (user archive and system backup) will remain at OSF until other moves complete, accessed via the 400 Gb/s network link.

Key Dates

Event Date
Carver Retires September 30, 2015 (at noon).  Retired
Jesup Testbed Retires September 30, 2015 (at noon).  Retired
Global Scratch Retires October 14, 2015 (at noon).  Retired 
Cori Phase 1 becomes available to all users All users enabled on Nov. 11, 2015
Edison offline for move to new facility November 30, 2015 (all files on scratch file systems will be deleted). Edison returned to service on January 4, 2016.
Hopper retires December 15, 2015, at noon. Retired

 

Detailed Schedule

September 2015

Cori Phase 1 and associated file systems were installed at CRT.

Carver was retired on Sept 30, 2015 at noon. On the same day (and same time), the Global Scratch file system became read-only and the Jesup testbed system was retired.

October 2015

Global Scratch was retired on Oct 14, 2015 at noon.

Cori Phase 1 became available to early users.

November 2015

Cori Phase 1 available to all users on Nov 11, 2015.

The Global Projecta file system was migrated to CRT (Nov 13-16). During the migration, available bandwidth was reduced. After the migration,  IO performance from systems at OSF accessing /global/projecta at CRT may be reduced. There was no outage of /global/projecta during this process. 

The Global Project file system was migrated to CRT (Nov 30 to Dec 6). During the migration, available bandwidth was reduced. After the migration, IO performance from systems at OSF accessing /global/project at CRT may be reduced. There was no outage of /global/project during this process.

Edison was powered off at 7:00am PST on Nov 30, 2015.   Edison scratch file systems were reformatted and all data removed. ALL files on the /scratch1, /scratch2, and /scratch3 file systems have been deleted.

Edison queues were turned off and all running jobs were killed at 00:01 PST (midnight) on November 30, 2015. All queued jobs were deleted. Edison login nodes were available until 7am PST on Nov 30 for users to retrieve files. 

December 2015

Edison began the move to CRT; relocation expected to result in up to 6 weeks of downtime.

Hopper retired on December 15 at noon.

Global homes were migrated to CRT December 15-21.

January 2016

Edison returned to service in its new home in CRT, brought online January 4 with SLURM as the batch scheduler. 

/global/projectb and /global/dna/ (JGI file systems) will be migrated to CRT beginning January 11. /global/seqfs (JGI file system) will be migrated to CRT after January 30.  During the migration, available bandwidth will be reduced. After migration completes,  IO performance from systems at OSF accessing these file systems at CRT may be reduced. We do not expect any outage of these file systems during this process.  

February 2016

Some PDSF file systems will be relocated to CRT. This will result in about a two week outage for those file systems.

A cluster providing resources to PDSF, JGI, the Materials Project and Babbage will be moved (currently scheduled starting the week of February 8). There will be new compute components for PDSF and JGI already in place at CRT, so this will not be an outage; however, the total amount of available compute resources will be reduced during the move. For the Materials Project there will be a three week outage while it is moved. Babbage will be offline for up to a month while it is moved.

Summer 2016

Cori Phase 2 will be delivered and installed at CRT.

Fall 2016

Cori Phase 2 will be available to all users.