NERSCPowering Scientific Discovery Since 1974

Perlmutter

NERSC9 Perlmutter Cabinet Art

 

NERSC's next supercomputer will be an HPE Cray system named “Perlmutter” in honor of Saul Perlmutter, an astrophysicist at Berkeley Lab who shared the 2011 Nobel Prize in Physics for his contributions to research showing that the expansion of the universe is accelerating.  Dr. Perlmutter has been a NERSC user for many years, and part of his Nobel Prize-winning work was carried out on NERSC machines and the system name reflects and highlights NERSC's commitment to advancing scientific research. 

Perlmutter, based on the HPE Cray “Shasta” platform, will be a heterogeneous system comprising both CPU-only and GPU-accelerated nodes, with a performance of 3-4 times Cori, NERSC’s current platform. It will include a number of innovations designed to meet the diverse computational and data analysis needs of NERSC’s user base and speed their scientific productivity. The new system derives performance from advances in hardware and software, including a new Cray system interconnect, code-named Slingshot, which is designed for data-centric computing. Slingshot’s Ethernet compatibility, advanced adaptive routing, first-of-a-kind congestion control, and sophisticated quality of service capabilities improve system utilization and performance and scalability of supercomputing and AI applications and workflows. The system will also feature NVIDIA A100 GPUs with new Tensor Core technology, direct liquid cooling and will be NERSC’s first supercomputer with an all-flash scratch filesystem. The 35-petabyte Lustre filesystem will move data at a rate of more than 5 terabytes/sec.

The system is scheduled to be delivered in two phases: Phase 1, with 12 GPU-accelerated cabinets housing over 1,500 nodes, and 35PB of All-Flash storage, will be delivered in early 2021, and Phase 2 with 12 CPU cabinets will be delivered later in 2021.

perlmutter 2 phases

Each of Phase 1's GPU-accelerated nodes will have 4 of NVIDIA A100 Tensor Core GPUs based on the NVIDIA Ampere GPU architecture, along with 256GB of memory for a total of over 6,000 GPUs. In addition, the Phase 1 nodes will each have a single AMD Milan CPU. The Phase 1 system also includes Non-Compute Nodes (NCNs): 20 User Access Nodes (NCN-UANs – login nodes) and service nodes. Some NCN-UANs can be used to deploy containerized user environments using Kubernetes for orchestration.

Each of Phase 2's CPU nodes will have 2 AMD Milan CPUs with 512GB of memory per node. The system will contain over 3,000 CPU-only nodes. The Phase 2 system will include 20 more login nodes and four large memory nodes.

The programming environment will feature NVDIA HPC SDK (Software Development Kit) in addition to the familiar CCE, GNU and LLVM compilers, to support diverse parallel programming models such as MPI, OpenMP, CUDA, and OpenACC for C, C++ and Fortran codes.

To ensure that our users are able to utilize the new technology in Perlmutter, NERSC has implemented a robust application readiness plan for simulation, data and learning applications through the NERSC Exascale Science Applications Program (NESAP). Support for complex workflows through new scheduling techniques and support for Exascale Computing Project (ECP) software is also planned on the new system.

The new system will be located in Wang Hall at Berkeley Lab, the site of NERSC's current supercomputers. NERSC is the DOE Office of Science’s (SC’s) mission high performance computing facility, supporting more than 7,000 scientists and 700 projects annually. The Perlmutter system represents SC’s ongoing commitment to extreme-scale science, developing new energy sources, improving energy efficiency, discovering new materials and analyzing massive data sets from scientific experimental facilities.

Timeline

  • November, 2020 - March, 2021: Cabinets containing GPU compute nodes and service nodes for the Phase 1 system arrived on site, and the system is being built.

    perlmutter phase1 install

  • Second quarter of 2021: When the system installation completes, NESAP teams will get first access to the system.

Related Material