NERSCPowering Scientific Discovery Since 1974

Perlmutter

PerlmutterCabinetsFinal

The mural that appears on NERSC's next major high-performance computing system, nicknamed Perlmutter, pays tribute to Saul Perlmutter and the team he led to the Nobel-Prize-winning discovery of an accelerating universe. Select image to enlarge.(Credits: Simulation by Zarija Lukic, Visualization by Andrew Myer; Collage by Susan Brand, Berkeley Lab)

 

NERSC's next supercomputer is now (July 2021) being installed at the center's facility in Shyh Wang Hall at Berkeley Lab. The system is named in honor of Saul Perlmutter, an astrophysicist at Berkeley Lab who shared the 2011 Nobel Prize in Physics for the ground shaking discovery that the rate at which the universe expands is accelerating. Dr. Perlmutter has been a NERSC user for many years, and part of his Nobel Prize-winning work was carried out on NERSC machines. The system name reflects  NERSC's commitment to advancing scientific research.

Perlmutter, based on the HPE Cray “Shasta” platform, is a heterogeneous system with both GPU-accelerated and CPU-only  nodes. Its projected performance is three to four times that of NERSC's current flagship system, Cori. The system is being installed in two phases with Phase 1, which includes the system's GPU-accelerated nodes and scratch file system,  is expected to be available for early science campaigns starting in summer of 2021. Phase 2 will add CPU-only nodes later in 2021.

See also:

Innovations to Support the Diverse Needs of Science

Perlmutter includes a number of innovations designed to meet the diverse computational and data analysis needs of NERSC’s users and to speed their scientific productivity.

The new system derives performance from advances in hardware and software, including a new Cray system interconnect, code-named Slingshot. Designed for data-centric computing, Slingshot’s Ethernet compatibility, advanced adaptive routing, first-of-a-kind congestion control, and sophisticated quality of service capabilities improve system utilization and performance, as well as scalability of supercomputing and AI applications and workflows.

The system will also feature NVIDIA A100 GPUs with new Tensor Core technology and direct liquid cooling. Perlmutter will also be NERSC’s first supercomputer with an all-flash scratch filesystem. The 35-petabyte Lustre filesystem will move data at a rate of more than 5 terabytes/sec making it the fastest storage system of its kind.

Two-Phase Installation

Phase 1 is made up of 12 GPU-accelerated cabinets housing over 1,500 nodes and 35 petabytes of all-flash storage. Phase 2 adds 12 CPU cabinets with more than 3,000 nodes.

perlmutter 2 phases

Each of Phase 1's GPU-accelerated nodes has four NVIDIA A100 Tensor Core GPUs based on the NVIDIA Ampere GPU architecture alongside 256GB of memory for a total of over 6,000 GPUs. Each Phase 1 node also has a single AMD "Milan" CPU. The Phase 1 system also includes Non-Compute Nodes (NCNs), 20 User Access Nodes (NCN-UANs – login nodes) and service nodes. Some NCN-UANs can be used to deploy containerized user environments using Kubernetes for orchestration.

The Phase 1 system achieved 64.6 Pflop/s, putting it at No. 5 in the Top500 list in June, 2021.

Each of Phase 2's CPU nodes will have two AMD Milan CPUs with 512GB of memory per node.  The Phase 2 system also adds 20 more login nodes and four large memory nodes.

The programming environment will feature NVDIA HPC SDK (Software Development Kit) in addition to the familiar CCE, GNU and LLVM compilers to support diverse parallel programming models such as MPI, OpenMP, CUDA, and OpenACC for C, C++ and Fortran codes.

Preparing for Perlmutter

To be sure that our users are able to use the new technology in Perlmutter, NERSC has implemented a robust application readiness plan for simulation, data, and learning applications through the NERSC Exascale Science Applications Program (NESAP). One outcome from the efforts is the webpage Transitioning Applications to Perlmutter that provides recommendations for applications developers and users who are preparing for the new system. Support for complex workflows through new scheduling techniques and support for Exascale Computing Project (ECP) software is also planned on the new system.

NERSC is the DOE Office of Science’s (SC’s) mission high-performance computing facility, supporting more than 8,000 scientists and 1,000 projects annually. The Perlmutter system represents SC’s ongoing commitment to extreme-scale science, developing new energy sources, improving energy efficiency, discovering new materials, and analyzing massive data sets from scientific experimental facilities.

Building Software and Running Applications

There are several HPE-Cray-provided base compilers available on Perlmutter, with varying levels of support for GPU code generation: HPE Cray, GNU, AOCC (AMD Optimizing C/C++ Compiler), and NVIDIA. All suites provide compilers for C, C++, and Fortran. Additionally, NERSC plans to provide the LLVM compilers there. Information on how to build software can be found in the technical documentation on compilers.

Information on how to launch parallel jobs on GPU-accelerated compute nodes can be found here.

Timeline

  • November 2020 - July 2021: Cabinets containing GPU compute nodes and service nodes for the Phase 1 system arrived on-site and are being configured and tested.
  • Summer  2021: When the Phase 1 system installation completes, NESAP teams will get first access to the system.
perlmutter phase1 install

Perlmutter's Phase I cabinets without the doors attached reveal the blue and red lines of its direct liquid cooling system.

  • June 2021: The Phase 1 system is ranked at No. 5 in the Top500 list.


Related Links