NERSCPowering Scientific Discovery for 50 Years

NERSC History

Overview

In 2024, the National Energy Research Scientific Computing Center (NERSC) celebrates 50 years of accelerating scientific discovery for the DOE Office of Science. Launched in 1974 as a computing resource for fusion energy research at Lawrence Livermore National Laboratory, NERSC quickly expanded its role to include users from all SC program offices. In 1996, the center moved to Lawrence Berkeley National Laboratory. Today, its almost 10,000 users make NERSC one of the most scientifically productive high performance computing (HPC) centers in the world: NERSC users produce more than 2,500 referred publications annually. In addition, over the years, the center has been associated with six Nobel Prize-winning scientists or their teams.

Wang Hall

NERSC is housed in Shyh Wang Hall on the campus of Lawrence Berkeley National Laboratory.(Credit: Roy Kaltschmidt, Berkeley Lab)


NERSC places a premium on delivering cutting-edge technology at scale and making it highly usable and productive in its role as the mission HPC center for the DOE Office of Science. As a unique resource, the center has been a leader in fielding next-generation supercomputing systems like the Cray T3E-900, the IBM SP POWER 3, IBM POWER 5+, and the Cray XT4. NERSC’s recent supercomputers were the first of their kind: Edison, a Cray XC30, included the first two cabinets of its line developed by Cray and DARPA; and Cori, a Cray XC40, is the first and largest system with Intel Xeon Phi “Knights Landing” processors, debuting as the 5th most powerful supercomputer in the world in 2016. In addition to its computing capabilities, Cori was the first large system to deploy an all-FLASH “burst buffer” that provided a world-best 1.7 TB/sec file system bandwidth when it was introduced. Perlmutter, a Cray Shasta system featuring both CPU and GPU/CPU nodes, was installed in two phases starting in 2020 when it debuted as No. 5 in the Top500 list of the world's fastest supercomputers. Today, the hybrid system, is 

NERSC is also a large-scale data analysis and storage center: there is a net flow of data into the center. Its tape archive, currently based on the High Performance Storage System (HPSS) technology, holds 200 PBs of data, some of which dates back to the first days of the center in the 1980s. NERSC is also unique among DOE National Laboratory computing centers in that it has never deleted scientific data from its archive. The center currently hosts datasets of more than 5PB each in climate science, genomics, nuclear physics, and from light source facilities, cosmic microwave background studies, and a number of neutrino and high-energy physics experiments. To help scientists address their growing data needs, NERSC created a Data Department in 2015 and has developed a vigorous program supporting deep learning for science. The center maintains state-of-the-art external networking capabilities – provided by ESnet – to help scientists move data to the center for analysis and archiving. NERSC has a long history of supporting data-driven science; many high-energy and nuclear physics teams have used NERSC data systems and the PDSF cluster and for their analyses. NERSC also maintains a close collaboration with DOE’s Joint Genome Institute, which conducts high-throughput DNA sequencing, synthesis, and analysis in support of BER’s bioenergy and environmental missions.

To ensure that scientists, whose expertise in HPC range from world-class to first-year graduate students, are as productive as possible, NERSC puts a lot of effort into creating systems that are highly available while offering expert consulting and performance optimization support from a team largely drawn from the scientific community itself. Among its innovations were using web technologies to expose job-level detail to users in the early 2000s, pioneering science gateways, developing the “Shifter” container technology for HPC, and recently enabling machine learning, data analytics software, and workflow frameworks at scale.

In 1978 NERSC developed CTSS, the Cray Time Sharing System, to allow a remote user interface to its Cray 1 supercomputer. The center was the first to checkpoint a full distributed-memory supercomputer (its Cray T3E) in 1997, and it launched the DOE INCITE program in 2003.

50 Years of Supercomputing for Science

The founding of the world’s first unclassified supercomputing center began in 1973 when Dr. Alvin Trivelpiece, then deputy director of the Controlled Thermonuclear Research (CTR) program of the Atomic Energy Commission, solicited proposals for a computing center that would aid in reaching fusion power, giving the magnetic fusion program under CTR access to computing power similar to that of the defense programs. Lawrence Livermore National Laboratory was chosen as the site for the new center which would be called the CTR Computer Center (CTRCC), later renamed the NMFECC (National Magnetic Fusion Energy Computer Center), and eventually NERSC. Starting with a cast-off CDC 6600, within a year from its inception, the center added a new CDC 7600 and provided, for the first time, a remote access system that allowed fusion energy scientists at Oak Ridge National Laboratory and Los Alamos National Laboratory (LANL), as well as the General Atomics research center in southern California, to communicate with the centralized computers.

The center continued to deploy leading-edge systems, and in 1978, NMFECC developed the Cray Time Sharing System (CTSS), which allowed remote users to interface with its Cray-1 supercomputer. At the time, computers were essentially custom machines, delivered without software, leaving centers to develop their own. Due to its success, CTSS was eventually adopted by multiple computing centers, including the National Science Foundation (NSF) centers established in the mid-1980s in Chicago, Illinois, and San Diego, California. In 1985, when ORNL deployed a Cray X-MP vector processing system, the system also ran CTSS. NERSC next deployed the first four-processor system, the 1.9-gigaflop Cray-2, which replaced the Cray X-MP as the fastest in the world. Having been prepared for multitasking, CTSS allowed users to run on the Cray-2 just one month after delivery.

In 1983, the NMFECC opened its systems to users in other science disciplines, allocating five percent of system time to the other science offices in DOE’s Office of Energy Research, paving the way for a broader role of computation across the research community. By 1990, the center was allocating computer time to such a wide range of projects from all of the Office of Energy Research offices that the name was changed to NERSC.

The growing number of users and increased demand for computing resources led Trivelpiece, then head of DOE’s Office of Energy Research, to make another decision that mapped out a path for making those resources more widely accessible. He recommended that DOE’s Magnetic Fusion Energy Network (MFEnet) be combined with the High Energy Physics network (HEPnet) to become ESnet (Energy Sciences Network) in 1986. ESnet’s roots stretch back to the mid-1970s, when staff at the CTRCC installed four acoustic modems on the center’s CDC 6600 computer.

As part of the High Performance Parallel Processing project with LANL, NERSC deployed a 128-processor Cray T3D machine, the first large-scale, parallel system from Cray Research, in 1994. The machine was used in a national laboratory-industry partnership to advance the development of parallel codes and upgraded to 256 processors within a year.

In 1996 NERSC moved from Livermore to Lawrence Berkeley National Laboratory acquired the Cray T3E-600 system, its first massively parallel processor architecture machine which was upgraded to a T3E-900 the following year. The system brought with it a fundamental change in the computing environment, making it possible for scientists to perform larger and more accurate simulations. It also had the largest I/O system built to date with 1.5 terabytes of disk storage and a read/write capability of 800 megabytes. Ranked No. 5 on the TOP500 list, this system, named MCurie, was the most powerful computer for open science in the U.S. NERSC’s upgraded T3E- 900 provided the training platform for a materials science project led by ORNL’s Malcolm Stocks, whose code was the first application to reach a sustained performance of 1 teraflop.

By 2003, NERSC was supporting more than 4,000 users from all the Office of Science program offices, and requests for time on its systems were three times what was available. At the direction of Office of Science Director Raymond Orbach, NERSC launched the INCITE (Innovative & Novel Computational Impact on Theory & Experiment) program, which created a system for scientists to apply for and receive large allocations of time on NERSC computing resources. INCITE was expanded to include the leadership computing facilities (LCF) in 2006 and the program is now supported by the ANL and ORNL facilities.

In November 2015, Berkeley Lab opened Shyh Wang Hall, a 149,000-square-foot facility housing NERSC, ESnet, and researchers in the laboratory’s Computational Research Division. The facility is one of the most energy-efficient computing centers anywhere, tapping into the San Francisco Bay’s mild climate to cool NERSC’s supercomputers and eliminate the need for mechanical cooling and earned a Gold LEED (Leadership in Energy and Environmental Design) certification from the U.S. Green Building Council.
The facility soon became home to NERSC’s next system, Cori, a 30-petaflop/s Cray system with Intel Xeon Phi (Knights Landing) processors. With 68 low-power cores and 272 hardware threads per node, Cori was the first system to deliver an energy-efficient, pre-exascale architecture for the entire Office of Science HPC workload. In 2019, NERSC began preparing for the installation of its next-generation, pre-exascale Perlmutter system, a Cray Shasta machine which will be a heterogeneous system comprising both CPU-only and GPU-accelerated cabinets.

Text in this section was derived from  ASCR@40

Advancing High Performance Computing and Data

As a leading HPC center NERSC is developing technologies that advance the state of the art.

In the emerging field of deep learning for science, training performance on the Cori supercomputer at NERSC was enabled at 15 petaflops in 2017, giving climate scientists the ability to use machine learning to identify extreme weather events in the output of huge climate simulations. Analyzing these datasets is challenging so researchers from NERSC, Intel, the Montreal Institute for Learning Algorithms, and Microsoft Research teamed up to create a novel, semi-supervised convolutional deep neural network (DDN). Predictive accuracies ranging from 89.4% to as high as 99.1% showed that DDNs can identify weather fronts, tropical cyclones and atmospheric rivers.

In 2015 NERSC developed and released “Shifter,” a container technology that allows users to bring their custom compute environment to NERSC’s supercomputers. Shifter is based on the Docker container technology, extending its use to HPC systems. Shifter was originally inspired by the need to improve the flexibility and usability of HPC systems for data-intensive workloads, but use cases are expanding to include general HPC workloads. Soon after its initial deployment numerous experimental facilities and academic institutions found that Shifter makes it much easier for them to run their data-centric workloads in an HPC environment. In 2016 the supercomputing company Cray adopted Shifter as an official product. In 2016 NERSC demonstrated that Shifter be used to run complex, scientific, Python-based codes in parallel on more than 9,000 nodes on Cori using it’s Intel Xeon Phi processors. Shifter is currently a finalist for a prized 2018 R&D 100 Award.

"The supercomputing community continues to evolve in our shared quest for discovery and scientific breakthroughs," said Ryan Waite, Cray's senior vice president of products. "We are seeing an increasing number of developers using new technologies to solve their problems. We are delighted to have partnered with NERSC in the development of this important technology."

When a Cray system based on power-efficient Intel Xeon Phi “Knights Landing (KNL)” cores was selected for its “NERSC-8” (aka Cori) system, NERSC knew its large user base would need help porting their codes to run efficiently on that architecture. In 2014 it started the NERSC Exascale Science Application Program (NESAP_ to enable leading teams and their codes to run efficiently at scale on Cori. NERSC worked collaboratively with 20 application teams, Cray and Intel to prepare key applications for Cori. By the time the system went into production in 2017, NESAP applications had improved their performance on KNL by 350%. At the SC14 conference in New Orleans, NERSC received HPCWire’s 2014 Editors’ Choice Award for Best HPC Collaboration Between Government & Industry. The award recognized NERSC’s partnership with Intel and Cray in preparation for Cori.

NERSC introduced the world’s first HPC all-FLASH file system or “burst buffer” on Cori in 2015. Based on Cray’s DataWarp technology Cori’s burst buffer achieved a world-best 1.7 TB/second of peak I/O performance with 28M IO operations per second, and about 1.8PB of storage. The burst buffer greatly improves I/O performance, particular for codes that are I/O heavy, but can use streaming large-block techniques. Many data analytics applications fall into this category, as their data can often be highly complex and unstructured. The paper "Accelerating Science with the NERSC Burst Buffer", won the “Best Paper” at the 2016 Cray User Group meeting.

Show Pagination