Global Reach: NERSC Helps Manage and Analyze LHC Data
September 1, 2008
Over 15 million gigabytes of data per year will need to be stored, processed, backed up, and distributed to researchers across the world, when the Large Hadron Collider (LHC) begins smashing together beams of protons to search for new particles and forces, and beams of heavy nuclei to study new states of matter. Managing this mountain of data requires an international effort, with portions of the beams of protons to search for new particles and forces, and beams of heavy nuclei to study new states of matter.
Large amounts of data from two of the LHC’s detectors, ATLAS and ALICE, will be sent to the Department of Energy’s National Energy Research Scientific Computing Center (NERSC), where two systems will be utilized to manage it. The High Performance Storage System (HPSS) will archive the raw and processed data. Meanwhile, the Parallel Distributed Systems Facility (PDSF) will process and distribute results to thousands of scientists across the globe.
“You can build a world class particle accelerator like the LHC, but if you don’t have a way to store and analyze the data, it’s all for nothing,” says Peter Jacobs, of the Lawrence Berkeley National Laboratory’s Nuclear Science Division, who contributed to the construction of the ALICE detector, one of the four large experiments at the LHC.
The LHC particle accelerator is located in Geneva, Switzerland, and is managed by the European Center for Nuclear Research (CERN).
PDSF Comes Full-Circle
When the LHC begins full operation next year, it will be the most powerful particle accelerator in the world. Capable of smashing together protons at an unprecedented 14 tera-electron volts of energy, it opens up a vast new land-scape called the “terascale” for exploration. One exciting possibility is the discovery of the Higgs boson, a fundamental particle predicted to give mass, or weight, to all matter in the Universe; another is the discovery of a new family of “supersymmetric” particles, which are predicted to exist by theories that unify all the forces of nature. The LHC will also be the world’s highest energy collider of heavy nuclei, generating matter under the extreme conditions that existed a few microseconds after the Big Bang.
Terrascale physics may be the new frontier, but the LHC is not the first collider designed to explore it. The U.S. had ambitions to investigate this realm of sci-
ence in the 1980s, when its scientists began constructing the Superconducting Super Collider (SSC) in Waxahachie, Texas. In fact, the original PDSF was built to analyze SSC data. However, the project was cancelled in the mid-90s, and PDSF was transferred to NERSC, where it underwent multiple upgrades and expansions. Now it is one of the most flexible computing facilitates in the U.S., and has supported the majority of large nuclear and high energy physics projects undertaken by the country’s leading scientists.
“PDSF is designed from the beginning to be able to support a wide range of nuclear science and high energy physics, from terascale phsics accelerators to experiments in the wastes of Antarctic to space experiments, in different ways. With the LHC, it comes full-circle to support ultra-large collider experiments again,” says Jay Srinivasan, PDSF Systems Lead.
According to Jacobs, the PDSF architecture is ideal for processing high energy physics data because the different nodes in the cluster do not really need to communicate with each other. Each particle collision is taken as a single event, and only one node is required to process that event. “All we really need is a large set of processors that can access tremendous amounts of data,” he says.
PDSF will also provide hundreds of terabytes of disk storage for the LHC experiments, allowing users to leverage NERSC’s expertise in deploying high-performance parallel filesystems.
“At NERSC we will provide scientists the processing capabilities of PDSF, along with user support from consultants who are experts in computational science and performance tuning, visualization assistance, training, customized support, and other services,” says Srinivasan. “And, because PDSF uses commodity technology, or hardware that is in the marketplace, we offer a very cost efficient service.”
In addition to PDSF, data from the LHC will also be archived by HPSS, a system that has been used for archival storage since 1998. HPSS has over 6 petabytes of data stored in more than 70 million files.
“Both PDSF and HPSS have processed and stored data from experiments similar to the LHC, so we know what to expect,” says Srinivasan. “However, we are all really excited to be a part of a major project that will advance our knowledge of the world around us.”
PDSF is funded by the Computing Sciences, Physics, and Nuclear Sciences divisions at the Lawrence Berkeley National Laboratory.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 7,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.