Global Scratch Gets an Upgrade
Improvements Will Include Higher Data Output Rates, Connection to PDSF
October 29, 2013
The most used file system at the National Energy Research Scientific Computing Center (NERSC)—global scratch—just got an upgrade. As a result, some users may see their data output to global scratch reach up to 80 gigabytes per second. Although users will probably not see their 20-terabyte storage quotas increase, the upgrade ensures that global scratch remains flexible and paves the way for PDSF to eventually use the file system
Because of the upgrade, users will also be able to better access their temporary data files or “scratch data” from any NERSC system, not just the one that generated it. Prior to the upgrade, Global scratch typically operated at over 90 percent capacity with data input and output rates around 15 gigabytes per second.
On average, about a petabyte of scientific data flows into NERSC every month.
As new projects came onboard at an accelerated pace, the Center’s scratch storage was inundated. Systems were consistently operating at 90 percent full.
“Utilization of the existing global scratch file system is extremely high,” says Jason Hick, who leads NERSC’s Storage Systems Group. “The consolidated file system is popular because users can store, analyze, and refactor data at high bandwidth from a variety of different systems at the facility without the hassle of transferring it between systems.”
The Global Scratch file system provides unique abilities to facility users. Upgrading it provides better access and capacity to support user’s need
What are the benefits?
Because NERSC’s Storage Systems Group built Global Scratch to meet the performance needs of its fastest cluster, the facility is now positioned to meet the performance demands of all its clusters. By upgrading the consolidated Global Scratch, users will be able to continue taking advantage of an efficient and scalable storage resource for their scientific storage needs.
“We can slice data differently for various purposes without concern over bandwidth or latency,” Hick says. “Efficiency was our key metric with adopting a site-wide storage architecture. By optimizing storage for different requirements, such as large- and small-scale simulations, visualizations or analytics, we could offer our community the most efficient, scalable storage resources possible.”
In choosing an embedded storage solution, NERSC has eliminated the need for additional servers, cabling, network switches and adapters, which reduced administrative overhead by hundreds of thousands of dollars.
“NERSC was a pioneer in moving to a site-wide file system architecture and recently for moving toward a consolidated storage architecture,” Hick says. “We recognized that centralization could yield substantial storage and network performance improvements while offering us a much simpler, cost-effective approach to deploying HPC resources.” The consolidated architecture further simplifies the infrastructure required to provide scalable storage to users.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is the primary high-performance computing facility for scientific research sponsored by the U.S. Department of Energy's Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 4,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. »Learn more about computing sciences at Berkeley Lab.