As data volumes and data complexity have grown at experimental and observational facilities funded by DOE, there has been an increase in partnerships between NERSC and these facilities. This page highlights recent examples where NERSC's scalable computing and data analytic capabilities have been leveraged in coordination with large scale experimental and observational science.
NERSC seeks opportunities where HPC can accelerate discovery and/or expand the scope of experimental and observational science. It also invites input from facilities staff and users through a special interest group (SIG) as part of NUG.
What if it was faster?
History shows that profound scientific discoveries are often found not at the middle point of disciplinary taxonomies but at their interfaces. In the field of astronomy, identifying super nova as "standard candles" to map the cosmos came about in part by identifying super nova as "events" and automating the related scientific workflows via HPC. This is where NERSC and ESnet's combined capabilities have much to offer in allowing scientists to rethink time-to-solution in their workflows. NERSC partners with ESnet (also a user facility) to bridge the geography between detector and computer (report). Bandwidth matters because detectors and computers needed to process their data are not always in the same place. NERSC capabilities in modeling, simulation, and data analysis can provide prompt answers to guide experimental science teams.
NERSC's capabilities are at the leading edge of computing technology. Expanding what is computationally feasible can change what is imaginable within the scope of a research project. Leveraging inter-facility science can require such imagination. NERSC's HPC can speed up facility workflows through automation in data logistics, data processing, and concurrent simulation. Providing the right computational intensity at the right time is a project-dependent objective, and a moving target, but we encourage your input via the form below regarding inter-facility science agendas. We also conduct requirements reviews with more details given here.
What resources does NERSC provide for inter-facility science?
- High Performance Computing (HPC): NERSC allocates billions of CPU hours (in units of NERSC hours) in reliable and scalable ways to provide computational intensities that shrink processing times by orders of magnitude. A startup allocation at NERSC is 50K NERSC hours. Multiple allocation programs provide access to NERSC computing resources.
- Data Storage: NERSC provides disk and tape storage to its users. A default storage quota at NERSC provides for millions of files in GB to TB. Projects requiring larger quotas or long-term data planning can use sponsored storage to fulfill long-term data campaigns.
- Data Transfer Nodes: An array of DTN's at NERSC provide high bandwidth data migration options. With help from ESnet, NERSC makes it easy to move data into and out of NERSC storage.
- Advanced IO Technologies: NERSC’s burst buffer provides an NVRAM IO solution with performs between that of memory and disk. The burst buffer can accelerate IO in a variety ways described here.
- Science Gateway Nodes: NERSC provides services to make data on disk or tape available via the web. Science gateways provide access to data for scientific collaboration as well as programmatic methods to orchestrate workflows through web portals.
- Software Containers: NERSC has developed shifter which allows flexible software environment control through docker-like containers for software portability.
- Software Defined Networking (SDN): NERSC has upgraded the WAN connectivity of Cori to include software controls on how data is moved in and out of batch nodes over the WAN.
- Scalable software performance and interactivity: NERSC has worked to make software such as python perform well on HPC. Jupyter notebooks provide a modern interactive interface HPC-based python.
- Resources for Scientific Workflows: NERSC provides software tools and dedicated nodes to support scientific workflows. These include real-time and high throughput queuing, database support, new analytics libraries, as well as two new groups, Data Analytics Services (DAS) and Data Science Engagement Group (DSEG) whose staff support the needs of data intensive science. .
What are the challenges and opportunities for facilities?
At NERSC we find users with a variety of scaling challenges in inter-facility data science:
- Detectors that outpace available computing, storage, and networking capabilities.
- File formats, workflows, and analysis with efficiencies that drop as data and data rates increase.
- Needs for more efficient or fully new algorithms and mathematical methods.
- Predicting time to solution for data analysis to improve workflow efficiency.
- What are yours? Use the form below if you would like to provide input or start a discussion about inter-facility science.
Getting started running at NERSC from an experimental facility perspective involves identifying opportunities. It's possible that your facility has an existing NERSC allocation. When upgrading detectors and recognizing new bandwidth and data analytic bottlenecks consider HPC as part of the solution. Timely HPC simulation also can be a tool to guide your next experiment quickly. Getting started at facilities level involves summing up the computing, storage, and networking requirements for what is proposed for your facility. Feel free to contact NERSC's Data Science Engagement Group (DSEG)