As data volumes and data complexity have grown at experimental and observational facilities funded by DOE, there has been an increase in partnerships between NERSC and these facilities. This page highlights recent examples where NERSC's scalable computing and data analytic capabilities have been leveraged in coordination with large scale experimental and observational science. It also invites input from facilities staff and users for new work in this area through a form at the bottom of the page
NERSC seeks opportunities where HPC can accelerate discovery and/or expand the scope of experimental and observational science. Inter-facility collaborations are numerous but the table below records some recent inter-facility science of note. HPC-enabled data analysis takes different forms but generally involves the orchestration of bandwidth, concurrency, and automation to solve scaling challenges in experimental and observational data science.
|Facility||Scientific Workload||HPC Value Proposition||URL|
|Advanced Light Source (ALS)||Varied tomographic and scattering data processing methods||User automation and beamline data portals||http://spot.nersc.gov/|
|Joint Genome Institute (JGI)||Sequencing and annotation of genomic data||Scalable HPC operations and algorithms||https://docs.nersc.gov/science-partners/jgi/|
|Linear Coherent Light Source (LCLS)||Real-time analysis of nanoscopic systems||Time to solution in processing data bursts||https://cs.lbl.gov/news-media/news/2014/photon-speedway-puts-big-data-in-the-fast-lane/|
|Bay Area Cryo-EM (BACEM)||Visualizing macro-molecular machines||Scalable workflow automation||http://www.nersc.gov/news-publications/nersc-news/science-news/2016/new-technologies-are-fueling-cryo-ems-renaissance/|
What if it was faster?
History shows that profound scientific discoveries are often found not at the middle point of disciplinary taxonomies but at their interfaces. In the field of astronomy, identifying super nova as "standard candles" to map the cosmos came about in part by identifying super nova as "events" and automating the related scientific workflows via HPC. This is where NERSC and ESnet's combined capabilities have much to offer in allowing scientists to rethink time-to-solution in their workflows. NERSC partners with ESnet (also a user facility) to bridge the geography between detector and computer (report). Bandwidth matters because detectors and computers needed to process their data are not always in the same place.
NERSC's capabilities are at the leading edge of computing technology. Expanding what is computationally feasible changes what is imaginable within the scope of a research project. Inter-facility science is all about such imagination. NERSC's HPC can speed up facility workflows through automation in data logistics, data processing, and concurrent simulation. Providing the right computational intensity at the right time is a project-dependent objective, and a moving target, but we encourage your input via the form below regarding inter-facility science agendas. We also conduct requirements reviews with more details given here.
What does NERSC provide?
- High Performance Computing (HPC): NERSC allocates billions of CPU hours (in units of NERSC hours) in reliable and scalable ways to provide computational intensities that shrink processing times by orders of magnitude. A startup allocation at NERSC is 50K NERSC hours. Multiple allocation programs provide access to NERSC computing resources.
- Data Storage: NERSC provides disk and tape storage to its users. A default storage quota at NERSC provides for millions of files in GB to TB. Projects requiring larger quotas or long-term data planning can use sponsored storage to fulfill long-term data campaigns.
- Data Transfer Nodes: An array of DTN's at NERSC provide high bandwidth data migration options. With help from ESnet, NERSC makes it easy to move data into and out of NERSC storage.
- Advanced IO Technologies: NERSC’s burst buffer provides an NVRAM IO solution with performs between that of memory and disk. The burst buffer can accelerate IO in a variety ways described here.
- Science Gateway Nodes: NERSC provides services to make data on disk or tape available via the web. Science gateways provide access to data for scientific collaboration as well as programmatic methods to orchestrate workflows through web portals.
- Software Containers: NERSC has developed shifter which allows flexible software environment control through docker-like containers for software portability.
- Software Defined Networking (SDN): NERSC has upgraded the WAN connectivity of Cori to include software controls on how data is moved in and out of batch nodes over the WAN.
- Scalable software performance and interactivity: NERSC has worked to make software such as python perform well on HPC. Jupyter notebooks provide a modern interactive interface HPC-based python.
- Resources for Scientific Workflows: NERSC provides software tools and dedicated nodes to support scientific workflows. These include real-time and high throughput queuing, database support, new analytics libraries, as well as two new groups, Data Analytics Services (DAS) and Data Science Engagement Group (DSEG) whose staff support the needs of data intensive science. .
What are the challenges and opportunities?
At NERSC we find users with a variety of scaling challenges in data science:
- Detectors that outpace available computing, storage, and networking capabilities.
- File formats, workflows, and analysis with efficiencies that drop as data and data rates increase.
- Needs for more efficient or fully new algorithms and mathematical methods.
- Predicting time to solution for data analysis to improve workflow efficiency.
- What are yours? Use the form below if you would like to provide input or start a discussion about inter-facility science.
Getting started running at NERSC from an experimental facility perspective involves identifying opportunities. It's possible that your facility has an existing NERSC allocation. Check with your program manager. To identify opportunities, ask: What part of your work could HPC potentially speed-up? What problems could HPC solve? What new opportunities could it open?
If you work at a facility which is upgrading its detectors and recognizing new bandwidth and data processing bottlenecks, if you want to employ HPC simulation as a tool to guide your next experiment, or if you have a software solution for related HPC workflows, NERSC welcomes your input on how to make HPC more impactful. Getting started at facilities level involves summing up the computing, storage, and networking requirements for what is proposed for your facility. Feel free to contact NERSC's Data Science Engagement Group (DSEG)