Big Data Center
The Big Data Center (BDC) within the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory (Berkeley Lab) is focused on developing a production-level big data software stack that can be used to solve leading scientific challenges at the full scale of NERSC’s largest supercomputer, Cori. The BDC will bring together existing open source big data analytics and scientific data management software into a single software distribution. Researchers will consider all levels of the stack: starting with real capability science applications, algorithms, key computational and structural motifs, runtimes and optimized libraries.
The BDC software distribution will fill gaps in component packages’ performance and functionality in order to support running exemplar scientific applications that involve the processing of ~100TB datasets on ~100,000 cores on Cori, with a few applications targeting processing 1PB datasets at the full scale of the system. Developing, testing, and packaging the BDC software distribution occurs both at NERSC as well as at collaborating institutions, such as the University of California - Berkeley, Oxford University, the University of Montreal, and the HDF Group. The BDC will operate for 3 years, producing an improved software distribution at the end of each year, and will reach final production-ready status at the end of the center’s lifetime.
Quincey Koziol, NERSC
Victor Lee, Intel
Mike Ringenburg and Ted Slater, Cray
Intel: Nalini Kumar, Amrita Mathuriya, Lei Shao
Cray: Kristyn Maschhoff, Peter Mendrygal, Aaron Vose
IPCCs (Intel Parallel Computing Centers)
Frank Wood, Gunes Baydin (University of Oxford)
Jeffrey Regier, Jon McAuliffe (University of California, Berkeley)
Adam Rupe, Ryan James, Jim Crutchfield (University of California, Davis)
Nick Choma, Joan Bruna (New York University)
Grzegorz Muszynski, Vitaliy Kurlin (University of Liverpool)