NUG 2013 User Day: Trends, Discovery, and Innovation in High Performance Computing

Wednesday, Feb. 13

Berkeley Lab Building 50 Auditorium

Live streaming: http://hosting.epresence.tv/LBL/1.aspx

8:45 - Welcome: Kathy Yelick, Berkeley Lab Associate Director for Computing Sciences

Trends

9:00 - The Future of High Performance Scientific Computing, Kathy Yelick, Berkeley Lab Associate Director for Computing Sciences

9:45 - NERSC Today and over the Next Ten Years, Sudip Dosanjh, NERSC Director

10:30 - The 2013 NERSC Achievement Awards

10:45 - Break

Discovery

11:00 - Discovery of the Higgs Boson and the role of LBNL and World-Wide Computing , Ian Hinchliffe, Berkeley Lab

11:30 - Discovery of the θ₁₃Weak Mixing Angle at Daya Bay using NERSC & ESnet, Craig Tull, Berkeley Lab

12:00 - Big Bang, Big Data, Big Iron - Planck Satellite Data Analysis At NERSC, Julian Borrill, Berkelely Lab

12:30 - Lunch (on your own)

Innovation

1:30 -The Materials Project: Combining density functional theory calculations with supercomputing centers for new materials discovery, Anubhav Jain, Berkeley Lab

2:00 - Python in a Parallel Environment, David Grote, Berkeley Lab

2:30 - OpenMSI: A Mass Spectometry Imaging Science Gateway, Ben Bowen, Berkeley Lab

3:00 - Break

3:15 - Towards an ALS Data Pipeline and Analysis Framework at NERSC, Jack Deslippe & Craig Tull, Berkeley Lab

3:45 - Large Sparse Matrix Problems in Ab-initio Nuclear Structure Calculations, Pieter Maris, Iowa State University

4:15 - iPython Notebook, Fernando Perez, UC Berkeley

Abstracts

The Materials Project: Combining density functional theory calculations with supercomputing centers for new materials discovery

Anubhav Jain
Berkeley Lab

New materials can potentially reduce the cost and improve the efficiency of solar photovoltaics, batteries, and catalysts, leading to broad societal impact. This talk describes a computational approach to materials design in which density functional theory (DFT) calculations are performed over very large computing resources. Because DFT calculations accurately predict many properties of new materials, this approach can screen tens of thousands of potential materials in short time frames.

We present some major software development efforts that generated over 10 million CPU-hours worth of materials information in the span of a few months using NERSC clusters. For the effort, we designed a custom workflow software using Python and MongoDB. This represents one of the largest materials data sets ever computed, and the results are compiled on a public web site (The Materials Project) with over 3,000 registered users that are designing new materials with computed information.

Finally, we describe future efforts in which algorithms might "self-learn" which chemical spaces are the most promising for investigation based on the results of previous computations, with application to solar water splitting materials.

Python in a parallel environment

David Grote
Berkeley Lab

Python provides a high level interface that is both interactive and interpreted. This allows for rapid code development and a highly flexible interface, giving significant advantages. However, with these advantages come some drawbacks - for instance, the dependency on the sizable python executable and standard scripts and a more complicated build scheme. In a serial environment, these drawbacks are minimal. Python installs on all available machines, has tools for simplifying the build, and starts up rapidly. In a parallel environment however, the drawbacks can be more onerous. For example, during start up, python needs to access a significant number of files, such as standard python scripts and any user input files. Without special handling, a large number of processors accessing the same files on disk could cause significant contention leading to long start up times. I will discuss how we have handled this issue. Additionally, I will discuss our general experience using Python in a parallel environment, including special needs for building and maintaining interactive capability.

OpenMSI: a Mass Spectrometry Imaging Science Gateway

Ben Bowen
Berkeley Lab

Metabolite and protein analysis is vital to understanding the phenotype of a biological sample. Specifically, metabolite levels dynamically vary in response to energy demands, diet, disease, and environment. Typical analysis of metabolite levels begins with homogenization of a sample and the spatial relationships of the biological material are lost. Mass spectrometry imaging of metabolite and protein levels overcomes this limitation by directly measuring the relative abundance of biomolecules and mapping their position. An "image" constitutes a relative abundance map for a given biomolecule, and large-numbers of molecules can be imaged simultaneously. While this technique is certainly revolutionary, the in-depth analysis of these datasets often provides a barrier to many researchers. OpenMSI provides a gateway for the management and storage of these datafiles (where each file is the size of a typical hard drive), the visualization of the hyper-dimensional contents of the data, and the statistical analysis of the data.

The ALS Data Pipeline and Analysis Framework at NERSC

Jack Deslippe
Berkeley Lab

The Advanced Light Source at Lawrence Berkeley National Lab is poised to meet unprecedented computational challenges in the next several years. The data rate at the facility is doubling each year and expected to reach 1 PB in 2013. The computational challenges come in the form of the storage, analysis and simulation of x-ray data. An effort is underway to transition these operations from small local machines to the NERSC facility using NERSC's global file systems, mass tape storage systems, and HPC resources. In this talk we will guide you through the pipeline under development for data as well as discuss the targeted simulation tools currently under testing. We'll discuss the development of the framework that enables beamline scientists across the ALS to independently (and in parallel) create custom NERSC powered-tools for their beamline.

Large Sparse Matrix Problems in Ab-initio Nuclear Structure Calculations

Pieter Maris
Iowa State University

The atomic nucleus is a self-bound system of strongly interacting nucleons. In Configuration Interaction (CI) calculations, the nuclear wavefunction is expanded in Slater determinants of single-nucleon wavefunctions, and the many-body Schrodinger equation becomes a large sparse matrix eigenvalue problem. The challenge is to reach numerical convergence to within quantified numerical uncertainties for physical observables using finite truncations of the infinite-dimensional basis space. In practice that means that we have to obtain the lowest eigenvalues of sparse matrices with dimensions of the order of a billion or more, and with trillions of nonzero matrix elements. I discuss strategies for constructing and solving these large sparse matrices on current multicore computer architectures. Several of these strategies have been implemented in the code MFDn, a hybrid MPI/OpenMP Fortran code for ab-initio nuclear structure calculations. In particular I will show how a topology-aware mapping of the matrix on the available hardware can drastically reduce the communication overhead time. Communication overheads are further reduced by fully overlapping expensive communication operations with symmetric SpMV computations using a novel technique. With these recent improvements, MFDn now scales well on Hopper and other Cray platforms. Finally, I will present selected new results for ground state energies, excitation energies, and other observables for light nuclei using realistic 2- and 3-body forces, and demonstrate that collective phenomena such as clustering and rotational band structures can emerge from these microscopic calculations.

IPython: tools for the entire lifecycle of research computing

Fernando Perez
UC Berkeley

IPython started as a better interactive Python interpreter in 2001, but over the last decade it has grown into a rich and powerful set of interlocking tools aimed at enabling an efficient, fluid and productive workflow in the typical use cases encountered by scientists in everyday research.

Today, IPython consists of a kernel executing user code and capable of communicating with a variety of clients, using ZeroMQ for networking via a well-documented protocol. This enables IPython to support, from a single codebase, a rich variety of usage scenarios through user-facing applications and an API for embedding.

In this talk we will show how IPython supports all stages in the lifecycle of a scientific idea: individual exploration, collaborative development, large-scale production using parallel resources, publication and education. In particular, the IPython Notebook supports multiuser collaboration and allows scientists to share their work in an open document format that is a true "executable paper": notebooks can be version controlled, exported to HTML or PDF for publication, and used for teaching. We will demonstrate the key features of the system,including recent examples of scientific publications made with the notebook.

Videos

Introduction and The Future of High Performance Computing, Kathy Yelick, Berkeley Lab Associate Director for Computing Sciences
NERSC Today and Over the Next Ten Years, Sudip Dosangh, NERSC Director
Discovery of the Higgs Boson and the Role of Berkeley Lab and World-Wide Computing, Ian Hinchliffe, Berkeley Lab
Discovery of the Theta-13 Neutrino Mixing Angle Using NERSC and ESnet, Craig Tull, Berkeley Lab, US Lead of Daya Bay Offline Computing
Big Bang, Big Data, and Big Iron: Planck Satellite Data Analysis at NERSC, Julian Borrill, Berkeley Lab

Downloads

Yelick-NUG13.pdf | Adobe Acrobat PDF file
The Future of High Performance Scientific Computing, Kethy Yelick, Berkeley Lab Associate Director for Computing Science
NERSC-NUG.pdf | Adobe Acrobat PDF file
NERSC Today and over the next Ten Years, Sudip Dosanjh, NERSC Director
cet20130213-nugdayabay.pdf | Adobe Acrobat PDF file
Discovery of θ13 at Daya Bay using NERSC & ESNet, Craig Tull
GroteNUG2013.pdf | Adobe Acrobat PDF file
Python in a Parallel Environment, Ben Bowen
NERSCuserday-jain.pdf | Adobe Acrobat PDF file
The Materials Project: An application of high-throughput computing, Anubhav Jain
1302ipythoninteractivewebnersc.pdf | Adobe Acrobat PDF file
IPython: modern tools for interactive & web-enabled scientific computing, Fernando Perez, UC Berkeley
borrill-nug-2013.pdf | Adobe Acrobat PDF file
Big Bang, Big Data, Big Iron -- Planck Satellite Data Analysis at NERSC