As an Application Performance Specialist at NERSC, I engage with scientists to help them optimize their software applications to achieve high performance on supercomputers at NERSC.
Yun (Helen) He, Brandon Cook, Jack Deslippe, Brian Friesen, Richard Gerber, Rebecca Hartman-Baker, Alice Koniges, Thorsten Kurth, Stephen Leak, WooSun Yang, Zhengji Zhao, Eddie Baron, Peter Hauschildt, "Preparing NERSC users for Cori, a Cray XC40 system with Intel Many Integrated Cores", Concurrency and Computation: Practice and Experience, August 2017, 30, doi: 10.1002/cpe.4291
The newest NERSC supercomputer Cori is a Cray XC40 system consisting of 2,388 Intel Xeon Haswell nodes and 9,688 Intel Xeon‐Phi “Knights Landing” (KNL) nodes. Compared to the Xeon‐based clusters NERSC users are familiar with, optimal performance on Cori requires consideration of KNL mode settings; process, thread, and memory affinity; fine‐grain parallelization; vectorization; and use of the high‐bandwidth MCDRAM memory. This paper describes our efforts preparing NERSC users for KNL through the NERSC Exascale Science Application Program, Web documentation, and user training. We discuss how we configured the Cori system for usability and productivity, addressing programming concerns, batch system configurations, and default KNL cluster and memory modes. System usage data, job completion analysis, programming and running jobs issues, and a few successful user stories on KNL are presented.
Friesen, B., Baron, E., Parrent, J. T., Thomas, R., C., Branch, D., Nugent, P., Hauschildt, P. H., Foley, R. J., Wright, D. E., Pan, Y.-C., Filippenko, A. V., Clubb, K. I., Silverman, J. M., Maeda, K. Shivvers, I., Kelly, P. L., Cohen, D. P., Rest, A., Kasen, D., "Optical and ultraviolet spectroscopic analysis of SN 2011fe at late times", Monthly Notices of the Royal Astronomical Society, February 27, 2017, 467:2392-2411, doi: 10.1093/mnras/stx241
We present optical spectra of the nearby Type Ia supernova SN 2011fe at 100, 205, 311, 349 and 578 d post-maximum light, as well as an ultraviolet (UV) spectrum obtained with the Hubble Space Telescope at 360 d post-maximum light. We compare these observations with synthetic spectra produced with the radiative transfer code phoenix. The day +100 spectrum can be well fitted with models that neglect collisional and radiative data for forbidden lines. Curiously, including these data and recomputing the fit yields a quite similar spectrum, but with different combinations of lines forming some of the stronger features. At day +205 and later epochs, forbidden lines dominate much of the optical spectrum formation; however, our results indicate that recombination, not collisional excitation, is the most influential physical process driving spectrum formation at these late times. Consequently, our synthetic optical and UV spectra at all epochs presented here are formed almost exclusively through recombination-driven fluorescence. Furthermore, our models suggest that the UV spectrum even as late as day +360 is optically thick and consists of permitted lines from several iron-peak species. These results indicate that the transition to the ‘nebular’ phase in Type Ia supernovae is complex and highly wavelength dependent.
Friesen, B., Almgren, A., Lukić, Z., Weber, G., Morozov, D., Day, M., "In situ and in-transit analysis of cosmological simulations", Computational Astrophysics and Cosmology, edited by Simon Portegies Zwart, August 26, 2016, 3:1-18, LBNL LBNL-1006104, doi: 10.1186/s40668-016-0017-2
Modern cosmological simulations have reached the trillion-element scale, rendering data storage and subsequent analysis formidable tasks. To address this circumstance, we present a new MPI-parallel approach for analysis of simulation data while the simulation runs, as an alternative to the traditional workflow consisting of periodically saving large data sets to disk for subsequent `offline' analysis. We demonstrate this approach in the compressible gasdynamics/N-body code Nyx, a hybrid MPI+OpenMP code based on the BoxLib framework, used for large-scale cosmological simulations. We have enabled on-the-fly workflows in two different ways: one is a straightforward approach consisting of all MPI processes periodically halting the main simulation and analyzing each component of data that they own ('in situ'). The other consists of partitioning processes into disjoint MPI groups, with one performing the simulation and periodically sending data to the other 'sidecar' group, which post-processes it while the simulation continues ('in-transit'). The two groups execute their tasks asynchronously, stopping only to synchronize when a new set of simulation data needs to be analyzed. For both the in situ and in-transit approaches, we experiment with two different analysis suites with distinct performance behavior: one which finds dark matter halos in the simulation using merge trees to calculate the mass contained within iso-density contours, and another which calculates probability distribution functions and power spectra of various fields in the simulation. Both are common analysis tasks for cosmology, and both result in summary statistics significantly smaller than the original data set. We study the behavior of each type of analysis in each workflow in order to determine the optimal configuration for the different data analysis algorithms.
Parrent, J. T., Howell, D. A., Fesen, R. A., Parker, S., Bianco, F. B., Dilday, B., Sand, D., Valenti, S., Vinkó, J., Berlind, P., Challis, P., Milisavljevic, D., Sanders, N., Marion, G. H., Wheeler, J. C., Brown, P., Calkins, M. L., Friesen, B., Kirshner, R., Pritchard, T., Quimby, R., Roming, P., "Comparative analysis of SN 2012dn optical spectra: days -14 to +114", Monthly Notices of the Royal Astronomical Society, January 29, 2016, 457:3702-3723, doi: 10.1093/mnras/stw239
SN 2012dn is a super-Chandrasekhar mass candidate in a purportedly normal spiral (SAcd) galaxy, and poses a challenge for theories of type Ia supernova diversity. Here we utilize the fast and highly parametrized spectrum synthesis tool, SYNAPPS, to estimate relative expansion velocities of species inferred from optical spectra obtained with six facilities. As with previous studies of normal SN Ia, we find that both unburned carbon and intermediate-mass elements are spatially coincident within the ejecta near and below 14 000 km s−1. Although the upper limit on SN 2012dn's peak luminosity is comparable to some of the most luminous normal SN Ia, we find a progenitor mass exceeding ∼1.6 M⊙ is not strongly favoured by leading merger models since these models do not accurately predict spectroscopic observations of SN 2012dn and more normal events. In addition, a comparison of light curves and host-galaxy masses for a sample of literature and Palomar Transient Factory SN Ia reveals a diverse distribution of SN Ia subtypes where carbon-rich material remains unburned in some instances. Such events include SN 1991T, 1997br, and 1999aa where trace signatures of C III at optical wavelengths are presumably detected.
Baron, E., Hoeflich, P., Friesen, B., Sullivan, M., Hsiao, E., Ellis, R. S., Gal-Yam, A., Howell, D. A., Nugent, P. E., Dominguez, I., Krisciunas, K., Phillips, M. M., Suntzeff, N., Wang, L., and Thomas, R. C., "Spectral models for early time SN 2011fe observations", Monthly Notices of the Royal Astronomical Society, 2015, 454:2549, doi: 10.1093/mnras/stv1951
We use observed UV through near-IR spectra to examine whether SN 2011fe can be understood in the framework of Branch-normal Type Ia supernovae (SNe Ia) and to examine its individual peculiarities. As a benchmark, we use a delayed-detonation model with a progenitor metallicity of Z⊙/20. We study the sensitivity of features to variations in progenitor metallicity, the outer density profile, and the distribution of radioactive nickel. The effect of metallicity variations in the progenitor have a relatively small effect on the synthetic spectra. We also find that the abundance stratification of SN 2011fe resembles closely that of a delayed-detonation model with a transition density that has been fit to other Branch-normal SNe Ia. At early times, the model photosphere is formed in material with velocities that are too high, indicating that the photosphere recedes too slowly or that SN 2011fe has a lower specific energy in the outer ≈0.1 M⊙ than does the model. We discuss several explanations for the discrepancies. Finally, we examine variations in both the spectral energy distribution and in the colours due to variations in the progenitor metallicity, which suggests that colours are only weak indicators for the progenitor metallicity, in the particular explosion model that we have studied. We do find that the flux in the U band is significantly higher at maximum light in the solar metallicity model than in the lower metallicity model and the lower metallicity model much better matches the observed spectrum.
Friesen, B., Baron, E., Wisniewski, J. P., Parrent, J. T., Thomas, R. C., Miller, Timothy R., and Marion, G. H., "Near-infrared Line Identification in Type Ia Supernovae during the Transitional Phase", The Astrophysical Journal, 2014, 792:120, doi: 10.1088/0004-637X/792/2/120
We present near-infrared synthetic spectra of a delayed-detonation hydrodynamical model and compare them to observed spectra of four normal Type Ia supernovae ranging from day +56.5 to day +85. This is the epoch during which supernovae are believed to be undergoing the transition from the photospheric phase, where spectra are characterized by line scattering above an optically thick photosphere, to the nebular phase, where spectra consist of optically thin emission from forbidden lines. We find that most spectral features in the near-infrared can be accounted for by permitted lines of Fe II and Co II. In addition, we find that [Ni II] fits the emission feature near 1.98 μm, suggesting that a substantial mass of 58Ni exists near the center of the ejecta in these objects, arising from nuclear burning at high density.
Parrent, J. T., Friesen, B., Parthasarathy, M., "A Review of Type Ia Supernova Spectra", Astrophysics and Space Science, 2014, 351:1-52, doi: 10.1007/s10509-014-1830-1
SN 2011fe was the nearest and best-observed type Ia supernova in a generation, and brought previous incomplete datasets into sharp contrast with the detailed new data. In retrospect, documenting spectroscopic behaviors of type Ia supernovae has been more often limited by sparse and incomplete temporal sampling than by consequences of signal-to-noise ratios, telluric features, or small sample sizes. As a result, type Ia supernovae have been primarily studied insofar as parameters discretized by relative epochs and incomplete temporal snapshots near maximum light. Here we discuss a necessary next step toward consistently modeling and directly measuring spectroscopic observables of type Ia supernova spectra. In addition, we analyze current spectroscopic data in the parameter space defined by empirical metrics, which will be relevant even after progenitors are observed and detailed models are refined.
Friesen, B., Baron, E., Branch, D., Chen, B., Parrent, J., Thomas, R. C., "Supernova Resonance-scattering Line Profiles in the Absence of a Photosphere", The Astrophysical Journal Supplements Series, 2012, 203:1, doi: 10.1088/0067-0049/203/1/12
In supernova (SN) spectroscopy relatively little attention has been given to the properties of optically thick spectral lines in epochs following the photosphere's recession. Most treatments and analyses of post-photospheric optical spectra of SNe assume that forbidden-line emission comprises most if not all spectral features. However, evidence exists that suggests that some spectra exhibit line profiles formed via optically thick resonance-scattering even months or years after the SN explosion. To explore this possibility, we present a geometrical approach to SN spectrum formation based on the "Elementary Supernova" model, wherein we investigate the characteristics of resonance-scattering in optically thick lines while replacing the photosphere with a transparent central core emitting non-blackbody continuum radiation, akin to the optical continuum provided by decaying 56Co formed during the explosion. We develop the mathematical framework necessary for solving the radiative transfer equation under these conditions and calculate spectra for both isolated and blended lines. Our comparisons with analogous results from the Elementary Supernova code SYNOW reveal several marked differences in line formation. Most notably, resonance lines in these conditions form P Cygni-like profiles, but the emission peaks and absorption troughs shift redward and blueward, respectively, from the line's rest wavelength by a significant amount, despite the spherically symmetric distribution of the line optical depth in the ejecta. These properties and others that we find in this work could lead to misidentification of lines or misattribution of properties of line-forming material at post-photospheric times in SN optical spectra.
Parrent, J. T., Howell, D. A., Friesen, B., Thomas, R. C., Fesen, R. A., Milisavljevic, D., Bianco, F. B., Dilday, B., Nugent, P., Baron, E., Arcavi, I., Ben-Ami, S., Bersier, D., Bildsten, L., Bloom, J., Cao, Y., Cenko, S. B., Filippenko, A. V., Gal-Yam, A., Kasliwal, M. M., Konidaris, N., Kulkarni, S. R., Law, N. M., Levitan, D., Maguire, K., Mazzali, P. A., Ofek, E. O., Pan, Y., Polishook, D., Poznanski, D., Quimby, R. M., Silverman, J. M., Sternberg, A., Sullivan, M., Walker, E. S., Xu, Dong, Buton, C., Pereira, R., "Analysis of the Early-time Optical Spectra of SN 2011fe in M101", The Astrophysical Journal Letters, 2012, 752, doi: 10.1088/2041-8205/752/2/L26
The nearby Type Ia supernova (SN Ia) SN 2011fe in M101 (cz = 241 km s–1) provides a unique opportunity to study the early evolution of a "normal" SN Ia, its compositional structure, and its elusive progenitor system. We present 18 high signal-to-noise spectra of SN 2011fe during its first month beginning 1.2 days post-explosion and with an average cadence of 1.8 days. This gives a clear picture of how various line-forming species are distributed within the outer layers of the ejecta, including that of unburned material (C+O). We follow the evolution of C II absorption features until they diminish near maximum light, showing overlapping regions of burned and unburned material between ejection velocities of 10,000 and 16,000 km s–1. This supports the notion that incomplete burning, in addition to progenitor scenarios, is a relevant source of spectroscopic diversity among SNe Ia. The observed evolution of the highly Doppler-shifted O I λ7774 absorption features detected within 5 days post-explosion indicates the presence of O I with expansion velocities from 11,500 to 21,000 km s–1. The fact that some O I is present above C II suggests that SN 2011fe may have had an appreciable amount of unburned oxygen within the outer layers of the ejecta.
C. Yang, R. Gayatri, T. Kurth, P. Basu, Z. Ronaghi, A. Adetokunbo, B. Friesen, B.
Cook, D. Doerfler, L. Oliker, J. Deslippe, and S. Williams,
"An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability",
IEEE International Workshop on Performance, Portability and Productivity in HPC (P3HPC'18),
B. Austin, C. Daley, D. Doerfler, J. Deslippe, B. Cook, B. Friesen, T. Kurth, C. Yang,
and N. Wright,
"A Metric for Evaluating Supercomputer Performance in the Era of Extreme Heterogeneity",
9th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS'18),
C. Yang, B. Friesen, T. Kurth, B. Cook, S. Williams, "Toward Automated Application Profiling on Cray Systems", Cray User Group conference (CUG'18), May 2018,
Zingale M, Almgren AS, Barrios Sazo MG, Beckner VE, Bell JB, Friesen B, Jacobs AM, Katz MP, Malone CM, Nonaka AJ, Willcox DE, Zhang W, "Meeting the Challenges of Modeling Astrophysical Thermonuclear Explosions: Castro, Maestro, and the AMReX Astrophysics Suite", 2018, doi: 10.1088/1742-6596/1031/1/012024
We describe the AMReX suite of astrophysics codes and their application to modeling problems in stellar astrophysics. Maestro is tuned to efficiently model subsonic convective flows while Castro models the highly compressible flows associated with stellar explosions. Both are built on the block-structured adaptive mesh refinement library AMReX. Together, these codes enable a thorough investigation of stellar phenomena, including Type Ia supernovae and X-ray bursts. We describe these science applications and the approach we are taking to make these codes performant on current and future many-core and GPU-based architectures.
Wahid Bhimji, Debbie Bard, Kaylan Burleigh, Chris Daley, Steve Farrell, Markus Fasel, Brian Friesen, Lisa Gerhardt, Jialin Liu, Peter Nugent, Dave Paul, Jeff Porter, Vakho Tsulaia, "Extreme I/O on HPC for HEP using the Burst Buffer at NERSC", Journal of Physics: Conference Series, December 1, 2017, 898:082015,
Damian Rouson, Ethan D Gutmann, Alessandro Fanfarillo, Brian Friesen, "Performance portability of an intermediate-complexity atmospheric research model in coarray Fortran", November 2017, doi: 10.1145/3144779.3169104
We examine the scalability and performance of an open-source, coarray Fortran (CAF) mini-application (mini-app) that implements the parallel, numerical algorithms that dominate the execution of The Intermediate Complexity Atmospheric Research (ICAR)  model developed at the the National Center for Atmospheric Research (NCAR). The Fortran 2008 mini-app includes one Fortran 2008 implementation of a collective subroutine defined in the Committee Draft of the upcoming Fortran 2018 standard. The ability of CAF to run atop various communication layers and the increasing CAF compiler availability facilitated evaluating several compilers, runtime libraries and hardware platforms. Results are presented for the GNU and Cray compilers, each of which offers different parallel runtime libraries employing one or more communication layers, including MPI, OpenSHMEM, and proprietary alternatives. We study performance on multi- and many-core processors in distributed memory. The results show promising scaling across a range of hardware, compiler, and runtime choices on up to ~100,000 cores.
B Friesen, MMA Patwary, B Austin, N Satish, Z Slepian, N Sundaram, D Bard, DJ Eisenstein, J Deslippe, P Dubey, Prabhat, "Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies", November 2017, doi: 10.1145/3126908.3126927
The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to billions of galaxies. We present Galactos, a high-performance implementation of a novel, O(N2) algorithm that uses a load-balanced k-d tree and spherical harmonic expansions to compute the anisotropic 3PCF. Our implementation is optimized for the Intel Xeon Phi architecture, exploiting SIMD parallelism, instruction and thread concurrency, and significant L1 and L2 cache reuse, reaching 39% of peak performance on a single node. Galactos scales to the full Cori system, achieving 9.8 PF (peak) and 5.06 PF (sustained) across 9636 nodes, making the 3PCF easily computable for all galaxies in the observable universe.
Thorsten Kurth, William Arndt, Taylor Barnes, Brandon Cook, Jack Deslippe, Doug Doerfler, Brian Friesen, Yun (Helen) He, Tuomas Koskela, Mathieu Lobet, Tareq Malas, Leonid Oliker, Andrey Ovsyannikov, Samual Williams, Woo-Sun Yang, Zhengji Zhao, "Analyzing Performance of Selected NESAP Applications on the Cori HPC System", High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science, Volume 10524, June 22, 2017,
Yun (Helen) He, Brandon Cook, Jack Deslippe, Brian Friesen, Richard Gerber, Rebecca Hartman-Baker, Alice Koniges, Thorsten Kurth, Stephen Leak, WooSun Yang, Zhengji Zhao, Eddie Baron, Peter Hauschildt, "Preparing NERSC users for Cori, a Cray XC40 system with Intel Many Integrated Cores", Cray User Group 2017, Redmond, WA. Best Paper First Runner-Up., May 12, 2017,
- Download File: pap161s2-file1.pdf (pdf: 2.8 MB)
Koskela TS, Deslippe J, Friesen B, Raman K, "Fusion PIC code performance analysis on the Cori KNL system", May 2017,
We study the attainable performance of Particle-In-Cell codes on the Cori KNL system by analyzing a miniature particle push application based on the fusion PIC code XGC1. We start from the most basic building blocks of a PIC code and build up the complexity to identify the kernels that cost the most in performance and focus optimization efforts there. Particle push kernels operate at high AI and are not likely to be memory bandwidth or even cache bandwidth bound on KNL. Therefore, we see only minor benefits from the high bandwidth memory available on KNL, and achieving good vectorization is shown to be the most beneficial optimization path with theoretical yield of up to 8x speedup on KNL. In practice we are able to obtain up to a 4x gain from vectorization due to limitations set by the data layout and memory latency.
T. Barnes, B. Cook, J. Deslippe, D. Doerfler, B. Friesen, Y.H. He, T. Kurth, T. Koskela, M. Lobet, T. Malas, L. Oliker, A. Ovsyannikov, A. Sarje, J.-L. Vay, H. Vincenti, S. Williams, P. Carrier, N. Wichmann, M. Wagner, P. Kent, C. Kerr, J. Dennis, "Evaluating and Optimizing the NERSC Workload on Knights Landing", PMBS 2016: 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Supercomputing Conference, Salt Lake City, UT, USA, IEEE, November 13, 2016, LBNL LBNL-1006681, doi: 10.1109/PMBS.2016.010
W. Bhimji, D. Bard, M. Romanus, D. Paul, A. Ovsyannikov, B. Friesen, M. Bryson, J. Correa, G. K. Lockwood, V. Tsulaia, S. Byna, S. Farrell, D. Gursoy, C. Daley, V. Beckner, B. Van Straalen, D. Trebotich, C. Tull, G. Weber, N. J. Wright, K. Antypas, Prabhat, "Accelerating Science with the NERSC Burst Buffer Early User Program", Cray User Group, May 11, 2016, LBNL LBNL-1005736,
NVRAM-based Burst Buffers are an important part of the emerging HPC storage landscape. The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory recently installed one of the first Burst Buffer systems as part of its new Cori supercomputer, collaborating with Cray on the development of the DataWarp software. NERSC has a diverse user base comprised of over 6500 users in 700 different projects spanning a wide variety of scientific computing applications. The use-cases of the Burst Buffer at NERSC are therefore also considerable and diverse. We describe here performance measurements and lessons learned from the Burst Buffer Early User Program at NERSC, which selected a number of research projects to gain early access to the Burst Buffer and exercise its capability to enable new scientific advancements. To the best of our knowledge this is the first time a Burst Buffer has been stressed at scale by diverse, real user workloads and therefore these lessons will be of considerable benefit to shaping the developing use of Burst Buffers at HPC centers.