Nicholas Wright

Nicholas (Nick) James Wright , Ph.D.

Group Lead

Advanced Technologies Group

NJWright@lbl.gov

Phone: (510) 486-5730

Fax: (510) 486-6459

1 Cyclotron Road

M/S 59R3103

Berkeley, CA 94720 us

Biographical Sketch

Nick Wright is the Advanced Technologies Group lead and the NERSC chief architect. He focuses on evaluating future technologies for potential application in scientific computing. He led the effort to optimize the architecture of the Perlmutter machine, the first NERSC platform designed to meet the needs of both large-scale simulation and data analysis from experimental facilities. Before moving to NERSC, he was a member of the Performance Modeling and Characterization (PMaC) group at the San Diego Supercomputing Center. He earned both his undergraduate and doctoral degrees in chemistry at the University of Durham in England.

Journal Articles

Nan Ding, Pieter Maris, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, LeAnn Lindsey, Christopher Daley, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, Samuel Williams, "Evaluating the potential of disaggregated memory systems for HPC applications", Concurrency and Computation: Practice and Experience, May 2024,

C. S. Daley, D. Ghoshal, G. K. Lockwood, S. Dosanjh, L. Ramakrishnan, N. J. Wright, "Performance characterization of scientific workflows for the optimal use of Burst Buffers", Future Generation Computer Systems, December 28, 2017, doi: 10.1016/j.future.2017.12.022

Scientific discoveries are increasingly dependent upon the analysis of large volumes of data from observations and simulations of complex phenomena. Scientists compose the complex analyses as workflows and execute them on large-scale HPC systems. The workflow structures are in contrast with monolithic single simulations that have often been the primary use case on HPC systems. Simultaneously, new storage paradigms such as Burst Buffers are becoming available on HPC platforms. In this paper, we analyze the performance characteristics of a Burst Buffer and two representative scientific workflows with the aim of optimizing the usage of a Burst Buffer, extending our previous analyses (Daley et al., 2016). Our key contributions are a). developing a performance analysis methodology pertinent to Burst Buffers, b). improving the use of a Burst Buffer in workflows with bandwidth-sensitive and metadata-sensitive I/O workloads, c). highlighting the key data management challenges when incorporating a Burst Buffer in the studied scientific workflows.

Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K Lockwood, Vakho Tsulaia, others, "Accelerating science with the NERSC burst buffer early user program", Cray User Group, May 11, 2016, LBNL LBNL-1005736,

Download File: works-1.bib (bib: 9 KB)

NVRAM-based Burst Buffers are an important part of the emerging HPC storage landscape. The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory recently installed one of the first Burst Buffer systems as part of its new Cori supercomputer, collaborating with Cray on the development of the DataWarp software. NERSC has a diverse user base comprised of over 6500 users in 700 different projects spanning a wide variety of scientific computing applications. The use-cases of the Burst Buffer at NERSC are therefore also considerable and diverse. We describe here performance measurements and lessons learned from the Burst Buffer Early User Program at NERSC, which selected a number of research projects to gain early access to the Burst Buffer and exercise its capability to enable new scientific advancements. To the best of our knowledge this is the first time a Burst Buffer has been stressed at scale by diverse, real user workloads and therefore these lessons will be of considerable benefit to shaping the developing use of Burst Buffers at HPC centers.

Lavanya Ramakrishnan, Richard Canon, Muriki, Sakrejda, Nicholas J. Wright, "Evaluating Interconnect and Virtualization Performance forHigh Performance Computing", SIGMETRICS Performance Evaluation Review, 2012, 40:55-60,

Hongzhang Shan, J. Wright, Shalf, A. Yelick, Wagner, Nathan Wichmann, "A preliminary evaluation of the hardware acceleration of Cray Gemini interconnect for PGAS languages and comparison with MPI", SIGMETRICS Performance Evaluation Review, 2012, 40:92-98,

K. Fuerlinger, N.J. Wright, D. Skinner, C. Klausecker, D. Kranzlmueller, "Effective Holistic Performance Measurement at Petascale Using IPM", Competence in High Performance Computing 2010, January 1, 2010, 15--26,

K. Fuerlinger, N.J. Wright, D. Skinner, "Performance analysis and workload characterization with ipm", Tools for High Performance Computing 2009, January 1, 2010, 31--38,

B. R. de Supinski, S. Alam, D. H. Bailey, L., C. Daley, A. Dubey, T., D. Gunter, P. D. Hovland, H., K. Karavanic, G. Marin, J., S. Moore, B. Norris, L., C. Olschanowsky, P. C. Roth, M., S. Shende, A. Snavely, Spear, M. Tikir, J. Vetter, P. Worley, N. Wright, "Modeling the Office of Science ten year facilities plan: The PERI Architecture Tiger Team", Journal of Physics: Conference Series, 2009, 180:012039,

Conference Papers

Ermal Rrapaj, Sridutt Bhalachandra, Zhengji Zhao, Brian Austin, Hai Ah Nam, Nicholas J. Wright, "Power Consumption Trends in Supercomputers: A Study of NERSC's Cori and Perlmutter Machines", ISC High Performance 2024 Research Paper Proceedings (39th International Conference), pp.1-10, 2024., May 12, 2024,

Zhengji Zhao, Ermal Rrapaj, Sridutt Bhalachandra, Brian Austin, Hai Ah Nam, Nicholas Wright, "Power Analysis of NERSC Production Workloads", In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '23), New York, NY, USA, Association for Computing Machinery, November 2023, 1279-1287, doi: 10.1145/3624062.3624200

Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, LeAnn Lindsey, Christopher Daley, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", 2022 IEEE/ACM International Workshop on Resource Disaggregation in High-Performance Computing (REDIS), November 2022, doi: 10.1109/RESDIS56595.2022.00006

Taylor Groves, Chris Daley, Rahulkumar Gayatri, Hai Ah Nam, Nan Ding, Lenny Oliker, Nicholas J. Wright, Samuel Williams, "A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures", 2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2022, doi: 10.1109/PMBS56514.2022.00012

K. Z. Ibrahim, T. Nguyen, H. Nam, W. Bhimji, S. Farrell, L. Oliker, M. Rowan, N. J. Wright, Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), IEEE, November 2021, doi: 10.1109/PMBS54543.2021.00007

Glenn K Lockwood, Alberto Chiusole, Lisa Gerhardt, Kirill Lozinskiy, David Paul, Nicholas J Wright, "Architecture and Performance of Perlmutter’s 35 PB ClusterStor E1000 All-Flash File System", Proceedings of the 2021 Cray User Group, May 3, 2021,

NERSC's newest system, Perlmutter, features a 35 PB all-flash Lustre file system built on HPE Cray ClusterStor E1000. We present its architecture, early performance figures, and performance considerations unique to this architecture. We demonstrate the performance of E1000 OSSes through low-level Lustre tests that achieve over 90% of the theoretical bandwidth of the SSDs at the OST and LNet levels. We also show end-to-end performance for both traditional dimensions of I/O performance (peak bulk-synchronous bandwidth) and non-optimal workloads endemic to production computing (small, incoherent I/Os at random offsets) and compare them to NERSC's previous system, Cori, to illustrate that Perlmutter achieves the performance of a burst buffer and the resilience of a scratch file system. Finally, we discuss performance considerations unique to all-flash Lustre and present ways in which users and HPC facilities can adjust their I/O patterns and operations to make optimal use of such architectures.

Glenn K. Lockwood, Shane Snyder, Suren Byna, Philip Carns, Nicholas J. Wright, "Understanding Data Motion in the Modern HPC Data Center", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), Denver, CO, USA, IEEE, 2019, 74--83, doi: 10.1109/PDSW49588.2019.00012

The utilization and performance of storage, compute, and network resources within HPC data centers have been studied extensively, but much less work has gone toward characterizing how these resources are used in conjunction to solve larger scientific challenges. To address this gap, we present our work in characterizing workloads and workflows at a data-center-wide level by examining all data transfers that occurred between storage, compute, and the external network at the National Energy Research Scientific Computing Center over a three-month period in 2019. Using a simple abstract representation of data transfers, we analyze over 100 million transfer logs from Darshan, HPSS user interfaces, and Globus to quantify the load on data paths between compute, storage, and the wide-area network based on transfer direction, user, transfer tool, source, destination, and time. We show that parallel I/O from user jobs, while undeniably important, is only one of several major I/O workloads that occurs throughout the execution of scientific workflows. We also show that this approach can be used to connect anomalous data traffic to specific users and file access patterns, and we construct time-resolved user transfer traces to demonstrate that one can systematically identify coupled data motion for individual workflows.

Sudheer Chunduri, Taylor Groves, Peter Mendygral, Brian Austin, Jacob Balma, Krishna Kandalla, Kalyan Kumaran, Glenn Lockwood, Scott Parker, Steven Warren, Nathan Wichmann, Nicholas Wright, "GPCNeT: Designing a Benchmark Suite for Inducing and Measuring Contention in HPC Networks", International Conference on High Performance Computing, Networking, Storage and Analysis (SC'19), November 16, 2019,

Download File: GPCNeTFinal.pdf (pdf: 2.4 MB)

Network congestion is one of the biggest problems facing HPC systems today, affecting system throughput, performance, user experience and reproducibility. Congestion manifests as run-to-run variability due to contention for shared resources like filesystems or routes between compute endpoints. Despite its significance, current network benchmarks fail to proxy the real-world network utilization seen on congested systems. We propose a new open-source benchmark suite called the Global Performance and Congestion Network Tests (GPCNeT) to advance the state of the practice in this area. The guiding principles used in designing GPCNeT are described and the methodology employed to maximize its utility is presented. The capabilities of GPCNeT evaluated by analyzing results from several world’s largest HPC systems, including an evaluation of congestion management on a next-generation network. The results show that systems of all technologies and scales are susceptible to congestion and this work motivates the need for congestion control in next-generation networks.

Glenn K. Lockwood, Kirill Lozinskiy, Lisa Gerhardt, Ravi Cheema, Damian Hazen, Nicholas J. Wright, "Designing an All-Flash Lustre File System for the 2020 NERSC Perlmutter System", Proceedings of the 2019 Cray User Group, Montreal, January 1, 2019,

New experimental and AI-driven workloads are moving into the realm of extreme-scale HPC systems at the same time that high-performance flash is becoming cost-effective to deploy at scale. This confluence poses a number of new technical and economic challenges and opportunities in designing the next generation of HPC storage and I/O subsystems to achieve the right balance of bandwidth, latency, endurance, and cost. In this paper, we present the quantitative approach to requirements definition that resulted in the 30 PB all-flash Lustre file system that will be deployed with NERSC's upcoming Perlmutter system in 2020. By integrating analysis of current workloads and projections of future performance and throughput, we were able to constrain many critical design space parameters and quantitatively demonstrate that Perlmutter will not only deliver optimal performance, but effectively balance cost with capacity, endurance, and many modern features of Lustre.

B. Austin, C. Daley, D. Doerfler, J. Deslippe, B. Cook, B. Friesen, T. Kurth, C. Yang,
and N. Wright, "A Metric for Evaluating Supercomputer Performance in the Era of Extreme Heterogeneity", 9th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS'18), November 2018,

Glenn K. Lockwood, Shane Snyder, Teng Wang, Suren Byna, Philip Carns, Nicholas J. Wright, "A Year in the Life of a Parallel File System", Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, TX, IEEE Press, November 11, 2018, 71:1--74:1,

I/O performance is a critical aspect of data-intensive scientific computing. We seek to advance the state of the practice in understanding and diagnosing I/O performance issues through investigation of a comprehensive I/O performance data set that captures a full year of production storage activity at two leadership-scale computing facilities. We demonstrate techniques to identify regions of interest, perform focused investigations of both long-term trends and transient anomalies, and uncover the contributing factors that lead to performance fluctuation.

We find that a year in the life of a parallel file system is comprised of distinct regions of long-term performance variation in addition to short-term performance transients. We demonstrate how systematic identification of these performance regions, combined with comprehensive analysis, allows us to isolate the factors contributing to different performance maladies at different time scales. From this, we present specific lessons learned and important considerations for HPC storage practitioners.

Teng Wang, Shane Snyder, Glenn K. Lockwood, Philip Carns, Nicholas Wright, Suren Byna, "IOMiner: Large-Scale Analytics Framework for Gaining Knowledge from I/O Logs", 2018 IEEE International Conference on Cluster Computing (CLUSTER), Belfast, UK, IEEE, 2018, 466--476, doi: 10.1109/CLUSTER.2018.00062

Modern HPC systems are collecting large amounts of I/O performance data. The massive volume and heterogeneity of this data, however, have made timely performance of in-depth integrated analysis difficult. To overcome this difficulty and to allow users to identify the root causes of poor application I/O performance, we present IOMiner, an I/O log analytics framework. IOMiner provides an easy-to-use interface for analyzing instrumentation data, a unified storage schema that hides the heterogeneity of the raw instrumentation data, and a sweep-line-based algorithm for root cause analysis of poor application I/O performance. IOMiner is implemented atop Spark to facilitate efficient, interactive, parallel analysis. We demonstrate the capabilities of IOMiner by using it to analyze logs collected on a large-scale production HPC system. Our analysis techniques not only uncover the root cause of poor I/O performance in key application case studies but also provide new insight into HPC I/O workload characterization.

Glenn K. Lockwood, Nicholas J. Wright, Shane Snyder, Philip Carns, George Brown, Kevin Harms, "TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis", Proceedings of the 2018 Cray User Group, Stockholm, SE, May 24, 2018,

At present, I/O performance analysis requires different tools to characterize individual components of the I/O subsystem, and institutional I/O expertise is relied upon to translate these disparate data into an integrated view of application performance. This process is labor-intensive and not sustainable as the storage hierarchy deepens and system complexity increases. To address this growing disparity, we have developed the Total Knowledge of I/O (TOKIO) framework to combine the insights from existing component-level monitoring tools and provide a holistic view of performance across the entire I/O stack.

A reference implementation of TOKIO, pytokio, is presented here. Using monitoring tools included with Cray XC and ClusterStor systems alongside commonly deployed community-supported tools, we demonstrate how pytokio provides a lightweight foundation for holistic I/O performance analyses on two Cray XC systems deployed at different HPC centers. We present results from integrated analyses that allow users to quantify the degree of I/O contention that affected their jobs and probabilistically identify unhealthy storage devices that impacted their performance.We also apply pytokio to inspect the utilization of NERSC’s DataWarp burst buffer and demonstrate how pytokio can be used to identify users and applications who may stand to benefit most from migrating their workloads from Lustre to the burst buffer.

Tyler Allen, Christopher S. Daley, Douglas Doerfler, Brian Austin, Nicholas J. Wright, "Performance and Energy Usage of Workloads on KNL and Haswell Architectures", High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2017. Lecture Notes in Computer Science, Volume 10724., December 23, 2017,

Glenn K. Lockwood, Wucherl Yoo, Suren Byna, Nicholas J. Wright, Shane Snyder, Kevin Harms, Zachary Nault, Philip Carns, "UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis", Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS'17), Denver, CO, ACM, November 2017, 55-60, doi: 10.1145/3149393.3149395

I/O efficiency is essential to productivity in scientific computing, especially as many scientific domains become more data-intensive. Many characterization tools have been used to elucidate specific aspects of parallel I/O performance, but analyzing components of complex I/O subsystems in isolation fails to provide insight into critical questions: how do the I/O components interact, what are reasonable expectations for application performance, and what are the underlying causes of I/O performance problems? To address these questions while capitalizing on existing component-level characterization tools, we propose an approach that combines on-demand, modular synthesis of I/O characterization data into a unified monitoring and metrics interface (UMAMI) to provide a normalized, holistic view of I/O behavior.

We evaluate the feasibility of this approach by applying it to a month-long benchmarking study on two distinct large-scale computing platforms. We present three case studies that highlight the importance of analyzing application I/O performance in context with both contemporaneous and historical component metrics, and we provide new insights into the factors affecting I/O performance. By demonstrating the generality of our approach, we lay the groundwork for a production-grade framework for holistic I/O analysis.

Taylor Groves, Yizi Gu, Nicholas J. Wright, "Understanding Performance Variability on the Aries Dragonfly Network", HPCMASPA in association with IEEE Cluster, September 1, 2017,

C.S. Daley, D. Ghoshal, G.K. Lockwood, S. Dosanjh, L. Ramakrishnan, N.J. Wright, "Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers", Workflows in Support of Large-Scale Science (WORKS-2016), CEUR-WS.org, 2016, 1800:69-73,

Shane Snyder, Philip Carns, Kevin Harms, Robert Ross, Glenn K. Lockwood, Nicholas J. Wright, "Modular HPC I/O characterization with Darshan", Proceedings of the 5th Workshop on Extreme-Scale Programming Tools (ESPT'16), Salt Lake City, UT, November 13, 2016, 9-17, doi: 10.1109/ESPT.2016.9

Contemporary high-performance computing (HPC) applications encompass a broad range of distinct I/O strategies and are often executed on a number of different compute platforms in their lifetime. These large-scale HPC platforms employ increasingly complex I/O subsystems to provide a suitable level of I/O performance to applications. Tuning I/O workloads for such a system is nontrivial, and the results generally are not portable to other HPC systems. I/O profiling tools can help to address this challenge, but most existing tools only instrument specific components within the I/O subsystem that provide a limited perspective on I/O performance. The increasing diversity of scientific applications and computing platforms calls for greater flexibility and scope in I/O characterization.

In this work, we consider how the I/O profiling tool Darshan can be improved to allow for more flexible, comprehensive instru- mentation of current and future HPC I/O workloads.We evaluate the performance and scalability of our design to ensure that it is lightweight enough for full-time deployment on production HPC systems. We also present two case studies illustrating how a more comprehensive instrumentation of application I/O workloads can enable insights into I/O behavior that were not previously possible. Our results indicate that Darshan’s modu- lar instrumentation methods can provide valuable feedback to both users and system administrators, while imposing negligible overheads on user applications.

Tina Declerck, Katie Antypas, Deborah Bard, Wahid Bhimji, Shane Canon, Shreyas Cholia, Helen (Yun) He, Douglas Jacobsen, Prabhat, Nicholas J. Wright, "Cori - A System to Support Data-Intensive Computing", Cray User Group Meeting 2016, London, England, May 2016,

Download File: Cori-CUG2016.pdf (pdf: 4.4 MB)

C.S. Daley, L. Ramakrishnan, S. Dosanjh, N.J. Wright, "Analyses of Scientific Workflows for Effective Use of Future Architectures", The 6th International Workshop on Big Data Analytics: Challenges, and Opportunities (BDAC-15), 2015,

Yu Jung Lo, Samuel Williams, Brian Van Straalen, Terry J. Ligocki,Matthew J. Cordery, Nicholas J. Wright, Mary W. Hall, Leonid Oliker, "Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis", SC'14, November 16, 2014,

Brian Austin, Nicholas Wright, "Measurement and interpretation of microbenchmark and application energy use on the Cray XC30", Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, November 2014,

M. J. Cordery, B. Austin, H. J. Wasserman, C. S. Daley, N. J. Wright, S. D. Hammond, D. Doerfler, "Analysis of Cray XC30 Performance using Trinity-NERSC-8 benchmarks and comparison with Cray XE6 and IBM BG/Q", High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation (PMBS 2013). Lecture Notes in Computer Science, Volume 8551, October 1, 2014,

Hongzhang Shan, Brian Austin, Wibe De Jong, Leonid Oliker, Nicholas Wright, Edoardo Apra, "Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms", SC'13, November 11, 2013,

Brian Austin, Matthew Cordery, Harvey Wasserman, Nicholas J. Wright, "Performance Measurements of the NERSC Cray Cascade System", 2013 Cray User Group Meeting, May 9, 2013,

Download File: baustincug10May2013.pdf (pdf: 475 KB)

Zhengji Zhao, Katie Antypas, Nicholas J Wright, "Effects of Hyper-Threading on the NERSC workload on Edison", 2013 Cray User Group Meeting, May 9, 2013,

Download File: CUG13HTpaper.pdf (pdf: 2.3 MB)

Andrew Uselton, Nicholas J. Wright, "A file system utilization metric for I/O characterization", 2013 Cray User Group Conference, Napa, CA, 2013,

Download File: pap111.pdf (pdf: 982 KB)

Hongzhang Shan, Brian Austin, Nicholas Wright, Erich Strohmaier, John Shalf, Katherine Yelick, "Accelerating Applications at Scale Using One-Sided Communication", The 6th Conference on Partitioned Global Address Programming Models, Santa Barbara, CA, October 10, 2012,

Lavanya Ramakrishnan, Richard Shane Canon, Krishna Muriki, Iwona Sakrejda, and Nicholas J. Wright., "Evaluating Interconnect and Virtualization Performance for High Performance Computing", Proceedings of 2nd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS11), 2011,

Download File: pmbs11.pdf (pdf: 441 KB)

In this paper we detail benchmarking results that characterize the virtualization overhead and its impact on performance. We also examine the performance of various interconnect technologies with a view to understanding the performance impacts of various choices. Our results show that virtualization can have a significant impact upon performance, with at least a 60% performance penalty. We also show that less capable interconnect technologies can have a significant impact upon performance of typical HPC applications. We also evaluate the performance of the Amazon Cluster compute instance and show that it performs approximately equivalently to a 10G Ethernet cluster at low core counts.

Zhengji Zhao and Nick Wright, "Performance of Density Functional Theory codes on Cray XE6", A paper presented in the Cray User Group meeting, May 23-26, 2011, Fairbanks, Alaska., May 24, 2011,

K. Furlinger, N.J. Wright, D. Skinner, "Comprehensive Performance Monitoring for GPU Cluster Systems", Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, 2011, 1377--1386,

Praveen Narayanan, Alice Koniges, Leonid Oliker, Robert Preissl, Samuel Williams, Nicholas J Wright, Maxim Umansky, Xueqiao Xu, Benjamin Dudson, Stephane Ethier, Weixing Wang, Jeff Candy, John R. Cary, "Performance Characterization for Fusion Co-design Applications", Proceedings of CUG, 2011,

Download File: cug2011-praveen.pdf (pdf: 376 KB)

Neal Master, Matthew Andrews, Jason Hick, Shane Canon, Nicholas J. Wright, "Performance Analysis of Commodity and Enterprise Class Flash Devices", Petascale Data Storage Workshop (PDSW), November 2010,

Andrew Uselton, Howison, J. Wright, Skinner, Keen, Shalf, L. Karavanic, Leonid Oliker, "Parallel I/O performance: From events to ensembles", IPDPS, 2010, 1-11,

Hongzhang Shan, Haoqiang Jin, Karl Fuerlinger, Alice Koniges, Nicholas J Wright, "Analyzing the effect of different programming models upon performance and memory usage on cray xt5 platforms", CUG2010, Edinburgh, Scotland, 2010,

Download File: Cug2010Shan.pdf (pdf: 288 KB)

Keith R. Jackson, Ramakrishnan, Muriki, Canon, Cholia, Shalf, J. Wasserman, Nicholas J. Wright, "Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud", CloudCom, January 1, 2010, 159-168,

Karl F\ urlinger, J. Wright, David Skinner, "Effective Performance Measurement at Petascale Using IPM", ICPADS, January 1, 2010, 373-380,

N.J. Wright, S. Smallen, C.M. Olschanowsky, J. Hayes, A. Snavely, "Measuring and Understanding Variation in Benchmark Performance", DoD High Performance Computing Modernization Program Users Group Conference (HPCMP-UGC), 2009, 2009, 438 -443,

Wayne Pfeiffer, Nicholas J. Wright, "Modeling and predicting application performance on parallel computers using HPC challenge benchmarks", IPDPS, 2008, 1-12,

John Michalakes, Hacker, Loft, O. McCracken, Snavely, J. Wright, E. Spelce, C. Gorda, Robert Walkup, "WRF nature run", SC, 2007, 59,

Book Chapters

Glenn K. Lockwood, Kirill Lozinskiy, Lisa Gerhardt, Ravi Cheema, Damian Hazen, Nicholas J. Wright, "A Quantitative Approach to Architecting All-Flash Lustre File Systems", ISC High Performance 2019: High Performance Computing, edited by Michele Weiland, Guido Juckeland, Sadaf Alam, Heike Jagode, (Springer International Publishing: 2019) Pages: 183--197 doi: 10.1007/978-3-030-34356-9_16

New experimental and AI-driven workloads are moving into the realm of extreme-scale HPC systems at the same time that high-performance flash is becoming cost-effective to deploy at scale. This confluence poses a number of new technical and economic challenges and opportunities in designing the next generation of HPC storage and I/O subsystems to achieve the right balance of bandwidth, latency, endurance, and cost. In this work, we present quantitative models that use workload data from existing, disk-based file systems to project the architectural requirements of all-flash Lustre file systems. Using data from NERSC’s Cori I/O subsystem, we then demonstrate the minimum required capacity for data, capacity for metadata and data-on-MDT, and SSD endurance for a future all-flash Lustre file system.

N.J. Wright, S. S. Dosanjh, A. K. Andrews, K. Antypas, B. Draney, R.S Canon, S. Cholia, C.S. Daley, K. M. Fagnan, R.A. Gerber, L. Gerhardt, L. Pezzaglia, Prabhat, K.H. Schafer, J. Srinivasan, "Cori: A Pre-Exascale Computer for Big Data and HPC Applications", Big Data and High Performance Computing 26 (2015): 82., ( June 2015) doi: 10.3233/978-1-61499-583-8-82

Extreme data science is becoming increasingly important at the U.S. Department of Energy's National Energy Research Scientific Computing Center (NERSC). Many petabytes of data are transferred from experimental facilities to NERSC each year. Applications of importance include high-energy physics, materials science, genomics, and climate modeling, with an increasing emphasis on large-scale simulations and data analysis. In response to the emerging data-intensive workloads of its users, NERSC made a number of critical design choices to enhance the usability of its pre-exascale supercomputer, Cori, which is scheduled to be delivered in 2016. These data enhancements include a data partition, a layer of NVRAM for accelerating I/O, user defined images and a customizable gateway for accelerating connections to remote experimental facilities.

Sudip Dosanjh, Shane Canon, Jack Deslippe, Kjiersten Fagnan, Richard Gerber, Lisa Gerhardt, Jason Hick, Douglas Jacobsen, David Skinner, Nicholas J. Wright, "Extreme Data Science at the National Energy Research Scientific Computing (NERSC) Center", Proceedings of International Conference on Parallel Programming – ParCo 2013, ( March 26, 2014)

Lavanya Ramakrishnan, Adam Scovel, Iwona Sakrejda, Susan Coghlan, Shane Canon, Anping Liu, Devarshi Ghoshal, Krishna Muriki and Nicholas J. Wright, "CAMP", On the Road to Exascale Computing: Contemporary Architectures in High Performance Computing, (Chapman & Hall/CRC Press: January 1, 2013)

Lavanya Ramakrishnan, Adam Scovel, Iwona Sakrejda, Susan Coghlan, Shane Canon, Anping Liu, Devarshi Ghoshal, Krishna Muriki, Nicholas J. Wright, "Magellan - A Testbed to Explore Cloud Computing for Science", On the Road to Exascale Computing: Contemporary Architectures in High Performance Computing, (Chapman & Hall/CRC Press: 2013)

Presentation/Talks

Zhengji Zhao, Ermal Rrapaj, Sridutt Bhalachandra, Brian Austin, Hai Ah Nam, Nicholas Wright, Power Analysis of NERSC Production Workloads, A talk presented at the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '23), November 12, 2023,

Kirill Lozinskiy, Glenn K. Lockwood, Lisa Gerhardt, Ravi Cheema, Damian Hazen, Nicholas J. Wright, A Quantitative Approach to Architecting All‐Flash Lustre File Systems, Lustre User Group (LUG) 2019, May 15, 2019,

Tina Declerck, Katie Antypas, Deborah Bard, Wahid Bhimji, Shane Canon, Shreyas Cholia, Helen (Yun) He, Douglas Jacobsen, Prabhat, Nicholas J. Wright, Cori - A System to Support Data-Intensive Computing, Cray User Group Meeting 2016, London, England, May 12, 2016,

Nick Wright, NERSC Initiative: Preparing Applications for Exascale, February 12, 2013,

Download File: pmodelsappreadinessNUGfeb2013.pdf (pdf: 2.2 MB)

Zhengji Zhao and Nick Wright, Performance of Density Functional Theory codes on Cray XE6, A talk in Cray User Group meeting 2011, May 23-26, 2011, Fairbanks, Alaska., May 23, 2011,

Reports

GK Lockwood, D Hazen, Q Koziol, RS Canon, K Antypas, J Balewski, N Balthaser, W Bhimji, J Botts, J Broughton, TL Butler, GF Butler, R Cheema, C Daley, T Declerck, L Gerhardt, WE Hurlbert, KA Kallback-Rose, S Leak, J Lee, R Lee, J Liu, K Lozinskiy, D Paul, Prabhat, C Snavely, J Srinivasan, T Stone Gibbins, NJ Wright, "Storage 2020: A Vision for the Future of HPC Storage", October 20, 2017, LBNL LBNL-2001072,

As the DOE Office of Science's mission computing facility, NERSC will follow this roadmap and deploy these new storage technologies to continue delivering storage resources that meet the needs of its broad user community. NERSC's diversity of workflows encompass significant portions of open science workloads as well, and the findings presented in this report are also intended to be a blueprint for how the evolving storage landscape can be best utilized by the greater HPC community. Executing the strategy presented here will ensure that emerging I/O technologies will be both applicable to and effective in enabling scientific discovery through extreme-scale simulation and data analysis in the coming decade.

K. Antypas, B.A Austin, T.L. Butler, R.A. Gerber, C.L Whitney, N.J. Wright, W. Yang, Z Zhao, "NERSC Workload Analysis on Hopper", Report, October 17, 2014, LBNL 6804E,