Nick Wright is the advanced technologies group lead and the NERSC chief architect. He focuses upon evaluating future technologies for potential application in scientific computing. He led the effort to optimize the architecture of the Perlmutter machine, the first NERSC platform designed to meet needs of both large scale simulation and data analysis from experimental facilities. Before moving to NERSC, he was a member of the Performance Modeling and Characterization (PMaC) group at the San Diego Supercomputing Center. He earned both his undergraduate and doctoral degrees in chemistry at the University of Durham in England.
C. S. Daley, D. Ghoshal, G. K. Lockwood, S. Dosanjh, L. Ramakrishnan, N. J. Wright, "Performance characterization of scientific workflows for the optimal use of Burst Buffers", Future Generation Computer Systems, December 28, 2017, doi: 10.1016/j.future.2017.12.022
Scientific discoveries are increasingly dependent upon the analysis of large volumes of data from observations and simulations of complex phenomena. Scientists compose the complex analyses as workflows and execute them on large-scale HPC systems. The workflow structures are in contrast with monolithic single simulations that have often been the primary use case on HPC systems. Simultaneously, new storage paradigms such as Burst Buffers are becoming available on HPC platforms. In this paper, we analyze the performance characteristics of a Burst Buffer and two representative scientific workflows with the aim of optimizing the usage of a Burst Buffer, extending our previous analyses (Daley et al., 2016). Our key contributions are a). developing a performance analysis methodology pertinent to Burst Buffers, b). improving the use of a Burst Buffer in workflows with bandwidth-sensitive and metadata-sensitive I/O workloads, c). highlighting the key data management challenges when incorporating a Burst Buffer in the studied scientific workflows.
Hongzhang Shan, J. Wright, Shalf, A. Yelick, Wagner, Nathan Wichmann, "A preliminary evaluation of the hardware acceleration of Cray Gemini interconnect for PGAS languages and comparison with MPI", SIGMETRICS Performance Evaluation Review, 2012, 40:92-98,
Lavanya Ramakrishnan, Richard Canon, Muriki, Sakrejda, Nicholas J. Wright, "Evaluating Interconnect and Virtualization Performance forHigh Performance Computing", SIGMETRICS Performance Evaluation Review, 2012, 40:55-60,
K. Fuerlinger, N.J. Wright, D. Skinner, "Performance analysis and workload characterization with ipm", Tools for High Performance Computing 2009, January 1, 2010, 31--38,
K. Fuerlinger, N.J. Wright, D. Skinner, C. Klausecker, D. Kranzlmueller, "Effective Holistic Performance Measurement at Petascale Using IPM", Competence in High Performance Computing 2010, January 1, 2010, 15--26,
B. R. de Supinski, S. Alam, D. H. Bailey, L., C. Daley, A. Dubey, T., D. Gunter, P. D. Hovland, H., K. Karavanic, G. Marin, J., S. Moore, B. Norris, L., C. Olschanowsky, P. C. Roth, M., S. Shende, A. Snavely, Spear, M. Tikir, J. Vetter, P. Worley, N. Wright, "Modeling the Office of Science ten year facilities plan: The PERI Architecture Tiger Team", Journal of Physics: Conference Series, 2009, 180:012039,
Brian Austin, Chris Daley, Douglas Doerfler, Jack Deslippe, Brandon Cook, Brian Friesen, Thorsten Kurth, Charlene Yang, Nicholas J. Wright, "A Metric for Evaluating Supercomputer Performance in the Era of Extreme Heterogeneity", 9th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS18), November 12, 2018,
Glenn K. Lockwood, Shane Snyder, Teng Wang, Suren Byna, Philip Carns, Nicholas J. Wright, "A Year in the Life of a Parallel File System", Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, TX, IEEE Press, November 11, 2018, 71:1--74:1,
I/O performance is a critical aspect of data-intensive scientific computing. We seek to advance the state of the practice in understanding and diagnosing I/O performance issues through investigation of a comprehensive I/O performance data set that captures a full year of production storage activity at two leadership-scale computing facilities. We demonstrate techniques to identify regions of interest, perform focused investigations of both long-term trends and transient anomalies, and uncover the contributing factors that lead to performance fluctuation.
We find that a year in the life of a parallel file system is comprised of distinct regions of long-term performance variation in addition to short-term performance transients. We demonstrate how systematic identification of these performance regions, combined with comprehensive analysis, allows us to isolate the factors contributing to different performance maladies at different time scales. From this, we present specific lessons learned and important considerations for HPC storage practitioners.
Glenn K. Lockwood, Nicholas J. Wright, Shane Snyder, Philip Carns, George Brown, Kevin Harms, "TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis", Proceedings of the 2018 Cray User Group, Stockholm, SE, May 24, 2018,
At present, I/O performance analysis requires different tools to characterize individual components of the I/O subsystem, and institutional I/O expertise is relied upon to translate these disparate data into an integrated view of application performance. This process is labor-intensive and not sustainable as the storage hierarchy deepens and system complexity increases. To address this growing disparity, we have developed the Total Knowledge of I/O (TOKIO) framework to combine the insights from existing component-level monitoring tools and provide a holistic view of performance across the entire I/O stack.
A reference implementation of TOKIO, pytokio, is presented here. Using monitoring tools included with Cray XC and ClusterStor systems alongside commonly deployed community-supported tools, we demonstrate how pytokio provides a lightweight foundation for holistic I/O performance analyses on two Cray XC systems deployed at different HPC centers. We present results from integrated analyses that allow users to quantify the degree of I/O contention that affected their jobs and probabilistically identify unhealthy storage devices that impacted their performance.We also apply pytokio to inspect the utilization of NERSC’s DataWarp burst buffer and demonstrate how pytokio can be used to identify users and applications who may stand to benefit most from migrating their workloads from Lustre to the burst buffer.
Tyler Allen, Christopher S. Daley, Douglas Doerfler, Brian Austin, Nicholas J. Wright, "Performance and Energy Usage of Workloads on KNL and Haswell Architectures", High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2017. Lecture Notes in Computer Science, Volume 10724., December 23, 2017,
Glenn K. Lockwood, Wucherl Yoo, Suren Byna, Nicholas J. Wright, Shane Snyder, Kevin Harms, Zachary Nault, Philip Carns, "UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis", Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS'17), Denver, CO, ACM, November 2017, 55-60, doi: 10.1145/3149393.3149395
I/O efficiency is essential to productivity in scientific computing, especially as many scientific domains become more data-intensive. Many characterization tools have been used to elucidate specific aspects of parallel I/O performance, but analyzing components of complex I/O subsystems in isolation fails to provide insight into critical questions: how do the I/O components interact, what are reasonable expectations for application performance, and what are the underlying causes of I/O performance problems? To address these questions while capitalizing on existing component-level characterization tools, we propose an approach that combines on-demand, modular synthesis of I/O characterization data into a unified monitoring and metrics interface (UMAMI) to provide a normalized, holistic view of I/O behavior.
We evaluate the feasibility of this approach by applying it to a month-long benchmarking study on two distinct large-scale computing platforms. We present three case studies that highlight the importance of analyzing application I/O performance in context with both contemporaneous and historical component metrics, and we provide new insights into the factors affecting I/O performance. By demonstrating the generality of our approach, we lay the groundwork for a production-grade framework for holistic I/O analysis.
Taylor Groves, Yizi Gu, Nicholas J. Wright, "Understanding Performance Variability on the Aries Dragonfly Network", HPCMASPA in association with IEEE Cluster, September 1, 2017,
C.S. Daley, D. Ghoshal, G.K. Lockwood, S. Dosanjh, L. Ramakrishnan, N.J. Wright, "Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers", Workflows in Support of Large-Scale Science (WORKS-2016), CEUR-WS.org, 2016, 1800:69-73,
Shane Snyder, Philip Carns, Kevin Harms, Robert Ross, Glenn K. Lockwood, Nicholas J. Wright, "Modular HPC I/O characterization with Darshan", Proceedings of the 5th Workshop on Extreme-Scale Programming Tools (ESPT'16), Salt Lake City, UT, November 13, 2016, 9-17, doi: 10.1109/ESPT.2016.9
Contemporary high-performance computing (HPC) applications encompass a broad range of distinct I/O strategies and are often executed on a number of different compute platforms in their lifetime. These large-scale HPC platforms employ increasingly complex I/O subsystems to provide a suitable level of I/O performance to applications. Tuning I/O workloads for such a system is nontrivial, and the results generally are not portable to other HPC systems. I/O profiling tools can help to address this challenge, but most existing tools only instrument specific components within the I/O subsystem that provide a limited perspective on I/O performance. The increasing diversity of scientific applications and computing platforms calls for greater flexibility and scope in I/O characterization.
In this work, we consider how the I/O profiling tool Darshan can be improved to allow for more flexible, comprehensive instru- mentation of current and future HPC I/O workloads.We evaluate the performance and scalability of our design to ensure that it is lightweight enough for full-time deployment on production HPC systems. We also present two case studies illustrating how a more comprehensive instrumentation of application I/O workloads can enable insights into I/O behavior that were not previously possible. Our results indicate that Darshan’s modu- lar instrumentation methods can provide valuable feedback to both users and system administrators, while imposing negligible overheads on user applications.
Tina Declerck, Katie Antypas, Deborah Bard, Wahid Bhimji, Shane Canon, Shreyas Cholia, Helen (Yun) He, Douglas Jacobsen, Prabhat, Nicholas J. Wright, "Cori - A System to Support Data-Intensive Computing", Cray User Group Meeting 2016, London, England, May 2016,
- Download File: Cori-CUG2016.pdf (pdf: 4.4 MB)
W. Bhimji, D. Bard, M. Romanus, D. Paul, A. Ovsyannikov, B. Friesen, M. Bryson, J. Correa, G. K. Lockwood, V. Tsulaia, S. Byna, S. Farrell, D. Gursoy, C. Daley, V. Beckner, B. Van Straalen, D. Trebotich, C. Tull, G. Weber, N. J. Wright, K. Antypas, Prabhat, "Accelerating Science with the NERSC Burst Buffer Early User Program", Cray User Group, May 11, 2016, LBNL LBNL-1005736,
NVRAM-based Burst Buffers are an important part of the emerging HPC storage landscape. The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory recently installed one of the first Burst Buffer systems as part of its new Cori supercomputer, collaborating with Cray on the development of the DataWarp software. NERSC has a diverse user base comprised of over 6500 users in 700 different projects spanning a wide variety of scientific computing applications. The use-cases of the Burst Buffer at NERSC are therefore also considerable and diverse. We describe here performance measurements and lessons learned from the Burst Buffer Early User Program at NERSC, which selected a number of research projects to gain early access to the Burst Buffer and exercise its capability to enable new scientific advancements. To the best of our knowledge this is the first time a Burst Buffer has been stressed at scale by diverse, real user workloads and therefore these lessons will be of considerable benefit to shaping the developing use of Burst Buffers at HPC centers.
C.S. Daley, L. Ramakrishnan, S. Dosanjh, N.J. Wright, "Analyses of Scientific Workflows for Effective Use of Future Architectures", The 6th International Workshop on Big Data Analytics: Challenges, and Opportunities (BDAC-15), 2015,
Yu Jung Lo, Samuel Williams, Brian Van Straalen, Terry J. Ligocki,Matthew J. Cordery, Nicholas J. Wright, Mary W. Hall, Leonid Oliker, "Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis", SC'14, November 16, 2014,
Brian Austin, Nicholas Wright, "Measurement and interpretation of microbenchmark and application energy use on the Cray XC30", Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, November 2014,
M. J. Cordery, B. Austin, H. J. Wasserman, C. S. Daley, N. J. Wright, S. D. Hammond, D. Doerfler, "Analysis of Cray XC30 Performance using Trinity-NERSC-8 benchmarks and comparison with Cray XE6 and IBM BG/Q", High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation (PMBS 2013). Lecture Notes in Computer Science, Volume 8551, October 1, 2014,
Hongzhang Shan, Brian Austin, Wibe De Jong, Leonid Oliker, Nicholas Wright, Edoardo Apra, "Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms", SC'13, November 11, 2013,
Zhengji Zhao, Katie Antypas, Nicholas J Wright, "Effects of Hyper-Threading on the NERSC workload on Edison", 2013 Cray User Group Meeting, May 9, 2013,
- Download File: CUG13HTpaper.pdf (pdf: 2.3 MB)
Brian Austin, Matthew Cordery, Harvey Wasserman, Nicholas J. Wright, "Performance Measurements of the NERSC Cray Cascade System", 2013 Cray User Group Meeting, May 9, 2013,
- Download File: baustincug10May2013.pdf (pdf: 475 KB)
Andrew Uselton, Nicholas J. Wright, "A file system utilization metric for I/O characterization", 2013 Cray User Group Conference, Napa, CA, 2013,
- Download File: pap111.pdf (pdf: 982 KB)
Hongzhang Shan, Brian Austin, Nicholas Wright, Erich Strohmaier, John Shalf, Katherine Yelick, "Accelerating Applications at Scale Using One-Sided Communication", The 6th Conference on Partitioned Global Address Programming Models, Santa Barbara, CA, October 10, 2012,
Lavanya Ramakrishnan, Richard Shane Canon, Krishna Muriki, Iwona Sakrejda, and Nicholas J. Wright., "Evaluating Interconnect and Virtualization Performance for High Performance Computing", Proceedings of 2nd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS11), 2011,
- Download File: pmbs11.pdf (pdf: 441 KB)
In this paper we detail benchmarking results that characterize the virtualization overhead and its impact on performance. We also examine the performance of various interconnect technologies with a view to understanding the performance impacts of various choices. Our results show that virtualization can have a significant impact upon performance, with at least a 60% performance penalty. We also show that less capable interconnect technologies can have a significant impact upon performance of typical HPC applications. We also evaluate the performance of the Amazon Cluster compute instance and show that it performs approximately equivalently to a 10G Ethernet cluster at low core counts.
Zhengji Zhao and Nick Wright, "Performance of Density Functional Theory codes on Cray XE6", A paper presented in the Cray User Group meeting, May 23-26, 2011, Fairbanks, Alaska., May 24, 2011,
K. Furlinger, N.J. Wright, D. Skinner, "Comprehensive Performance Monitoring for GPU Cluster Systems", Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, 2011, 1377--1386,
Praveen Narayanan, Alice Koniges, Leonid Oliker, Robert Preissl, Samuel Williams, Nicholas J Wright, Maxim Umansky, Xueqiao Xu, Benjamin Dudson, Stephane Ethier, Weixing Wang, Jeff Candy, John R. Cary, "Performance Characterization for Fusion Co-design Applications", Proceedings of CUG, 2011,
- Download File: cug2011-praveen.pdf (pdf: 376 KB)
Neal Master, Matthew Andrews, Jason Hick, Shane Canon, Nicholas J. Wright, "Performance Analysis of Commodity and Enterprise Class Flash Devices", Petascale Data Storage Workshop (PDSW), November 2010,
Keith R. Jackson, Ramakrishnan, Muriki, Canon, Cholia, Shalf, J. Wasserman, Nicholas J. Wright, "Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud", CloudCom, January 1, 2010, 159-168,
Hongzhang Shan, Haoqiang Jin, Karl Fuerlinger, Alice Koniges, Nicholas J Wright, "Analyzing the effect of different programming models upon performance and memory usage on cray xt5 platforms", CUG2010, Edinburgh, Scotland, 2010,
- Download File: Cug2010Shan.pdf (pdf: 288 KB)
Andrew Uselton, Howison, J. Wright, Skinner, Keen, Shalf, L. Karavanic, Leonid Oliker, "Parallel I/O performance: From events to ensembles", IPDPS, 2010, 1-11,
Karl F\ urlinger, J. Wright, David Skinner, "Effective Performance Measurement at Petascale Using IPM", ICPADS, January 1, 2010, 373-380,
N.J. Wright, S. Smallen, C.M. Olschanowsky, J. Hayes, A. Snavely, "Measuring and Understanding Variation in Benchmark Performance", DoD High Performance Computing Modernization Program Users Group Conference (HPCMP-UGC), 2009, 2009, 438 -443,
Wayne Pfeiffer, Nicholas J. Wright, "Modeling and predicting application performance on parallel computers using HPC challenge benchmarks", IPDPS, 2008, 1-12,
John Michalakes, Hacker, Loft, O. McCracken, Snavely, J. Wright, E. Spelce, C. Gorda, Robert Walkup, "WRF nature run", SC, 2007, 59,
Sudip Dosanjh, Shane Canon, Jack Deslippe, Kjiersten Fagnan, Richard Gerber, Lisa Gerhardt, Jason Hick, Douglas Jacobsen, David Skinner, Nicholas J. Wright, "Extreme Data Science at the National Energy Research Scientific Computing (NERSC) Center", Proceedings of International Conference on Parallel Programming – ParCo 2013, ( March 26, 2014)
Lavanya Ramakrishnan, Adam Scovel, Iwona Sakrejda, Susan Coghlan, Shane Canon, Anping Liu, Devarshi Ghoshal, Krishna Muriki, Nicholas J. Wright, "Magellan - A Testbed to Explore Cloud Computing for Science", On the Road to Exascale Computing: Contemporary Architectures in High Performance Computing, (Chapman & Hall/CRC Press: 2013)
Lavanya Ramakrishnan, Adam Scovel, Iwona Sakrejda, Susan Coghlan, Shane Canon, Anping Liu, Devarshi Ghoshal, Krishna Muriki and Nicholas J. Wright, "CAMP", On the Road to Exascale Computing: Contemporary Architectures in High Performance Computing, (Chapman & Hall/CRC Press: January 1, 2013)
Kirill Lozinskiy, Glenn K. Lockwood, Lisa Gerhardt, Ravi Cheema, Damian Hazen, Nicholas J. Wright, A Quantitative Approach to Architecting All‐Flash Lustre File Systems, Lustre User Group (LUG) 2019, May 15, 2019,
Kirill Lozinskiy, Glenn K. Lockwood, Lisa Gerhardt, Ravi Cheema, Damian Hazen, Nicholas J. Wright, Designing an All-Flash Lustre File System for the 2020 NERSC Perlmutter System, Cray User Group (CUG) 2019, May 7, 2019,
New experimental and AI-driven workloads are moving into the realm of extreme-scale HPC systems at the same time that high-performance flash is becoming cost-effective to deploy at scale. This confluence poses a number of new technical and economic challenges and opportunities in designing the next generation of HPC storage and I/O subsystems to achieve the right balance of bandwidth, latency, endurance, and cost. In this paper, we present the quantitative approach to requirements definition that resulted in the 30 PB all-flash Lustre file system that will be deployed with NERSC's upcoming Perlmutter system in 2020. By integrating analysis of current workloads and projections of future performance and throughput, we were able to constrain many critical design space parameters and quantitatively demonstrate that Perlmutter will not only deliver optimal performance, but effectively balance cost with capacity, endurance, and many modern features of Lustre.
Tina Declerck, Katie Antypas, Deborah Bard, Wahid Bhimji, Shane Canon, Shreyas Cholia, Helen (Yun) He, Douglas Jacobsen, Prabhat, Nicholas J. Wright, Cori - A System to Support Data-Intensive Computing, Cray User Group Meeting 2016, London, England, May 12, 2016,
Nick Wright, NERSC Initiative: Preparing Applications for Exascale, February 12, 2013,
- Download File: pmodelsappreadinessNUGfeb2013.pdf (pdf: 2.2 MB)
Zhengji Zhao and Nick Wright, Performance of Density Functional Theory codes on Cray XE6, A talk in Cray User Group meeting 2011, May 23-26, 2011, Fairbanks, Alaska., May 23, 2011,
Glenn K. Lockwood, Damian Hazen, Quincey Koziol, Shane Canon, Katie Antypas, Jan Balewski, Nicholas Balthaser, Wahid Bhimji, James Botts, Jeff Broughton, Tina L. Butler, Gregory F. Butler, Ravi Cheema, Christopher Daley, Tina Declerck, Lisa Gerhardt, Wayne E. Hurlbert, Kristy A. Kallback-
Rose, Stephen Leak, Jason Lee, Rei Lee, Jialin Liu, Kirill Lozinskiy, David Paul, Prabhat, Cory Snavely, Jay Srinivasan, Tavia Stone Gibbins, Nicholas J. Wright,
"Storage 2020: A Vision for the Future of HPC Storage",
October 20, 2017,
- Download File: Storage-2020-A-Vision-for-the-Future-of-HPC-Storage.pdf (pdf: 3.6 MB)