NERSCPowering Scientific Discovery for 50 Years

Kirill Lozinskiy

KLozinskiy.PNG
Kirill Lozinskiy
Computer Systems Engineer
Storage Systems Group
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory
1 Cyclotron Road
Mailstop: 59R4010A
Berkeley, CA 94720 us

Kirill Lozinskiy is a computer systems engineer at NERSC. He has been a member of the Storage Systems Group since 2016, where he shares the responsibility for maintaining the HPSS archive and various global filesystems containing hundreds of petabytes and storing many decades of scientific research.

Before coming to NERSC, Kirill was at the Broad Institute, a biomedical and genomic research center, where he maintained over 30 PB of file and object storage and worked on HPC cloud computing initiatives with a focus on DevOps and automation.

Conference Papers

Glenn K Lockwood, Alberto Chiusole, Lisa Gerhardt, Kirill Lozinskiy, David Paul, Nicholas J Wright, "Architecture and Performance of Perlmutter’s 35 PB ClusterStor E1000 All-Flash File System", Proceedings of the 2021 Cray User Group, May 3, 2021,

NERSC's newest system, Perlmutter, features a 35 PB all-flash Lustre file system built on HPE Cray ClusterStor E1000. We present its architecture, early performance figures, and performance considerations unique to this architecture. We demonstrate the performance of E1000 OSSes through low-level Lustre tests that achieve over 90% of the theoretical bandwidth of the SSDs at the OST and LNet levels. We also show end-to-end performance for both traditional dimensions of I/O performance (peak bulk-synchronous bandwidth) and non-optimal workloads endemic to production computing (small, incoherent I/Os at random offsets) and compare them to NERSC's previous system, Cori, to illustrate that Perlmutter achieves the performance of a burst buffer and the resilience of a scratch file system. Finally, we discuss performance considerations unique to all-flash Lustre and present ways in which users and HPC facilities can adjust their I/O patterns and operations to make optimal use of such architectures.

Glenn K. Lockwood, Kirill Lozinskiy, Lisa Gerhardt, Ravi Cheema, Damian Hazen, Nicholas J. Wright, "Designing an All-Flash Lustre File System for the 2020 NERSC Perlmutter System", Proceedings of the 2019 Cray User Group, Montreal, January 1, 2019,

New experimental and AI-driven workloads are moving into the realm of extreme-scale HPC systems at the same time that high-performance flash is becoming cost-effective to deploy at scale. This confluence poses a number of new technical and economic challenges and opportunities in designing the next generation of HPC storage and I/O subsystems to achieve the right balance of bandwidth, latency, endurance, and cost. In this paper, we present the quantitative approach to requirements definition that resulted in the 30 PB all-flash Lustre file system that will be deployed with NERSC's upcoming Perlmutter system in 2020. By integrating analysis of current workloads and projections of future performance and throughput, we were able to constrain many critical design space parameters and quantitatively demonstrate that Perlmutter will not only deliver optimal performance, but effectively balance cost with capacity, endurance, and many modern features of Lustre.

Book Chapters

Glenn K. Lockwood, Kirill Lozinskiy, Lisa Gerhardt, Ravi Cheema, Damian Hazen, Nicholas J. Wright, "A Quantitative Approach to Architecting All-Flash Lustre File Systems", ISC High Performance 2019: High Performance Computing, edited by Michele Weiland, Guido Juckeland, Sadaf Alam, Heike Jagode, (Springer International Publishing: 2019) Pages: 183--197 doi: 10.1007/978-3-030-34356-9_16

New experimental and AI-driven workloads are moving into the realm of extreme-scale HPC systems at the same time that high-performance flash is becoming cost-effective to deploy at scale. This confluence poses a number of new technical and economic challenges and opportunities in designing the next generation of HPC storage and I/O subsystems to achieve the right balance of bandwidth, latency, endurance, and cost. In this work, we present quantitative models that use workload data from existing, disk-based file systems to project the architectural requirements of all-flash Lustre file systems. Using data from NERSC’s Cori I/O subsystem, we then demonstrate the minimum required capacity for data, capacity for metadata and data-on-MDT, and SSD endurance for a future all-flash Lustre file system.

Presentation/Talks

Nicholas Balthaser, Francis Dequenne, Melinda Jacobsen, Owen James, Kristy Kallback-Rose, Kirill Lozinskiy, NERSC HPSS Site Update, 2021 HPSS User Forum, October 20, 2021,

Nicholas Balthaser, Wayne Hurlbert, Melinda Jacobsen, Owen James, Kristy Kallback-Rose, Kirill Lozinskiy, NERSC HPSS Site Update, 2020 HPSS User Forum, October 9, 2020,

Report on recent projects and challenges running HPSS at NERSC, including recent AQI issues and upcoming HPSS upgrade.

Nicholas Balthaser, Damian Hazen, Wayne Hurlbert, Owen James, Kristy Kallback-Rose, Kirill Lozinskiy, Moving the NERSC Archive to a Green Data Center, Storage Technology Showcase 2020, March 3, 2020,

Description of methods used and challenges involved in moving the NERSC tape archive to a new data center with environmental cooling.

Nicholas Balthaser, Wayne Hurlbert, Kirill Lozinskiy, Owen James, Regent System Move Update, NERSC All-to-All Meeting, December 16, 2019,

Update on moving the NERSC center backup system from the Oakland Scientific Facility to LBL Building 59.

Glenn K. Lockwood, Kirill Lozinskiy, Kristy Kallback-Rose, NERSC's Perlmutter System: Deploying 30 PB of all-NVMe Lustre at scale, Lustre BoF at SC19, November 19, 2019,

Update at SC19 Lustre BoF on collaborative work with Cray on deploying an all-flash Lustre tier for NERSC's Perlmutter Shasta system.

Nicholas Balthaser, Kirill Lozinskiy, Melinda Jacobsen, Kristy Kallback-Rose, NERSC Migration from Oracle tape libraries GPFS-HPSS-Integration Proof of Concept, October 16, 2019,

NERSC updates on Storage 2020 Strategy & Progress, GHI Testing, Tape Library Update, Futures

Kirill Lozinskiy, Glenn K. Lockwood, Lisa Gerhardt, Ravi Cheema, Damian Hazen, Nicholas J. Wright, A Quantitative Approach to Architecting All‐Flash Lustre File Systems, Lustre User Group (LUG) 2019, May 15, 2019,

Kirill Lozinskiy, Lisa Gerhardt, Annette Greiner, Ravi Cheema, Damian Hazen, Kristy Kallback-Rose, Rei Lee, User-Friendly Data Management for Scientific Computing Users, Cray User Group (CUG) 2019, May 9, 2019,

Wrangling data at a scientific computing center can be a major challenge for users, particularly when quotas may impact their ability to utilize resources. In such an environment, a task as simple as listing space usage for one's files can take hours. The National Energy Research Scientific Computing Center (NERSC) has roughly 50 PBs of shared storage utilizing more than 4.6B inodes, and a 146 PB high-performance tape archive, all accessible from two supercomputers. As data volumes increase exponentially, managing data is becoming a larger burden on scientists. To ease the pain, we have designed and built a “Data Dashboard”. Here, in a web-enabled visual application, our 7,000 users can easily review their usage against quotas, discover patterns, and identify candidate files for archiving or deletion. We describe this system, the framework supporting it, and the challenges for such a framework moving into the exascale age.

NERSC site update focusing on plans to implement new tape technology at the Berkeley Data Center. 

NERSC Site Report focusing on plans for migration of tape-based system to new location and new technology, and collection of metrics for GPFS.

Kirill Lozinskiy, GPFS & HPSS Interface (GHI), Spectrum Scale User Group 2017, April 5, 2017,

This presentation gives a brief overview of integration between the High Performance Storage System (HPSS) and the General Parallel File System (GPFS).

Reports

GK Lockwood, D Hazen, Q Koziol, RS Canon, K Antypas, J Balewski, N Balthaser, W Bhimji, J Botts, J Broughton, TL Butler, GF Butler, R Cheema, C Daley, T Declerck, L Gerhardt, WE Hurlbert, KA Kallback-Rose, S Leak, J Lee, R Lee, J Liu, K Lozinskiy, D Paul, Prabhat, C Snavely, J Srinivasan, T Stone Gibbins, NJ Wright, "Storage 2020: A Vision for the Future of HPC Storage", October 20, 2017, LBNL LBNL-2001072,

As the DOE Office of Science's mission computing facility, NERSC will follow this roadmap and deploy these new storage technologies to continue delivering storage resources that meet the needs of its broad user community. NERSC's diversity of workflows encompass significant portions of open science workloads as well, and the findings presented in this report are also intended to be a blueprint for how the evolving storage landscape can be best utilized by the greater HPC community. Executing the strategy presented here will ensure that emerging I/O technologies will be both applicable to and effective in enabling scientific discovery through extreme-scale simulation and data analysis in the coming decade.