Jialin is a computer engineer in the Data Analytics Service group. He received his Ph.D. in computer science from Texas Tech University in 2015. He spent the summer of 2013 with the SDM group in building 50 working on scientific data service (SDS). He has interest in scientific data management/analytics and parallel I/O.
He works with domain scientists/users in designing the optimal I/O solution for exa-scale computing. During the past few years at NERSC, he collaborated with UCB and helped design the H5Spark for running Spark across 52,000 cores on Cori, which was the largest scale in the world; He also developed the Collective I/O function in H5py and has been released for all parallel H5py users in the world. In 2018, he led and completed the Object Store Evaluation project, together with ATG/SSG, SDM groups at LBL, as well as Intel and the HDF group. The work has been presented in the PDSW-DISCS workshop at SC'18. Recently, he participated in the 'etalumis' project (i.e., simulate[::-1]), which is a deep learning project based on PyTorch and PyProb, he and other team members successfully resolved the I/O bottleneck and scaled the MPI PyTorch to 1024 CPU nodes. The work was accepted in the Best Paper Finalist in SC'19.
Jialin Liu, Yu Zhuang, Yong Chen, "Hierarchical Collective I/O Scheduling for High-Performance Computing", Big Data Research, September 1, 2015,
Jialin Liu, Yong Chen, "Segmented In-Advance Computing for Fast Scientific Discovery", Transactions on Cloud Computing, 2015,
Wahid Bhimji, Debbie Bard, Kaylan Burleigh, Chris Daley, Steve Farrell, Markus Fasel, Brian Friesen, Lisa Gerhardt, Jialin Liu, Peter Nugent, Dave Paul, Jeff Porter, Vakho Tsulaia, "Extreme I/O on HPC for HEP using the Burst Buffer at NERSC", Journal of Physics: Conference Series, December 1, 2017, 898:082015,
Alex Gittens et al, "Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies", 2016 IEEE International Conference on Big Data, July 1, 2017,
Jialin Liu, Quincey Koziol, Houjun Tang, François Tessier, Wahid Bhimji, Brandon Cook, Brian Austin, Suren Byna, Bhupender Thakur, Glenn K. Lockwood, Jack Deslippe, Prabhat, "Understanding the IO Performance Gap Between Cori KNL and Haswell", Proceedings of the 2017 Cray User Group, Redmond, WA, May 10, 2017,
The Cori system at NERSC has two compute partitions with different CPU architectures: a 2,004 node Haswell partition and a 9,688 node KNL partition, which ranked as the 5th most powerful and fastest supercomputer on the November 2016 Top 500 list. The compute partitions share a common storage configuration, and understanding the IO performance gap between them is important, impacting not only to NERSC/LBNL users and other national labs, but also to the relevant hardware vendors and software developers. In this paper, we have analyzed performance of single core and single node IO comprehensively on the Haswell and KNL partitions, and have discovered the major bottlenecks, which include CPU frequencies and memory copy performance. We have also extended our performance tests to multi-node IO and revealed the IO cost difference caused by network latency, buffer size, and communication cost. Overall, we have developed a strong understanding of the IO gap between Haswell and KNL nodes and the lessons learned from this exploration will guide us in designing optimal IO solutions in many-core era.
Jialin Liu, Evan Racah, Quincey Koziol, Richard Shane Canon,
Alex Gittens, Lisa Gerhardt, Suren Byna, Mike F. Ringenburg, Prabhat,
"H5Spark: Bridging the I/O Gap between Spark and Scientific Data Formats on HPC Systems",
Cray User Group,
May 13, 2016,
Mostofa Patwary, Nadathur Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Prabhat, Pradeep Dubey, "PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures", IPDPS 2016, April 5, 2016,
Jialin Liu, Yong Chen, Surendra Byna, "Collective Computing for Scientific Big Data Analysis", 44th International Conference on Parallel Processing Workshops (ICPPW), September 1, 2015,
Glenn K. Lockwood, Damian Hazen, Quincey Koziol, Shane Canon, Katie Antypas, Jan Balewski, Nicholas Balthaser, Wahid Bhimji, James Botts, Jeff Broughton, Tina L. Butler, Gregory F. Butler, Ravi Cheema, Christopher Daley, Tina Declerck, Lisa Gerhardt, Wayne E. Hurlbert, Kristy A. Kallback-
Rose, Stephen Leak, Jason Lee, Rei Lee, Jialin Liu, Kirill Lozinskiy, David Paul, Prabhat, Cory Snavely, Jay Srinivasan, Tavia Stone Gibbins, Nicholas J. Wright,
"Storage 2020: A Vision for the Future of HPC Storage",
October 20, 2017,
- Download File: Storage-2020-A-Vision-for-the-Future-of-HPC-Storage.pdf (pdf: 3.6 MB)
Annette Greiner, Evan Racah, Shane Canon, Jialin Liu, Yunjie Liu, Debbie Bard, Lisa Gerhardt, Rollin Thomas, Shreyas Cholia, Jeff Porter, Wahid Bhimji, Quincey Koziol, Prabhat, "Data-Intensive Supercomputing for Science", Berkeley Institute for Data Science (BIDS) Data Science Faire, May 3, 2016,
Review of current DAS activities for a non-NERSC audience.