Jialin is a computer engineer in the Data Analytics Service group. He earned his Ph.D. in computer science from Texas Tech University with the dissertation 'Fast Data Analysis Framework for Scientific Big Data Applications'. He has interest in data management/analytics and parallel I/O.
He works with domain scientists in understanding the science problem and designing the optimal I/O solution for exa-scale computing. He has designed H5Spark for scaling Spark on 52,000 cores at NERSC; He developed Collective I/O in H5py and is used by parallel H5py users. He recently participated the Object Store Evaluation project and developed the first HDF5 VOL plugin for OpenStack Swift, i.e., Sci-Swift, a plugin that allows users to connect any file formats (FITS, ROOT, CSV, etc) to HDF5 with their favorite python language.
Jialin Liu, Yu Zhuang, Yong Chen, "Hierarchical Collective I/O Scheduling for High-Performance Computing", Big Data Research, September 1, 2015,
Jialin Liu, Yong Chen, "Segmented In-Advance Computing for Fast Scientific Discovery", Transactions on Cloud Computing, 2015,
Wahid Bhimji, Debbie Bard, Kaylan Burleigh, Chris Daley, Steve Farrell, Markus Fasel, Brian Friesen, Lisa Gerhardt, Jialin Liu, Peter Nugent, Dave Paul, Jeff Porter, Vakho Tsulaia, "Extreme I/O on HPC for HEP using the Burst Buffer at NERSC", Journal of Physics: Conference Series, December 1, 2017, 898:082015,
Alex Gittens et al, "Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies", 2016 IEEE International Conference on Big Data, July 1, 2017,
Jialin Liu, Quincey Koziol, Houjun Tang, François Tessier, Wahid Bhimji, Brandon Cook, Brian Austin, Suren Byna, Bhupender Thakur, Glenn K. Lockwood, Jack Deslippe, Prabhat, "Understanding the IO Performance Gap Between Cori KNL and Haswell", Proceedings of the 2017 Cray User Group, Redmond, WA, May 10, 2017,
The Cori system at NERSC has two compute partitions with different CPU architectures: a 2,004 node Haswell partition and a 9,688 node KNL partition, which ranked as the 5th most powerful and fastest supercomputer on the November 2016 Top 500 list. The compute partitions share a common storage configuration, and understanding the IO performance gap between them is important, impacting not only to NERSC/LBNL users and other national labs, but also to the relevant hardware vendors and software developers. In this paper, we have analyzed performance of single core and single node IO comprehensively on the Haswell and KNL partitions, and have discovered the major bottlenecks, which include CPU frequencies and memory copy performance. We have also extended our performance tests to multi-node IO and revealed the IO cost difference caused by network latency, buffer size, and communication cost. Overall, we have developed a strong understanding of the IO gap between Haswell and KNL nodes and the lessons learned from this exploration will guide us in designing optimal IO solutions in many-core era.
Jialin Liu, Evan Racah, Quincey Koziol, Richard Shane Canon,
Alex Gittens, Lisa Gerhardt, Suren Byna, Mike F. Ringenburg, Prabhat,
"H5Spark: Bridging the I/O Gap between Spark and Scientific Data Formats on HPC Systems",
Cray User Group,
May 13, 2016,
Mostofa Patwary, Nadathur Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Prabhat, Pradeep Dubey, "PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures", IPDPS 2016, April 5, 2016,
Jialin Liu, Yong Chen, Surendra Byna, "Collective Computing for Scientific Big Data Analysis", 44th International Conference on Parallel Processing Workshops (ICPPW), September 1, 2015,
Glenn K. Lockwood, Damian Hazen, Quincey Koziol, Shane Canon, Katie Antypas, Jan Balewski, Nicholas Balthaser, Wahid Bhimji, James Botts, Jeff Broughton, Tina L. Butler, Gregory F. Butler, Ravi Cheema, Christopher Daley, Tina Declerck, Lisa Gerhardt, Wayne E. Hurlbert, Kristy A. Kallback-
Rose, Stephen Leak, Jason Lee, Rei Lee, Jialin Liu, Kirill Lozinskiy, David Paul, Prabhat, Cory Snavely, Jay Srinivasan, Tavia Stone Gibbins, Nicholas J. Wright,
"Storage 2020: A Vision for the Future of HPC Storage",
October 20, 2017,
- Download File: Storage-2020-A-Vision-for-the-Future-of-HPC-Storage.pdf (pdf: 3.6 MB)
Annette Greiner, Evan Racah, Shane Canon, Jialin Liu, Yunjie Liu, Debbie Bard, Lisa Gerhardt, Rollin Thomas, Shreyas Cholia, Jeff Porter, Wahid Bhimji, Quincey Koziol, Prabhat, "Data-Intensive Supercomputing for Science", Berkeley Institute for Data Science (BIDS) Data Science Faire, May 3, 2016,
Review of current DAS activities for a non-NERSC audience.