TOKIO: Total Knowledge of I/O
The Total Knowledge of I/O (TOKIO) project is developing algorithms and a software framework that collects and correlates I/O workload data from production HPC resources at multiple system levels to provide a dramatically clearer view of system behavior, and the causes of behavior, to application scientists, facility operators and computer science researchers in the field. TOKIO is a collaboration between the Lawrence Berkeley and Argonne National Laboratories and is funded by the DOE Office of Science through the Office of Advanced Scientific Computing Research.
The framework combines a multitude of component-level I/O characterization utilities and a scalable collection framework to continuously monitor I/O at various levels including application profiling with Darshan and back-end storage server monitoring using file system-specific tools.
Once collected, data is retained on-disk, and views are created that serve as queryable indices of salient measurements across the different component-level monitoring outputs.
These views are then used by analysis modules that present the correlated data in a meaningful way through standard query interfaces for users and applications.
- Nicholas J. Wright (LBNL) - Lead Principal Investigator
- Philip Carns (ANL) - Institutional Principal Investigator
- Suren Byna (LBNL) - Co-investigator
- Rob Ross (ANL) - External collaborator
- Prabhat (LBNL) - External collaborator
- Glenn K. Lockwood (LBNL)
- Shane Snyder (ANL)
- Wucherl (William) Yoo (LBNL)
Publications and Presentations
- Philip Carns. "Characterizing data-intensive scientific applications with Darshan." CS/NERSC Data Seminar, National Energy Research Scientific Computing Center. June 2017.
- Philip Carns. "Characterizing HPC I/O: from Applications to Systems." ZIH Colloquium at Technische Universität Dresden, Dresden. April 2017.
- Philip Carns. "TOKIO: Using Lightweight Holistic Characterization to Understand, Model, and Improve HPC I/O Performance." SIAM Conference on Computational Science and Engineering, Atlanta GA. March 2017.
- Shane Snyder. "Leveraging Holistic Characterization for Insights into HPC I/O Behavior." 2017 Understanding I/O Performance Behavior (UIOP) Workshop, DKRZ, Hamburg. March 2017.
Cong Xu, Suren Byna, Vishwanath Venkatesan, Robert Sisneros, Omkar Kulkarni, Mohamad Chaarawi, and Kalyana Chadalavada, "LIOProf: Exposing Lustre File System Behavior for I/O Middleware." 2016 Cray User Group, London. May 2016.
- Glenn K. Lockwood, Nicholas J. Wright. "Understanding I/O performance on burst buffers through holistic I/O characterization." MCS Seminar, Argonne National Laboratory. May 2016.
- Glenn K. Lockwood. "Developing a holistic understanding of I/O workloads on future architectures." 2016 SIAM Conference on Parallel Processing for Scientific Computing, Paris. April 2016.
- Julian Kunkel, Philip Carns, Shane Snyder, Huong Luu, Matthieu Dorier, Wolfgang Frings, and Glenn K. Lockwood. "Analyzing Parallel I/O." Birds of a Feather session, International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin. November 2015.
- Wahid Bhimji, Debbie Bard, Melissa Romanus, et al. "Accelerating science with the NERSC burst buffer early user program." 2016 Cray User Group, London. May 2016.