NERSCPowering Scientific Discovery for 50 Years

A Year in the Life of a Parallel File System

The TOKIO team presented a paper titled "A Year in the Life of a Parallel File System" at the 2018 International Conference for High Performance Computing, Networking, and Storage (SC'18) that demonstrate new techniques for classifying the sources of performance variation over time.  A year-long dataset documenting I/O performance variation on file systems at NERSC and the Argonne Leadership Computing Facility (ALCF) was then analyzed with these techniques to demonstrate their efficacy and quantify the sources of performance variation observed in production.

This paper was published with all of the code and data required to reproduce all of the analysis demonstrated in the paper.  Specifically, the following code packages have been published and can be freely downloaded:

  • pytokio v0.10.0 (doi: 10.5281/zenodo.1345790), the version of the pytokio Python library used to generate the 11,986 feature vectors on which the study was based
  • tokio-abcutils v1.1.0 (doi: 10.5281/zenodo.1345786) (TOKIO Automated Benchmark Collection Utilities), an additional Python package that implements the statistical analyses used in the paper.  This source distribution also includes all of the Jupyter Notebooks required to recreate the figures contained in the paper and the TOKIO Year-long Dataset feature vector dataset (see below). 
  • TOKIO-ABC v1.0.0 (doi: 10.5281/zenodo.1345784) (TOKIO Automated Benchmark Collection), a metapackage of benchmarks, build scripts, and job submission scripts used to launch the benchmark jobs that resulted in the 11,986 feature vectors analyzed in the study
  • TOKIO Year-long Dataset (doi: 10.5281/zenodo.1345780), a dataset that contains
    • the 11,986 feature vectors in CSV format
    • unmodified Darshan logs corresponding to all benchmark jobs that ran to constitute the 11,986 feature vectors

The aforementioned source code and DOI information is provided for strict reproducibility purposes; if you wish to use these software and data in your own work, we highly recommend that you download the latest version of each package.