NERSCPowering Scientific Discovery for 50 Years

HDF5 Workshop, Aug 31, 2022

August 31, 2022

Introduction

This workshop, presented by ALCF and open to NERSC users, is geared towards achieving HDF5 Performance on the ALCF Polaris system, with a similar architecture to NERSC Perlmutter.

Date and Time

August 31, 2022, 8:00 am-1:00 pm PDT

Abstract

HDF5 is a data model, file format, and I/O library that has become a de-facto standard for HPC applications to achieve scalable I/O and the storage and management of big data from computer modeling. This workshop will be geared toward the ALCF Polaris system. It will give an overview of its parallel file system and its possible effects on HDF5 performance and will provide a summary of tools useful for performance investigations. It will use examples from well-known codes and use cases from HPC science applications in hands-on demonstrations. It will discuss HDF5 tuning techniques such as collective metadata I/O, data aggregation, parallel compression, and other HDF5 tuning parameters and features. Finally, the workshop will allow for a review of attendees' HDF5 I/O implementations targeting the Polaris system.

The workshop will target intermediate and advanced users of HDF5 since it will cover the best practices, supporting analysis tools, and tuning techniques for users already using parallel HDF5. New users to HDF5 are encouraged to go over the slides and video from the precursor talk.

Agenda

1. Intro to HDF5

  • 1.1. What is HDF5
  • 1.2. Abstract Data Model
  • 1.2.1.Files, Groups, Links, Datasets, Attributes, Dataspace
  • 1.3. Software
  • 1.3.1.Tools
  • 1.3.2.API schema, Language bindings
  • 1.3.3.HDF5 Library
  • 1.4. HDF5 Binary File Format
  • 1.4.1.Bit-level organization of HDF5 file
  • 1.4.2.Defined by HDF5 File Format Specification

2. Overview of Parallel HDF5 design

  • 2.1. Design requirements
  • 2.2. PHDF5 implementation layers, MPI IO relationship

3. Parallel Consistency Semantics
4. PHDF5 Programming Model

  • 4.1. Selection, Collective, Independent, etc.
  • 4.2. Examples

5. Performance Analysis

  • 5.1. General Parallel Performance
  • 5.2. Effects of Storage layout (contiguous, chunked, compact)
  • 5.3. Effects of metadata cache

6. Parallel Performance Analysis Tools

  • 6.1. Darshan, Tau, etc.

7. Cutting-edge HDF5 features

  • 7.1. Multi-dataset
  • 7.2. VOLs
  • 7.2.1.DAOS
  • 7.2.2.Log-based
  • 7.2.3.ASYNC
  • 7.3. VFD
  • 7.3.1.Core
  • 7.3.2.Subfiling
  • 7.3.3.Split

8. System Specific Considerations
9. Hands-on and/or walkthroughs

  • 9.1. Switching between independent and collective, mixing data layouts, etc.
  • 9.2. Effects of using a VOL
  • 9.3. Effects of using a VFD

Registration

For more information, the agenda, and to register, please see the ALCF training event page.

Presentation Materials