May 21, 2019 by Elizabeth Bautista

The Operations Monitoring and Notification Infrastructure (OMNI) is an operational cluster at NERSC that performs the following functions:

  • monitors the large systems (HPS and storage), the supporting infrastructure and the building in order to send a notification when a specific threshold occurs
  • collects time series data from a variety of sources including the HPC systems at NERSC, other supporting computational infrastructure, environmental sensors, mechanical systems, and more.

OMNI is built using open-source technologies, such as the Elastic Stack, and currently contains over two years of online operational data, totaling 550 billion records (125 TB of data).

  1. HEPIX 2016 (April) - A Slice of the NERSC Data Collect System - click here for slides
  2. CUG 2016 (May) - The NERSC Data Collect Environment - click here for the full text
  3. OMNI overview - slide deck from Elastic Conference 2017 - click here
  4. Collecting, Monitoring, and Analyzing Facility and Systems Data at the National Energy Research Scientific Computing Center, paper from the 48th International Conference on Parallel Processing: Workshops (ICPP 2019), August 5–8, 2019, Kyoto, Japan. ACM, New York, NY, USA. - click here


Here is a graphic of the list of data sources that are stored in OMNI:

data collect final