May 21, 2019 by Elizabeth Bautista
The Operations Monitoring and Notification Infrastructure (OMNI) is an operational cluster at NERSC that performs the following functions:
- monitors the large systems (HPS and storage), the supporting infrastructure and the building in order to send a notification when a specific threshold occurs
- collects time series data from a variety of sources including the HPC systems at NERSC, other supporting computational infrastructure, environmental sensors, mechanical systems, and more.
OMNI is built using open-source technologies, such as the Elastic Stack, and currently contains over two years of online operational data, totaling 550 billion records (125 TB of data).
- HEPIX 2016 (April) - A Slice of the NERSC Data Collect System - click here for slides
- CUG 2016 (May) - The NERSC Data Collect Environment - click here for the full text
- OMNI overview - slide deck from Elastic Conference 2017 - click here
- Collecting, Monitoring, and Analyzing Facility and Systems Data at the National Energy Research Scientific Computing Center, paper from the 48th International Conference on Parallel Processing: Workshops (ICPP 2019), August 5–8, 2019, Kyoto, Japan. ACM, New York, NY, USA.https://doi.org/10.1145/3339186.3339213 - click here
Here is a graphic of the list of data sources that are stored in OMNI: