Larry Pezzaglia is an HPC Systems Analyst in the NERSC Computational Systems Group. In addition to managing NERSC's production computational systems, he develops software to facilitate systems administration at scale. Larry is the founding developer of avs_image_mgr and minimond, and a contributing developer to CHOS.
Larry is the System Integration Lead for Cori, NERSC's next flagship supercomputer.
Larry Pezzaglia, "Supporting Multiple Workloads, Batch Systems, and Computing Environments on a Single Linux Cluster", Cray User Group 2013, May 9, 2013,
Larry Pezzaglia, Cluster Consolidation at NERSC, A talk at the HEPiX Spring 2014 Workshop, Annecy-le-Vieux, France, May 22, 2014,
- Download File: larry-pezzaglia-cluster-consolidation-at-nersc.pdf (pdf: 1.7 MB)
In 2012, NERSC began deployment of "Mendel", a 500+ node, Infiniband-attached, Linux "meta-cluster" which transparently expands NERSC production clusters and services in a scalable and maintainable fashion. The success of the software automation infrastructure behind the Mendel multi-clustering model encouraged investigation into even more aggressive consolidation efforts.
This talk will detail one such effort: under the constraints of a 24x7, disruption-sensitive environment, NERSC staff merged a 400-node legacy production cluster, consisting of multiple hardware generations and ad-hoc software configurations, into Mendel's automation infrastructure. By leveraging the hierarchical management features of the xCAT software package in combination with other open-source and in-house tools, such as Cfengine and CHOS, NERSC abstracted the unique characteristics of both clusters away below a unified management interface. Consequently, both cluster components are now managed as a single, albeit complex, integrated system.
Additionally, this talk will provide an update on the PDSF system at NERSC, including improvements to trending data collection and ongoing CHOS development.
Larry Pezzaglia, CHOS in Production: Supporting Multiple Linux Environments on PDSF at NERSC, A talk at the HEPiX Spring 2012 Workshop, Prague, Czech Republic, April 25, 2012,
- Download File: chos.pdf (pdf: 796 KB)
The CHOS software package combines a Linux kernel module, a PAM module, and batch system integration to provide a mechanism for concurrently supporting multiple Linux environments on a single Linux system. This presentation gives an introduction to CHOS and details how NERSC has deployed this utility on the PDSF HPC system to meet the complex, and often conflicting, software environment requirements of multiple applications. The CHOS utility has been in continuous use on PDSF for over 8 years, and has proven to be a robust and simple approach to ensure optimal software environments for HENP workloads.
Eric Hjort, Larry Pezzaglia, Iwona Sakrejda, PDSF at NERSC: Site Report, A talk at the HEPiX Spring 2012 Workshop, Prague, Czech Republic, April 24, 2012,
- Download File: pdsfsitereport.pdf (pdf: 1.6 MB)
PDSF is a commodity Linux cluster at NERSC which has been in continuous operation since 1996. This talk will provide a status update on the PDSF system and summarize recent changes at NERSC. Highlighted PDSF changes include the conversion to xCAT-managed netboot node images, the ongoing deployment of Scientific Linux 6, and the introduction of XRootD for STAR.