NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

GUPFS Overview


The goal of the Global Unified Parallel File System (GUPFS) Project at NERSC is to provide a scalable, high-performance, high-bandwidth, shared-disk file system for use by all of NERSC's high performance production computational systems. GUPFS will provide unified file namespace for these systems and be integrated with the High Performance Storage System (HPSS). An additional goal is to distribute GUPFS-based file systems to remote facilities as local file systems over the DOE Science Grid.

The typical state of many high performance computational environments is one in which each large computational and support system has its own large, independent disk store, with additional Network Attached Storage (NAS), such as NFS or DFS, and an archival storage server such as HPSS. This leads to wasteful replication of customer files on multiple systems and an increased, nonproductive workload on customers to move and manage these files. This, in turn, creates a burden on the infrastructure to support file transfers between the systems as well as to the storage server. In addition, the existing environment prevents the consolidation of storage between systems, thus limiting the amount of working storage available to each system to its local disk capacity.

The environment envisioned by the GUPFS project is one in which the large high performance computational systems and support systems can access a consolidated disk store. This consolidated storage will be directly accessed through shared-disk file systems with a unified file namespace. A storage server, accessing the consolidated storage using the shared-disk file systems, would provide Hierarchical Storage management (HSM), backup, and archival services. Additionally, it is envisioned that the GUPFS project will distribute this environment over the DOE Science Grid to geographically remote facilities as a native file system. This environment will eliminate unnecessary data replication, simplify the customer environment, provide better distribution of storage resources, and permit the management of storage as a separate entity while minimizing impacts on the computational systems.

The major enabling components of this envisioned environment are a high performance shared-disk file system and a cost-effective, high performance Storage Area Network (SAN). These emerging technologies, while evolving rapidly, are not targeted towards the needs of high performance scientific computing. The GUPFS project intends to encourage the development of these technologies to support HPC needs through collaborations with other institutions and vendors, while also aiding in their development.

It is anticipated that the GUPFS project will span five years. During the first three years of the project NERSC plans to test, evaluate, and develop shared-disk file systems, SAN technology, and other components of the GUPFS environment. This investigation is expected to include open source and commercial shared-disk file systems, new SAN fabric technologies as they become available, SAN and file system distribution over the WAN, HPSS integration, and file system performance and scaling. During this time NERSC also plans to form collaborations and become active in the shared-disk file system community. At the end of this period NERSC will assess the feasibility of moving forward with a full implementation in the NERSC production environment.

Provided the assessment is favorable, the last two years of the GUPFS project, will emphasize implementation. The first step in implementation is to chose a file system based upon the previous evaluation and testing. The next step is building up the SAN infrastructure, while simultaneously starting the full development efforts required for grid distribution and HSM integration. Subsequently, there will be a phased roll-out of the GUPFS file system on the various production systems, including HPSS. Once adequate SAN infrastructure, production systems, and required software are in place, DOE Science Grid distribution of the shared file systems over the WAN will be initiated.


LBNL Home
Page last modified: Wed, 19 May 2004 19:42:25 GMT
Page URL: http://www.nersc.gov/projects/GUPFS/overview/
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science