Research Objective The purpose of this project is to develop techniques and tools that will enable efficient access to the massive datasets of modern High-Energy and Nuclear Physics (HENP) experiments. The prototype development is being carried out in conjunction with the computing facility and experiments at the Relativistic Heavy ion Collider (RHIC) beginning in late 1999. Access to these 100+TByte datasets by hundreds of scientists, in conjunction with carrying out the large-scale computations necessary to refine and reduce them to the essential physical properties buried within, is one of the forefront problems of high-performance computing today. By capitalizing on recent advances in storage systems (HPSS), object database technology (Objectivity Inc.), high-performance scientific computing (at NERSC) and expertise from several institutions across the U.S. we are able to address this important problem affecting fundamental science.
Computational Approach The principle approach to this problem is to cast it in the form of an extremely large hierarchical collection of objects (some persistent and some transient) that experience has shown is very well matched to this event-based experimental data. Modern object-database technology allows us to address the issue of physical storage layout separately from the logical relationships between the objects. The primary issue for physical storage is to organize the data so that it is stored according to how it is accessed rather than by how it is generated.
Coupling large robotic tape storage systems to object databases is a very recent development and there are a number of issues that need to be addressed to enable applying the required amount of high-performance computing cycles to these massive object databases. Addressing these issues is a principle area of research and development this project that is enabled by use of the newly installed HPSS system at NERSC.
Accomplishments A software architecture for access to these massive datasets has been developed and is currently in prototype development. It consists of storage manager components which provide a central coordination service and application code client side components that interface with the storage manager. The storage manager components consist of an index, query estimation, query execution and cache management of the staging disk. The client side components consist of a query interface and an order optimized iterator. Collectively these components permit querying a massive dataset consisting of many files resident on tape, parallel processing of data a file (or set of files) at a time as it is staged to disk, by many users simultaneously.
In order to produce a simulated dataset the NERSC T3E is used to calculate theoretical predictions of what may occur in the relativistic nuclear matter collisions at RHIC. An example of one such study is described below.
Anisotropy in Relativistic Heavy-Ion Collisions We have learned from current experiments studying the collision of nuclei at high energy that there is a large amount of re-scattering of produced secondaries. The Relativistic Quantum Molecular Dynamics (RQMD) cascade model shows that re-scattering reduces the pion multiplicity by 8% at CERN SPS energy. An interesting consequence of this is that a non-spherical, high energy density domain, which can be formed at early stage of a heavy ion collision due to fluctuations in energy deposition, may produce an anisotropy in pion emission at freeze-out.
To investigate this possibility, we used RQMD model to simulate head-on Pb+Pb collisions at SPS energy (158 GeV/nucleon). The resulting freeze-out pion distribution is azimuthally asymmetric (c.f., figure 4; there are fewer pions emitted along the major axis of the ellipse, in agreement with the picture that pion yield is reduced by re-scattering.
The RQMD model, or any typical heavy ion collision model, is cpu
intensive. Running an RQMD central Pb+Pb event at SPS energy takes
about 15 minutes on the NERSC CRAY T3E-900 (peak 0.9 GFlops/cpu, or 70
SPECint95); longer for special events as studied in the present work.
For a typical study of special events, one needs on the order of 10K
events (or
100 days of processing). Therefore many parallel
processors are needed for the calculation to have a reasonable
turn-around. Moreover, the current study deals with only one
configuration, i.e., a single high density domain with fixed size and
shape at the center in every event. It remains a future task to study
the location, size and shape dependences of the
anisotropy.
Significance
The direct significance of this work is to enhance the capabilities of
large-scale experimental high-energy and nuclear physics to achieve
their full scientific potential in exploring new regions of the
physical world.