[an error occurred while processing this directive]
NERSC 3 Greenbook
Next: ERSUG Action Items
Up: Specific Recommendations
Previous: Encourage science of scale
For some classes of scientific endeavors in Energy Research which are
very data intensive, the capability to select both large and small
samples from very large data sets is as equally critical to the
science as is applying compute cycles to those samples once they are
selected. This will involve the development both of software and
necessary networking hardware for this capability to be effective.
Selecting and staging files from shell scripts via ftp is simply
inadequate for large complex data sets. A true object database
capability is required for the scientist to be able to select the
appropriate data sets or ``data objects'' which have meaning for the
scientific study. Two issues involved in this capability are
optimizing data layout so that selecting small samples is efficient as
well as providing adequate network bandwidth so that large samples are
retrieved in a timely way.
Rick A Kendall
7/13/1998