Centralized Logging for Grid Troubleshooting
September 13, 2007
Tracking failures across a widely distributed system of resources has proven challenging to many DOE applications. This can be an issue not only for Grid computing but for anyone performing large-scale data transfers to remote machines. A single action such as reliably transferring a directory of files can involve coordinating a wide range of loosely coupled software tools, including security software, delegation services, and file transfer tools. The Open Science Grid (OSG) project, for example, currently experiences a 15% job failure rate.
Downloads
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is the primary high-performance computing facility for scientific research sponsored by the U.S. Department of Energy's Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 4,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. For more information about computing sciences at Berkeley Lab, please visit www.lbl.gov/cs.


