NERSCPowering Scientific Discovery for 50 Years

NERSC’s Nick Cardo Helps IBM Refine System Software Testing

June 1, 2005

Nick Cardo, NERSC’s IBM SP project lead, was invited earlier this year to give a customer perspective to  staff at IBM’s test lab in Poughkeepsie, NY. Cardo spent two days at the facility, demonstrating how he runs various systems tests regularly on Seaborg, NERSC’s IBM supercomputer.

For two days, Cardo worked side by side with IBM staff on their test SP, showing them how he runs tests on a daily basis. The result was that the IBM staff were able to see what a user encounters.

“By sitting down with the testers at their internal test machine, I was able to give them a customer’s perspective of a production environment, running the checkouts I would normally run during the course of the day,” Cardo said. “This effort, which was unique, is a reflection of the working partnership we have developed with IBM over the years.”

Curtis Vinson, Cardo’s contact at IBM, summarized the results of the testing at the SP-XXL meeting held recently in Edinburgh, Scotland. The SP-XXL user group focuses on large-scale scientific and technical computing on IBM hardware.

For his part, Cardo produced a seven-page report describing some of the problems he encountered during the March 29–30 testing stint and outlining ways to fix them.

The overall objective, Cardo said, was to help IBM find ways to prevent “field escapes,” the term for software bugs that make it out of the testing lab and into the user community.

“Our concern is that sometimes when we do system updates, we hit problems that should have been caught in the test lab,” Cardo said. “By showing IBM how we use the system, we were able to help them refine their testing procedures and take steps to eliminate the bugs before they become field escapes.”

As part of his responsibilities at NERSC, Cardo runs certain tests twice a day on Seaborg. This helps the Computational Systems Group find and fix problems quickly, before they become major hindrances to running users’ jobs.

What Cardo and the IBM testers realized is that while each software component may have been well tested at the lab individually, the components were not always tested together for overall compatibility.

“The benefit of all this is that the software upgrades produced by IBM will be more stable right out of the box,” Cardo said. “Users of all IBM systems will benefit from this work.”

About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.