One of the keynotes at Europe's main supercomputer event, the International
Supercomputing Conference in Heidelberg ISC2003,
24 - 27 June, will be delivered by Horst Simon. Dr. Simon is one of the most
renowned HPC experts in the world.
He graduated in Mathematics from the Technical University of Berlin in 1978
and obtained his Ph.D. at the University of California at
Berkley in 1982. Since 1996 he is the director of the U.S. Department of Energys
National Energy Research Scientific Computing
Center (NERSC), which is located at the Lawrence Berkeley National Laboratory
in Berkeley (California).
This interview for Supercomputing Online about HPC Strategies in the US was
conducted by Christoph Pöppe, editor of Spektrum
der Wissenschaft, the German version of Scientific American.
Supercomputing Online: Last year in April, Japan started up the "Earth
Simulator" which is five times faster than ASCI White, the
fastest computer to that date. It was said that the Americans were hit by the
Computenik-Shock comparable to the Sputnik
shock of 1957. Was the Earth Simulator such a surprise for the experts?
Horst Simon: It was not the computer itself. Japanese colleagues had talked
with us about the project already several years before
it was officially announced in April 2002. The big surprise was that the producer
NEC not only delivered and installed the Earth
Simulator on time but that it also delivered results so quickly.
For that to be a surprise one has to know that among the U.S. manufacturers
unfortunately a culture of delays has established
itself: long-announced innovations almost predictably seem to arrive three to
six months after the announced date. Given the US
experience with vendors, it was a surprise that the machine arrived on time,
as contracted with the vendor NEC several years
back.
The second surprise: After visiting Japan in 2000 and 2001 and based on discussions
there, I had expected the Earth Simulator to
reach a sustained performance of about 10 teraflop/s on scientific applications.
However, one of the first results from Japan was
that it succeeded in reaching 27 out of 40 possible teraflop/s running a climate
application. This is outstanding performance, both
in absolute terms as well as in the sustained-to-peak performance ratio, and
it was awarded the Gordon Bell Price last fall at
SC2002.
Supercomputing Online: The Earth Simulator consists of single vector computers,
an architecture that outside of Japan is regarded
to be mostly outdated. Is the Earth Simulator so successful because of or in
spite of its vector architecture?
Simon: Neither . The important architectural feature is that the Earth Simulator
is designed around bandwidth. It provides large
processor-to-memory bandwidth, and with a custom designed crossbar switch, also
a huge bisection bandwidth. In the case of
climate simulations, for example, it is typical that the ratio of floating point
operations to memory references is about one. For this
example, the architecture of the Earth Simulator is very well balanced by providing
the extra memory bandwidth and thus is able to
obtain a high sustained performance.
Supercomputing Online: Is there a single identifiable reason for this success?
Simon: The willingness to spend a lot of money! Bandwidth for memory access
is simply very expensive. If the U.S. government is
also ready to pay $400 million for a supercomputer, then we can do the same.
For example, the design of the Cray X-1 also has
very good bandwidth, but building machines like the X-1 is very expensive. They
are custom designed and targeted to a relatively
small market in scientific applications, so they cannot leverage the engineering
cost over a large customer base, like machines built
for commercial applications.
An important question that we have to ask ourselves now is, Is it worth
the extra expense to pay for bandwidth for scientific
applications?.Obviously, it does not make sense to pay five times the
price for doubling the sustained to peak ratio, but it would
very well worth the investment to pay twice the price for a five times higher
sustained to peak ratio. The sustained to peak
performance ratio is a very poorly determined quantity because it strongly depends
on the kind of application. For example, at
NERSC we have done an analysis of the SX-6 performance and found that the percent
of peak ranged from 2.1 percent for
molecular dynamics to 55 percent for an astrophysics application. We find the
same wide range of performance ratios on the IBM
Power 3 and Power 4, with very different applications at the extreme ends of
the performance scale.
At NERSC we are very much looking forward to having a Cray X-1 available at
ORNL. We are planning to collaborate very closely
with ORNL in the evaluation of this new machine. It will be important to understand
how much sustained performance can be
achieved on this platform.
Supercomputing Online: What happens next in the U.S.?
Simon: Our colleagues from the Lawrence Livermore National Laboratory (LLNL)
have already announced ASCI-Purple. This will be
an IBM computer that is equipped with Power 5 processors. The delivery is planned
for the end of 2004; peak performance will be
100 teraflop/s.
Supercomputing Online: Are there any other new developments in the US?
Simon: Yes, I think we are gradually departing from the strict either-or-attitude
which roughly says: Either a supercomputer
consists entirely out of inexpensive commodity technology, e.g. all types of
cluster architectures; or all parts of the computer are
special purpose and custom designed, e.g. the Cray X1. I find the most interesting
development to that new architectures are
appearing that are geared specifically for scientific simulations and combine
mass-produced and customized components Hopefully
this will produce the best of both worlds, lower cost because of the use of
commodity parts, and higher performance for science,
because of custom integration.
The project Red Storm is such a combination, consisting of approximately
80 percent commodity parts and the decisive 20
percent is specially developed hardware. Sandia National Laboratories in Albuquerque
(New Mexico) have done a careful analysis of
their applications requirements and developed the concept of this architecture
themselves and then gave Cray the contract for the
production. The processors are commodity Opteron processors from AMD. Opteron
enables 64-bit technology on the x86 platform
for a very high processor performance at a mass market price. Red Storm will
feature a custom designed interconnect. What I find
fascinating is that Sandia succeeded in taking concepts from two of the most
successful MPPs in the 1990s, the Cray T3E and the
Intel ASCI Red platform, and evolving them into highly balanced new machine
with 2004 technology.
The project Blue Planet, which we at NERSC are developing in collaboration
with IBM and others could potentially deliver twice
the sustained performance of the Earth Simulator at half the price. IBM had
already decided on how their commercial machines in
2005/2006 should look like, and NERSC in collaboration with Argonne National
Laboratory (Illinois) and the IBM systems developers
have suggested a number of changes which should significantly enhance the performance
for scientific purposes. One of the most
important concepts is the use of ViVA, the virtual vector architecture. What
we have done here is examined a commercially
developed architecture, and suggested improvements that will significantly enhance
the sustained performance of scientific
applications.
Supercomputing Online: Will Blue Planet be installed in your center?
Simon: We are very confident about the viability of our project, but even
at half the cost of the Earth Simulato,r this still means
$200 million or more. So far, this level of investment has never been made for
a civilian basic research center in the U.S. But I think
that the situation is right that a project like this could happen. Congress
seems to be very favorable to science in general and
basic research in particular. The Office of Science at DOE has made a computing
initiative its highest priority. We have also started
a close collaboration with LLNL, which is interested in some of the Blue Planet
architecture features for ASCI Purple. IBM received a
lot of interest in the Blue Planet concept and has given presentations to more
than 40 sites.
Supercomputing Online: So, there wont be an answer to the Earth Simulator from the U.S.?
Simon: Not this year and probably not next year.. The plans for the FY2004
budget do not allow for an appreciable increase in high
performance computing. But the most important fact is that the Earth Simulator
has been a wake-up call for both scientists and
political officials.
More than ten years ago, the HPCC (high performance computing and communications)
initiative was launched and successfully
extended the dominance of the U.S. in supercomputing hardware and software.
But for about the past six years, this field has not
gotten any new boost. The petaflop/s-computing efforts did not yield any results
because there were no new grants that followed.
We have been coasting along, leveraging what we could from the commercial market.
But we have reached the point where
commercial and scientific interests in high performance computing are diverging.
The problem of divergence is now being broadly addressed. The National Academy
of Sciences started a study about the future of
supercomputing. The new High Computing Revitalization Task Force (HECRTF) was
established in March 2003 to coordinate the
strategy of several agencies. This level of interest and excitement has not
been there for the last eight years, and it was sparked
by the Earth Simulator.
Supercomputing Online: In recent years, a lot of the HPC community has focused
on grid computing: Obtaining high effective
computing power by combining computers at different locations and having them
work in parallel on the same problem. Can a grid
satisfy the needs of the users of your center? And what do you think of Thomas
Sterlings pronouncement that his own child, the
Beowulf-Cluster, is dead?
Simon: Thomas Sterling has two children. The Beowulf PC cluster is one of
them. The other is the HTMT (hybrid technology
multithreaded system) and this was the only serious architecture study that
could have reached a performance in the petaflop/s
range before 2008. To achieve such a goal with PC clusters is unrealistic. That
was what Thomas Sterling meant when he
pronounced Beowulf dead in Baltimore last year. In the meantime, he is pursuing
a new project called Cascade collaborating with
Burton Smith at Cray. The first machines of this type are expected for 2009.
This one of DARPA's HPCS projects.
PC cluster and grid computing are often mentioned together. They are, however,
basically different things. A cluster is best suited
for the needs of a single department or research group. At LBNL we have (or
have planned) about a dozen PCclusters, ranging from
32 to 512 processors and are separate from NERSCs high performance-computing
platform. The clusters are very useful tools for
scientists. Some of these cluster users may eventually move their research onto
NERSCs 6,656-processor IBM, and some will
continue to compute on increasingly powerful clusters. I don't see any competition
with Earth Simulator class machines they are
different tools for different purposes.
Grid computing lies sort of orthogonal to high performance computing. The
grid is often compared with the power supply network.
Tony Hey, one of the leading scientists of the UK e-science initiative, gave
me a great quote. He likes to scoff at people who think
that the grid would make high performance computers dispensable. Why do
we need power plants? We get our electricity from the
wall outlets. High performance computers are like the power plants for
the grid. We need the grid and associated middleware to
connect resources such as data archives and unique experimental facilities with
the users. And we need supercomputers to power
the grid.
Supercomputing Online: And the little PC user who puts his computing power into the grid?
Simon: This is not what I understand as grid computing. Connecting a thousand
PCs via the Internet may be useful for a few
applications; but for tightly coupled high performance computing applications
it is not interesting. A thousand times nothing is still
nothing.
---------------------------
Web links:
Homepage of Horst Simon: http://www.nersc.gov/~simon/
Internationale Supercomputer-Conference ISC003 in Heidelberg, 24.-27. June 2003,
with the presentation of the hit list TOP500
containing the 500 fastest supercomputers of the world: http://www.isc2003.org/home.php
The US wed site related to the Earth simulator: http://www.ultrasim.info/
Presentation of the project "Blue Planet": http://www.nersc.gov/news/blueplanet.html
Press release about signing the contracts for "Red Storm": http://www.sandia.gov/LabNews/LN11-01-02/key11-01-02_stories.html
Press release about signing the contracts for "ASCI Purple":
http://www-1.ibm.com/servers/eserver/pseries/news/pressreleases/2002/nov/asci_purple.html
Performance of the SX-6 compared to Power 3 and Power 4 on a suite of applications:
http://www.nersc.gov/~oliker/drafts/ICS_submit.pdf