Remembering NERSC's Roots
NERSC veteran Kirby Fong shares some fond and fascinating memories from the center's early days
April 14, 2014
By Kirby Fong
I was not an original member of the Controlled Thermonuclear Research Computer Center, as NERSC was initially named. I arrived in June 1976. My job was to provide mathematical library software to users and to support them in finding and using it. In those days scientific programming was in Fortran, and scalar computing was the norm since vector and parallel processors were not yet common. Of course I didn’t write the software; I acquired commercial libraries like IMSL, NAG and PORT. In reserve I also had Fortran libraries from Los Alamos, Sandia, Chalk River Nuclear Laboratory, CERN and NBS. LLL (now LLNL) was also a later participant in the Sandia, Los Alamos, Air Force Weapons Lab Technical Exchange Committee (SLATEC), a committee of computer center directors from the major DOE and DOD laboratories that started in New Mexico.
One of the SLATEC subcommittees on which I served was the math library subcommittee. We collected a lot of public domain software, wrote machine-readable documentation, wrote minimal test codes and packaged it into a complete library. This was one of the most frequently distributed pieces of software from DOE’s National Energy Software Center at Argonne. It’s a rare example of a committee producing something that was useful as well as an example of computer centers allowing people like me to work on something for the common good rather than the immediate needs of the center. With the current emphasis on massively parallel computing, this type of software is probably no longer of much interest to NERSC users. I don’t know if the SLATEC Common Math Library is used much anymore.
One of my next jobs at CTRCC (or its next name, National Magnetic Fusion Energy Computer Center) was to organize and lead the consulting group. These were the people who answered questions from users who were having problems getting their codes to work. One of my tasks was explaining to management that six was the smallest stable number of consultants able to provide consistent, quality service to our customers, and that stability enabled the consulting group to continue providing excellent service to users under the group’s subsequent leaders.
One of the ways we kept our users informed was through The Buffer, our monthly newsletter. If you’ve seen the earliest issues, you’ve seen the logo created by LLNL's Technical Information Department of the arcs radiating from Lake Tahoe. Here’s a picture of a 2-inch self-adhesive sticker using that logo:
These were the stickers we put on our output tape reels to identify them from Livermore Computer Center tapes. I thought that logo was inaccurate, so I redrew it using only lines and filled polygons. We didn’t have drawing applications like Illustrator in those days. Here is a plot of the logo on the resin-coated paper for the FR80 hard copy camera:
The FR80 was a cathode ray tube plotting device, and the hard copy camera was a giant camera loaded with 12-inch wide resin-coated paper. You can see the sprocket holes at the edges of the paper. Of course this was in the days before JPEG files and layout applications like InDesign, so we could use the high quality logo only in two applications that understood the line and polygon descriptions and could plot them on output devices.
The First Computers
NMFECC was very clearly providing a necessary supercomputer service in its earliest years with a borrowed Control Data 6600 and then its own CDC7600 computer. The need for additional computing capacity was obvious. There were proposals to move an underutilized 7600 from LBL, EG&G Idaho or Brookhaven to Livermore, but all three labs explained why they needed to keep their 7600s. The short-term solution was the famous MFE blocktime at Berkeley Lab, where the Livermore Time Sharing System and batch jobs were run at night with output brought back to Livermore in the morning. The long-term solution was to buy a Cray-1computer for NMFECC, a stunning development considering the very first Cray-1was only recently delivered (in April 1976) to the Los Alamos Scientific Laboratory (now LANL), and the classified Livermore Computer Center was busy integrating its Control Data STAR-100 computers. If I remember the date correctly, our machine came in 1978.
Los Alamos was writing their DEMOS operating system for the Cray-1while Cray itself offered a multi-programming batch operating system similar to SCOPE 2.1. What should NMFECC do? It was a daring decision, probably Hans Bruijnes’ decision, to port LTSS, rechristened the Cray Time Sharing System, onto the Cray-1 in time for its arrival. It was finished just in time, and we should give credit to the hard working programmers who made it happen. Dave Storch, Bruce Griffing and Dave Fisher worked on the operating system. Larry Berdahl produced the LRLTRAN crosscompiler and later ported the compiler to the Cray-1. Rick Johnson wrote a crossloader subsequently ported to the Cray-1. Tim Lundeen wrote the FORTLIB Fortran I/O library and BASELIB system library. Clement Luk wrote the batchjob system. These people spent nights and weekends to produce a working system and programming environment for the Cray-1 that was similar enough to the 7600 environment that codes could be migrated easily to the new machine. Of all the early recipients of Cray-1computers, NMFECC was the fastest at migrating existing codes onto its machine.
There’s probably little documentation left about CTSS, but if you want an overview of it, you can read the article I wrote titled "The NMFECC Cray Time-Sharing System in Software Practice and Experience," Volume 15, Number 1, pages 87-103 (1985).
Expanding at LLNL
CTRCC started with a shared machine room in the classified area of LLNL and offices in one of the World War II buildings. As a major computer center in its own right, it deserved its own facility in an unclassified area of LLNL where users could visit. Its new home was Building 45, near the center of the laboratory site where there were not yet many buildings. The building displaced many field mice that insisted on invading the new building until the mousetraps finally persuaded them to stay away.
The most notable event in the construction of the new computer center was an earthquake. The contractor had fallen behind schedule so we were able to move the computers but not the staff initially, and we had therefore not taken beneficial occupancy of the building. When the construction was finally complete, the laboratory and contractor conducted the final walkthrough during which an earthquake set the ceiling tiles in the machine room swinging and crashing into the walls. The quake pulled apart some of the joints where sheetrock panels were taped. The laboratory told the contractor it could not take beneficial occupancy of the building in this condition, so the contractor was responsible for repairing the earthquake damage. No equipment or people were harmed in the quake, but the laboratory had been within minutes of becoming responsible for the repairs. By the way, the laboratory learned to tie down suspended ceilings to keep them from swinging during earthquakes.
Scientists in other fields cited the success of the NMFECC as evidence that the National Science Foundation could and should support supercomputer centers for other disciplines. While the Department of Energy was proud of its supercomputing facilities, it wanted a share of some of the publicity about supercomputing that NSF was now drawing but without spending a whole lot more money. The result was the first Supercomputing Honors program, which funded the travel and lodging of one high school student from each state to spend two weeks at NMFECC. The program was regarded as so successful that it continued for several more years until tight budgets forced an end. Many of the computer center’s staff pitched in to help the program, but much of the credit goes to Sue Wiebe for organizing the travel, housing and activities. One example of the activities was a talk by Edward Teller to the students. Similar programs in other disciplines at other DOE laboratories exemplified DOE’s educational outreach during that period.
I’ve forgotten the dates when we got a second Cray-1 and various multi-processor models of Crays, but it was the arrival of the first multiprocessor Cray that spurred work on enabling users to make full use of multiple processors. This meant all software libraries had to be made re-entrant or be made safe by enforcing single threading. This caused a big realignment of responsibilities where math libraries, graphics libraries, system libraries and, for good measure, third-party applications were put into a single group for maintenance. I was privileged to lead that group through all the effort to make our software run correctly in multi-threaded applications.
A Team to Remember
Particular credit should go to Tim Pierce who, with a little help from me, designed and implemented software and hardware interrupt procedures in the system library. This is not a simple feature to implement, and when we described it at a Cray User Group meeting, one of the Cray programmers complimented us, saying it was a complex issue they had so far hesitated to tackle in their UNICOS operating system. In another area, Bruce Curtis and I discovered that gang scheduling was not necessary to make parallel codes run efficiently; sophisticated thread switching enables a CPU blocked on one task to switch to another. It required only one simple, new system call that ouroperating system programmers were able to provide. That made it possible for multiprocessing codes to run even if the operating system didn’t allocate all the processors to it. After that, the glory days of CTSS started coming to an end as DOE pressured its major labs to use vendor operating systems rather than pursue innovation.
Another person we shouldn’t forget is John Fitzgerald, the NERSC program manager. A program manager is the person who handles budgets and accounting. NERSC held short, daily meetings of the entire department primarily for the benefit of the programmers and operators. John would attend the meetings to keep informed of what we were doing. He played a dual role as program manager and father confessor to whom people could turn with their concerns. Less obvious to the programmers was John’s brilliance as a program manager. He repeatedly ended each fiscal year close to budget. Other departments with trouble managing their expenses asked the finance department for advice and were told to ask John how he did it. Another little remembered fact was that John pioneered techniques in buying supercomputers. Taking advantage of the fact that interest paid by the lab was tax exempt, he was able to borrow money at low interest rates to buy the machines and then pay back the loans over several years as a lease to ownership using operating rather than capital money. So now you know who figured out how to buy supercomputers without requiring a big lump of capital money.
I feel fortunate to have been involved in supercomputing during the time it captured the public imagination, and I also feel fortunate to have been a participant in creating the services and capabilities of that era. Supercomputing is just as important now as it was then and has evolved to emphasize massive parallelism. This glimpse into NERSC’s past won’t provide any guide to the future, but I hope it gives you a sense of how it survived its first few years, when it was most vulnerable, to thrive and become the powerhouse that it is today.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 6,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. »Learn more about computing sciences at Berkeley Lab.