NERSCPowering Scientific Discovery Since 1974

ERSUG Meeting July 11 - 12, 1994 Minutes

Summary of ERSUG Meeting

July 11 - 12, 1994, Rockville, Maryland

The latest Energy Research Supercomputer Users Group (ERSUG) meeting was held in Rockville, Maryland, on July 11 - 12. The focus of the meeting was on realized and projected improvements in the NERSC computing environment. Minutes of the ERSUG meeting are published by Brian Hingerty.


The View from Washington (Tom Kitchens)

It's Washington, so let's talk about science support, policies, and organization. Most Department of Energy (DOE) budgets are expected to be down 5 to 10% next year; a flat budget is suddenly a fat budget for FY95 and FY96. The signals have changed: High Performance Computing (HPC) and Grand Challenges are not out, but the Administration is more interested in the National Information Infrastructure (NII) and National Challenges. High Performance Computing, Communications, and Information Technology (HPCCIT) is now a subcommittee of the Committee on Information and Communications (CIC), which is also responsible for NII. It is going to be much harder to demonstrate that HPC needs even flat support.

The Administration is interested in inter- and intra-agency collaborations. HPCCIT and the Global Climate Initiative are being used as pilot "virtual agencies," where components of agencies, guided by the President's Science Advisor's Office (Office of Science and Technology Policy), attack a common problem. The message is: working with people from other agencies--or other parts of your own agency--is good.

Such a collaboration aids the Domestic Natural Gas and Oil Industry through a program called the Advanced Computational Technology Initiative (ACTI), which is managed and supported by the Office of Fossil Energy, Defense Programs, and Energy Research. The program is based on DOE Laboratory collaboration with the domestic gas and oil companies; its budget is set between $23 and $40 million. The lesson is that if you don't initiate the crosscutting collaborations, someone else will, and you probably won't like the way they do it!

The ERSUG requirements document (the "green book") is three years old and must be updated soon. If we want the report to be read, we need to support the ERSUG requirements with persuasive statements on the societal impact of the work, descriptions of valuable accomplishments, and expected milestones. I hope the EXERSUG (the executive branch of ERSUG) appoints a strong group for this task. ERSUG needs to be stronger to maintain even a constant budget for computational resources. ERSUG needs to publicize more of its members' work and to interact with scientists and technicians inside and outside the DOE. Having this meeting videotaped for the Mbone (Multicast-backbone) and put next to the Energy Research Power Users Symposium (ERPUS) and the Office of Program Analysis Peer Review of Energy Research's Computational Science Projects was to make it easier for more users to attend and to increase visibility.

Some streamlining is being proposed within the DOE and Energy Research (ER). The Director of Energy Research, Martha Krebs, is proposing two new divisions in ER, one to include the Office of Scientific Computing and ER's Technology Transfer Office, Small Business Innovative Research, and Basic Energy Science/Advanced Energy Projects program office. (This has now happened.)

Again, I want to stress the importance of updating and strengthening the ERSUG computational requirements document.


Production System Plans (Moe Jette)

A reconfiguration of CRAY disks was completed in May. The CRAY/A now has a 53-gigabyte /usr/tmp file system available with a 12-hour purge time. This additional storage has reduced the load on the /tmp/wk# file systems substantially. The additional disk space has allowed the disk limit for all high-priority queues to be increased to 6 gigabytes. Other storage is available in NERSC's Andrew File System (AFS) server, which is being expanded from 30 gigabytes to 95 gigabytes, providing support to all of NERSC's clients.

The Centralized User Bank (CUB) now has a complete X-window interface available, Common File System (CFS) quotas by user, reserve controls for Supercomputer Access Committee (SAC) members, and monthly accounting report generation. Soon CUB will support the Supercomputing Auxiliary Service (SAS) computers, report recent database modifications, permit the change of login names and passwords throughout NERSC, and support single-use passwords. Numerous other CUB enhancements are planned.

CFS tape drives have been updated to 36-track technology, doubling the capacity of each cartridge as it is rewritten. We have drastically increased the portion of data stored in automated libraries, from 50 to 67% of the total. Automated libraries can mount cartridges in a matter of seconds, while in the past it sometimes took hours to fetch cartridges from distant buildings (our "shelf" operation). [UPDATE: In September, all CFS data was available on disk or automated cartridge libraries.]

ESnet Plans (Jim Leighton)

Energy Sciences Network (ESnet) is a backbone network a network to connect other networks. Upgrades and services include the following:


  • Traffic carried by ESnet is now over 1.6 terabytes/month. The recent exponential growth is most likely attributable to World Wide Web and Mbone packet-video traffic.


  • ESnet recently installed four T3 (45 megabytes per second) dedicated circuits, linking Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Fermi National Accelerator Laboratory, Princeton Plasma Physics Laboratory, and Oak Ridge National Laboratory. These are installed on an interim basis until the fast-packet services can be installed.


  • ESnet has international circuits to Japan, Germany, Italy, and Brazil. Other international links are established via interconnections with federal agency networks.


  • Additional interconnectivity includes connections to other federal agencies, to most of the National Science Foundation regional networks, and to selected commercial networks.


  • ESnet is supporting meeting-room level video conferencing. A typical meeting room can be equipped for approximately $75K. ESnet provides an online reservation system for conferences as well as a Multi-way Conferencing Unit (MCU), which allows conferences with more than two participants. Communications use dial-up ISDN lines at speeds of 128 to 384 Kbps and higher. More than 1000 conferences have been scheduled.


  • ESnet has established a Multicast-backbone to support packet-based videoconferencing. Multicast allows traffic intended for multiple destinations to be sent as a single traffic stream over the network, reducing overall traffic load. The ESnet Mbone currently uses workstations as routers until this capability is developed on commercial routers.


  • A competitive procurement is under way for high-speed communications based ATM (asynchronous transfer mode) services. Initial access speed will be T3 (45 Mbps), with upgrades to OC3 (155Mbps) and OC12 (622 Mbps) planned for 1995 and 1997, respectively. A number of objections delayed this project for nearly two years.


X Window System: How to Get from Here to There (Barry Howard)

X Window System (X) is a network-based graphical windowing system developed at MIT in 1984 and now accepted as an industry standard. Using the client-server model, a user can display output from multiple clients on multiple machines in separate windows of the user's computer screen. X provides the bare bones of a window system upon which any style of graphical user interface (GUI) can be built. The look and feel of the GUI is provided by the window manager, for example OSF/Motif.

Our task at NERSC is to provide production supercomputing services based on high-performance hardware and software. Many new commercial and public domain software releases come with only a graphical interface. In addition, NERSC is in the process of building a Unified Production Environment that will offer a single GUI to most NERSC services. Here are some facts:


  • The Unified Production Environment depends on NERSC customers having X capability on their desktops.


  • All supported graphics packages and utilities on CRAYs are capable of sending output directly to an X window.


  • All visualization systems except PV-Wave on SAS rely solely on X.


  • Massively parallel processing (MPP) tools will all require X capability.


  • About 40% of the principal investigators who ran the ERDP program this year used the character mode interface instead of the X interface!

Part of the problem is that we don't have an accurate number of how many NERSC customers are without an X-capable desktop, but results from the ERDP logs and informal surveys indicate the number is significant.

NERSC has developed a transitional roadmap for customers without access to X. To guide you, we are providing the following:


  • Buffer articles. A series of articles began in April 1994 describing the motivation for adding X to your desktop, the possible hardware platforms available, and the comparative advantages of each configuration.


  • Town meetings. We discuss the issues highlighted in the Buffer series in NERSC users' town meetings. Visiting the users on their home turf gives us an opportunity to see how users get their work done and what equipment they have.


  • Demonstrations. At town meetings, scientific meetings, and networking meetings we demonstrate new technologies (most often, over X) and, most importantly, show how researchers are applying the power of the new technologies to their work.


  • Staff Training. Most NERSC staff members use X and many are at least acquainted with X internals. The consulting staff has Xperts on call who can provide information about the options available for adding X capabilities to any desktop configuration.


MPP Procurement Status (Michel McCoy)

For some time, NERSC has been developing a Request for Proposals (RFP) for the procurement of a massively parallel computer. The process is lengthy not only because of its complexity but because of the review procedures required by DOE and the University. It is anticipated that the document will be presented to vendors in early November. Draft versions of the RFP went out in March and again in August, and the vendors have had the benchmark codes since March. Because the vendor community has had the opportunity to work on the benchmarks for many months, we do not expect a lengthy response time after the final version of the RFP goes out. If all goes well, an award will be made in spring or early summer of 1995 and delivery of the first system component will occur in the second half of 1995.

The actual system is expected to be delivered in two phases. The first component--the Pilot Early Production (PEP) system--will consist of at least 128 processors. On this system, the vendor must demonstrate the ability to provide a production environment through meeting a series of milestones. Approximately one year later, the vendor will deliver the second phase system, called Fully Configured Machine (FCM), consisting of at least 512 processors. This could represent a simple augmentation of the original machine, or it could represent a technological upgrade in which the vendor takes back the original system, substituting a higher performance machine in its place. This upgrade will not occur if the milestones are not met to NERSC's satisfaction. The milestones (called production status requirements, or PSRs) will be comprehensive and reflect the capabilities promised by the vendor in describing the virtues of the FCM in the response to the RFP.

The benchmark codes come from the energy research community that uses NERSC as a resource. We have tried to make the codes reasonably representative of the spectrum of science and numerical techniques used by the community. Some codes are written in Fortran 90, others use message passing. NERSC staff adapted these codes over a six-month code conversion/verification/benchmark period. NERSC is grateful to the researchers who gave us the use of these codes, all the more so because we know that in some cases this represents some personal sacrifice by the application owner.



Capacity and Capability Solutions for 1996 and Beyond (Tammy Welcome)

Currently the CRAY C90 is fully utilized. Capability users compete with capacity users for the C90's resources. Capability users use all the processing power and memory of the system to solve a single Grand Challenge-scale application. Capacity users develop applications interactively, debug them to ensure correct execution and performance, and analyze results. An analysis of accounting records for a 64-day period beginning April 5 revealed that 15 applications were using 30% of the cycles on the C90.

We propose to alleviate this problem by making effective use of all NERSC resources in two ways:


  1. Enable capability applications on the PEP massively parallel (MP) computer system (currently under competitive procurement with an anticipated delivery in the latter part of 1995).


  2. Enable capacity use on auxiliary servers.

To enable capability applications on the MP computer system, staff members are working with research scientists to parallelize capability applications. The portability of the resulting parallel code and adoption of the modified source code by the scientist are important issues that will be addressed. When the PEP system arrives, NERSC staff will ensure there is software to support the parallel applications and will offer training to research scientists on how to use the new environment. By using the parallel computer system in this manner, NERSC could potentially free a significant portion (perhaps 30%) of cycles on the C90. We anticipate that in a year the execution environment on the FCM (the upgrade to the PEP) will be able to support both capacity and capability use, further offloading the C90.

To enable capacity use on auxiliary servers, NERSC will upgrade SAS and AFS to accommodate more users. These auxiliary servers will have better system response; a rich software environment for preprocessing, post-processing, and developing applications; and a file system that permits the easy sharing of files between NERSC platforms. By shifting some of the interactive capacity workload to the auxiliary servers, we can potentially provide better interactive service to the users, and free cycles on the C90 at the same time.


Access to the HPCRCs and the LLNL T3D (Tammy Welcome)

MPP Access is described in the article "The Massively Parallel Processing Access Program," starting on page 1 of the October 1994 Buffer.

Storage: A New View (Steve Louis)

Storage technology is undergoing a paradigm shift. NERSC must adjust to this change by moving to architectural approaches that use new hardware and software storage technologies. Failure to change will result in a storage environment at NERSC that lags behind the higher demands of new processing and communications technologies.

As NERSC users know, the old storage model is exemplified by our CFS storage environment. The CFS environment centers around expensive IBM mainframe storage servers, uses a proprietary MVS/XA operating system and relatively slow-speed block multiplexor channels; it must be accessed through unfamiliar and non-standard data transfer interfaces. The newer storage paradigm, as prototyped at the National Storage Laboratory (NSL) and elsewhere, centers around less-expensive workstation servers, open distributed systems, commonly used (or even transparent) data-transfer interfaces, and new ways to use high-speed scalable and parallel I/O.

At NERSC, plans are under way to improve the storage environment in phases over the next three years. The first phase, which introduces a new base- level storage system modeled after proven NSL hardware and software technologies, is nearing completion. This NSL-Technology Base System comprises a powerful IBM RISC System/6000 workstation coupled with a large 50 - 100 gigabyte Fast/Wide SCSI-2 disk cache and an automated robotic archive with multiple terabyte capacity. This system will have HIPPI, FDDI, and Ethernet connectivity and will initially run a version of NSL-UniTree software. A 16 x 16 HIPPI crossbar switch will connect this system over HIPPI to the NERSC mainframes.

After successful deployment of the Base System, which will be used primarily by selected users with large storage needs, the Base System will be improved during a second phase enhancement. During this phase, additional production-level capabilities will be added to the Base System. Plans are for these extensions to be compatible with initial Base System hardware and software, but may take the form of capacity extensions to existing disk and robotics, or new technology upgrades to existing Base System disk and robotics.

The Extended Base System paves the way for a probable FY96 installation of a fully configured new storage system that will completely replace CFS. This Fully Configured Storage System is planned to be a full High-Performance Storage System (HPSS) environment, with parallel I/O capabilities and network-attached high-performance storage devices. HPSS is the current software development project of the NSL, and is being jointly developed by Lawrence Livermore, Los Alamos, Sandia, and Oak Ridge National Laboratories together with IBM-U.S. Federal, Cornell, NASA Lewis, and NASA Langley Research Centers. An HPSS software environment, together with powerful new high-performance disk and archival devices, is expected to meet the high-end storage needs of the new massively parallel machines to be installed at NERSC over the next few years.


Morning Session (July 11, 1994)


really upset the system of networks. DDK)
- Jack Byers will graduate to emeritus status at the next meeting and Brian
Hingerty will become chairperson.
- Fusion was less predominant and chemistry (PNL and ANL) was strong.
- One petabyte = 42,500,000 trees of paper.
- Day 2, the ERPUS contribution, will be put up as a MOSAIC page at NERSC
so it will not be included here.

Welcoming Remarks - Jack Byers


It was pointed out that the SPP (Special Parallel Processing) program
has had an impact on the usage of the Crays. Some large single
processor jobs have had difficulty running. Not all has been positive
for the users.

JACK BYERS: Chairman's greeting.
- Pitch for EXERSUG - ERSUG - SAC communication and cooperation.
- Noted importance of memory as a part of the SPP program.
(Yea!!!! - DDK)

Welcome from Washington - John Cavallini


Series of reviews of large computational projects in progress (June 1994)
Budget problems - do more with less
Transition to Massively Parallel - software problems
vendors, conversion of code
performance issues
FY96 budget not yet ready till Aug 94
FICCIT committee active:
Anita Jones (DOD) covers us - advisory panel being setup

JOHN CAVALLINI: Washington view 1.
- Network, computational projects, etc. reviews are completed.
- Budgets are "not looking good". No FY96 guidance possible yet.
Growth from initial $7M to $107M.
- Important concepts: virtual machine, scalable I/O, Mass store.
- "FIXIT" --> "National Science and Technology Committee" and under that
"Committee on Information Technology".

Washington View - Tom Kitchens


Meeting is now being recorded on M-BONE video tape for future reference.
Budgets now 6% down in most DOE Offices for FY95.
National Information Infrastructure - important to administration
Communications Committee active
New projects - less money
Personnel changes at DOE
New initiative - advanced computing initiative (ACTI)
money from various offices
FY 95 $23-40M
ERSUG requirements document needs update

TOM KITCHENS: Washington Report 2.
- Flat is the best you can assume.
- "National Information Infrastructure" is in focus (DDK comment: Gore)
Something that was introduced that I don't understand is "Prototype Virtual
Agencies" in the discussion of developing this.
- Proposed DOE/ER reorganization would combine a number of parts: OSC/
Technology Transfer / Basic Energy Sciences / Advanced Energy Projects
(wow! DDK)
- People Changes: OSC has a lot of turnover in their detailees. New Federal
Employee for "Information Technology"
- New initiative in "Advanced Computational Technology:. It is to deal
with petroleum problems. Funding to come from many sources all of which
have been significantly reduced. FY95 funding will be $23-40M.
- MAJOR REMINDER: Need a "needs" document update by next meeting. May be
able to clip major portions from the blue book.


Additional notes supplied by Tom Kitchens (DOE-OSC)

Washington View - Tom Kitchens

It's Washington so let's talk about science support, policies and
organization - what else is there? Most budgets are expected to be
5-10% down next year in the DOE; many other agencies will also have
decreases in FY'95. A flat budget is suddenly a fat budget for FY'95
and FY'96. The signals have been changed - High Performance Computing
and Grand Challenges are not completely out but the Administration is
more interested in the National Information Infrastructure (NII) and
National Challenges. The Administration has its new National Science
and Technology Council (NSTC) responsible for more facets of technology
than the old FCCSET: The High Performance Computing, Communications,
and Information Technology (HPCCIT) organization is now a subcommittee
of an NSTC committee, the Committee on Information and Communications
(CIC) which is also responsible for NII. This means that HPCCIT has
dropped in priority and has substantial competition for attention in
its superior committee. It is going to be far harder to convince this
organization that High Performance Computing (HPC) needs even flat
support. The ERSUG "green book" is now three years old and must
be updated soon with regard to the needs and what you have learned in
the last few years: the ERSUG requirements need to be well supported by
statements on the societal impact of the work being done, good
accomplishments and expected milestones if the report is to be read
with any interest. I hope to see the EXERSUG appoint a strong group
to undertake this task.

The Administration is also interested in inter-agency and
intra-agency collaboration; in fact, the HPCCIT and the Global
Climate Initiative are being used as pilot "Virtual Agencies" where
components of several agencies are being guided by the President's
Science Advisor's Office, OSTP, to attack a common problem. This has
plusses and minuses; we must do all the internal budget preparation
reports to DOE as well as to the virtual agency, but we do reap some
additional visibility in the OSTP. This virtual plus has not yet
improved our budget but maybe there will be an effect in FY'97! The
message is: working with people from other agencies - or other parts
of your own agency - is good, especially if you can show some
leverage of support. ERSUG needs to tell more about its members
work and interact with scientists and technicians both inside and
outside the DOE. ERSUG need to be strengthened: having this meeting
videotaped for the Mbone and putting it next to the Energy Research
Power Users Symposium (ERPUS) and the Office of Program Analysis Peer
Review of Energy Research's Computational Science Projects was to make
it easier for more users to attend. A strong ERSUG will be essential
to maintain even a constant budget for computational resources.

Another example of an collaborative inside the DOE is a program to
aid the Domestic Natural Gas and Oil Industry through a program called
the Advanced Computational Technology Initiative (ACTI) which is
managed and supported by the Office of Fossil Energy, Defense
Programs, and Energy Research. This program is based on DOE
Laboratory collaborating with the domestic gas and oil companies and
the appropriation committees have pinned its budget somewhere between
$23 and 40 million. The plans are quite flawed for our perspective at
this time but there are several months to work them out. The lesson
here is that if you don't initiate the cross-cutting work, someone else
will - and you probably won't like the way they want to do it!

Inside the Department and inside Energy Research some streamlining
is being proposed. An early-out program has been in effect at DOE
Headquarters. The Director of Energy Research, Martha Krebs, is
proposing to form two new divisions in ER, one that would include the
OSC with ER's Technology Transfer Office, SBIR, and BES/Advanced
Energy Projects program office. (This has now happened.)

In OSC we have had some changes in personnel: George Seweryniak has
joined us to manage much of the ESNET effort, Steve Elbert has
returned to Ames and Wally Ermler has joined us from Stevens Institute.
We expect to add another Federal Employee to handle Information
Technology and other issues. (It can now be told that this person is
Mary Anne Scott who has been in the Office of Fusion Energy and has
often participated the ERSUG meetings.)

I want to remind you not to forget the importance of updating and
strengthening the ERSUG computational requirements document (the
'Green book'). I hope you have a good meeting.


Jack Byers' comments on need for a new green book for DOE


EXERSUG members:
Note Kitchens' appeal for stronger ERSUG.
We all need to work on this. E-mail between ourselves and Kitchens
clearly isnt enough. We need plans suggestions mechanisms we can use
that we now don't have or don't use.

Also following is more push for us to get going on the green book.
I will need help. Am starting to work with McCoy and MIrin (from NERSC)
on division of work between NERSC and EXERSUG.
I am presently struggling with my version of statement from users
point of view of needs, requirements and trying to see that it
fits it in with a statement of NERSC vision by Mike McCoy.
When he and I get to some partial agreement I will send this to you
for editing modification etc. My present idea is that the users
statement ought to be independent of NERSC or Osc or anybody.
If that makes sense, the NERSC vision would naturally stand as a response
to the users statement of needs.

It might make sense to plan to have the ERSUG users statement targeted
elsewhere also, ie, not to use it only for the green book. This
might serve as an initial action in making ERSUG stronger. Ideas for targets?

I will need help from all of you at least on the science accomplishments
sections in your areas. If you cant do this yourselves, please at least
take responsibility for farming it out to people in your discipline.
Potter has agreed to do the climate section.

I have a lot of good material (3 ERPUS papers) on QCD. I will take a first cut
at pulling out a statement of needs and accomplishments from those papers.
But I will need a high energy physics person to at least edit that and perhaps
even rewrite what I can do.

There is some more material from ERPUS that you might use as starting
points, tho the most complete ERPUS seemed to be the QCD papers and
the ocean modeling paper by Malone. Contact me for a list of what
I have. I haven't got anything from the ERPUS papers of Leboeuf, Hammett,
Colella, Kendall and others I think.

You also should look at the previous green book to see what is involved.
if you don't have copies, E-mail kitchens for them.

There is a possibility that NERSC will hold a workshop to bring the
green book together, early next year. This is NOT to suggest that we
are off the hook, but rather to point out that all of the rough drafting must
be complete by then, and probably we should try to have each individual
science section completed in final form, so that the meeting could then fill in
the holes, stitch together the pieces, and make coherent summary statements.


PRODUCTION SYSTEMS PLANS - Moe Jette and Rick Kendall


--Centralized User Bank (CUB)
--Portable Batch System (PBS)
--Preparing for arrival of Cray T3D
--Preparing for installation of UNICOS 8.0
--Major file system reconfiguration
--CFS converting to 36-track tapes


-Complete X window interface
In /usr/local/new/xcub on Crays and SAS for beta testing.
Will be moved to /usr/local/bin/xcub (ie. production) this summer.
-CFS quotas by user available
Each user can be allocated some percentage of the repository's
total CFS allocation. The CFS allocation, like the CRU allocation,
can be oversubscribed
-Resource accounting report generation
Monthly accounting reports based upon data from CUB
The UNICOS timecards are no longer used
-CRU reserve controls for SAC
SAC members can manage their CRU reserves with CUB


-CFS reserve controls for SAC
SAC members will be able to manage their CFS reserves with CUB
To be completed by September
-Port to SAS and T3D
We will be able to account for SAS resource use by December.
Port to T3D completed in July (for account management) and
August (for accounting). ER users will have access to about
7.5 percent of the T3D. We plan to allocate T3D resources
completely separately from other NERSC resources.
-History report
This can be used to review recent changes to the database
(records of move, infuse, modify, and other commands).
-Many improvements for security and reliability
Update to Oracle 7 for database management.
Better encryption of message traffic.
Messages recorded and can be played back.
-Change login name
Each client will be able to change his login name throughout
NERSC with a single CUB command. To be available this winter.
-Change password NERSC-wide
Each client will be able to change his password throughout
NERSC with a single CUB command. To be available this winter.
-Users in multiple repositories
Each client will be able to access multiple repositories from
a single login name and password. He will be able to specify the
repository to use at login time. To be available this fall.
-One-time password (SecurID cards)
Since each password will be used only one time, its capture
by a hacker poses little security threat. We are currently
working with SecurID cards since we already have the software
and some cards. We are investigating other one-time password
schemes including Enigma Logic cards, S/Key (no cards required)
and others. NERSC would purchase cards for some of its staff
(those with special privileges). Cards purchased by client
sites (about $50 per card) could be supported by NERSC without
additional cost. Send E-mail to [email protected] for details.
-Restrict access of foreign nationals
Foreign nationals from specific countries will have
restricted access to sensitive data.
-Modification of address, phone number, etc.
Clients will be able to modify this information as needed with
a CUB command. To be available this fall.
-Support for Kerberos
There are several incompatible versions of Kerberos. Our current
intention is to support Kerberos version 4, as used by AFS.
-Identification of resource use by process
We intend to record resource use by process for long running
processes. While the volume of data involved precludes us from
recording in CUB each process executed, a small number of long
running processes account for most resources consumed.
-Real time monitoring of resource use by process
Users will be able to monitor the resource use (CRUs) by a single
process in real time.
-More information about CUB is available online
For a brief description type : man setcub
For a complete description type: document view setcub
-Help prioritize these tasks
We welcome your comments to help us prioritize these tasks.
Direct your comments to [email protected]

-POSIX compliant batch system
This is the only batch system under development which
complies with the POSIX standards and is well suited
to satisfy supercomputing needs.
-Being developed with NASA Ames
NASA Ames developed NQS, the current batch standard.
PBS fixes many of the shortcomings of NQS.
-In beta test mode
Many computer vendors are interested in using PBS
We expect PBS to provide a consistent batch system
across all NERSC platforms.


--Available from Cray in March 1994
--AFS available from Pittsburgh Supercomputer Center in July 1994
--Preliminary installations schedule
August Cray T3D (after machine's acceptance)
September Cray/F
October Cray/C
November Cray/A


--Multi-threaded kernel reduces system overhead and improves interactive
use of the machine
--Unified Resource Manager (URM) - improves control of resources
--POSIX compliant shell and utilities - improves interoperability
--Kerberized clients and utilities (klogin, krsh, kcp, kftp, etc.)


--Removed /tmp/wk3 and /workbig file systems from Cray/F
Sufficient disk space was available for Cray/F
without these file systems
--Removed /workbig file system from Cray/A
--Added 53 GB /usr/tmp file system to Cray/A
The disk space came from the three file systems listed above.
--Constructed 8 GB /usr/fast file system on Cray/A
We cache /usr/fast on SSD during SPP shots for added speed.
--Both /usr/tmp and /usr/fast have 12 hour purge times
This short lifetime permits us to keep adequate
disk space available.
--Neither /usr/tmp or /usr/fast have data migrated to CFS
The load could severely impact CFS if data migration
were used.
--Reconfiguration completed on May 16, 1994.


--Big file queue (bfq) on Cray/A uses /usr/tmp/workbig directory
/usr/tmp/workbig is a directory in /usr/tmp file system
rather than an independent file system. This does not
insure 16 GB of disk space for big file queue jobs, but
we will not start a job until at least 150 percent of
the requested disk space is available. Pre-allocating
files at the time the job starts should insure sufficient
disk space.
--Disk limit for all NQS high priority queues increased to 6 GB
Several jobs with moderate storage requirements (2 to 6 GB)
can now execute simultaneously. Formerly jobs with storage
requirements in excess of 4 GB could only execute one at a time.
--Big file queue still used for jobs requiring over 6 GB of storage


--No disk space shortage on Cray/F
--Utilization of /tmp/wk# substantially reduced on Cray/A
These file systems are no longer filling. The reduced
load also improves the I/O response time.
--Data migration reduced
-Thirty percent more data is typically kept on Cray disk
-Less data is kept on CFS
-The system overhead has been reduced by a few percent
--New /usr/tmp being used by a small number of NERSC clients
Although only a small number of NERSC clients use
/usr/tmp, the storage involved is substantial. Typically
26 GB of /usr/tmp is used.


--NERSC's AFS server is accessible from any AFS client
One can access data on NERSC's AFS server from anywhere
on the Internet
--Good security is provided with the AFS version of Kerberos
--The storage uses inexpensive (commodity) disks
--The bandwidth is significantly lower than Cray disks
Caching is used and can result in speeds approaching
Cray disk. Cache size is currently 858 MB.


--NERSC's AFS server being expanded from 30 GB to 95 GB
--User quotas may be increased beyond the current 30 MB limit
--Accounts on the AFS server will be available to all NERSC users
--Transarc offers AFS client-only licenses to ER sites for $600
Send E-mail to [email protected] for details.


--LASL CFS version 61 installed
This includes numerous performance enhancements.
--Storage Technology (STK) 36-track drive installed
This doubles the capacity of a 3480 cartridge when it
is copied from an 18-track tape drive. However, it will
take years to copy existing cartridges as most of CFS
capacity goes into directly servicing user requests.
--Number of cartridges on shelf reduced by 50 percent!
This has taken place over a four month period. A far
higher proportion of data is now in the STK silos with
a vastly better performance than operator retrieved and
mounted cartridges. Shelf tapes are stored in two vaults
(down from three) in various locations around LLNL.


Date In Silo On Shelf Total
---- ------- -------- -----
Feb 24 31,002 30,315 61,317
June 7 29,029 14,551 43,580

------ ------
-52% -29%


--The cartridge reclaim rate is up by over a factor of five
--CFS administrator added
--Increased operational support
--10,000 "defective" cartridges cleared and removed
This required re-installation of 18-track tape capability,
which STK supplied at no charge!
--Online tape category eliminated
Incoming data was sometimes written on tape then copied almost
right away to other tapes. Enhancements in CFS version 61
permit us to monitor tape position with a data compressing
tape controller and eliminate this extra step.
--Massive cleanup effort to delete migrated, orphaned and abandoned


--Switch to double length tapes
This potentially doubles the storage capacity.
--Release CFS gateway
This can increase CFS aggregate bandwidth due to its independent
data path. This will provide service to workstations.
--Spool up National Storage Laboratory (NSL) Unitree expertise
We hope to install a prototype system early next year.


File Total User
Name Lifetime Space Quota Back-up
-------- -------- ----- ----- -------
/u (home) Permanent 4 GB 3.2 MB Daily
/tmp/wk# 30 days 57 GB None None
/usr/tmp 12 hours 53 GB None None
/usr/fast 12 hours 8 GB None None
/afs Permanent 95 GB 30 MB Daily
CFS Permanent 11 TB Varies None

--The /tmp/wk1, /tmp/wk2 and /tmp/wk3 file systems each have 19 GB.
--AFS files are available from any computer with AFS client software
on the Internet. Currently, only SAS and the Cray C90 at NERSC
have AFS client software.

MOE JETTE: Storage.
- Moving disk from F to C90. (F doing OK on disk as it deals with smaller
- Established large file system to replace multiple smaller ones.
- Expanded AFS service. (Arranged $600 client license available.)
- Set up /usr/tmp to accommodate large temporary storage. (Not yet heavily
used but the large users are thrilled.)
= These changes are the result of a user committee formed at the last meeting.

MOE JETTE: CUB (Central User Bank)
- X window interface is in beta test.
- CFS quotas are now available by user.
- Resource account report generation is in operation (if I heard right).
This means the old funny tools are finally being replaced.
- SAS and the T3D have been included.
- Lots of system upgrades installed (not of major interest to the user).
- The option of the one-time password will be provided for those needing extra
security (The cards will cost about $50.). "KERBEROS will provide DCE
capability for the one-time password." <<-- questioned by the audience.


Description of Kerberos (supplied by Moe Jette)
Kerberos permits you to authenticate yourself to a collection of
computer systems one time. After Kerberos authentication, you
may securely access a variety of services throughout the collection
of computer systems without further authentication (eg. klogin,
krsh, kcp, kftp, etc.).

A one-time (or single-use) password is only valid for a single
use. After that use, it can no longer be used to provide one
with authentication.

Kerberos itself does NOT provide the capability for single-use
passwords, although used together they provide very good security
and ease of use.


- Six month password expiration is in force.
- Hacker activity is down but can be expected back. Reference to the book
"The Cuckoo's Egg" to illustrate the difficulty of tracking them down.
CFS (Common File System)
- Tape sizes increased.
- CFS Gateway "soon" -- ie direct access to information in CFS without
going through the CRAY-IBM bottleneck.
- Starting to stage up NSL's Unitree as a replacement.


For further information on any of these issues, contact:

Moe Jette E-mail: [email protected]
Group Leader, Large Scale Systems Group tel: 510-423-4856
National Energy Research Supercomputer Center fax: 510-422-0435

Rick Kendall followed Moe Jette with a discussion of storage problems
and possible disk charges to more accurately reflect costs.

RICK KENDALL: for the user committee.
- Committee chair was J.-N. Leboeuf.
- Many suggestions were made and weighed.
- Two major users are impacting the system: Jack Ritchie and Uzi Landman.
Ritchie responded to queries and significant relief actually came out of a
mutual accommodation. Landman has never responded.
_ Much of the issue turns out to be a subset of the need for BALANCED
resource utilization.
- Need to improve NERSC auxiliary workstation load.
- /usr/tmp is working. This and previous bullet can be put in perspective
by noting that in the past TWO CPU's were needed just for data migration.
That's a big fraction of the CRAY resources.
- If disk charges are reinstated as a control mechanism, they need to be
staged, based on a good utilization model, and accompanied by a good
set of tools (for example the CFS interface is wholly inadequate).




ESNET Plans - Jim Leighton

Backbone Upgrade
Network Services
--ATM Asynchronous Transmission mode

Jim Leighton: ESNET
- ESNET is a backbone, ie a network to connect networks.
- Carrying ~1.6 Terabytes/mo. today. It has had and exponential like growth
with peaks each October. MOSAIC probably produced the peak last October.
- Mostly T1 (1.5Mb) links with growing number of T3 (45Mb) links. Currently,
PNL -- FNAL -- LANL -- LLNL -- LBL are T3.
- International: Japan, Germany, Italy, Brazil. Other links are made through
other agencies.
- National: two FIX (links to other agency networks) hubs, one on each coast
and lots of links to regional networks.
- Video Conferences: A meeting room can be used (outfitted for ~$60K).
ESNET provides a central hub needed when there are more than two
participating stations as well as an on-line reservation system. The
links are dial-up ISDN lines yielding transfer rates of 112-384 Kb. Over
1000 conferences have already been handled.
- Mbone does a multicast when wanting to broadcast to many receivers. It
is home grown using workstations as routing agents.
- The future? ATM = Asynchronous Transfer Mode. Multiplexing circuits to give
OC3c service at 155Mb/s and OC12c service at 622 Mb/s. ESNET proposal to
develop services with it received much interest and then many protests
(10!). All have now been dropped. So ATM experiment will start "soon"
(end of the month, maybe): LLNL-GAC-LANL & SNLA-FNAL & ANL-PPPL.


X-windows: How to Get From Here To There - Barry Howard and Alice Koniges


-What is X Windows?
-What is the "X-Windows Problem" we are trying to solve?
-How can NERSC help solve the problem?

What is X Windows?
-The X Window System is a network-based graphical windowing system.
-Developed at MIT in 1984 and now adopted as industry standard.
-Client-server model: Display Server provides windows for multiple
Clients on multiple machines.
-Provides the bare bones of a window system upon which any style of
graphical user interface (GUI) can be built.
-Windows manager provides "look and feel" for the GUI.
OSF/Motif, OLWM, TWM, BWM (to be announced)
-Latest release: Version 11 Release 5.

What About Security with X Windows?
-Authorization required to access a display
User enables any client on specified host(s) to display on server
Use xhost command: xhost + hostname
Opens opportunity for sniffing messages between client and server
Special code called "magic cookie" passed between client and server
Code stored in .Xauthority file in user's home directory
Usually set up automatically by display manager (XDM)
Easiest case is one .Xauthority file in shared home directory
-See "Safe X" article in August, 1993 edition of Buffer

What's the Problem?
There are several:
-Lack of control over
-GUIs distributed with commercial and public domain software
-Capabilities and configurations of users' desktops
-NERSC working toward a Unified Production Environment in 1996
-X windows will be an indispensable part of work space on MPP
Debugger, performance analysis tools by late 1995
-A common GUI will be used to access many of the services at NERSC
Built on X Windows
Selected application may, in fact, be run on another computer
Usage accounting for each application
Need a transition path for providing X Windows capacity for "almost all"
NERSC users which minimizes cost and frustration.

What can NERSC do to help solve this problem?
The four step method to Windows in every office.

First Step: Understand what desktop configurations are in use
-We don't know how many users lack X-capable desktops.
-Using surveys (ERDP)
-Site visits
-Usage statistics of utilities with both character and graphical
user interfaces
-Suggestions are welcome!
-We do know that implementations vary greatly among desktops
-Knowledge of which desktop configurations are most popular will help
NERSC concentrate on providing information on large, but limited,
solution set.

Second Step: Provide Education and Motivation
-Buffer - series of articles on X Windows began in April, 1994
-Consulting - eXpert on call during working hours
Familiar with various hardware/software tested at NERSC
Help with PC, MAC, X-terminal and workstation configurations
Familiar with administrative burden attached to each option
-Site visits - town meetings
-Demonstrations - show power of new tools with graphical interfaces
-Mosaic,xcub,graphics,MPP tools, Unified Production Environment

Motivation - Graphics
-X will replace Tektronix as the lowest common denominator for graphics
-All supported graphics packages on Crays are capable of sending output
directly to an X window.
-Ditto for all Cray utilities used to view graphics files.
-All graphics utilities except PV-Wave on SAS rely SOLELY on X.

Third Step: Provide recommendations on tested desktop configurations
-Need transitional roadmap for "almost all" users

Transitional Roadmap - Which desktop to buy?

Advantages Disadvantages
---------- -------------
Workstations -local processing ability -expensive
-independent of other hosts -complex system administration

X Terminals -inexpensive -X server only
-simple systems administration -dependent on host
-X performance

PCs -availability -X performance
-can run PC applications -dependent on host
-X server only

Transitional Roadmap - Cheap X
-UNIX and X for PC - LINUX + X386
Full featured UNIX for 386 and 486 AT machines
Source available in BSD and System V flavors
Includes GNU utilities
Accommodates Ethernet board or modem
Serial IP options: SLIP, PPP, CSLIP, term
Modem speeds up to 14400 (38400 compressed)
Tested successfully at home with Mosaic, FTP, etc.
Software cost: FREE

-X on top of MS Windows
Winsock (Windows Sockets) is de facto standard network software
for MS Windows.
MicroX is one choice for X server
Software cost : $150
Effective bridge between MS Windows and X Windows

Transitional Roadmap - X Windows at Home

Two approaches:
-X remote
Proprietary X protocol for using X-terminal (or PC) at home
-Serial IP
Public domain protocol for using internet at home
Examples: SLIP, PPP
Need X server software plus user interface (MS Windows)
Available for Pc, Mac, UNIX platforms

NERSC uses Cisco terminal server to provide 1-800 dialup service
-Published number for telnet access to NERSC hosts
-Unpublished number for Xremote and Serial IP access
Used by NERSC staff for using X from home
-Issues like security and cost prevent offering this as service now

Fourth Step: Explore possible new services
-XRemote server for NERSC customers
Control cost by cost-sharing and limiting access to a program
Control security by requiring one-time passwords
-Provide boot/configuration/font server for remote X-terminals
-Others to be suggested by customers

Barry Howard: DCE
- Is the base for a lot of DCE as the industry standard client-server model.
- The GUI is built on top of X. There are already several: OSF/MOTIF, OLWM,
TWM and shortly one more: BWM (Barry's window manager). This a part of
the NERSC commitment (DDK observation).
- Authorizing use: discouraged use of host-based [host + hostname] in
preference to user based "magic cookie".
- Problems: lack of control because of very different general GUI's and
very different desktop capability.
- NERSC wants to go to an X-based "Unified Production Environment" with
a common GUI used on various resources.
- Howard outlined what he called a 4 step solution. I found the steps fuzzy
but they did clearly involve a lot of user interaction.

Alice Koniges then described Graphical User Interfaces (GUIs).

- Outlined a common X interface. It included library interfaces.
- Pointed to their "Data Dimensions Interface" -- looks interesting but not
yet understood (by DDK).


EXERSUG (ERSUG Executive Committee then adjourned for a closed lunch).


SPP Project Status and Plans for 1995 - Bruce Curtis and Tom Kitchens


SPP - Special Parallel Processing in 1994
-Soni (BNL) weak matrix elements of B-mesons
allocation: 12,900 CRUs. Used 5,644 (44%)
-Kogut (U. Ill) quenched QCD at finite density of large lattices
allocation: 8,050 CRUs. Used 5,393 (67%)
-Bell (LLNL) natural transition to turbulence in a waveguide mixing layer
allocation: 7,200 CRUs. Used 1,006 (14%)
-Cohen (LLNL) toroidal gyrokinetic PIC simulation using quasi-ballooning
allocation: 3,600 CRUs. Used 2,391 (66%)
-Dory (ORNL) High Resolution Plasma Fluid Turbulence Calculations
allocation: 3,500 CRUs. Used 1,965 (56%)
-Dunning (PNL) Ab Initio Molecular Dynamics
allocation: 3,000 CRUs. Used 2,350 (78%)
-Lee (PPL) Gyrokinetic Simulation of Tokamak Plasmas
allocation: 2,500 CRUs. Used 877 (35%)
-Chen (U.C. Irvine) Alpha/Energetic-Particle Driven Instabilities
in Tokamak Plasmas
allocation: 2,500 CRUs.
-Aydemir (U. Texas) Nonlinear Gyrofluid Simulations of ITG Turbulence
in Tokamaks Using Field-Line Coordinates
allocation: 2,000 CRUs.
-Fu (PPL) Gyrokinetic MHD Hybrid Simulation of MHD Modes Destabilized
by Energetic Particles
allocation: 1,500 CRUs.
-Hammett (PPL) Gyrofluid Simulations of Tokamak Plasma Turbulence
allocation: 1,500 CRUs.
-Lee (LBL) High-Resolution Underground Imaging
allocation: 1,200 CRUs.
-Dawson (UCLA) Numerical Simulation of Plasma and Energy Transportation
in Fusion Devices Using 3D Toroidal Gyrokinetic PIC Models
allocation: 1,000 CRUs. Used 17 (1.7%)
-Stevens (ANL) Benchmarking Comparison of Computational Chemistry Codes
with MPPs
allocation: 500 CRUs.
-Zunger (NREL) Atomic Study of Step-flow Growth and Spontaneous Ordering
of Semiconductor Alloys for Photovoltaic Applications
allocation: 100 CRUs.
-Le Brun (U. Texas) Global Toroidal Gyrokinetic Simulation of eta_i-mode
Induced Transport in a Tokamak-like Plasma
allocation: 100 CRUs.
-Lester (U.C. Berkeley) quantum monte carlo for electronic structure
and vibrational eigenvalues
allocation: 75 CRUs.
-Chua (Continuum Dynamics) Computational Combustion Using Gridless
Particle Methods on Parallel Computers
allocation: 50 CRUs.

Poor utilization during first several months:
-Sloe ramp-up. Users weren't ready
-Added unparallelized physics
-Basis bug where error in input file causes infinite loop
-Other bugs in codes

Lesser factors:
-NERSC bugs and cfs problems
-Interactive Interface (15% before Sherwood, 4% otherwise)

Consequently, 1/3 of the total allocation will not be used
-Total allocation remaining divided by wall time remaining = 20.75
-If average cpu/wall ratio for remainder is 13.0, then 66% of total
allocation will be consumed by end of October

Recent Runs

CPU/Wall MFlop/cpu- Gflop/wall sec

Bell 13.8 384 5.3
Cohen 13.4 355 4.7
Dunning 13.4 51 0.7
Soni 13.3 141 1.9
Dory 12.3 330 4.1

Tom Kitchens - comments on SPP
- cap of 50K CRUs
- 10% of total CRUs
- 5% of disk storage
- a few big users/slow startup/conversion problems
- proposal not to expand in FY 95

Byers comments on SPP discussions:
NERSC had a proposal not to expand at all. Kitchens mostly bought on
to this, but others objected. It was left something like NERSC
would run a workshop to encourage new SPP users, and hopefully to get
them up to speed with less pain and time as experienced by present
users. No one new would be allowed on unless they first went to the
workshop. How hard and fast this rule is to be, was not clear.
Some expansion would then be allowed, the precise number mentioned
by McCurdy.


TOM KITCHENS: SAC suggested SPP (Special Parallel Processing) program with
the expectation that "Special" would eventually become "Standard". They
dedicated about 10% of the available computer power at NERSC.

- SPP topics included Lattice Gauge/ Fusion / Fluid Dynamics / Ab Initio
Molecular Dynamics.
- SPP codes did not perform as well this year and interest was not as high.
- NERSC recommendations: (1) limit to 50 KCRU's (flat); (2) limit to 18
SPP users (also flat) [This got the greatest objections from the floor.
DDK for one believes it a mistake.]; (3)try not to have a large turnover.

JACK BYERS: primarily reviewed the E-MAIL on the subject [includes DDK opinions]

C. WILLIAM McCURDY: proposed a mandatory SPP workshop. Users thought it might
best be expanded to include users who want to propose for SPP. [DDK thinks
that it is OK but doesn't address primary problems of SPP: shots are once a
week so unless one needs huge memory/disk it is not cost effective to the
user even if he/she is already parallelized.]

TOM KITCHENS: pointed out that the SPP allocation time frame is 1 December
for the next round (~ year).

B. CURTIS: noted that we should expand test queues NOW.

[DDK NOTE]: Besides developing parallel code, the SPP is motivated by a need
for (1) large memory -- its the only way to get 200Mw; (2) Scheduling
security -- if you want large memory, you will not get swapped in very often;
(3) Real Time Turnaround -- if you are asking for large memory or long times,
having a definite shot can be a lifesaver; (4) Allocation -- this is 10% of
the resources in a new allocation basket. Within the bounds of the SPP,
these considerations are more significant than parallelization so the SPP
should not be taken as an indicator of interest in MPP.


MPP Procurement Status - Michel McCoy


Award of competitive procurement expected in April 1995.

MIKE McCOY: MPP Acquisition
- Basic document Blue Book (ER-0587, Feb., 1993) called for MPP FY94.
- Current schedule calls for FCM (Fully Configured Machine) mid-FY96. It
should be capable of both multiple users and multiple programming models;
have Math Libraries and Programming Tools; and be balanced in CPU to
memory to I/O. It should be FAST!!!! It should run GCS (Grand Challenge
Scale) codes on 500-1000 nodes. It will be dedicated to that purpose
"at night".
- The PEP (Pilot Early Production) machine has to be available to demonstrate
capability by running bench marks. It should be available to users as
a development machine April 1995. By preparing codes, it should give some
capacity relief.






Capability System - A capability system possesses sufficient processing
power and memory to accommodate a single Grand Challenge-scale
application utilizing all the resources of the system.

Capacity System - A capacity system can support, in a flexible manner,
a large load of users simultaneously developing, debugging, and
running a large mix of applications.

The Problem - Capacity verses Capability
-- "I'm being forced off the machine"
-- "I'm not getting enough time for my capability solutions"


Batch queue length - one measure of demand

Analysis of codes using cycles on the C90
-Snapshot based on accounting records from CUB
-Covers 64 day period beginning April 5
-Filtering of accounting data will enable us to store
accounting information for greater length of time



li.x/ Soni Brookhaven 2056 8.4 8.4

sqed*.x 1699 6.9 15.3
chem_1bc Kogut U. of Illinois 457 1.9 17.2
spectrum 423 1.7 18.9

xtreb Kerbel, GA,LLNL 354 1.4 20.3

xden6 Sharpe/ LANL 314 1.3 21.6

dtem Spong/ ORNL 295 1.2 22.8

prg* Jansen UCSD 257 1.0 23.8

nrqcd.ds Sloan FSU 242 1.0 24.8

lmbh Klepeis LLNL 224 0.9 25.7

xvel Charlton ORNL 223 0.9 26.6

xg3e Williams/ LLNL 218 0.9 27.5

amr Crutchfield LLNL 215 0.9 28.4

wxex* Park PPPL 208 0.8 29.2

pies* Merkel et. PPPL 206 0.8 30.0

cup Strand NCAR 195 0.8 30.8

gotsy Turner et. PPPL 194 0.8 31.6

xvg04* Atherton LLNL 185 0.8 32.4

Additional cycles available soon
-July 1994 - time available via MPP allocation program is limited
-Late 1995 - PEP system available for capability (and limited
capacity) computing
-Late 1996 - FCM system available for capacity and capability

Proposed Solution
-To solve both capacity and capability needs...
- Make use of PEP system to enable C90 capability codes on
the MPP, possibly freeing cycles on the C90 for capacity
- Use FCM to support both capability and capacity codes.
- Make effective use of all NERSC resources!

Why would the research scientist support this plan?

Significant performance gain...

System LKF* Efficiency #PEs System Performance

C90 250 MF .75 16 3 GF
PEP 15-60 MF** .30 256 1.15-4.6 GF
FCM 60 MF .4 512 12.3 GF

* LFK geometric mean
** depending on the technology

Access to more memory...

System Memory #PEs Total Memory

C90 2 GB
PEP 64-256 MB* 256 16-64 GB
FCM 256 MB 512 128 GB

* depending on the technology

Access to unsaturated PEP system...
-Improved wall clock turnaround time

Implementing the solution
1. Staff works with research scientists to parallelize capability codes.
Issues: - Portability of resulting code
- Parallel platforms on which to run code
- Single source
- adoption of modified source code by scientist
2. Staff enables codes to run on PEP on day 1.
Issues: - Existence of software to support the parallel code
3. Research scientists run capability codes on PEP on day 2.
Issues: - Training scientist to run in this new environment
4. Research scientists run capacity and capability codes on FCM.

Proposed Solution
To solve both capacity and capability needs...
Enable more users on SAS and AFS, providing an attractive platform
for preprocessing, post-processing, and developing codes.
Make effective use of all NERSC resources!

Why would the research scientists support this plan?
Better interactive service...
-Improved response for user doing code development
-Rich software environment
-Shared home directories with CRAYs via AFS

Implementing the solution
1. Staff upgrades SAS and AFS to accommodate more users.

-By Parallelizing the top cycle-burning capability codes to
run on the PEP system, we potentially free up 30% of the C90
for capacity codes in 1996.
-One year later, we expect the execution environment on the FCM
to support both capability and capacity codes.
-By shifting some of the interactive workload over to the auxiliary
servers, we can potentially provide better interactive service
to the user.




- Develop user expertise in anticipation of delivery of NERSC machines
- Broaden base of user codes on parallel platforms
- Develop NERSC staff expertise


Limited access to HPCRCs and H4P
- ORNL Paragon XP/S 5 and XP/S 35 approx. 6%
'5 - 66 PEs, 16 MB, 9.6 GB disk
'35 - 512 PEs, 32 MB, 144 GB disk
64 PEs, 32 MB, 20 GB disk
- LANL CM-5 - approx. 5%
1024 PEs, 32 MB, 60 GB disk
- LLNL T3D - approx. 7.5%
128 PEs, 64 MB, 90 GB disk

Timeline for access
Early May - request for proposals mailed to PIs
Mid May - request for proposals appears in Mosaic
May-June - request for proposals appears in Buffer
(May 1994 issue)
June 17 - proposal deadline
?? - MPP access awards
July - access to Paragon '5, Paragon '35, KSR/1, CM-5
Sept - access to T3D

A total of 17 proposals were received
- 9 have previous parallel experience (3 via SPP)
8 have NO previous parallel experience
- 10 requested regular consulting support
5 requested help developing the parallel applications
1 undecided
1 unknown

More about the proposals

Parallel Platform...

7 - T3D (11)
2 - CM-5 (4)
1 - Paragon (6)
1 - KSR (2)
3 - T3D and/or Paragon
1 - Paragon and/or CM-5
1 - T3D and/or CM-5
1 - Paragon and/or KSR-1

Programming Paradigm

9 - message passing
2 - data parallel
1 - shared memory
3 - undecided
2 - no comment

List of proposals for MPP access - available from Tammy Welcome

Access awards
- Funding decisions will be made by OSC
- Awards have a 6 month duration
- PI is responsible for short project status report

TAMMY WELCOME: Access Program (for development only)
- Time available at the two research centers and also on the LLNL T3D.
- Received 17 proposals, half with experience already.
- Five requested development help.
- T3D has been the most popular.
- Message passing preferred --> most likely to be available on other machines.
- Decisions on which proposals being made by OSC. They are of six month
duration [too short a limit] and the PI is responsible for a short report.
- Capacity/Capability issue: Want to use the PEP to move some C90 codes
to free up the C90. The idea is that the top 15 codes use 30% of the C90
and are chomping at the bit to use more. To implement: (1) staff works with
scientist to convert code. (2) Staff enables those codes to run on the PEP
"day 1". (3) Research scientists run "day 2".




Storage - a new paradigm
- "How can you keep on movin' (unless you migrate me, too)"
... Ry Cooder, Into the Purple Valley
- S-curve model /Performance or value vs investment or time

Storage - Migration from old to new
Old Paradigm
- Expensive mainframe file servers
- Proprietary operating systems
- Non-standard device interfaces
- Slow-speed channel bottlenecks
- Mainframe-centered architectures
- Inefficient access mechanisms

New Paradigm
- Less expensive workstation servers
- Open systems
- Common interfaces
- High-speed scalable, parallel I/O
- Network-centered architectures
- Transparent access mechanisms

Storage - Hardware Technology Trends
- Disk seek, latency, and data transfer are all improving
- Disk arrays with higher performance and lower cost
- New advanced RAID levels now under development
- Magnetic recording technology appears almost limitless
- Helical and narrow-track longitudinal both improving
- Automated robotics are becoming standard equipment
- Fiber optics becoming transmission technology of choice
- Gigabit networks and applications are more common

Storage - Software Technology Trends
- Open systems, interoperability, and standards necessary
- Distributed client-server computing more widespread
- Seamless data interchange between applications
- Scalability of capacity and data transfer rates necessary
- Integrated storage system management capabilities
- Transaction and metadata management for storage systems
- More electronically-saved and machine-readable storage
(for the environmentally concerned: 1 petabyte = 42.5M trees)

NSL - Technology Base System
- IBM RISC System/6000 Model R24 (or 990)
- 512 MB memory
- 92 GB SCSI-2 Fast/Wide disk
- HIPPI, FDDI, and Ethernet connectivity
- 3490E (36-track and compression) robotic archive
- 4 drives
- 1,340 cartridges (1 TB uncompressed)
- 16x16 HIPPI crossbar switch
- connects Crays to Base System
- connects Crays to Crays
- can also isolate C-90 from NSL switch problems
- NSL-UniTree software environment

NSL - Technology Extended Base System
- Additional production-level capabilities for Base System
- compatibility with Base System hardware/software
- no vendor bias toward existing base configuration
- May take several forms
- extensions to existing Base System disk or robotics
- upgrades to existing Base System disk robotics
- preliminary version of Fully Configured Storage
- Extended Base System paves way for
- HPSS conversion
- parallel I/O
- network-attached devices
- scale-down CFS

NSL - Technology Fully Configured Storage
- Fully Configured Storage is the last step:
- full HPSS environment
- full high-performance peripherals
- scalable, parallel I/O to FCM MPP
- dismantlement of CFS
- Procurements may be split among several vendors if:
- best fit is through selection of several technologies
- no single-vendor solution exists to meet requirements
- Funding issues:
- dependent on NERSC FY96 and FY97 funding
- may be coupled with Extended Base System as a
two-phased procurement similar to the PEP/FCM

STEVE LOUIS: storage
- Old CFS used (1) mainframe file servers; (2) proprietary operating systems;
(3) non-standard device interfaces; (4) slow channel speeds; and
(5) inefficient access mechanisms.
- New system will based on workstations, UNIX, standard high speed interfaces,
multiple paths.




As soon as a larger system is provided, it is filled to capacity.

- To unite diversified users with collaboration and education.
- Workshops and Classes: Intro. MPP Computing (thought up at this meeting);
Adaptive Grid Methods (Dec.); PVM and Distributed Computing; and more...
- Expanded Visitors Program -- ie spend some time at NERSC.
- MOSAIC: Research Highlights Program.


Open Discussion

Rick Kendall - large memory jobs are having major problems
checkpoint failure
slow queue

DISCUSSION PERIOD: MPP was discussed as we went along. The problem of poor
treatment of large memory jobs was brought up by Rich Kendall -- NERSC people
appeared surprised.

BILL McCURDY: The fragile nature of the effort to put an MPP on the floor
in FY96.
- When creating the Bluebook of FY92, the question was asked whether it
made sense to pin one's hopes on a high end MPP. It was concluded that
the vector machines would not compete with the workstations.
- The C90 will be in its 4th year by then and so in its "mature latter stages"
- The C90 will be paid out by then so what one must hope for is to switch the
funding to the MPP. No hope of an increase, what one hopes for is that
support is not taken away.
- The USERS will have to motivate the acquisition as they are the only
ones that can do it. They should remember that as they push for the
workstations (that they admittedly must have), they should push for
the high end too.
- The two T3D's from the CRADA (~$50M over 3 years) are essentially sold out.
(128 node at LANL and 128 +128(local LLNL) node at LLNL. These, like the
RC's at LANL and ORNL, are NOT general access.
- 30% of the C90 (15 codes) will be converted in advance. (The C90 is 75%
of the NERSC resources.)
- NERSC has a commitment to do precisely what the users tell us.
- $2B/yr is assigned to equipment in the DOE and there is a committee to
try to save some of this. (John Fitzgerald is on that committee and
pointed out that "lease - not to own" costs state sales tax. There is
an enormous saving possible there.) Savannah River has the biggest
computer budget in the DOE -- why?

ANS: record retrieval and maintenance for environmental cleanup. Many
of those records are on obsolete equipment and software but they MUST
be available. It is a massive EM problem which unfortunately makes it
(falsely) look like computation is getting large funding.

- SSC still has workstation farms there but they can't be moved.

End of meeting - adjourn