Minutes of the ERSUG/EXERSUG Meeting

Holiday Inn Crown Plaza
Rockville, MD
July 11-12, 1994

A grateful acknowledgement is made is Dale Koelling (DDK in the minutes) for supplying extensive notes from the meeting.

Brian Hingerty
Vice-Chair/Secretary
beh@ornl.gov

Preliminary Announcement (Barry Howard)

The Energy Research Supercomputer Users' Group (ERSUG) will meet at the Holiday Inn, Crown Plaza in Rockville, MD on July 11-12, 1994. In the past, this meeting has combined presentations describing work-in-progress at NERSC with lively user discussions in the areas of the services and capabilities provided by NERSC. For this particular meeting, the focus will change somewhat. First, more emphasis will be put on future services, and second, the ERSUG meeting will be combined with the Energy Research Power Users Symposium (ERPUS). The session dedicated to ERPUS will occur on the second day of the two day ERSUG meeting and will be devoted to user presentations describing results obtained through the Special Parallel Processing (SPP) program at NERSC, as well as results obtained through use of other DOE high-performance systems.

The general theme of the first day of the ERSUG meeting will be realized and projected improvements in the NERSC computing environment as well as on some remaining bottlenecks inhibiting most efficient access to system resources. A report from the committee to study options for controlling disk space usage will be part of a discussion of integrating host disk systems with AFS and archival storage at NERSC. There will be a progress report from ESNET and a detailed report on the transitions to X-windows as a baseline standard for our users as we move into an era in which vendor software is driven by, and only compatible with, X-based standards.

Also on the first day, a new program initiated by the Office of Scientific Computing to arrange for Massively Parallel (MP) computing access will be described. NERSC Principal Investigators are now applying for access to parallel computers at the High Performance Research Supercomputer Centers at Los Alamos and Oak Ridge and to the Livermore T3D. Such access will allow scientists to develop applications in anticipation of the arrival of the NERSC MP system. Following this will be an update on the status of the NERSC procurement and a discussion of the anticipated impact that the transition will have on our users. This will include an approach for addressing the capacity-capability antithesis, not only during the transition period, but also beyond.

Since the meeting is dominated by what is about to happen rather than by what has happened, the NERSC staff will prepare a number of short documents describing recent progress and the status of critical areas. These will have to substitute for detailed presentations and will include topics such as the Energy Research Decision Package (ERDP), Centralized User Bank (CUB), etc.

We expect this meeting to be very stimulating, its focus on the present will be on extracting the maximum out of existing resources, and the focus on the future will be on planning a feasible and realistic path to effective use of new technologies. In this journey, the major constraints will be to find a reasonable path for all of our users regardless of their computing requirements, since good science is not always the biggest science...



                               AGENDA
                               ------

                         Monday, July 11, 1994
                         ---------------------

8:30  Welcoming Remarks - Jack Byers

8:40  Welcome from Washington - John Cavallini

8:45  Washington View - Tom Kitchens

9:05  Production Systems Plans - Moe Jette and Rick Kendall
       New storage usage paradigms
       Disk reconfiguration committee report
       CUB plans

10:05 BREAK

10:15 ESNET Plans - Jim Leighton
       Backbone upgrade
       Network services

11:00 X-windows: How To Get From Here To There - Barry Howard and
                                                 Alice Koniges

12:00 LUNCH (EXERSUG meeting to be held in the Hideaway Room of the 
             Holiday Inn Restaurant)

1:30 SPP Project Status and Plans for 1995 - Bruce Curtis and 
                                             Tom Kitchens

2:15 MPP Procurement Status - Michel McCoy

2:45 Break

3:00 Transition to Parallel Computing - Tammy welcome and Steve Louis
       Access to the HPCRCs and to the LLNL T3D
       Capacity and Capability solutions for 1996 and beyond
       Storage expands to meet demands

4:00 R and D: Tracking the Computational Explosion - Alice Koniges

4:20 Open Discussion

5:00 Adjourn

Tuesday, July 12, 1994, was used for presentations of the Energy Research
Power Users Symposium (ERPUS). The proceedings of this symposium will be
published elsewhere by DOE.

------------------------------------------------------------------------

                  NERSC NEWS-FLASH - July 1994

Fast NQS batch queues on C-90 allow 6 Gb file space in /usr/temp

NQS BIGFILE batch queue on C-90 used for jobs declaring maximum file
space over 6 Gb, to 16 Gb

/usr/tmp file system on the C-90 has 53 Gb available

CFS gateway lets workstation users access CFS via FTP, giving them great 
storage capacity, bypassing Crays

Installation of 36-track CFS tapes is cutting in half the number of tapes
formerly kept offline in vaults

Tripling AFS storage will let all NERSC users have AFS access

Fortran 90 now licensed on C-90

C++ version 1.0.1.1 now available on the Cray-2s

X-windows BUFFER articles and NERSC machine room tour now in MOSAIC

T3D arrives at NERSC on July 12

X-windows interface to CUB and ERDP available

-------------------------------------------------------------------------


Morning Session (July 11, 1994)
-------------------------------

GENERAL NOTES:
- THE NSF PLANS TO GET OUT OF THE R & E NETWORKING BUSINESS!  (This could 
  really upset the system of networks. DDK)
- Jack Byers will graduate to emeritus status at the next meeting and Brian 
  Hingerty will become chairperson.
- Fusion was less predominant and chemistry (PNL and ANL) was strong.
- One petabyte = 42,500,000 trees of paper.
- Day 2, the ERPUS contribution, will be put up as a MOSAIC page at NERSC 
  so it will not be included here.

Welcoming Remarks - Jack Byers
------------------------------

It was pointed out that the SPP (Special Parallel Processing) program
has had an impact on the usage of the Crays. Some large single
processor jobs have had difficulty running. Not all has been positive
for the users.

JACK BYERS: Chairman's greeting.
- Pitch for EXERSUG - ERSUG - SAC communication and cooperation.
- Noted importance of memory as a part of the SPP program.
  (Yea!!!! - DDK)

Welcome from Washington - John Cavallini
----------------------------------------

Series of reviews of large computational projects in progress (June 1994)
Budget problems - do more with less
Transition to Massively Parallel - software problems
                                     vendors, conversion of code
                                   performance issues
FY96 budget not yet ready till Aug 94
FICCIT committee active:
  Anita Jones (DOD) covers us - advisory panel being setup

JOHN CAVALLINI: Washington view 1.
- Network, computational projects, etc. reviews are completed.
- Budgets are "not looking good".  No FY96 guidance possible yet.
   Growth from initial $7M to $107M.
- Important concepts: virtual machine, scalable I/O, Mass store.
- "FIXIT" --> "National Science and Technology Committee" and under that 
   "Committee on Information Technology".

Washington View - Tom Kitchens
------------------------------

Meeting is now being recorded on M-BONE video tape for future reference.
Budgets now 6% down in most DOE Offices for FY95.
National Information Infrastructure - important to administration
Communications Committee active
New projects - less money
Personnel changes at DOE
New initiative - advanced computing initiative (ACTI)
                 money from various offices
                 FY 95 $23-40M 
ERSUG requirements document needs update

TOM KITCHENS: Washington Report 2.
- Flat is the best you can assume.
- "National Information Infrastructure" is in focus (DDK comment: Gore)
  Something that was introduced that I don't understand is "Prototype Virtual 
  Agencies" in the discussion of developing this.
- Proposed DOE/ER reorganization would combine a number of parts: OSC/ 
  Technology Transfer / Basic Energy Sciences / Advanced Energy Projects 
  (wow! DDK)
- People Changes: OSC has a lot of turnover in their detailees.  New Federal 
  Employee for "Information Technology"
- New initiative in  "Advanced Computational Technology:.  It is to deal 
  with petroleum problems.  Funding to come from many sources all of which 
  have been significantly reduced.  FY95 funding will be $23-40M.
- MAJOR REMINDER: Need a "needs" document update by next meeting.  May be 
  able to clip major portions from the blue book.

-----------------------------------------------------------------------

Additional notes supplied by Tom Kitchens (DOE-OSC)
---------------------------------------------------

Washington View - Tom Kitchens
------------------------------

   It's Washington so let's talk about science support, policies and
organization - what else is there?  Most budgets are expected to be
5-10% down next year in the DOE; many other agencies will also have
decreases in FY'95.  A flat budget is suddenly a fat budget for FY'95
and FY'96.  The signals have been changed - High Performance Computing
and Grand Challenges are not completely out but the Administration is
more interested in the National Information Infrastructure (NII) and
National Challenges.  The Administration has its new National Science 
and Technology Council (NSTC) responsible for more facets of technology
than the old FCCSET: The High Performance Computing, Communications,
and Information Technology (HPCCIT) organization is now a subcommittee
of an NSTC committee, the Committee on Information and Communications
(CIC) which is also responsible for NII.  This means that HPCCIT has
dropped in priority and has substantial competition for attention in
its superior committee.  It is going to be far harder to convince this
organization that High Performance Computing (HPC) needs even flat
support.  The ERSUG "green book" is now three years old and must
be updated soon with regard to the needs and what you have learned in 
the last few years: the ERSUG requirements need to be well supported by
statements on the societal impact of the work being done, good
accomplishments and expected milestones if the report is to be read
with any interest.  I hope to see the EXERSUG appoint a strong group
to undertake this task.

   The Administration is also interested in inter-agency and
intra-agency collaboration; in fact, the HPCCIT and the Global
Climate Initiative are being used as pilot "Virtual Agencies" where
components of several agencies are being guided by the President's
Science Advisor's Office, OSTP, to attack a common problem.  This has
plusses and minuses; we must do all the internal budget preparation
reports to DOE as well as to the virtual agency, but we do reap some
additional visibility in the OSTP.  This virtual plus has not yet
improved our budget but maybe there will be an effect in FY'97!  The
message is: working with people from other agencies - or other parts
of your own agency - is good, especially if you can show some
leverage of support.  ERSUG needs to tell more about its members
work and interact with scientists and technicians both inside and
outside the DOE.  ERSUG need to be strengthened: having this meeting
videotaped for the Mbone and putting it next to the Energy Research
Power Users Symposium (ERPUS) and the Office of Program Analysis Peer
Review of Energy Research's Computational Science Projects was to make
it easier for more users to attend.  A strong ERSUG will be essential
to maintain even a constant budget for computational resources.

   Another example of an collaborative inside the DOE is a program to
aid the Domestic Natural Gas and Oil Industry through a program called
the Advanced Computational Technology Initiative (ACTI) which is
managed and supported by the Office of Fossil Energy, Defense
Programs, and Energy Research.  This program is based on DOE
Laboratory collaborating with the domestic gas and oil companies and
the appropriation committees have pinned its budget somewhere between
$23 and 40 million.  The plans are quite flawed for our perspective at
this time but there are several months to work them out.  The lesson
here is that if you don't initiate the cross-cutting work, someone else
will - and you probably won't like the way they want to do it!

   Inside the Department and inside Energy Research some streamlining
is being proposed.  An early-out program has been in effect at DOE
Headquarters.  The Director of Energy Research, Martha Krebs, is
proposing to form two new divisions in ER, one that would include the
OSC with ER's Technology Transfer Office, SBIR, and BES/Advanced
Energy Projects program office. (This has now happened.)

   In OSC we have had some changes in personnel: George Seweryniak has
joined us to manage much of the ESNET effort, Steve Elbert has
returned to Ames and Wally Ermler has joined us from Stevens Institute.
We expect to add another Federal Employee to handle Information
Technology and other issues. (It can now be told that this person is
Mary Anne Scott who has been in the Office of Fusion Energy and has
often participated the ERSUG meetings.)

   I want to remind you not to forget the importance of updating and
strengthening the ERSUG computational requirements document (the
'Green book'). I hope you have a good meeting.

-----------------------------------------------------------------------

Jack Byers' comments on need for a new green book for DOE
---------------------------------------------------------

EXERSUG members:
Note Kitchens' appeal for stronger ERSUG.
We all need to work on this.  E-mail between ourselves and Kitchens
clearly isnt enough.  We need plans suggestions mechanisms we can use 
that we now don't have or don't use. 

Also following is more push for us to get going on the green book.
I will need help.  Am starting to work with McCoy and MIrin (from NERSC)
on division of work between NERSC and EXERSUG.
I am presently struggling with my version of statement from users
point of view of needs, requirements and trying to see that it
fits it in with a statement of NERSC vision by Mike McCoy.
When he and I get to some partial agreement I will send this to you
for editing modification etc.  My present idea is that the users
statement ought to be independent of NERSC or Osc or anybody. 
If that makes sense, the NERSC vision would naturally stand as a response
to the users statement of needs.

It might make sense to plan to have the ERSUG users statement targeted
elsewhere also, ie, not to use it only for the green book. This 
might serve as an initial action in making ERSUG stronger. Ideas for targets?

I will need help from all of you at least on the science accomplishments
sections in your areas. If you cant do this yourselves, please at least
take responsibility for farming it out to people in your discipline.
Potter has agreed to do  the  climate section.

I have a lot of good material (3 ERPUS papers) on QCD.  I will take a first cut
at pulling out a statement of needs and accomplishments from those papers.
But I will need a high energy physics person to at least edit that and perhaps 
even rewrite what I can do.

There is some more material from ERPUS that you might use as starting 
points, tho the most complete ERPUS seemed to be the QCD papers and
the ocean modeling paper by Malone. Contact me for a list of what
I have. I haven't got anything from the ERPUS papers of Leboeuf, Hammett,
Colella, Kendall and others I think.

You also should look at the previous green book to see what is involved.
if you don't have copies, E-mail kitchens for them.

There is a possibility that NERSC will hold a workshop to bring the
green book together, early next year.  This is NOT to suggest that we
are off the hook, but rather to point out that all of the rough drafting must
be complete by then, and probably we should try to have each individual
science section completed in final form, so that the meeting could then fill in
the holes, stitch together the pieces, and make coherent summary statements.

---------------------------------------------------------------------

PRODUCTION SYSTEMS PLANS - Moe Jette and Rick Kendall
------------------------------------------------------

--Centralized User Bank (CUB) 
--Portable Batch System (PBS)
--Preparing for arrival of Cray T3D
--Preparing for installation of UNICOS 8.0
--Major file system reconfiguration
--CFS converting to 36-track tapes

CENTRALIZED USER BANK (CUB) - RECENT UPDATES
--------------------------------------------

-Complete X window interface
  In /usr/local/new/xcub on Crays and SAS for beta testing.
  Will be moved to /usr/local/bin/xcub (ie. production) this summer.
-CFS quotas by user available
  Each user can be allocated some percentage of the repository's
  total CFS allocation. The CFS allocation, like the CRU allocation,
  can be oversubscribed
-Resource accounting report generation
  Monthly accounting reports based upon data from CUB
  The UNICOS timecards are no longer used
-CRU reserve controls for SAC
  SAC members can manage their CRU reserves with CUB

CENTRALIZED USER BANK, FUTURE PLANS
-----------------------------------

-CFS reserve controls for SAC
  SAC members will be able to manage their CFS reserves with CUB
  To be completed by September
-Port to SAS and T3D
  We will be able to account for SAS resource use by December.
  Port to T3D completed in July (for account management) and 
  August (for accounting). ER users will have access to about
  7.5 percent of the T3D. We plan to allocate T3D resources
  completely separately from other NERSC resources.
-History report
  This can be used to review recent changes to the database
  (records of move, infuse, modify, and other commands).
-Many improvements for security and reliability
  Update to Oracle 7 for database management.
  Better encryption of message traffic.
  Messages recorded and can be played back.
-Change login name
  Each client will be able to change his login name throughout
  NERSC with a single CUB command. To be available this winter.
-Change password NERSC-wide
  Each client will be able to change his password throughout
  NERSC with a single CUB command. To be available this winter.
-Users in multiple repositories
  Each client will be able to access multiple repositories from
  a single login name and password. He will be able to specify the
  repository to use at login time. To be available this fall.
-One-time password (SecurID cards)
  Since each password will be used only one time, its capture
  by a hacker poses little security threat. We are currently
  working with SecurID cards since we already have the software
  and some cards. We are investigating other one-time password
  schemes including Enigma Logic cards, S/Key (no cards required)
  and others. NERSC would purchase cards for some of its staff
  (those with special privileges). Cards purchased by client
  sites (about $50 per card) could be supported by NERSC without
  additional cost. Send E-mail to jette@nersc.gov for details.
-Restrict access of foreign nationals
  Foreign nationals from specific countries will have 
  restricted access to sensitive data.
-Modification of address, phone number, etc.
  Clients will be able to modify this information as needed with
  a CUB command. To be available this fall.
-Support for Kerberos
  There are several incompatible versions of Kerberos. Our current
  intention is to support Kerberos version 4, as used by AFS.
-Identification of resource use by process
  We intend to record resource use by process for long running
  processes. While the volume of data involved precludes us from
  recording in CUB each process executed, a small number of long
  running processes account for most resources consumed.
-Real time monitoring of resource use by process
  Users will be able to monitor the resource use (CRUs) by a single
  process in real time.
-More information about CUB is available online
  For a brief description type :  man setcub
  For a complete description type: document view setcub
-Help prioritize these tasks
  We welcome your comments to help us prioritize these tasks.
  Direct your comments to jette@nersc.gov

PORTABLE BATCH SYSTEM (PBS)
-POSIX compliant batch system
  This is the only batch system under development which
  complies with the POSIX standards and is well suited
  to satisfy supercomputing needs.
-Being developed with NASA Ames
  NASA Ames developed NQS, the current batch standard.
  PBS fixes many of the shortcomings of NQS.
-In beta test mode
  Many computer vendors are interested in using PBS
  We expect PBS to provide a consistent batch system
  across all NERSC platforms.

  UNICOS 8.0 AVAILABILITY
  -----------------------

--Available from Cray in March 1994
--AFS available from Pittsburgh Supercomputer Center in July 1994
--Preliminary installations schedule
      August    Cray T3D   (after machine's acceptance)
      September Cray/F
      October   Cray/C
      November  Cray/A

  UNICOS 8.0 FEATURES
  -------------------

--Multi-threaded kernel reduces system overhead and improves interactive
  use of the machine
--Unified Resource Manager (URM) - improves control of resources
--POSIX compliant shell and utilities - improves interoperability
--Kerberized clients and utilities (klogin, krsh, kcp, kftp, etc.)

  DISK RECONFIGURATION IMPLEMENTATION
  -----------------------------------

--Removed /tmp/wk3 and /workbig file systems from Cray/F
    Sufficient disk space was available for Cray/F
    without these file systems
--Removed /workbig file system from Cray/A
--Added 53 GB /usr/tmp file system to Cray/A
    The disk space came from the three file systems listed above.
--Constructed 8 GB /usr/fast file system on Cray/A
    We cache /usr/fast on SSD during SPP shots for added speed.
--Both /usr/tmp and /usr/fast have 12 hour purge times
    This short lifetime permits us to keep adequate
    disk space available.
--Neither /usr/tmp or /usr/fast have data migrated to CFS
    The load could severely impact CFS if data migration
    were used.
--Reconfiguration completed on May 16, 1994.

DISK RECONFIGURATION AND NQS
----------------------------

--Big file queue (bfq) on Cray/A uses  /usr/tmp/workbig directory
    /usr/tmp/workbig is a directory in /usr/tmp file system
    rather than an independent file system. This does not
    insure 16 GB of disk space for big file queue jobs, but
    we will not start a job until at least 150 percent of
    the requested disk space is available. Pre-allocating
    files at the time the job starts should insure sufficient
    disk space.
--Disk limit for all NQS high priority queues increased to 6 GB
    Several jobs with moderate storage requirements (2 to 6 GB)
    can now execute simultaneously. Formerly jobs with storage
    requirements in excess of 4 GB could only execute one at a time.
--Big file queue still used for jobs requiring over 6 GB of storage

DISK RECONFIGURATION RESULTS
----------------------------

--No disk space shortage on Cray/F
--Utilization of /tmp/wk# substantially reduced on Cray/A
    These file systems are no longer filling. The reduced 
    load also improves the I/O response time.
--Data migration reduced
    -Thirty percent more data is typically kept on Cray disk
    -Less data is kept on CFS
    -The system overhead has been reduced by a few percent
--New /usr/tmp being used by a small number of NERSC clients
    Although only a small number of NERSC clients use
    /usr/tmp, the storage involved is substantial. Typically
    26 GB of /usr/tmp is used.

ANDREW FILE SYSTEM CHARACTERISTICS
----------------------------------

--NERSC's AFS server is accessible from any AFS client
    One can access data on NERSC's AFS server from anywhere 
    on the Internet
--Good security is provided with the AFS version of Kerberos
--The storage uses inexpensive (commodity) disks
--The bandwidth is significantly lower than Cray disks
    Caching is used and can result in speeds approaching
    Cray disk. Cache size is currently 858 MB.

ANDREW FILE SYSTEM ENHANCEMENTS
-------------------------------

--NERSC's AFS server being expanded from 30 GB to 95 GB
--User quotas may be increased beyond the current 30 MB limit
--Accounts on the AFS server will be available to all NERSC users
--Transarc offers AFS client-only licenses to ER sites for $600
    Send E-mail to afshelp@nersc.gov for details.

COMMON FILE SYSTEM (CFS) STATUS
-------------------------------

--LASL CFS version 61 installed
    This includes numerous performance enhancements.
--Storage Technology (STK) 36-track drive installed
    This doubles the capacity of a 3480 cartridge when it
    is copied from an 18-track tape drive. However, it will
    take years to copy existing cartridges as most of CFS
    capacity goes into directly servicing user requests.
--Number of cartridges on shelf reduced by 50 percent!
    This has taken place over a four month period. A far
    higher proportion of data is now in the STK silos with
    a vastly better performance than operator retrieved and
    mounted cartridges. Shelf tapes are stored in two vaults
    (down from three) in various locations around LLNL.

CFS CARTRIDGE STATUS
--------------------

Date      In Silo     On Shelf      Total
----      -------     --------      -----
Feb 24    31,002      30,315        61,317
June 7    29,029      14,551        43,580

                      ------        ------
                        -52%          -29%

HOW THIS WAS ACCOMPLISHED
-------------------------

--The cartridge reclaim rate is up by over a factor of five
--CFS administrator added
--Increased operational support
--10,000 "defective" cartridges cleared and removed
     This required re-installation of 18-track tape capability,
     which STK supplied at no charge!
--Online tape category eliminated
     Incoming data was sometimes written on tape then copied almost
     right away to other tapes. Enhancements in CFS version 61
     permit us to monitor tape position with a data compressing
     tape controller and eliminate this extra step.
--Massive cleanup effort to delete migrated, orphaned and abandoned 
     files

WHAT IS NEXT FOR CFS?
---------------------

--Switch to double length tapes
     This potentially doubles the storage capacity.
--Release CFS gateway
     This can increase CFS aggregate bandwidth due to its independent
     data path. This will provide service to workstations.
--Spool up National Storage Laboratory (NSL) Unitree expertise
     We hope to install a prototype system early next year.

FILE SYSTEM CHARACTERISTICS (Cray C90)
--------------------------------------

             File       Total      User
Name       Lifetime     Space     Quota    Back-up
--------   --------     -----     -----    -------
/u  (home) Permanent     4 GB     3.2 MB    Daily
/tmp/wk#     30 days    57 GB       None    None
/usr/tmp    12 hours    53 GB       None    None
/usr/fast   12 hours     8 GB       None    None
/afs       Permanent    95 GB      30 MB    Daily
CFS        Permanent    11 TB     Varies    None

Notes:
--The /tmp/wk1, /tmp/wk2 and /tmp/wk3 file systems each have 19 GB.
--AFS files are available from any computer with AFS client software
    on the Internet. Currently, only SAS and the Cray C90 at NERSC
    have AFS client software.

MOE JETTE: Storage.
- Moving disk from F to C90.  (F doing OK on disk as it deals with smaller 
  jobs.)
- Established large file system to replace multiple smaller ones.
- Expanded AFS service. (Arranged $600 client license available.)
- Set up /usr/tmp to accommodate large temporary storage.  (Not yet heavily 
  used but the large users are thrilled.)
= These changes are the result of a user committee formed at the last meeting.

MOE JETTE: CUB (Central User Bank)
- X window interface is in beta test.
- CFS quotas are now available by user.
- Resource account report generation is in operation (if I heard right).
  This means the old funny tools are finally being replaced.
- SAS and the T3D have been included.
- Lots of system upgrades installed (not of major interest to the user).
- The option of the one-time password will be provided for those needing extra
  security (The cards will cost about $50.).  "KERBEROS will provide DCE 
  capability for the one-time password."  <<-- questioned by the audience.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Description of Kerberos (supplied by Moe Jette)
-----------------------------------------------
Kerberos permits you to authenticate yourself to a collection of 
computer systems one time.  After Kerberos authentication, you 
may securely access a variety of services throughout the collection 
of computer systems without further authentication (eg. klogin, 
krsh, kcp, kftp, etc.).  

A one-time (or single-use) password is only valid for a single 
use.  After that use, it can no longer be used to provide one 
with authentication.

Kerberos itself does NOT provide the capability for single-use 
passwords, although used together they provide very good security 
and ease of use.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

- Six month password expiration is in force.
- Hacker activity is down but can be expected back.  Reference to the book 
  "The Cuckoo's Egg" to illustrate the difficulty of tracking them down.
CFS (Common File System)
- Tape sizes increased.
- CFS Gateway "soon" -- ie direct access to information in CFS without 
  going through the CRAY-IBM bottleneck.
- Starting to stage up NSL's Unitree as a replacement.

FOR FURTHER INFORMATION
-----------------------

For further information on any of these issues, contact:

Moe Jette                                     E-mail: jette@nersc.gov
Group Leader, Large Scale Systems Group       tel: 510-423-4856
National Energy Research Supercomputer Center fax: 510-422-0435

Rick Kendall followed Moe Jette with a discussion of storage problems
and possible disk charges to more accurately reflect costs.

RICK KENDALL: for the user committee.
- Committee chair was J.-N. Leboeuf.
- Many suggestions were made and weighed.
- Two major users are impacting the system: Jack Ritchie and Uzi Landman. 
  Ritchie responded to queries and significant relief actually came out of a 
  mutual accommodation.  Landman has never responded.
_ Much of the issue turns out to be a subset of the need for BALANCED 
  resource utilization.
- Need to improve NERSC auxiliary workstation load.
- /usr/tmp is working.  This and previous bullet can be put in perspective 
  by noting that in the past TWO CPU's were needed just for data migration.
  That's a big fraction of the CRAY resources.
- If disk charges are reinstated as a control mechanism, they need to be 
  staged, based on a good utilization model, and accompanied by a good 
  set of tools (for example the CFS interface is wholly inadequate).

----------------------------------------------------------------------

Break
-----

----------------------------------------------------------------------

ESNET Plans - Jim Leighton

Backbone Upgrade
----------------
Network Services
----------------
--ATM Asynchronous Transmission mode

Jim Leighton: ESNET
- ESNET is a backbone, ie a network to connect networks.
- Carrying ~1.6 Terabytes/mo. today.  It has had and exponential like growth 
  with peaks each October.  MOSAIC probably produced the peak last October.
- Mostly T1 (1.5Mb) links with growing number of T3 (45Mb) links.  Currently, 
  PNL -- FNAL -- LANL -- LLNL -- LBL are T3.
- International: Japan, Germany, Italy, Brazil.  Other links are made through 
  other agencies.
- National: two FIX (links to other agency networks) hubs, one on each coast 
  and lots of links to regional networks.
- Video Conferences: A meeting room can be used (outfitted for ~$60K).
  ESNET provides a central hub needed when there are more than two 
  participating stations as well as an on-line reservation system.  The 
  links are dial-up ISDN lines yielding transfer rates of 112-384 Kb.  Over 
  1000 conferences have already been handled.
- Mbone does a multicast when wanting to broadcast to many receivers.  It 
  is home grown using workstations as routing agents.
- The future? ATM = Asynchronous Transfer Mode.  Multiplexing circuits to give 
  OC3c service at 155Mb/s and OC12c service at 622 Mb/s.  ESNET proposal to 
  develop services with it received much interest and then many protests 
  (10!).  All have now been dropped.  So ATM experiment will start "soon"
  (end of the month, maybe): LLNL-GAC-LANL & SNLA-FNAL & ANL-PPPL.

-----------------------------------------------------------------------

X-windows: How to Get From Here To There - Barry Howard and Alice Koniges
-------------------------------------------------------------------------

Outline
-------
-What is X Windows?
-What is the "X-Windows Problem" we are trying to solve?
-How can NERSC help solve the problem?

What is X Windows?
------------------
-The X Window System is a network-based graphical windowing system.
-Developed at MIT in 1984 and now adopted as industry standard.
-Client-server model: Display Server provides windows for multiple
   Clients on multiple machines.
-Provides the bare bones of a window system upon which any style of
   graphical user interface (GUI) can be built.
-Windows manager provides "look and feel" for the GUI.
   OSF/Motif, OLWM, TWM, BWM (to be announced)
-Latest release: Version 11 Release 5.

What About Security with X Windows?
-----------------------------------
-Authorization required to access a display
 -Host-based
   User enables any client on specified host(s) to display on server
   Use xhost command: xhost + hostname
   Opens opportunity for sniffing messages between client and server
 -User-based
   Special code called "magic cookie" passed between client and server
   Code stored in .Xauthority file in user's home directory
   Usually set up automatically by display manager (XDM)
   Easiest case is one .Xauthority file in shared home directory
-See "Safe X" article in August, 1993 edition of Buffer

What's the Problem?
-------------------
There are several:
-Lack of control over
 -GUIs distributed with commercial and public domain software
 -Capabilities and configurations of users' desktops
-NERSC working toward a Unified Production Environment in 1996
 -X windows will be an indispensable part of work space on MPP
   Debugger, performance analysis tools by late 1995
 -A common GUI will be used to access many of the services at NERSC
   Built on X Windows
   Selected application may, in fact, be run on another computer
   Usage accounting for each application
Need a transition path for providing X Windows capacity for "almost all"
NERSC users which minimizes cost and frustration.

What can NERSC do to help solve this problem?
---------------------------------------------
The four step method to Windows in every office.

First Step: Understand what desktop configurations are in use
-----------
-We don't know how many users lack X-capable desktops.
  -Using surveys (ERDP)
  -Site visits
  -Usage statistics of utilities with both character and graphical
     user interfaces
  -Suggestions are welcome!
-We do know that implementations vary greatly among desktops
-Knowledge of which desktop configurations are most popular will help
 NERSC concentrate on providing information on large, but limited,
 solution set.

Second Step: Provide Education and Motivation
-----------
-Buffer - series of articles on X Windows began in April, 1994
-Consulting - eXpert on call during working hours
   Familiar with various hardware/software tested at NERSC
   Help with PC, MAC, X-terminal and workstation configurations
   Familiar with administrative burden attached to each option
-Site visits - town meetings
-Demonstrations - show power of new tools with graphical interfaces
  -Mosaic,xcub,graphics,MPP tools, Unified Production Environment

Motivation - Graphics
----------
-X will replace Tektronix as the lowest common denominator for graphics
 output.
-Currently:
 -All supported graphics packages on Crays are capable of sending output
  directly to an X window.
 -Ditto for all Cray utilities used to view graphics files.
 -All graphics utilities except PV-Wave on SAS rely SOLELY on X.

Third Step: Provide recommendations on tested desktop configurations
----------
-Need transitional roadmap for "almost all" users

Transitional Roadmap - Which desktop to buy?
--------------------------------------------

                  Advantages                   Disadvantages
                  ----------                   -------------
Workstations  -local processing ability      -expensive
              -independent of other hosts    -complex system administration

X Terminals   -inexpensive                   -X server only
              -simple systems administration -dependent on host
              -X performance

PCs           -availability                  -X performance
              -can run PC applications       -dependent on host
                                             -X server only     

Transitional Roadmap - Cheap X
------------------------------
-UNIX and X for PC - LINUX + X386
  Full featured UNIX for 386 and 486 AT machines
  Source available in BSD and System V flavors
  Includes GNU utilities
  Accommodates Ethernet board or modem
    Serial IP options: SLIP, PPP, CSLIP, term
    Modem speeds up to 14400 (38400 compressed)
    Tested successfully at home with Mosaic, FTP, etc.
  Software cost: FREE

-X on top of MS Windows
  Winsock (Windows Sockets) is de facto standard network software
  for MS Windows.
  MicroX is one choice for X server
  Software cost : $150
  Effective bridge between MS Windows and X Windows

Transitional Roadmap - X Windows at Home
----------------------------------------

Two approaches:
-X remote
   Proprietary X protocol for using X-terminal (or PC) at home
-Serial IP
   Public domain protocol for using internet at home
   Examples: SLIP, PPP
   Need X server software plus user interface (MS Windows)
   Available for Pc, Mac, UNIX platforms

NERSC uses Cisco terminal server to provide 1-800 dialup service
-Published number for telnet access to NERSC hosts
-Unpublished number for Xremote and Serial IP access
   Used by NERSC staff for using X from home
-Issues like security and cost prevent offering this as service now

Fourth Step: Explore possible new services
------------------------------------------
-XRemote server for NERSC customers
 Control cost by cost-sharing and limiting access to a program
 Control security by requiring one-time passwords
-Provide boot/configuration/font server for remote X-terminals
-Others to be suggested by customers

Barry Howard: DCE
X WINDOWS
- Is the base for a lot of DCE as the industry standard client-server model.
- The GUI is built on top of X.  There are already several: OSF/MOTIF, OLWM,
  TWM and shortly one more: BWM (Barry's window manager).  This a part of 
  the NERSC commitment (DDK observation).
- Authorizing use: discouraged use of host-based [host + hostname] in 
  preference to user based "magic cookie".
- Problems: lack of control because of very different general GUI's and 
  very different desktop capability.  
- NERSC wants to go to an X-based "Unified Production Environment" with 
  a common GUI used on various resources.
- Howard outlined what he called a 4 step solution.  I found the steps fuzzy 
  but they did clearly involve a lot of user interaction.

Alice Koniges then described Graphical User Interfaces (GUIs).

Alice Koniges: APPLICATIONS
- Outlined a common X interface.  It included library interfaces.
- Pointed to their "Data Dimensions Interface" -- looks interesting but not 
  yet understood (by DDK).

-----------------------------------------------------------------------

EXERSUG (ERSUG Executive Committee then adjourned for a closed lunch).

-----------------------------------------------------------------------

SPP Project Status and Plans for 1995 - Bruce Curtis and Tom Kitchens
-------------------------------------

SPP - Special Parallel Processing in 1994
-----------------------------------------
-Soni (BNL) weak matrix elements of B-mesons
 allocation: 12,900 CRUs. Used 5,644 (44%)
-Kogut (U. Ill) quenched QCD at finite density of large lattices
 allocation: 8,050 CRUs. Used 5,393 (67%)
-Bell (LLNL) natural transition to turbulence in a waveguide mixing layer
 allocation: 7,200 CRUs. Used 1,006 (14%)
-Cohen (LLNL) toroidal gyrokinetic PIC simulation using quasi-ballooning
 coordinates
 allocation: 3,600 CRUs. Used 2,391 (66%)
-Dory (ORNL)  High Resolution Plasma Fluid Turbulence Calculations
 allocation: 3,500 CRUs. Used 1,965 (56%)
-Dunning (PNL) Ab Initio Molecular Dynamics
 allocation: 3,000 CRUs. Used 2,350 (78%)
-Lee (PPL) Gyrokinetic Simulation of Tokamak Plasmas
 allocation: 2,500 CRUs. Used 877 (35%)
-Chen (U.C. Irvine) Alpha/Energetic-Particle Driven Instabilities
 in Tokamak Plasmas
 allocation: 2,500 CRUs.
-Aydemir (U. Texas) Nonlinear Gyrofluid Simulations of ITG Turbulence
 in Tokamaks Using Field-Line Coordinates
 allocation: 2,000 CRUs.
-Fu (PPL) Gyrokinetic MHD Hybrid Simulation of MHD Modes Destabilized 
 by Energetic Particles
 allocation: 1,500 CRUs.
-Hammett (PPL) Gyrofluid Simulations of Tokamak Plasma Turbulence
 allocation: 1,500 CRUs.
-Lee (LBL) High-Resolution Underground Imaging 
 allocation: 1,200 CRUs.
-Dawson (UCLA) Numerical Simulation of Plasma and Energy Transportation
 in Fusion Devices Using 3D Toroidal Gyrokinetic PIC Models
 allocation: 1,000 CRUs. Used 17 (1.7%)
-Stevens (ANL) Benchmarking Comparison of Computational Chemistry Codes
 with MPPs
 allocation: 500 CRUs.
-Zunger (NREL) Atomic Study of Step-flow Growth and Spontaneous Ordering
 of Semiconductor Alloys for Photovoltaic Applications
 allocation: 100 CRUs.
-Le Brun (U. Texas) Global Toroidal Gyrokinetic Simulation of eta_i-mode
 Induced Transport in a Tokamak-like Plasma
 allocation: 100 CRUs.
-Lester (U.C. Berkeley) quantum monte carlo for electronic structure
 and vibrational eigenvalues
 allocation: 75 CRUs.
-Chua (Continuum Dynamics) Computational Combustion Using Gridless
 Particle Methods on Parallel Computers
 allocation: 50 CRUs.

Problems
--------
Poor utilization during first several months:
-Sloe ramp-up. Users weren't ready
-Added unparallelized physics
-Basis bug where error in input file causes infinite loop
-Other bugs in codes

Lesser factors:
-NERSC bugs and cfs problems
-Interactive Interface (15% before Sherwood, 4% otherwise)

Consequently, 1/3 of the total allocation will not be used
----------------------------------------------------------
-Total allocation remaining divided by wall time remaining = 20.75
-If average cpu/wall ratio for remainder is 13.0, then 66% of total
 allocation will be consumed by end of October

                         Recent Runs
                         -----------

                   CPU/Wall     MFlop/cpu-      Gflop/wall sec
                                 sec/cpu

   Bell              13.8         384                5.3
   Cohen             13.4         355                4.7
   Dunning           13.4          51                0.7
   Soni              13.3         141                1.9
   Dory              12.3         330                4.1

Tom Kitchens - comments on SPP
------------------------------
- cap of 50K CRUs
- 10% of total CRUs
- 5% of disk storage
- a few big users/slow startup/conversion problems
- proposal not to expand in FY 95

Byers comments on SPP discussions:
----------------------------------
NERSC had a proposal not to expand at all. Kitchens mostly bought on
to this, but others objected. It was left something like NERSC 
would run a workshop to encourage new SPP users, and hopefully to get 
them up to speed with less pain and time as experienced by present
users. No one new would be allowed on unless they first went to the 
workshop. How hard and fast this rule is to be, was not clear.
Some expansion would then be allowed, the precise number mentioned
by McCurdy.

                   <<< SPP  DISCUSSION >>>

TOM KITCHENS: SAC suggested SPP (Special Parallel Processing) program with 
the expectation that "Special" would eventually become "Standard".  They 
dedicated about 10% of the available computer power at NERSC.

BRUCE CURTIS
- SPP topics included Lattice Gauge/ Fusion / Fluid Dynamics / Ab Initio 
  Molecular Dynamics.
- SPP codes did not perform as well this year and interest was not as high.
- NERSC recommendations:  (1) limit to 50 KCRU's (flat); (2) limit to 18 
  SPP users (also flat) [This got the greatest objections from the floor.
  DDK for one believes it a mistake.]; (3)try not to have a large turnover.

JACK BYERS: primarily reviewed the E-MAIL on the subject [includes DDK opinions]

C. WILLIAM McCURDY: proposed a mandatory SPP workshop.  Users thought it might 
  best be expanded to include users who want to propose for SPP.  [DDK thinks 
  that it is OK but doesn't address primary problems of SPP: shots are once a 
  week so unless one needs huge memory/disk it is not cost effective to the 
  user even if he/she is already parallelized.]

TOM KITCHENS: pointed out that the SPP allocation time frame is 1 December
  for the next round (~ year).

B. CURTIS: noted that we should expand test queues NOW.

[DDK NOTE]: Besides developing parallel code, the SPP is motivated by a need 
for (1) large memory -- its the only way to get 200Mw; (2) Scheduling 
security -- if you want large memory, you will not get swapped in very often;
(3) Real Time Turnaround -- if you are asking for large memory or long times, 
having a definite shot can be a lifesaver; (4) Allocation -- this is 10% of 
the resources in a new allocation basket.  Within the bounds of the SPP, 
these considerations are more significant than parallelization so the SPP 
should not be taken as an indicator of interest in MPP.
 
--------------------------------------------------------------------

MPP Procurement Status - Michel McCoy
-------------------------------------

Award of competitive procurement expected in April 1995.

MIKE McCOY: MPP Acquisition
- Basic document Blue Book (ER-0587, Feb., 1993) called for MPP FY94.
- Current schedule calls for FCM (Fully Configured Machine) mid-FY96.  It 
  should be capable of both multiple users and multiple programming models; 
  have Math Libraries and Programming Tools; and be balanced in CPU to 
  memory to I/O.  It should be FAST!!!!  It should run GCS (Grand Challenge 
  Scale) codes on 500-1000 nodes.  It will be dedicated to that purpose 
  "at night".
- The PEP (Pilot Early Production) machine has to be available to demonstrate 
  capability by running bench marks.  It should be available to users as 
  a development machine April 1995.  By preparing codes, it should give some 
  capacity relief.

--------------------------------------------------------------------

Break

--------------------------------------------------------------------  

CAPACITY AND CAPABILITY SOLUTIONS FOR 1996 AND BEYOND - Tammy Welcome
-----------------------------------------------------

Definitions
-----------
Capability System - A capability system possesses sufficient processing
     power and memory to accommodate a single Grand Challenge-scale
     application utilizing all the resources of the system.

Capacity System - A capacity system can support, in a flexible manner,
     a large load of users simultaneously developing, debugging, and
     running a large mix of applications.

The Problem - Capacity verses Capability
----------------------------------------
-- "I'm being forced off the machine"
-- "I'm not getting enough time for my capability solutions"

CURRENTLY THE C90 IS FULLY UTILIZED
----------------------------------

Batch queue length - one measure of demand

Analysis of codes using cycles on the C90
-----------------------------------------
-Snapshot based on accounting records from CUB
-Covers 64 day period beginning April 5
-Filtering of accounting data will enable us to store
 accounting information for greater length of time

A SNAPSHOT OF WHERE CYCLES GO ON THE C90
----------------------------------------

CODE      USER        SITE(S)         CPU        % OF       CUMULATIVE
NAME      NAME(S)                  HOURS       C90        % OF C90

li.x/     Soni        Brookhaven     2056        8.4          8.4
newlu.x

sqed*.x                            1699        6.9          15.3  
chem_1bc  Kogut       U. of Illinois 457         1.9          17.2
spectrum                           423         1.7          18.9

xtreb     Kerbel,     GA,LLNL        354         1.4          20.3
          Waltz

xden6     Sharpe/     LANL           314         1.3          21.6
          Gupta

dtem      Spong/      ORNL           295         1.2          22.8
          Lynch

prg*      Jansen      UCSD           257         1.0          23.8

nrqcd.ds  Sloan       FSU            242         1.0          24.8

lmbh      Klepeis     LLNL           224         0.9          25.7

xvel      Charlton    ORNL           223         0.9          26.6

xg3e      Williams/   LLNL           218         0.9          27.5
          Dimits

amr       Crutchfield LLNL           215         0.9          28.4

wxex*     Park et.al. PPPL           208         0.8          29.2

pies*     Merkel et.  PPPL           206         0.8          30.0
          al.

cup       Strand      NCAR           195         0.8          30.8

gotsy     Turner et.  PPPL           194         0.8          31.6
          al.

xvg04*    Atherton    LLNL           185         0.8          32.4

Additional cycles available soon
--------------------------------
-July 1994 - time available via MPP allocation program is limited
-Late 1995 - PEP system available for capability (and limited
 capacity) computing
-Late 1996 - FCM system available for capacity and capability 
 computing

Proposed Solution
-----------------
-To solve both capacity and capability needs...
  - Make use of PEP system to enable C90 capability codes on
    the MPP, possibly freeing cycles on the C90 for capacity
    codes.
  - Use FCM to support both capability and capacity codes.
  - Make effective use of all NERSC resources!

Why would the research scientist support this plan?
---------------------------------------------------

Significant performance gain...
-------------------------------

System        LKF*     Efficiency      #PEs       System Performance

C90          250 MF       .75            16          3 GF
PEP         15-60 MF**    .30           256          1.15-4.6 GF
FCM          60 MF        .4            512          12.3 GF

* LFK geometric mean
** depending on the technology


Access to more memory...
------------------------

System       Memory       #PEs        Total Memory

C90                                     2 GB
PEP          64-256 MB*   256           16-64 GB
FCM          256 MB       512           128 GB

* depending on the technology

Access to unsaturated PEP system...
-----------------------------------
-Improved wall clock turnaround time

Implementing the solution
-------------------------
1. Staff works with research scientists to parallelize capability codes.
 Issues: - Portability of resulting code
        - Parallel platforms on which to run code
        - Single source
        - adoption of modified source code by scientist
2. Staff enables codes to run on PEP on day 1.
 Issues: - Existence of software to support the parallel code
3. Research scientists run capability codes on PEP on day 2.
 Issues: - Training scientist to run in this new environment
4. Research scientists run capacity and capability codes on FCM.

Proposed Solution
-----------------
To solve both capacity and capability needs...
Enable more users on SAS and AFS, providing an attractive platform
for preprocessing, post-processing, and developing codes.
Make effective use of all NERSC resources!

Why would the research scientists support this plan?
----------------------------------------------------
Better interactive service...
-Improved response for user doing code development
-Rich software environment 
-Shared home directories with CRAYs via AFS

Implementing the solution
-------------------------
1. Staff upgrades SAS and AFS to accommodate more users.

RESULT
------
-By Parallelizing the top cycle-burning capability codes to
 run on the PEP system, we potentially free up 30% of the C90
 for capacity codes in 1996.
-One year later, we expect the execution environment on the FCM
 to support both capability and capacity codes.
-By shifting some of the interactive workload over to the auxiliary
 servers, we can potentially provide better interactive service 
 to the user.

---------------------------------------------------------------------

ACCESS TO THE HPCRCS AND TO THE LLNL T3D - Tammy Welcome
----------------------------------------

Purpose
-------
- Develop user expertise in anticipation of delivery of NERSC machines
- Broaden base of user codes on parallel platforms
- Develop NERSC staff expertise

Note: ACCESS IS FOR CODE DEVELOPMENT RATHER THAN PRODUCTION USE.
----------------------------------------------------------------

Limited access to HPCRCs and H4P
--------------------------------
- ORNL Paragon XP/S 5 and XP/S 35 approx. 6%
  '5 - 66 PEs, 16 MB, 9.6 GB disk
  '35 - 512 PEs, 32 MB, 144 GB disk
- ORNL KSR1
  64 PEs, 32 MB, 20 GB disk
- LANL CM-5 - approx. 5%
  1024 PEs, 32 MB, 60 GB disk
- LLNL T3D - approx. 7.5%
  128 PEs, 64 MB, 90 GB disk

Timeline for access
-------------------
Early May - request for proposals mailed to PIs
Mid May - request for proposals appears in Mosaic
May-June - request for proposals appears in Buffer
           (May 1994 issue)
June 17 - proposal deadline
??      - MPP access awards
July    - access to Paragon '5, Paragon '35, KSR/1, CM-5
Sept    - access to T3D

A total of 17 proposals were received
-------------------------------------
- 9 have previous parallel experience (3 via SPP)
  8 have NO previous parallel experience
- 10 requested regular consulting support
  5 requested help developing the parallel applications
  1 undecided
  1 unknown

More about the proposals
------------------------

Parallel Platform...
--------------------

      7 - T3D                (11)
      2 - CM-5                (4)
      1 - Paragon             (6)
      1 - KSR                 (2)
      3 - T3D and/or Paragon
      1 - Paragon and/or CM-5
      1 - T3D and/or CM-5
      1 - Paragon and/or KSR-1

Programming Paradigm
--------------------

      9 - message passing
      2 - data parallel
      1 - shared memory
      3 - undecided
      2 - no comment

List of proposals for MPP access - available from Tammy Welcome

Access awards
-------------
- Funding decisions will be made by OSC
- Awards have a 6 month duration
- PI is responsible for short project status report

TAMMY WELCOME: Access Program (for development only)
- Time available at the two research centers and also on the LLNL T3D.
- Received 17 proposals, half with experience already.
- Five requested development help.
- T3D has been the most popular.
- Message passing preferred --> most likely to be available on other machines.
- Decisions on which proposals being made by OSC.  They are of six month 
  duration [too short a limit] and the PI is responsible for a short report.
- Capacity/Capability issue: Want to use the PEP to move some C90 codes 
  to free up the C90.  The idea is that the top 15 codes use 30% of the C90 
  and are chomping at the bit to use more.  To implement: (1) staff works with 
  scientist to convert code. (2) Staff enables those codes to run on the PEP 
  "day 1".  (3) Research scientists run "day 2".

----------------------------------------------------------------------

STORAGE A NEW VIEW - Steve Louis
--------------------------------

Storage - a new paradigm
------------------------
- "How can you keep on movin' (unless you migrate me, too)"
  ... Ry Cooder, Into the Purple Valley
- S-curve model /Performance or value vs investment or time

Storage - Migration from old to new
-----------------------------------
Old Paradigm
------------
- Expensive mainframe file servers
- Proprietary operating systems
- Non-standard device interfaces
- Slow-speed channel bottlenecks
- Mainframe-centered architectures
- Inefficient access mechanisms

New Paradigm
------------
- Less expensive workstation servers
- Open systems
- Common interfaces
- High-speed scalable, parallel I/O
- Network-centered architectures
- Transparent access mechanisms

Storage - Hardware Technology Trends
------------------------------------
- Disk seek, latency, and data transfer are all improving
- Disk arrays with higher performance and lower cost
- New advanced RAID levels now under development
- Magnetic recording technology appears almost limitless
- Helical and narrow-track longitudinal both improving
- Automated robotics are becoming standard equipment
- Fiber optics becoming transmission technology of choice
- Gigabit networks and applications are more common

Storage - Software Technology Trends
------------------------------------
- Open systems, interoperability, and standards necessary
- Distributed client-server computing more widespread
- Seamless data interchange between applications
- Scalability of capacity and data transfer rates necessary
- Integrated storage system management capabilities
- Transaction and metadata management for storage systems
- More electronically-saved and machine-readable storage
  (for the environmentally concerned: 1 petabyte = 42.5M trees)

NSL - Technology Base System
----------------------------
- IBM RISC System/6000 Model R24 (or 990)
  - 512 MB memory
  - 92 GB SCSI-2 Fast/Wide disk
  - HIPPI, FDDI, and Ethernet connectivity
- 3490E (36-track and compression) robotic archive
  - 4 drives
  - 1,340 cartridges (1 TB uncompressed)
- 16x16 HIPPI crossbar switch
  - connects Crays to Base System
  - connects Crays to Crays
  - can also isolate C-90 from NSL switch problems
- NSL-UniTree software environment

NSL - Technology Extended Base System 
-------------------------------------
- Additional production-level capabilities for Base System
  - compatibility with Base System hardware/software
  - no vendor bias toward existing base configuration
- May take several forms
  - extensions to existing Base System disk or robotics
  - upgrades to existing Base System disk robotics
  - preliminary version of Fully Configured Storage
- Extended Base System paves way for 
  - HPSS conversion
  - parallel I/O
  - network-attached devices
  - scale-down CFS

NSL - Technology Fully Configured Storage
-----------------------------------------
- Fully Configured Storage is the last step:
  - full HPSS environment
  - full high-performance peripherals
  - scalable, parallel I/O to FCM MPP
  - dismantlement of CFS
- Procurements may be split among several vendors if:
  - best fit is through selection of several technologies
  - no single-vendor solution exists to meet requirements
- Funding issues:
  - dependent on NERSC FY96 and FY97 funding
  - may be coupled with Extended Base System as a
    two-phased procurement similar to the PEP/FCM

STEVE LOUIS: storage
- Old CFS used (1) mainframe file servers; (2) proprietary operating systems;
  (3) non-standard device interfaces; (4) slow channel speeds; and 
  (5) inefficient access mechanisms.
- New system will based on workstations, UNIX, standard high speed interfaces,
  multiple paths.

---------------------------------------------------------------------

R AND D: TRACKING THE COMPUTATIONAL EXPLOSION - Alice Koniges
-------------------------------------------------------------

As soon as a larger system is provided, it is filled to capacity.

ALICE KONIGES: Outreach
- To unite diversified users with collaboration and education.
- Workshops and Classes: Intro. MPP Computing (thought up at this meeting); 
  Adaptive Grid Methods (Dec.); PVM and Distributed Computing; and more...
- Expanded Visitors Program  -- ie spend some time at NERSC.
- MOSAIC: Research Highlights Program.

----------------------------------------------------------------------

Open Discussion
---------------

Rick Kendall - large memory jobs are having major problems
               checkpoint failure
               slow queue

DISCUSSION PERIOD: MPP was discussed as we went along.  The problem of poor 
treatment of large memory jobs was brought up by Rich Kendall -- NERSC people 
appeared surprised.

BILL McCURDY: The fragile nature of the effort to put an MPP on the floor 
              in FY96.
- When creating the Bluebook of FY92, the question was asked whether it 
  made sense to pin one's hopes on a high end MPP.  It was concluded that 
  the vector machines would not compete with the workstations.
- The C90 will be in its 4th year by then and so in its "mature latter stages"
  (DDKism)
- The C90 will be paid out by then so what one must hope for is to switch the 
  funding to the MPP.  No hope of an increase, what one hopes for is that 
  support is not taken away.
- The USERS will have to motivate the acquisition as they are the only 
  ones that can do it.  They should remember that as they push for the 
  workstations (that they admittedly must have), they should push for 
  the high end too.
- The two T3D's from the CRADA (~$50M over 3 years) are essentially sold out.
  (128 node at LANL and 128 +128(local LLNL) node at LLNL.  These, like the 
  RC's at LANL and ORNL, are NOT general access.
- 30% of the C90 (15 codes) will be converted in advance. (The C90 is 75% 
  of the NERSC resources.)
-  NERSC has a commitment to do precisely what the users tell us.
- $2B/yr is assigned to equipment in the DOE and there is a committee to 
   try to save some of this.  (John Fitzgerald is on that committee and 
   pointed out that "lease - not to own" costs state sales tax.  There is 
   an enormous saving possible there.)  Savannah River has the biggest 
   computer budget in the DOE -- why?

   ANS: record retrieval and maintenance for environmental cleanup.  Many 
   of those records are on obsolete equipment and software but they MUST 
   be available.  It is a massive EM problem which unfortunately makes it 
   (falsely) look like computation is getting large funding.

- SSC still has workstation farms there but they can't be moved.


End of meeting - adjourn