A grateful acknowledgement is made is Dale Koelling (DDK in the minutes) for supplying extensive notes from the meeting.
Brian Hingerty Vice-Chair/Secretary beh@ornl.gov
The general theme of the first day of the ERSUG meeting will be realized and projected improvements in the NERSC computing environment as well as on some remaining bottlenecks inhibiting most efficient access to system resources. A report from the committee to study options for controlling disk space usage will be part of a discussion of integrating host disk systems with AFS and archival storage at NERSC. There will be a progress report from ESNET and a detailed report on the transitions to X-windows as a baseline standard for our users as we move into an era in which vendor software is driven by, and only compatible with, X-based standards.
Also on the first day, a new program initiated by the Office of Scientific Computing to arrange for Massively Parallel (MP) computing access will be described. NERSC Principal Investigators are now applying for access to parallel computers at the High Performance Research Supercomputer Centers at Los Alamos and Oak Ridge and to the Livermore T3D. Such access will allow scientists to develop applications in anticipation of the arrival of the NERSC MP system. Following this will be an update on the status of the NERSC procurement and a discussion of the anticipated impact that the transition will have on our users. This will include an approach for addressing the capacity-capability antithesis, not only during the transition period, but also beyond.
Since the meeting is dominated by what is about to happen rather than by what has happened, the NERSC staff will prepare a number of short documents describing recent progress and the status of critical areas. These will have to substitute for detailed presentations and will include topics such as the Energy Research Decision Package (ERDP), Centralized User Bank (CUB), etc.
We expect this meeting to be very stimulating, its focus on the present will be on extracting the maximum out of existing resources, and the focus on the future will be on planning a feasible and realistic path to effective use of new technologies. In this journey, the major constraints will be to find a reasonable path for all of our users regardless of their computing requirements, since good science is not always the biggest science...
AGENDA
------
Monday, July 11, 1994
---------------------
8:30 Welcoming Remarks - Jack Byers
8:40 Welcome from Washington - John Cavallini
8:45 Washington View - Tom Kitchens
9:05 Production Systems Plans - Moe Jette and Rick Kendall
New storage usage paradigms
Disk reconfiguration committee report
CUB plans
10:05 BREAK
10:15 ESNET Plans - Jim Leighton
Backbone upgrade
Network services
11:00 X-windows: How To Get From Here To There - Barry Howard and
Alice Koniges
12:00 LUNCH (EXERSUG meeting to be held in the Hideaway Room of the
Holiday Inn Restaurant)
1:30 SPP Project Status and Plans for 1995 - Bruce Curtis and
Tom Kitchens
2:15 MPP Procurement Status - Michel McCoy
2:45 Break
3:00 Transition to Parallel Computing - Tammy welcome and Steve Louis
Access to the HPCRCs and to the LLNL T3D
Capacity and Capability solutions for 1996 and beyond
Storage expands to meet demands
4:00 R and D: Tracking the Computational Explosion - Alice Koniges
4:20 Open Discussion
5:00 Adjourn
Tuesday, July 12, 1994, was used for presentations of the Energy Research
Power Users Symposium (ERPUS). The proceedings of this symposium will be
published elsewhere by DOE.
------------------------------------------------------------------------
NERSC NEWS-FLASH - July 1994
Fast NQS batch queues on C-90 allow 6 Gb file space in /usr/temp
NQS BIGFILE batch queue on C-90 used for jobs declaring maximum file
space over 6 Gb, to 16 Gb
/usr/tmp file system on the C-90 has 53 Gb available
CFS gateway lets workstation users access CFS via FTP, giving them great
storage capacity, bypassing Crays
Installation of 36-track CFS tapes is cutting in half the number of tapes
formerly kept offline in vaults
Tripling AFS storage will let all NERSC users have AFS access
Fortran 90 now licensed on C-90
C++ version 1.0.1.1 now available on the Cray-2s
X-windows BUFFER articles and NERSC machine room tour now in MOSAIC
T3D arrives at NERSC on July 12
X-windows interface to CUB and ERDP available
-------------------------------------------------------------------------
Morning Session (July 11, 1994)
-------------------------------
GENERAL NOTES:
- THE NSF PLANS TO GET OUT OF THE R & E NETWORKING BUSINESS! (This could
really upset the system of networks. DDK)
- Jack Byers will graduate to emeritus status at the next meeting and Brian
Hingerty will become chairperson.
- Fusion was less predominant and chemistry (PNL and ANL) was strong.
- One petabyte = 42,500,000 trees of paper.
- Day 2, the ERPUS contribution, will be put up as a MOSAIC page at NERSC
so it will not be included here.
Welcoming Remarks - Jack Byers
------------------------------
It was pointed out that the SPP (Special Parallel Processing) program
has had an impact on the usage of the Crays. Some large single
processor jobs have had difficulty running. Not all has been positive
for the users.
JACK BYERS: Chairman's greeting.
- Pitch for EXERSUG - ERSUG - SAC communication and cooperation.
- Noted importance of memory as a part of the SPP program.
(Yea!!!! - DDK)
Welcome from Washington - John Cavallini
----------------------------------------
Series of reviews of large computational projects in progress (June 1994)
Budget problems - do more with less
Transition to Massively Parallel - software problems
vendors, conversion of code
performance issues
FY96 budget not yet ready till Aug 94
FICCIT committee active:
Anita Jones (DOD) covers us - advisory panel being setup
JOHN CAVALLINI: Washington view 1.
- Network, computational projects, etc. reviews are completed.
- Budgets are "not looking good". No FY96 guidance possible yet.
Growth from initial $7M to $107M.
- Important concepts: virtual machine, scalable I/O, Mass store.
- "FIXIT" --> "National Science and Technology Committee" and under that
"Committee on Information Technology".
Washington View - Tom Kitchens
------------------------------
Meeting is now being recorded on M-BONE video tape for future reference.
Budgets now 6% down in most DOE Offices for FY95.
National Information Infrastructure - important to administration
Communications Committee active
New projects - less money
Personnel changes at DOE
New initiative - advanced computing initiative (ACTI)
money from various offices
FY 95 $23-40M
ERSUG requirements document needs update
TOM KITCHENS: Washington Report 2.
- Flat is the best you can assume.
- "National Information Infrastructure" is in focus (DDK comment: Gore)
Something that was introduced that I don't understand is "Prototype Virtual
Agencies" in the discussion of developing this.
- Proposed DOE/ER reorganization would combine a number of parts: OSC/
Technology Transfer / Basic Energy Sciences / Advanced Energy Projects
(wow! DDK)
- People Changes: OSC has a lot of turnover in their detailees. New Federal
Employee for "Information Technology"
- New initiative in "Advanced Computational Technology:. It is to deal
with petroleum problems. Funding to come from many sources all of which
have been significantly reduced. FY95 funding will be $23-40M.
- MAJOR REMINDER: Need a "needs" document update by next meeting. May be
able to clip major portions from the blue book.
-----------------------------------------------------------------------
Additional notes supplied by Tom Kitchens (DOE-OSC)
---------------------------------------------------
Washington View - Tom Kitchens
------------------------------
It's Washington so let's talk about science support, policies and
organization - what else is there? Most budgets are expected to be
5-10% down next year in the DOE; many other agencies will also have
decreases in FY'95. A flat budget is suddenly a fat budget for FY'95
and FY'96. The signals have been changed - High Performance Computing
and Grand Challenges are not completely out but the Administration is
more interested in the National Information Infrastructure (NII) and
National Challenges. The Administration has its new National Science
and Technology Council (NSTC) responsible for more facets of technology
than the old FCCSET: The High Performance Computing, Communications,
and Information Technology (HPCCIT) organization is now a subcommittee
of an NSTC committee, the Committee on Information and Communications
(CIC) which is also responsible for NII. This means that HPCCIT has
dropped in priority and has substantial competition for attention in
its superior committee. It is going to be far harder to convince this
organization that High Performance Computing (HPC) needs even flat
support. The ERSUG "green book" is now three years old and must
be updated soon with regard to the needs and what you have learned in
the last few years: the ERSUG requirements need to be well supported by
statements on the societal impact of the work being done, good
accomplishments and expected milestones if the report is to be read
with any interest. I hope to see the EXERSUG appoint a strong group
to undertake this task.
The Administration is also interested in inter-agency and
intra-agency collaboration; in fact, the HPCCIT and the Global
Climate Initiative are being used as pilot "Virtual Agencies" where
components of several agencies are being guided by the President's
Science Advisor's Office, OSTP, to attack a common problem. This has
plusses and minuses; we must do all the internal budget preparation
reports to DOE as well as to the virtual agency, but we do reap some
additional visibility in the OSTP. This virtual plus has not yet
improved our budget but maybe there will be an effect in FY'97! The
message is: working with people from other agencies - or other parts
of your own agency - is good, especially if you can show some
leverage of support. ERSUG needs to tell more about its members
work and interact with scientists and technicians both inside and
outside the DOE. ERSUG need to be strengthened: having this meeting
videotaped for the Mbone and putting it next to the Energy Research
Power Users Symposium (ERPUS) and the Office of Program Analysis Peer
Review of Energy Research's Computational Science Projects was to make
it easier for more users to attend. A strong ERSUG will be essential
to maintain even a constant budget for computational resources.
Another example of an collaborative inside the DOE is a program to
aid the Domestic Natural Gas and Oil Industry through a program called
the Advanced Computational Technology Initiative (ACTI) which is
managed and supported by the Office of Fossil Energy, Defense
Programs, and Energy Research. This program is based on DOE
Laboratory collaborating with the domestic gas and oil companies and
the appropriation committees have pinned its budget somewhere between
$23 and 40 million. The plans are quite flawed for our perspective at
this time but there are several months to work them out. The lesson
here is that if you don't initiate the cross-cutting work, someone else
will - and you probably won't like the way they want to do it!
Inside the Department and inside Energy Research some streamlining
is being proposed. An early-out program has been in effect at DOE
Headquarters. The Director of Energy Research, Martha Krebs, is
proposing to form two new divisions in ER, one that would include the
OSC with ER's Technology Transfer Office, SBIR, and BES/Advanced
Energy Projects program office. (This has now happened.)
In OSC we have had some changes in personnel: George Seweryniak has
joined us to manage much of the ESNET effort, Steve Elbert has
returned to Ames and Wally Ermler has joined us from Stevens Institute.
We expect to add another Federal Employee to handle Information
Technology and other issues. (It can now be told that this person is
Mary Anne Scott who has been in the Office of Fusion Energy and has
often participated the ERSUG meetings.)
I want to remind you not to forget the importance of updating and
strengthening the ERSUG computational requirements document (the
'Green book'). I hope you have a good meeting.
-----------------------------------------------------------------------
Jack Byers' comments on need for a new green book for DOE
---------------------------------------------------------
EXERSUG members:
Note Kitchens' appeal for stronger ERSUG.
We all need to work on this. E-mail between ourselves and Kitchens
clearly isnt enough. We need plans suggestions mechanisms we can use
that we now don't have or don't use.
Also following is more push for us to get going on the green book.
I will need help. Am starting to work with McCoy and MIrin (from NERSC)
on division of work between NERSC and EXERSUG.
I am presently struggling with my version of statement from users
point of view of needs, requirements and trying to see that it
fits it in with a statement of NERSC vision by Mike McCoy.
When he and I get to some partial agreement I will send this to you
for editing modification etc. My present idea is that the users
statement ought to be independent of NERSC or Osc or anybody.
If that makes sense, the NERSC vision would naturally stand as a response
to the users statement of needs.
It might make sense to plan to have the ERSUG users statement targeted
elsewhere also, ie, not to use it only for the green book. This
might serve as an initial action in making ERSUG stronger. Ideas for targets?
I will need help from all of you at least on the science accomplishments
sections in your areas. If you cant do this yourselves, please at least
take responsibility for farming it out to people in your discipline.
Potter has agreed to do the climate section.
I have a lot of good material (3 ERPUS papers) on QCD. I will take a first cut
at pulling out a statement of needs and accomplishments from those papers.
But I will need a high energy physics person to at least edit that and perhaps
even rewrite what I can do.
There is some more material from ERPUS that you might use as starting
points, tho the most complete ERPUS seemed to be the QCD papers and
the ocean modeling paper by Malone. Contact me for a list of what
I have. I haven't got anything from the ERPUS papers of Leboeuf, Hammett,
Colella, Kendall and others I think.
You also should look at the previous green book to see what is involved.
if you don't have copies, E-mail kitchens for them.
There is a possibility that NERSC will hold a workshop to bring the
green book together, early next year. This is NOT to suggest that we
are off the hook, but rather to point out that all of the rough drafting must
be complete by then, and probably we should try to have each individual
science section completed in final form, so that the meeting could then fill in
the holes, stitch together the pieces, and make coherent summary statements.
---------------------------------------------------------------------
PRODUCTION SYSTEMS PLANS - Moe Jette and Rick Kendall
------------------------------------------------------
--Centralized User Bank (CUB)
--Portable Batch System (PBS)
--Preparing for arrival of Cray T3D
--Preparing for installation of UNICOS 8.0
--Major file system reconfiguration
--CFS converting to 36-track tapes
CENTRALIZED USER BANK (CUB) - RECENT UPDATES
--------------------------------------------
-Complete X window interface
In /usr/local/new/xcub on Crays and SAS for beta testing.
Will be moved to /usr/local/bin/xcub (ie. production) this summer.
-CFS quotas by user available
Each user can be allocated some percentage of the repository's
total CFS allocation. The CFS allocation, like the CRU allocation,
can be oversubscribed
-Resource accounting report generation
Monthly accounting reports based upon data from CUB
The UNICOS timecards are no longer used
-CRU reserve controls for SAC
SAC members can manage their CRU reserves with CUB
CENTRALIZED USER BANK, FUTURE PLANS
-----------------------------------
-CFS reserve controls for SAC
SAC members will be able to manage their CFS reserves with CUB
To be completed by September
-Port to SAS and T3D
We will be able to account for SAS resource use by December.
Port to T3D completed in July (for account management) and
August (for accounting). ER users will have access to about
7.5 percent of the T3D. We plan to allocate T3D resources
completely separately from other NERSC resources.
-History report
This can be used to review recent changes to the database
(records of move, infuse, modify, and other commands).
-Many improvements for security and reliability
Update to Oracle 7 for database management.
Better encryption of message traffic.
Messages recorded and can be played back.
-Change login name
Each client will be able to change his login name throughout
NERSC with a single CUB command. To be available this winter.
-Change password NERSC-wide
Each client will be able to change his password throughout
NERSC with a single CUB command. To be available this winter.
-Users in multiple repositories
Each client will be able to access multiple repositories from
a single login name and password. He will be able to specify the
repository to use at login time. To be available this fall.
-One-time password (SecurID cards)
Since each password will be used only one time, its capture
by a hacker poses little security threat. We are currently
working with SecurID cards since we already have the software
and some cards. We are investigating other one-time password
schemes including Enigma Logic cards, S/Key (no cards required)
and others. NERSC would purchase cards for some of its staff
(those with special privileges). Cards purchased by client
sites (about $50 per card) could be supported by NERSC without
additional cost. Send E-mail to jette@nersc.gov for details.
-Restrict access of foreign nationals
Foreign nationals from specific countries will have
restricted access to sensitive data.
-Modification of address, phone number, etc.
Clients will be able to modify this information as needed with
a CUB command. To be available this fall.
-Support for Kerberos
There are several incompatible versions of Kerberos. Our current
intention is to support Kerberos version 4, as used by AFS.
-Identification of resource use by process
We intend to record resource use by process for long running
processes. While the volume of data involved precludes us from
recording in CUB each process executed, a small number of long
running processes account for most resources consumed.
-Real time monitoring of resource use by process
Users will be able to monitor the resource use (CRUs) by a single
process in real time.
-More information about CUB is available online
For a brief description type : man setcub
For a complete description type: document view setcub
-Help prioritize these tasks
We welcome your comments to help us prioritize these tasks.
Direct your comments to jette@nersc.gov
PORTABLE BATCH SYSTEM (PBS)
-POSIX compliant batch system
This is the only batch system under development which
complies with the POSIX standards and is well suited
to satisfy supercomputing needs.
-Being developed with NASA Ames
NASA Ames developed NQS, the current batch standard.
PBS fixes many of the shortcomings of NQS.
-In beta test mode
Many computer vendors are interested in using PBS
We expect PBS to provide a consistent batch system
across all NERSC platforms.
UNICOS 8.0 AVAILABILITY
-----------------------
--Available from Cray in March 1994
--AFS available from Pittsburgh Supercomputer Center in July 1994
--Preliminary installations schedule
August Cray T3D (after machine's acceptance)
September Cray/F
October Cray/C
November Cray/A
UNICOS 8.0 FEATURES
-------------------
--Multi-threaded kernel reduces system overhead and improves interactive
use of the machine
--Unified Resource Manager (URM) - improves control of resources
--POSIX compliant shell and utilities - improves interoperability
--Kerberized clients and utilities (klogin, krsh, kcp, kftp, etc.)
DISK RECONFIGURATION IMPLEMENTATION
-----------------------------------
--Removed /tmp/wk3 and /workbig file systems from Cray/F
Sufficient disk space was available for Cray/F
without these file systems
--Removed /workbig file system from Cray/A
--Added 53 GB /usr/tmp file system to Cray/A
The disk space came from the three file systems listed above.
--Constructed 8 GB /usr/fast file system on Cray/A
We cache /usr/fast on SSD during SPP shots for added speed.
--Both /usr/tmp and /usr/fast have 12 hour purge times
This short lifetime permits us to keep adequate
disk space available.
--Neither /usr/tmp or /usr/fast have data migrated to CFS
The load could severely impact CFS if data migration
were used.
--Reconfiguration completed on May 16, 1994.
DISK RECONFIGURATION AND NQS
----------------------------
--Big file queue (bfq) on Cray/A uses /usr/tmp/workbig directory
/usr/tmp/workbig is a directory in /usr/tmp file system
rather than an independent file system. This does not
insure 16 GB of disk space for big file queue jobs, but
we will not start a job until at least 150 percent of
the requested disk space is available. Pre-allocating
files at the time the job starts should insure sufficient
disk space.
--Disk limit for all NQS high priority queues increased to 6 GB
Several jobs with moderate storage requirements (2 to 6 GB)
can now execute simultaneously. Formerly jobs with storage
requirements in excess of 4 GB could only execute one at a time.
--Big file queue still used for jobs requiring over 6 GB of storage
DISK RECONFIGURATION RESULTS
----------------------------
--No disk space shortage on Cray/F
--Utilization of /tmp/wk# substantially reduced on Cray/A
These file systems are no longer filling. The reduced
load also improves the I/O response time.
--Data migration reduced
-Thirty percent more data is typically kept on Cray disk
-Less data is kept on CFS
-The system overhead has been reduced by a few percent
--New /usr/tmp being used by a small number of NERSC clients
Although only a small number of NERSC clients use
/usr/tmp, the storage involved is substantial. Typically
26 GB of /usr/tmp is used.
ANDREW FILE SYSTEM CHARACTERISTICS
----------------------------------
--NERSC's AFS server is accessible from any AFS client
One can access data on NERSC's AFS server from anywhere
on the Internet
--Good security is provided with the AFS version of Kerberos
--The storage uses inexpensive (commodity) disks
--The bandwidth is significantly lower than Cray disks
Caching is used and can result in speeds approaching
Cray disk. Cache size is currently 858 MB.
ANDREW FILE SYSTEM ENHANCEMENTS
-------------------------------
--NERSC's AFS server being expanded from 30 GB to 95 GB
--User quotas may be increased beyond the current 30 MB limit
--Accounts on the AFS server will be available to all NERSC users
--Transarc offers AFS client-only licenses to ER sites for $600
Send E-mail to afshelp@nersc.gov for details.
COMMON FILE SYSTEM (CFS) STATUS
-------------------------------
--LASL CFS version 61 installed
This includes numerous performance enhancements.
--Storage Technology (STK) 36-track drive installed
This doubles the capacity of a 3480 cartridge when it
is copied from an 18-track tape drive. However, it will
take years to copy existing cartridges as most of CFS
capacity goes into directly servicing user requests.
--Number of cartridges on shelf reduced by 50 percent!
This has taken place over a four month period. A far
higher proportion of data is now in the STK silos with
a vastly better performance than operator retrieved and
mounted cartridges. Shelf tapes are stored in two vaults
(down from three) in various locations around LLNL.
CFS CARTRIDGE STATUS
--------------------
Date In Silo On Shelf Total
---- ------- -------- -----
Feb 24 31,002 30,315 61,317
June 7 29,029 14,551 43,580
------ ------
-52% -29%
HOW THIS WAS ACCOMPLISHED
-------------------------
--The cartridge reclaim rate is up by over a factor of five
--CFS administrator added
--Increased operational support
--10,000 "defective" cartridges cleared and removed
This required re-installation of 18-track tape capability,
which STK supplied at no charge!
--Online tape category eliminated
Incoming data was sometimes written on tape then copied almost
right away to other tapes. Enhancements in CFS version 61
permit us to monitor tape position with a data compressing
tape controller and eliminate this extra step.
--Massive cleanup effort to delete migrated, orphaned and abandoned
files
WHAT IS NEXT FOR CFS?
---------------------
--Switch to double length tapes
This potentially doubles the storage capacity.
--Release CFS gateway
This can increase CFS aggregate bandwidth due to its independent
data path. This will provide service to workstations.
--Spool up National Storage Laboratory (NSL) Unitree expertise
We hope to install a prototype system early next year.
FILE SYSTEM CHARACTERISTICS (Cray C90)
--------------------------------------
File Total User
Name Lifetime Space Quota Back-up
-------- -------- ----- ----- -------
/u (home) Permanent 4 GB 3.2 MB Daily
/tmp/wk# 30 days 57 GB None None
/usr/tmp 12 hours 53 GB None None
/usr/fast 12 hours 8 GB None None
/afs Permanent 95 GB 30 MB Daily
CFS Permanent 11 TB Varies None
Notes:
--The /tmp/wk1, /tmp/wk2 and /tmp/wk3 file systems each have 19 GB.
--AFS files are available from any computer with AFS client software
on the Internet. Currently, only SAS and the Cray C90 at NERSC
have AFS client software.
MOE JETTE: Storage.
- Moving disk from F to C90. (F doing OK on disk as it deals with smaller
jobs.)
- Established large file system to replace multiple smaller ones.
- Expanded AFS service. (Arranged $600 client license available.)
- Set up /usr/tmp to accommodate large temporary storage. (Not yet heavily
used but the large users are thrilled.)
= These changes are the result of a user committee formed at the last meeting.
MOE JETTE: CUB (Central User Bank)
- X window interface is in beta test.
- CFS quotas are now available by user.
- Resource account report generation is in operation (if I heard right).
This means the old funny tools are finally being replaced.
- SAS and the T3D have been included.
- Lots of system upgrades installed (not of major interest to the user).
- The option of the one-time password will be provided for those needing extra
security (The cards will cost about $50.). "KERBEROS will provide DCE
capability for the one-time password." <<-- questioned by the audience.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Description of Kerberos (supplied by Moe Jette)
-----------------------------------------------
Kerberos permits you to authenticate yourself to a collection of
computer systems one time. After Kerberos authentication, you
may securely access a variety of services throughout the collection
of computer systems without further authentication (eg. klogin,
krsh, kcp, kftp, etc.).
A one-time (or single-use) password is only valid for a single
use. After that use, it can no longer be used to provide one
with authentication.
Kerberos itself does NOT provide the capability for single-use
passwords, although used together they provide very good security
and ease of use.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- Six month password expiration is in force.
- Hacker activity is down but can be expected back. Reference to the book
"The Cuckoo's Egg" to illustrate the difficulty of tracking them down.
CFS (Common File System)
- Tape sizes increased.
- CFS Gateway "soon" -- ie direct access to information in CFS without
going through the CRAY-IBM bottleneck.
- Starting to stage up NSL's Unitree as a replacement.
FOR FURTHER INFORMATION
-----------------------
For further information on any of these issues, contact:
Moe Jette E-mail: jette@nersc.gov
Group Leader, Large Scale Systems Group tel: 510-423-4856
National Energy Research Supercomputer Center fax: 510-422-0435
Rick Kendall followed Moe Jette with a discussion of storage problems
and possible disk charges to more accurately reflect costs.
RICK KENDALL: for the user committee.
- Committee chair was J.-N. Leboeuf.
- Many suggestions were made and weighed.
- Two major users are impacting the system: Jack Ritchie and Uzi Landman.
Ritchie responded to queries and significant relief actually came out of a
mutual accommodation. Landman has never responded.
_ Much of the issue turns out to be a subset of the need for BALANCED
resource utilization.
- Need to improve NERSC auxiliary workstation load.
- /usr/tmp is working. This and previous bullet can be put in perspective
by noting that in the past TWO CPU's were needed just for data migration.
That's a big fraction of the CRAY resources.
- If disk charges are reinstated as a control mechanism, they need to be
staged, based on a good utilization model, and accompanied by a good
set of tools (for example the CFS interface is wholly inadequate).
----------------------------------------------------------------------
Break
-----
----------------------------------------------------------------------
ESNET Plans - Jim Leighton
Backbone Upgrade
----------------
Network Services
----------------
--ATM Asynchronous Transmission mode
Jim Leighton: ESNET
- ESNET is a backbone, ie a network to connect networks.
- Carrying ~1.6 Terabytes/mo. today. It has had and exponential like growth
with peaks each October. MOSAIC probably produced the peak last October.
- Mostly T1 (1.5Mb) links with growing number of T3 (45Mb) links. Currently,
PNL -- FNAL -- LANL -- LLNL -- LBL are T3.
- International: Japan, Germany, Italy, Brazil. Other links are made through
other agencies.
- National: two FIX (links to other agency networks) hubs, one on each coast
and lots of links to regional networks.
- Video Conferences: A meeting room can be used (outfitted for ~$60K).
ESNET provides a central hub needed when there are more than two
participating stations as well as an on-line reservation system. The
links are dial-up ISDN lines yielding transfer rates of 112-384 Kb. Over
1000 conferences have already been handled.
- Mbone does a multicast when wanting to broadcast to many receivers. It
is home grown using workstations as routing agents.
- The future? ATM = Asynchronous Transfer Mode. Multiplexing circuits to give
OC3c service at 155Mb/s and OC12c service at 622 Mb/s. ESNET proposal to
develop services with it received much interest and then many protests
(10!). All have now been dropped. So ATM experiment will start "soon"
(end of the month, maybe): LLNL-GAC-LANL & SNLA-FNAL & ANL-PPPL.
-----------------------------------------------------------------------
X-windows: How to Get From Here To There - Barry Howard and Alice Koniges
-------------------------------------------------------------------------
Outline
-------
-What is X Windows?
-What is the "X-Windows Problem" we are trying to solve?
-How can NERSC help solve the problem?
What is X Windows?
------------------
-The X Window System is a network-based graphical windowing system.
-Developed at MIT in 1984 and now adopted as industry standard.
-Client-server model: Display Server provides windows for multiple
Clients on multiple machines.
-Provides the bare bones of a window system upon which any style of
graphical user interface (GUI) can be built.
-Windows manager provides "look and feel" for the GUI.
OSF/Motif, OLWM, TWM, BWM (to be announced)
-Latest release: Version 11 Release 5.
What About Security with X Windows?
-----------------------------------
-Authorization required to access a display
-Host-based
User enables any client on specified host(s) to display on server
Use xhost command: xhost + hostname
Opens opportunity for sniffing messages between client and server
-User-based
Special code called "magic cookie" passed between client and server
Code stored in .Xauthority file in user's home directory
Usually set up automatically by display manager (XDM)
Easiest case is one .Xauthority file in shared home directory
-See "Safe X" article in August, 1993 edition of Buffer
What's the Problem?
-------------------
There are several:
-Lack of control over
-GUIs distributed with commercial and public domain software
-Capabilities and configurations of users' desktops
-NERSC working toward a Unified Production Environment in 1996
-X windows will be an indispensable part of work space on MPP
Debugger, performance analysis tools by late 1995
-A common GUI will be used to access many of the services at NERSC
Built on X Windows
Selected application may, in fact, be run on another computer
Usage accounting for each application
Need a transition path for providing X Windows capacity for "almost all"
NERSC users which minimizes cost and frustration.
What can NERSC do to help solve this problem?
---------------------------------------------
The four step method to Windows in every office.
First Step: Understand what desktop configurations are in use
-----------
-We don't know how many users lack X-capable desktops.
-Using surveys (ERDP)
-Site visits
-Usage statistics of utilities with both character and graphical
user interfaces
-Suggestions are welcome!
-We do know that implementations vary greatly among desktops
-Knowledge of which desktop configurations are most popular will help
NERSC concentrate on providing information on large, but limited,
solution set.
Second Step: Provide Education and Motivation
-----------
-Buffer - series of articles on X Windows began in April, 1994
-Consulting - eXpert on call during working hours
Familiar with various hardware/software tested at NERSC
Help with PC, MAC, X-terminal and workstation configurations
Familiar with administrative burden attached to each option
-Site visits - town meetings
-Demonstrations - show power of new tools with graphical interfaces
-Mosaic,xcub,graphics,MPP tools, Unified Production Environment
Motivation - Graphics
----------
-X will replace Tektronix as the lowest common denominator for graphics
output.
-Currently:
-All supported graphics packages on Crays are capable of sending output
directly to an X window.
-Ditto for all Cray utilities used to view graphics files.
-All graphics utilities except PV-Wave on SAS rely SOLELY on X.
Third Step: Provide recommendations on tested desktop configurations
----------
-Need transitional roadmap for "almost all" users
Transitional Roadmap - Which desktop to buy?
--------------------------------------------
Advantages Disadvantages
---------- -------------
Workstations -local processing ability -expensive
-independent of other hosts -complex system administration
X Terminals -inexpensive -X server only
-simple systems administration -dependent on host
-X performance
PCs -availability -X performance
-can run PC applications -dependent on host
-X server only
Transitional Roadmap - Cheap X
------------------------------
-UNIX and X for PC - LINUX + X386
Full featured UNIX for 386 and 486 AT machines
Source available in BSD and System V flavors
Includes GNU utilities
Accommodates Ethernet board or modem
Serial IP options: SLIP, PPP, CSLIP, term
Modem speeds up to 14400 (38400 compressed)
Tested successfully at home with Mosaic, FTP, etc.
Software cost: FREE
-X on top of MS Windows
Winsock (Windows Sockets) is de facto standard network software
for MS Windows.
MicroX is one choice for X server
Software cost : $150
Effective bridge between MS Windows and X Windows
Transitional Roadmap - X Windows at Home
----------------------------------------
Two approaches:
-X remote
Proprietary X protocol for using X-terminal (or PC) at home
-Serial IP
Public domain protocol for using internet at home
Examples: SLIP, PPP
Need X server software plus user interface (MS Windows)
Available for Pc, Mac, UNIX platforms
NERSC uses Cisco terminal server to provide 1-800 dialup service
-Published number for telnet access to NERSC hosts
-Unpublished number for Xremote and Serial IP access
Used by NERSC staff for using X from home
-Issues like security and cost prevent offering this as service now
Fourth Step: Explore possible new services
------------------------------------------
-XRemote server for NERSC customers
Control cost by cost-sharing and limiting access to a program
Control security by requiring one-time passwords
-Provide boot/configuration/font server for remote X-terminals
-Others to be suggested by customers
Barry Howard: DCE
X WINDOWS
- Is the base for a lot of DCE as the industry standard client-server model.
- The GUI is built on top of X. There are already several: OSF/MOTIF, OLWM,
TWM and shortly one more: BWM (Barry's window manager). This a part of
the NERSC commitment (DDK observation).
- Authorizing use: discouraged use of host-based [host + hostname] in
preference to user based "magic cookie".
- Problems: lack of control because of very different general GUI's and
very different desktop capability.
- NERSC wants to go to an X-based "Unified Production Environment" with
a common GUI used on various resources.
- Howard outlined what he called a 4 step solution. I found the steps fuzzy
but they did clearly involve a lot of user interaction.
Alice Koniges then described Graphical User Interfaces (GUIs).
Alice Koniges: APPLICATIONS
- Outlined a common X interface. It included library interfaces.
- Pointed to their "Data Dimensions Interface" -- looks interesting but not
yet understood (by DDK).
-----------------------------------------------------------------------
EXERSUG (ERSUG Executive Committee then adjourned for a closed lunch).
-----------------------------------------------------------------------
SPP Project Status and Plans for 1995 - Bruce Curtis and Tom Kitchens
-------------------------------------
SPP - Special Parallel Processing in 1994
-----------------------------------------
-Soni (BNL) weak matrix elements of B-mesons
allocation: 12,900 CRUs. Used 5,644 (44%)
-Kogut (U. Ill) quenched QCD at finite density of large lattices
allocation: 8,050 CRUs. Used 5,393 (67%)
-Bell (LLNL) natural transition to turbulence in a waveguide mixing layer
allocation: 7,200 CRUs. Used 1,006 (14%)
-Cohen (LLNL) toroidal gyrokinetic PIC simulation using quasi-ballooning
coordinates
allocation: 3,600 CRUs. Used 2,391 (66%)
-Dory (ORNL) High Resolution Plasma Fluid Turbulence Calculations
allocation: 3,500 CRUs. Used 1,965 (56%)
-Dunning (PNL) Ab Initio Molecular Dynamics
allocation: 3,000 CRUs. Used 2,350 (78%)
-Lee (PPL) Gyrokinetic Simulation of Tokamak Plasmas
allocation: 2,500 CRUs. Used 877 (35%)
-Chen (U.C. Irvine) Alpha/Energetic-Particle Driven Instabilities
in Tokamak Plasmas
allocation: 2,500 CRUs.
-Aydemir (U. Texas) Nonlinear Gyrofluid Simulations of ITG Turbulence
in Tokamaks Using Field-Line Coordinates
allocation: 2,000 CRUs.
-Fu (PPL) Gyrokinetic MHD Hybrid Simulation of MHD Modes Destabilized
by Energetic Particles
allocation: 1,500 CRUs.
-Hammett (PPL) Gyrofluid Simulations of Tokamak Plasma Turbulence
allocation: 1,500 CRUs.
-Lee (LBL) High-Resolution Underground Imaging
allocation: 1,200 CRUs.
-Dawson (UCLA) Numerical Simulation of Plasma and Energy Transportation
in Fusion Devices Using 3D Toroidal Gyrokinetic PIC Models
allocation: 1,000 CRUs. Used 17 (1.7%)
-Stevens (ANL) Benchmarking Comparison of Computational Chemistry Codes
with MPPs
allocation: 500 CRUs.
-Zunger (NREL) Atomic Study of Step-flow Growth and Spontaneous Ordering
of Semiconductor Alloys for Photovoltaic Applications
allocation: 100 CRUs.
-Le Brun (U. Texas) Global Toroidal Gyrokinetic Simulation of eta_i-mode
Induced Transport in a Tokamak-like Plasma
allocation: 100 CRUs.
-Lester (U.C. Berkeley) quantum monte carlo for electronic structure
and vibrational eigenvalues
allocation: 75 CRUs.
-Chua (Continuum Dynamics) Computational Combustion Using Gridless
Particle Methods on Parallel Computers
allocation: 50 CRUs.
Problems
--------
Poor utilization during first several months:
-Sloe ramp-up. Users weren't ready
-Added unparallelized physics
-Basis bug where error in input file causes infinite loop
-Other bugs in codes
Lesser factors:
-NERSC bugs and cfs problems
-Interactive Interface (15% before Sherwood, 4% otherwise)
Consequently, 1/3 of the total allocation will not be used
----------------------------------------------------------
-Total allocation remaining divided by wall time remaining = 20.75
-If average cpu/wall ratio for remainder is 13.0, then 66% of total
allocation will be consumed by end of October
Recent Runs
-----------
CPU/Wall MFlop/cpu- Gflop/wall sec
sec/cpu
Bell 13.8 384 5.3
Cohen 13.4 355 4.7
Dunning 13.4 51 0.7
Soni 13.3 141 1.9
Dory 12.3 330 4.1
Tom Kitchens - comments on SPP
------------------------------
- cap of 50K CRUs
- 10% of total CRUs
- 5% of disk storage
- a few big users/slow startup/conversion problems
- proposal not to expand in FY 95
Byers comments on SPP discussions:
----------------------------------
NERSC had a proposal not to expand at all. Kitchens mostly bought on
to this, but others objected. It was left something like NERSC
would run a workshop to encourage new SPP users, and hopefully to get
them up to speed with less pain and time as experienced by present
users. No one new would be allowed on unless they first went to the
workshop. How hard and fast this rule is to be, was not clear.
Some expansion would then be allowed, the precise number mentioned
by McCurdy.
<<< SPP DISCUSSION >>>
TOM KITCHENS: SAC suggested SPP (Special Parallel Processing) program with
the expectation that "Special" would eventually become "Standard". They
dedicated about 10% of the available computer power at NERSC.
BRUCE CURTIS
- SPP topics included Lattice Gauge/ Fusion / Fluid Dynamics / Ab Initio
Molecular Dynamics.
- SPP codes did not perform as well this year and interest was not as high.
- NERSC recommendations: (1) limit to 50 KCRU's (flat); (2) limit to 18
SPP users (also flat) [This got the greatest objections from the floor.
DDK for one believes it a mistake.]; (3)try not to have a large turnover.
JACK BYERS: primarily reviewed the E-MAIL on the subject [includes DDK opinions]
C. WILLIAM McCURDY: proposed a mandatory SPP workshop. Users thought it might
best be expanded to include users who want to propose for SPP. [DDK thinks
that it is OK but doesn't address primary problems of SPP: shots are once a
week so unless one needs huge memory/disk it is not cost effective to the
user even if he/she is already parallelized.]
TOM KITCHENS: pointed out that the SPP allocation time frame is 1 December
for the next round (~ year).
B. CURTIS: noted that we should expand test queues NOW.
[DDK NOTE]: Besides developing parallel code, the SPP is motivated by a need
for (1) large memory -- its the only way to get 200Mw; (2) Scheduling
security -- if you want large memory, you will not get swapped in very often;
(3) Real Time Turnaround -- if you are asking for large memory or long times,
having a definite shot can be a lifesaver; (4) Allocation -- this is 10% of
the resources in a new allocation basket. Within the bounds of the SPP,
these considerations are more significant than parallelization so the SPP
should not be taken as an indicator of interest in MPP.
--------------------------------------------------------------------
MPP Procurement Status - Michel McCoy
-------------------------------------
Award of competitive procurement expected in April 1995.
MIKE McCOY: MPP Acquisition
- Basic document Blue Book (ER-0587, Feb., 1993) called for MPP FY94.
- Current schedule calls for FCM (Fully Configured Machine) mid-FY96. It
should be capable of both multiple users and multiple programming models;
have Math Libraries and Programming Tools; and be balanced in CPU to
memory to I/O. It should be FAST!!!! It should run GCS (Grand Challenge
Scale) codes on 500-1000 nodes. It will be dedicated to that purpose
"at night".
- The PEP (Pilot Early Production) machine has to be available to demonstrate
capability by running bench marks. It should be available to users as
a development machine April 1995. By preparing codes, it should give some
capacity relief.
--------------------------------------------------------------------
Break
--------------------------------------------------------------------
CAPACITY AND CAPABILITY SOLUTIONS FOR 1996 AND BEYOND - Tammy Welcome
-----------------------------------------------------
Definitions
-----------
Capability System - A capability system possesses sufficient processing
power and memory to accommodate a single Grand Challenge-scale
application utilizing all the resources of the system.
Capacity System - A capacity system can support, in a flexible manner,
a large load of users simultaneously developing, debugging, and
running a large mix of applications.
The Problem - Capacity verses Capability
----------------------------------------
-- "I'm being forced off the machine"
-- "I'm not getting enough time for my capability solutions"
CURRENTLY THE C90 IS FULLY UTILIZED
----------------------------------
Batch queue length - one measure of demand
Analysis of codes using cycles on the C90
-----------------------------------------
-Snapshot based on accounting records from CUB
-Covers 64 day period beginning April 5
-Filtering of accounting data will enable us to store
accounting information for greater length of time
A SNAPSHOT OF WHERE CYCLES GO ON THE C90
----------------------------------------
CODE USER SITE(S) CPU % OF CUMULATIVE
NAME NAME(S) HOURS C90 % OF C90
li.x/ Soni Brookhaven 2056 8.4 8.4
newlu.x
sqed*.x 1699 6.9 15.3
chem_1bc Kogut U. of Illinois 457 1.9 17.2
spectrum 423 1.7 18.9
xtreb Kerbel, GA,LLNL 354 1.4 20.3
Waltz
xden6 Sharpe/ LANL 314 1.3 21.6
Gupta
dtem Spong/ ORNL 295 1.2 22.8
Lynch
prg* Jansen UCSD 257 1.0 23.8
nrqcd.ds Sloan FSU 242 1.0 24.8
lmbh Klepeis LLNL 224 0.9 25.7
xvel Charlton ORNL 223 0.9 26.6
xg3e Williams/ LLNL 218 0.9 27.5
Dimits
amr Crutchfield LLNL 215 0.9 28.4
wxex* Park et.al. PPPL 208 0.8 29.2
pies* Merkel et. PPPL 206 0.8 30.0
al.
cup Strand NCAR 195 0.8 30.8
gotsy Turner et. PPPL 194 0.8 31.6
al.
xvg04* Atherton LLNL 185 0.8 32.4
Additional cycles available soon
--------------------------------
-July 1994 - time available via MPP allocation program is limited
-Late 1995 - PEP system available for capability (and limited
capacity) computing
-Late 1996 - FCM system available for capacity and capability
computing
Proposed Solution
-----------------
-To solve both capacity and capability needs...
- Make use of PEP system to enable C90 capability codes on
the MPP, possibly freeing cycles on the C90 for capacity
codes.
- Use FCM to support both capability and capacity codes.
- Make effective use of all NERSC resources!
Why would the research scientist support this plan?
---------------------------------------------------
Significant performance gain...
-------------------------------
System LKF* Efficiency #PEs System Performance
C90 250 MF .75 16 3 GF
PEP 15-60 MF** .30 256 1.15-4.6 GF
FCM 60 MF .4 512 12.3 GF
* LFK geometric mean
** depending on the technology
Access to more memory...
------------------------
System Memory #PEs Total Memory
C90 2 GB
PEP 64-256 MB* 256 16-64 GB
FCM 256 MB 512 128 GB
* depending on the technology
Access to unsaturated PEP system...
-----------------------------------
-Improved wall clock turnaround time
Implementing the solution
-------------------------
1. Staff works with research scientists to parallelize capability codes.
Issues: - Portability of resulting code
- Parallel platforms on which to run code
- Single source
- adoption of modified source code by scientist
2. Staff enables codes to run on PEP on day 1.
Issues: - Existence of software to support the parallel code
3. Research scientists run capability codes on PEP on day 2.
Issues: - Training scientist to run in this new environment
4. Research scientists run capacity and capability codes on FCM.
Proposed Solution
-----------------
To solve both capacity and capability needs...
Enable more users on SAS and AFS, providing an attractive platform
for preprocessing, post-processing, and developing codes.
Make effective use of all NERSC resources!
Why would the research scientists support this plan?
----------------------------------------------------
Better interactive service...
-Improved response for user doing code development
-Rich software environment
-Shared home directories with CRAYs via AFS
Implementing the solution
-------------------------
1. Staff upgrades SAS and AFS to accommodate more users.
RESULT
------
-By Parallelizing the top cycle-burning capability codes to
run on the PEP system, we potentially free up 30% of the C90
for capacity codes in 1996.
-One year later, we expect the execution environment on the FCM
to support both capability and capacity codes.
-By shifting some of the interactive workload over to the auxiliary
servers, we can potentially provide better interactive service
to the user.
---------------------------------------------------------------------
ACCESS TO THE HPCRCS AND TO THE LLNL T3D - Tammy Welcome
----------------------------------------
Purpose
-------
- Develop user expertise in anticipation of delivery of NERSC machines
- Broaden base of user codes on parallel platforms
- Develop NERSC staff expertise
Note: ACCESS IS FOR CODE DEVELOPMENT RATHER THAN PRODUCTION USE.
----------------------------------------------------------------
Limited access to HPCRCs and H4P
--------------------------------
- ORNL Paragon XP/S 5 and XP/S 35 approx. 6%
'5 - 66 PEs, 16 MB, 9.6 GB disk
'35 - 512 PEs, 32 MB, 144 GB disk
- ORNL KSR1
64 PEs, 32 MB, 20 GB disk
- LANL CM-5 - approx. 5%
1024 PEs, 32 MB, 60 GB disk
- LLNL T3D - approx. 7.5%
128 PEs, 64 MB, 90 GB disk
Timeline for access
-------------------
Early May - request for proposals mailed to PIs
Mid May - request for proposals appears in Mosaic
May-June - request for proposals appears in Buffer
(May 1994 issue)
June 17 - proposal deadline
?? - MPP access awards
July - access to Paragon '5, Paragon '35, KSR/1, CM-5
Sept - access to T3D
A total of 17 proposals were received
-------------------------------------
- 9 have previous parallel experience (3 via SPP)
8 have NO previous parallel experience
- 10 requested regular consulting support
5 requested help developing the parallel applications
1 undecided
1 unknown
More about the proposals
------------------------
Parallel Platform...
--------------------
7 - T3D (11)
2 - CM-5 (4)
1 - Paragon (6)
1 - KSR (2)
3 - T3D and/or Paragon
1 - Paragon and/or CM-5
1 - T3D and/or CM-5
1 - Paragon and/or KSR-1
Programming Paradigm
--------------------
9 - message passing
2 - data parallel
1 - shared memory
3 - undecided
2 - no comment
List of proposals for MPP access - available from Tammy Welcome
Access awards
-------------
- Funding decisions will be made by OSC
- Awards have a 6 month duration
- PI is responsible for short project status report
TAMMY WELCOME: Access Program (for development only)
- Time available at the two research centers and also on the LLNL T3D.
- Received 17 proposals, half with experience already.
- Five requested development help.
- T3D has been the most popular.
- Message passing preferred --> most likely to be available on other machines.
- Decisions on which proposals being made by OSC. They are of six month
duration [too short a limit] and the PI is responsible for a short report.
- Capacity/Capability issue: Want to use the PEP to move some C90 codes
to free up the C90. The idea is that the top 15 codes use 30% of the C90
and are chomping at the bit to use more. To implement: (1) staff works with
scientist to convert code. (2) Staff enables those codes to run on the PEP
"day 1". (3) Research scientists run "day 2".
----------------------------------------------------------------------
STORAGE A NEW VIEW - Steve Louis
--------------------------------
Storage - a new paradigm
------------------------
- "How can you keep on movin' (unless you migrate me, too)"
... Ry Cooder, Into the Purple Valley
- S-curve model /Performance or value vs investment or time
Storage - Migration from old to new
-----------------------------------
Old Paradigm
------------
- Expensive mainframe file servers
- Proprietary operating systems
- Non-standard device interfaces
- Slow-speed channel bottlenecks
- Mainframe-centered architectures
- Inefficient access mechanisms
New Paradigm
------------
- Less expensive workstation servers
- Open systems
- Common interfaces
- High-speed scalable, parallel I/O
- Network-centered architectures
- Transparent access mechanisms
Storage - Hardware Technology Trends
------------------------------------
- Disk seek, latency, and data transfer are all improving
- Disk arrays with higher performance and lower cost
- New advanced RAID levels now under development
- Magnetic recording technology appears almost limitless
- Helical and narrow-track longitudinal both improving
- Automated robotics are becoming standard equipment
- Fiber optics becoming transmission technology of choice
- Gigabit networks and applications are more common
Storage - Software Technology Trends
------------------------------------
- Open systems, interoperability, and standards necessary
- Distributed client-server computing more widespread
- Seamless data interchange between applications
- Scalability of capacity and data transfer rates necessary
- Integrated storage system management capabilities
- Transaction and metadata management for storage systems
- More electronically-saved and machine-readable storage
(for the environmentally concerned: 1 petabyte = 42.5M trees)
NSL - Technology Base System
----------------------------
- IBM RISC System/6000 Model R24 (or 990)
- 512 MB memory
- 92 GB SCSI-2 Fast/Wide disk
- HIPPI, FDDI, and Ethernet connectivity
- 3490E (36-track and compression) robotic archive
- 4 drives
- 1,340 cartridges (1 TB uncompressed)
- 16x16 HIPPI crossbar switch
- connects Crays to Base System
- connects Crays to Crays
- can also isolate C-90 from NSL switch problems
- NSL-UniTree software environment
NSL - Technology Extended Base System
-------------------------------------
- Additional production-level capabilities for Base System
- compatibility with Base System hardware/software
- no vendor bias toward existing base configuration
- May take several forms
- extensions to existing Base System disk or robotics
- upgrades to existing Base System disk robotics
- preliminary version of Fully Configured Storage
- Extended Base System paves way for
- HPSS conversion
- parallel I/O
- network-attached devices
- scale-down CFS
NSL - Technology Fully Configured Storage
-----------------------------------------
- Fully Configured Storage is the last step:
- full HPSS environment
- full high-performance peripherals
- scalable, parallel I/O to FCM MPP
- dismantlement of CFS
- Procurements may be split among several vendors if:
- best fit is through selection of several technologies
- no single-vendor solution exists to meet requirements
- Funding issues:
- dependent on NERSC FY96 and FY97 funding
- may be coupled with Extended Base System as a
two-phased procurement similar to the PEP/FCM
STEVE LOUIS: storage
- Old CFS used (1) mainframe file servers; (2) proprietary operating systems;
(3) non-standard device interfaces; (4) slow channel speeds; and
(5) inefficient access mechanisms.
- New system will based on workstations, UNIX, standard high speed interfaces,
multiple paths.
---------------------------------------------------------------------
R AND D: TRACKING THE COMPUTATIONAL EXPLOSION - Alice Koniges
-------------------------------------------------------------
As soon as a larger system is provided, it is filled to capacity.
ALICE KONIGES: Outreach
- To unite diversified users with collaboration and education.
- Workshops and Classes: Intro. MPP Computing (thought up at this meeting);
Adaptive Grid Methods (Dec.); PVM and Distributed Computing; and more...
- Expanded Visitors Program -- ie spend some time at NERSC.
- MOSAIC: Research Highlights Program.
----------------------------------------------------------------------
Open Discussion
---------------
Rick Kendall - large memory jobs are having major problems
checkpoint failure
slow queue
DISCUSSION PERIOD: MPP was discussed as we went along. The problem of poor
treatment of large memory jobs was brought up by Rich Kendall -- NERSC people
appeared surprised.
BILL McCURDY: The fragile nature of the effort to put an MPP on the floor
in FY96.
- When creating the Bluebook of FY92, the question was asked whether it
made sense to pin one's hopes on a high end MPP. It was concluded that
the vector machines would not compete with the workstations.
- The C90 will be in its 4th year by then and so in its "mature latter stages"
(DDKism)
- The C90 will be paid out by then so what one must hope for is to switch the
funding to the MPP. No hope of an increase, what one hopes for is that
support is not taken away.
- The USERS will have to motivate the acquisition as they are the only
ones that can do it. They should remember that as they push for the
workstations (that they admittedly must have), they should push for
the high end too.
- The two T3D's from the CRADA (~$50M over 3 years) are essentially sold out.
(128 node at LANL and 128 +128(local LLNL) node at LLNL. These, like the
RC's at LANL and ORNL, are NOT general access.
- 30% of the C90 (15 codes) will be converted in advance. (The C90 is 75%
of the NERSC resources.)
- NERSC has a commitment to do precisely what the users tell us.
- $2B/yr is assigned to equipment in the DOE and there is a committee to
try to save some of this. (John Fitzgerald is on that committee and
pointed out that "lease - not to own" costs state sales tax. There is
an enormous saving possible there.) Savannah River has the biggest
computer budget in the DOE -- why?
ANS: record retrieval and maintenance for environmental cleanup. Many
of those records are on obsolete equipment and software but they MUST
be available. It is a massive EM problem which unfortunately makes it
(falsely) look like computation is getting large funding.
- SSC still has workstation farms there but they can't be moved.
End of meeting - adjourn