NERSCPowering Scientific Discovery for 50 Years

Minutes

ERSUG Meeting Minutes- Monday, April 6, 1998

The meeting, held in Berkeley Lab's Perseverance Hall, was opened by ExERSUG Chairman Ricky Kendall, who welcomed participants and made routine announcements.

 


 

The View from Washington

The first speaker was NERSC Program Manager Tom Kitchens from DOE's Mathematical, Information and Computational Sciences Division. Tom's "View from Washington" consisted mainly of an overview of the proposed FY99 DOE budget as it pertains to scientific computing.

The budget proposal includes funding for Next Generation Internet (NGI) testbeds, DOE2000, and collaborative facilities. The budget proposes $10 million in funding for DOE research in support of NGI.

In FY99, there will be two strategic thrusts:

•               The ACTS Toolkit, or Advanced Computational Testing and Simulation: These tools will be further developed to allow simulations to be done on massively parallel processor computers to complement laboratory experiments. ACTS will allow research to be done less expensively and less dangerously.

•               National Collaboratories: DOE will develop a set of tools and capabilities to permit scientists and engineers to access facilities and collaborate on experiments system-wide as easily as if they were in the same building.

Tom also outlined several DOE education programs supporting students and teachers at the graduate, undergraduate, and high school levels. Phase I of the Academic Strategic Alliances Program, funded by ASCI, has been implemented with five universities, and Phase II is in the works.

 


Overview of NERSC

NERSC Director Horst Simon followed with an overview of NERSC research and cited accomplishments of the past year. Horst noted that NERSC is bringing forth one or two innovations per month. In the past year, these included upgrading the Cray J90s, taking delivery and accepting the Cray T3E-900, joining with Intel and UC Berkeley in the Millennium clustered workstation project, the Sun Microsystems "Wildfire" collaboration, HPSS being placed into production, and installation of the remote visualization server and math server.

Horst reiterated the NERSC model of combining a major facility with appropriate intellectual services. He said it's important to maintain the appropriate balance between the two.

Among the recent NERSC research programs receiving funding are DOE2000 ACTS Toolkit support, a national bioremediation project called NABIR, a climate research project with NCAR and GFDL, and combustion research. Although none of these have received large-scale funding, they represent important steps toward larger efforts involving NERSC.

Horst also introduced new staff members and reviewed the recent revisions to the NERSC organization, including creation of the Advanced Systems Group to appraise new architectures and technologies. The NERSC Division currently consists of 120 employees, with 63 of them working under the NERSC Program.

Horst concluded by distributing copies of two reports: the 1997 NERSC Annual Report and "How Are We Doing?", a self-assessment of NERSC systems and services for FY97.

 


DOE Computing Initiative

Berkeley Lab Computing Sciences Director Bill McCurdy gave an update on the proposed Computing Science initiative now making its way through DOE and hopefully to Congress in time for FY2000 funding. Now called the Strategic Simulation Initiative, or SSI, the proposal began as a plan to expand scientific computing within Energy Research. Proposed funding would start at $100 million and increase to $300 million by FY2004.

After much discussion between various labs and organizations within DOE, the initiative has been fashioned to focus on two main areas: accelerated climate prediction and combustion simulation and modeling. The accompanying science needed to support this effort, such as fluid dynamics and materials science, could help further research in other ER areas, too. Future thrust areas could be developed. The plan is to have the entire package ready by the end of summer to begin working its way through the budget process.

"We haven't had an opportunity like this and aren't likely to in the next 10 years," McCurdy said.

 


Changes in ERCAP

Tom Kitchens outlined several proposals for changing the way time is allocated on the Cray T3Es. The suggestions came out of a SCAC (Scientific Computing Access Committee) meeting held in March. The allocations for the vector computers would remain essentially unchanged.

About half the time on the T3Es now goes to the Grand Challenge researchers, Tom said. The remainder is split between users who are looking to gain expertise in MPP and users who already have the expertise. Requests are currently reviewed by a technical committee to assess the pertinence of the projects to DOE's missions.

Among the suggested changes for ERCAP 99 are:

•               Establish a peer review panel to set recommended allocations. These recommendations would then be sent to DOE for a final decision.

•               Change the NERSC allocation cycle so that proposals are not due in late summer, when many researchers are unavailable. One proposal is to move the beginning of each allocation year from Oct. 1 to June 1.

•               Establish a new allocation and use formula to encourage users to use their allocations in a timely manner, rather than holding back and trying to use more time at the end of the allocation cycle. Two proposals were shown, with each giving more time to heavy users who used more of their time at the outset, and would take away time from those who used less than their monthly allotment. Average users would still get their full allocation.

 


Archival Storage

Harvard Holmes of NERSC's Mass Storage Group gave a presentation on plans for archival storage quotas. The goal, Harvard said, is to influence user behavior to use storage resources more efficiently. The new plan is for quotas, not for charging users for storage. The overall quota would be a formula and include file counts, space used, and input/output performed. The proposed default allocation would accommodate about 90 percent of NERSC users, so only about 10 percent would need to be actively managed. Because the plan is still being developed, Harvard said he would like to hear from users. Whatever plan is adopted, he said, it should not get in the way of doing science.

Small changes/improvements in use of storage can make a big difference, Harvard said. Current storage capacity is estimated to be adequate through FY99, but with the increase of data intensive computing, storage needs to be well managed. Storage, Harvard noted, is more than just file space and includes bandwidth, file counts, and other elements.

The new system would try to give early notice to those who consistently exceed their quota. NERSC would have some storage in reserve to quickly address requests for larger quotas. The current plan is to include storage in this year's ERCAP process.

 


CFS/HPSS Migration

Nancy Meyer of NERSC's Mass Storage Group discussed the upcoming migration of file storage from CFS to HPSS. Earlier this year, NERSC changed the UniTree storage system over to HPSS in two days. The changeover involved 700,000 files accessed by 950 users. CFS, on the other hand, holds 3.3 million files used by 5,000 users. Based on the UniTree migration, Nancy said the CFS changeover would take four to six days. The timetable calls for the migration to be done this summer.

Among the benefits of the conversion for CFS users are:

•               An increase in the silo capacity from the current 30 terabytes (using 1 gigabyte tapes) to 300 TB (with the 10 GB now used in HPSS). Soon, this should be expanded to 600 TB using 20 GB tapes.

•               Large files will be stored as one file and no longer segmented.

•               File transfers will be done faster (an estimated 10 MB/second).

•               File names will not be restricted to a certain length as in the past.

Nancy said that although a lot of work remains before the migration is made, there appear to be no insurmountable obstacles. She also emphasized that the conversion won't be done until everything is right.

 


Scientific Visualization

Steve Lau of NERSC's Visualization Group presented an update on the group's activities, including installation of a new remote visualization server. The machine, a Silicon Graphics Onyx 2, features six R10000 processors, 3 GB of memory, a 100 GB disk array, a HIPPI connection to NERSC's Cray supercomputers, a fast Ethernet connection, and an ATM interface.

The server allows users to view their data from afar, which is useful when the data is too large or complex to be viewed with users' local resources. Users can also use the server for collaborations. The server will provide visualization capabilities to users who do not otherwise have access to visualization resources. Initially, the server will support about 35 low-end users and 10 high-end users. Eventually, it should support about 150 users.

The server is currently available only to NERSC users with visualization needs -- the server is not available as a secondary compute server or for uses other than visualization, and use will be monitored. In addition to serving as a testbed for developing and testing new remote visualization and collaboration techniques, the server will be a repository for NERSC visualization software.

The server, named "Escher," will be available to any user upon request, and there are currently no set restrictions on the amount of use. Users will have access to 300 MB of disk space per user directory and both graphics pipelines. The machine features real-time interactive capability, not the batch mode of NERSC's other high performance computers.

For more information about Escher, visit the web site at: http://www.nersc.gov/hardware/servers/vis-server.html

 


Technology Review Panel

At the request of Bill Kramer, a technology review panel was formed to assist him in the evaluation of proposals for the next procurement. Panel members are Bas Braams, Brian Hingerty, Ricky Kendall, Jean-Noel Leboeuf, Mike Minkoff, Steve Pieper, and Robert Ryne.

 


ERSUG Meeting Minutes- Tuesday, April 7, 1998

Tools for Collaborative Science

Stu Loken presented an overview of the DOE2000 collaborative tools, what they can do, and where to get them. DOE2000 will fundamentally change the way DOE does collaborative experiments and computation by developing new capabilities and enhancing existing tools. Collaboratory R&D projects involve shared virtual reality, software infrastructure, collaboration management, security infrastructure, electronic notebooks, floor control, and quality of service. Tools that researchers can use today include electronic notebooks and videoconferencing tools. The two major collaboratory pilot projects are the Diesel Combustion Collaboratory and the Materials Microcharacterization Collaboratory. Some other limited collaboratory efforts are going ahead with non-DOE2000 funds.

A taxonomy of collaboration tools includes:

•               Persistent information: email, news groups, papers, mail, electronic notebooks, legal and records requirements

•               Real-time information exchange: telephone, videoconference, chat/white board, shared authoring and applications, shared VR space, instrument control.

Stu demonstrated the electronic notebook architecture, which includes a notebook client, object, engine (with plug-ins and storage interface), and storage object. The notebooks are based on Web browsers to take advantage of existing standards, familiar interface, cross-platform capabilities, and lots of existing software.

Stu then compared various conference tools, as summarized in the following tables:

 

CONFERENCE TOOLS

Mbone tools

Unix, Win95, NT

NetMeeting

Win95, NT

CUSeeMe

Mac, PC

PictureTalk

Web-based

MeetingPlace

telephone

ISDN systems

room-based, PC

CONFERENCE TOOL FEATURES

 

Video

Audio

White board

Chat

Share docs

Change docs

Mbone

Yes

Yes

Yes

Yes

 

 

NetMeeting

Yes

Yes

Yes

Yes

Yes

Yes

CUSeeMe

Yes

Yes

 

 

 

 

PictureTalk

~Yes

Soon

~Yes

Yes

Yes

 









 

An old but popular conference tool is ESnet's VCS, which has a 40-port ISDN hub at Berkeley Lab, full bi-directional Mbone gateway, and Web-based video resource management. Over 110 video rooms around the world are registered. January 1998 statistics showed 406 conferences totaling 3217.5 hours. ESnet also has a 48-port AudioBridge that provides out-of-band, good-quality audio in parallel with any video conference or stand-alone.

A 100-user data conferencing server that provides desktop image sharing can be reserved in parallel with any video conference, audio conference, or stand-alone. The free client software is a Web browser plug-in.

In the future, VCS will become Virtual Collaboration Services, no longer just video-centric. Future services and capabilities will include audio/video recording with playback on demand, fax support, ATM-based and H.323 (packet) conferencing, Webcasting, transcoding, continuous presence, and live video distribution.

The Quality of Service (QoS) project will change the way the Internet works by providing the equivalent of first class, business, and coach classes, with higher-priority traffic paying more for faster speed.

Collaboration tools are available at these Web sites:

•               Mbone: http://doe2k.lbl.gov/doe2k/download/vc_download.html

•               NetMeeting: http://microsoft.com/NetMeeting

•               PictureTalk: http://www-nt.es.net/PictureTalk

•               Electronic notebooks: http://www.epm.ornl.gov/enote/

 

Queue Structure Working Group

A Queue Structure Working Group will begin considering modifications to the existing queue structure. Volunteers for the group are Jim Craw, Bas Braams, Greg Kilcup, Jean-Noel Leboeuf, Doug Olson, and Salman Habib.

Changes in ERCAP (continued)

Tom Kitchens led further discussion of the reallocation of T3E resources. He would like feedback on the proposal within 10 days.

C90 and J90 Status Report

Jim Craw reported on the J90++, the follow-on to the J90se. The J90++ has faster processors and vector caching. In January/February, SGI/Cray met their milestones which involved final testing and release to production of several key ICs. Cray will test a prototype system in July and ship a 24-processor system to NERSC in the fourth calendar quarter of 1998.

The cost of NERSC's C90 maintenance (hardware and software) has been reduced by 50%. The maintenance contract has been modified to "best effort" instead of 24 x 7 support. The operating system will be upgraded to UNICOS 10.0 in June. The decommissioning of the C90 is set for December 31, 1998.

 


 

Computer Security

Bill Kramer gave an overview of security: responsibilities, incidents, strategies, and future plans. NERSC works within the Berkeley Lab security framework and implements special requirements as needed in cooperation with CIAC and other organizations. NERSC conforms to and often exceeds required standards.

In the two years since NERSC moved to Berkeley, there were five security incidents and two compromises, but no compromises of mission-critical (compute and storage) systems. Incidents included compromise of two C90 user accounts and five staff workstations, threatening phone calls to staff, and a SAS system compromise that resulted in changed files in only one account. Unauthorized attempts to access NERSC systems are increasing, but users have been very cooperative in detecting and addressing attacks.

Bill outlined NERSC's security approach:

•               Use standard, effective services whenever possible, avoiding overlapping services.

◦                                 Users are responsible for protecting their sensitive data.

◦                                 NERSC is responsible to assure standard protections are not bypassed.

◦                                 PIs are responsible for co-investigators having proper citizenship for the access they have.

◦                                 If vendors provide additional protections, make them available.

•               Where feasible, monitor and react rather than restrict services.

•               Proactive account management (often the initial access method for attackers).

•               Good system management.

•               Develop approaches that are functional but secure; install new functionality when it is shown to be secure.

Future security plans include:

•               Decrease potential ways of compromise. For example:

◦                                 Timely updates of OS with patches

◦                                 Replaced FDDI ring with giga-switch connections, reducing sniffer risk

◦                                 More aggressive account retirement:

▪                                                   Disabled after 90 days of inactivity

▪                                                   Removed after 180 days of inactivity

•               Improve network monitoring (e.g., adding monitoring hardware for second NERSC route).

•               Continue to improve overall security with experiences from other groups and sites.

•               Increase staffing for computer security. Two positions open:

◦                                 NERSC Computer Security Analyst

◦                                 Berkeley Lab Computer Protection Program Manager (CPPM)

•               Develop appropriate access methods.

◦                                 Team being formed.

◦                                 Working to convince SGI/Cray to support SSH.

◦                                 Diminish clear text password from network.

 


ExERSUG Membership

Ricky Kendall led a discussion of ExERSUG membership. Representation is needed from the following fields:

•               engineering

•               experimental

•               large data set

•               Grand Challenge (biology, combustion, climate)

Criteria for membership are participation and program management support.

 


New ExERSUG Vice Chair

Brian Hingerty and Jerry Potter were nominated, and Brian was elected.

 


ERSUG Meeting at SC98

There was discussion on whether to hold the meeting at the beginning or end of SC98, which will be held November 7-13 in Orlando. Until more details of the conference are known, the ERSUG meeting is tentatively set for Saturday-Sunday (11/7-8) or Sunday-Monday (11/8-9).

 


Greenbook Recommendations

Ricky summarized the Greenbook recommendations:

•               Expand available HPC resources.

•               Focus efforts to transition applications to MPP systems.

•               Strengthen the network.

•               Encourage collaboration and integration with ER programs; promote synergy between physical and computational sciences, computer science, and mathematics.

•               Encourage extensive software development targeted toward DOE program offices, facilitated by NERSC.

•               Encourage more local and distributed computing to reduce capacity load on capability systems, facilitated by NERSC.

 


 

New ExERSUG Chair

Before handing over the reins to Bas Braams, Ricky volunteered to finish the Greenbook and continue maintaining the ERSUG Web site. He also advised members to "step up or step out" -- i.e., participate and help the new chair.

Bas led the group in thanking Ricky for his leadership from 1996-98, and Ricky was presented with a mounted Cray 2 CPU module in recognition of his efforts.

 

Informative Documents

 


Contact

 1

Common Super Home

Jim Craw

 2

Low Cost Desktop Visualization Report

Nancy Johnston

 3

Graphlib/DISPLAY to NCAR Template Documentation

Nancy Johnston

 4

DFS Migration Issues

Keith Fitzgerald

 5

DOE 2000/ACTS Toolkit activities

William Saphir

 6

NERSC Information Update

Bill Kramer

 

 

Users Helping Users (UHU) Home Page

Purpose

Users Helping Users (UHU for short, pronounced "you-who") is a group of users who assist other users by mentoring or answering questions about using the NERSC's computational resources. The Co-Chairs are Mike Minkoff from Argonne National Laboratory and Brian Hingerty of Oak Ridge National Laboratory . The UHU group is a volunteer effort initiated by the NERSC user community and NUG.

 


UHU Mentor Goals

•               Provide information and expertise to the Energy Research high performance computing (HPC) community.

•               Ease the transition to new NERSC resources.

•               Help educate the NERSC user community on short and long term application development issues.

 


UHU Axioms of High Performance Computing

•               Axiom 1: It's not the compilers job! Never has been and never will be. 
Explanation: Compilers simply translate application software specifications to machine code that is run by users. The compiler can never know how to get the best performance for the application. Application developers must provide hints or transform software based on how the compiler writes machine code for the hardware.

•               Axiom 2: You must get your hands dirty to understand the field. 
Explanation: To port or develop an application for high performance parallel supercomputers you have to do some work to get enough experience to determine which mechanism (parallel algorithms, communication schemes, computer systems) is best for your application.

•               Axiom 3: There is always a tradeoff between short term and long term priorities.

◦                                 Short Term: Functioning and efficient Code for application science.

◦                                 Long Term: Software that can adapt to future technology changes. Your application software must be :

▪                                                   Modular

▪                                                   Portable

▪                                                   Functionally Complete

▪                                                   Able to deliver High Performance

•               Explanation: Design and implementation decisions must be analyzed prior to crafting the software. A quick re-engineering of a serial application might provide enough capability to do application science on the currently available parallel supercomputer, but this code might not work well on the next generation system.

 


UHU Activities

While UHU is new and many of these activities have not been implemented, UHU activities might include:

•               Generic mentor mail list.

•               WWW page for Participating mentors

•               Co-resident Visits among Mentors and Users

•               Educational information and workshops to help solve users' problems.

 


UHU Mentor List

•               Matti Alatalo ( email ), computational physicist

•               Tom Bettge ( email ), computational meteorologist

•               Tom Blum ( email ), lattice gauge theorist, computational physicist

•               Brian Hingerty ( email ), computational biologist

•               Ricky Kendall ( email ), computational chemist

•               Shichang Liu ( email ), computational physicist

•               Mike Minkoff ( email ), computational and computer scientist

•               Scott Parker ( email ), computational plasma physicist

•               Robert D. Ryne ( email ), computational accelerator physicist

•               Jeffrey L. Tilson ( email ), computation scientist, computational chemist

•               Doug Toussaint ( email ), computational physicist

•               Lu Zhong Yi ( email ), computational physicist