NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
  Untitled Document

PETASCALE SYSTEMS INTEGRATION INTO LARGE SCALE FACILITIES WORKSHOP

Agenda:

DAY ONE: May 15, 2007 – 8 am start

First Session – Plenary
Introduction and Logistics – Bill Kramer/Yeen Mankin
Welcome – Dan Hitchcock
Motivation for the Workshop – Bill Kramer
System Integration at LLNL – Mark Seager
System Integration at NCAR – Tom Bettge

Break

All breakout session chairs will be asked to report back to the plenary on the following issues:

  1. What are the major challenges in this area?
  2. What methods and technology are currently being used and how do we use them
  3. What methods and technology work and which ones do not
  4. What tools and technology do we wish have – particularly for Petascale systems
  5. Other observations/suggestions/issues

Second Session – Breakouts
            Breakout 1 – Integration Issues for Facilities – Petascale systems are pushing the limits of facilities in terms of space, power, cooling and even weight.  There are many complex issues to deal with when integrating large scale systems and these will get more challenging with Petascale systems.  While we all hope technology will reverse these trends, can we count on it?  Besides building large facilities (at Moore’s law rates) how can we better optimize facilities?  How can the lead times and costs for site preparation be reduced?  Can real time adjustments be made rather than over design?
            Breakout 1 Leaders - Howard Walter, Gary New (Steve Lowe)
           
            Breakout 2 – Performance Assessment of Systems – There are many tools and benchmarks that help assess performance of systems, ranging from single performance kernels to full applications.  Performance tests can be kernels, specific performance probes and composite assessments.  What are the most effective tools?  What scale tests are needed to set system performance expectations and to assure system performance?  What are the best combinations of tools and tests?
            Breakout 2 Leaders – Tom Engel (NCAR), Rob Pennington (NCSA) (H Wassermann)
           
            Breakout 3 – Methods of Testing and Integration – There are a range of methods for fielding large scale systems, ranging from self integration, cooperative development, factory testing, and on-site acceptance testing.  Each site and system has different goals and selects from the range of methods.  When are different methods appropriate?  What is the right balance between the different approaches?  Are there better combinations than others? 
            Breakout 3 Leaders – Brad Comes (DOD), Buddy Bland (ORNL) (N Cardo/F Verdier)

Third Session – Plenary
            Reports from breakouts
           
            Panel – The Vendor Side of Deployments – TBD (Cray), Chulho Kim (IBM), Renato Ribeiro (Sun), Dave Sundstrom (Linux Networx) 
               Petascale HPC Deployments: Sun's Perspective, Renato Ribeiro, Ph.D., Manager, Integrated Systems Marketing

Working Dinner
            Panel – If only I had known! The biggest blunders/mistakes and humorous experiences in large system deployments (All)

DAY TWO: May 16, 2007 – 8 am start

Fourth Session – Breakouts
            Breakout 4 – Systems and User Environment Integration Issues – Breakout session #2 looked at performance and benchmarking tools.  While performance is one element of successful systems, so are effective resource management, reliability, consistency and usability, to name a few.  Other than performance, what other areas are critical to successful integration?  How are these evaluated?
            Breakout 4 Leaders - Mike McCraney (MHPCC), TDB (T Davis)
           
            Breakout 5 - Early Warning signs of problems – detecting and handling – Fielding large scale systems is a major project in its own right, and takes cooperation between site staff, stakeholders, users, vendors, third party contributors and many more.  How can early warning signs of problems be detected?  When they are detected, what should be done about them?  How can they be best handled to have the highest and quickest success?  How do we insure long-term success vs the pressure of quick milestone accomplishment?  Will the current focus on formal project management methods help or hinder?
            Breakout 5 Leaders - Bob Tomlinson (LANL), Jim Kasdorf (PSC) (R Gerber)

            Breakout 6 – How to keep systems running up to expectations – Once systems are integrated and accepted, is the job done?  If systems pass a set of tests, will they continue to perform at the level they start at?  How can we assure systems continue to deliver what is expected?  What levels and types of continuous testing are appropriate?
            Breakout 6 Leaders - Dave Skinner (NERSC), Kevin Regimbal (PNNL) (T Butler)

Fifth Session – Plenary
            Reports from breakouts

            Panel Session – How will Petascale systems change what we have been doing? – Ray Bair (ANL), Phil Andrews (SDSC), Brad Comes (DODMod)
               How Should Petascale Systems Change what we are Doing?, Ray Bair, Director, Agronne Leadership Computing Facility.
               How will Petascale systems change what we have been doing?, Phil Andrews, Patricia Kovatch, SDSC
               DoD HPC Modernization Program, Bradley Comes

Sixth Session – Plenary
            Report Summary
           
            Conclusion

Workshop adjourns – 5 pm on May 16


LBNL Home
Page last modified: Tue, 22 May 2007 19:21:09 GMT
Page URL: http://www.nersc.gov/projects/HPC-Integration/agenda.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science