Table of Contents
Many thanks to the 326 users who responded to this year's User Survey -- this represents the highest response level yet in the six years we have conducted the survey. The respondents represent all five DOE Science Offices and a variety of home institutions: see Respondent Demographics.
The survey responses provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve. The survey results are listed below.
You can see the FY 2003 User Survey text, in which users rated us on a 7-point satisfaction scale. Some areas were also rated on a 3-point importance scale or a 3-point usefulness scale.
The average satisfaction scores from this year's survey ranged from a high of 6.61 (very satisfied) to a low of 4.67 (somewhat satisfied). See All Satisfaction Questions. Areas with the highest user satisfaction were:
|Topic||Avg Score||No. of Responses|
|Consulting - timely response||6.55||207|
|Consulting - technical advice||6.54||200|
|Local Area Network||6.54||114|
Areas with the lowest user satisfaction were:
|Topic||Avg Score||No. of Responses|
|Access Grid classes||4.67||27|
|Escher visualization software||4.75||8|
|NERSC training classes||4.88||24|
The largest increases in satisfaction over last year's survey came from the IBM SP, Seaborg, HPSS uptime, network connectivity, and available hardware:
|Topic||Avg Score||Increase from 2002||No. of Responses|
|SP Disk Configuration and I/O Performance||6.15||0.18||156|
The areas rated significantly lower this year were:
|Topic||Avg Score||Decrease from 2002||No. of Responses|
|PDSF Fortran Compilers||6.03||-0.42||29|
|PDSF Ability to Run Interactively||5.77||-0.41||64|
|SP Queue Structure||5.69||-0.23||177|
Survey Results Lead to Changes at NERSC
Every year we institute changes based on the survey. NERSC took a number of actions in response to suggestions from the 2002 user survey.
SP resource scheduling:
- Could longer run time limits be implemented across the board?
NERSC response: In March 2003 limits were extended from 8 to 48 hours for jobs running on 32 or more nodes, and from 8 to 12 hours for jobs run on 31 or fewer nodes. The "regular long" class, which provides a 24 hour limit for jobs run on 31 or fewer nodes, was preserved but with restrictions on the number of jobs that can run simultaneously.
- Could more services be devoted to interactive jobs?
NERSC response: In March 2003 interactive jobs were given an additional system priority boost (placing them ahead of debug jobs).
- Could there be a serial queue?
NERSC response: Two new classes to facilitate pre-and-post data processing and data transfers to HPSS were introduced in November, 2003. Jobs run in these classes are charged for one processor's wall clock time.
- Could more resources be devoted to the "small node-long runtime" class (more nodes, a longer run time, better throughput)?
NERSC response: Resources were not increased for "regular long" types of jobs; rather the priority has been to increase resources for jobs running on more than 32 nodes. This is in line with the DOE Office of Science's goal that 1/4 of all batch resources be applied to jobs that use 1/8 of the available processors. For FY 2004 this goal has been increased to target 1/2 of the batch resources. Perhaps because of this resource prioritization, satisfaction with the SP queue structure dropped by 0.2 points.
SP software enhancements:
- Could the Unix environment be more user-friendly (e.g. more editors and shells in the default path)?
NERSC response: The most recent versions of vim, nano, nedit, gvim, pico, xemacs are now in in all users' paths by default, as well as the compression utilities zip and bunzip2. Two new utilities help make the batch environment easier to use: llhist shows recently completed jobs and ll_check_script gives warnings/advice on crafting batch scripts. This year's rating for SP applications went up by 0.3 points..
- Could there be more data analysis software, including matlab?
NERSC response: Matlab and Mathematica are available on the math server, newton. Matlab is not available on the IBM SP because big Matlab jobs can severely affect other users on the interactive nodes. The IDL (Interactive Data Language) package is available on Seaborg for interactive data analysis and visualization of data.
- NERSC needs more computational power overall.
Could a vector resource be provided?
Could mid-range computing or cluster resources be provided?
NERSC response: All the above are excellent suggestions and we certainly understand the desire for more computational resources. The FY 2004 Seaborg allocation requests were for 2.4 times the amount available to allocate. The reality is that there is no budget for additional hardware acquisitions. Last year we were able to double the number of nodes on Seaborg and this year's rating for available computing hardware increased by 0.2 points.
- Provide better searching, navigation, organization of the information.
NERSC response: The NERSC user web site (http://hpcf.nersc.gov) has been restructured with new navigation links that should make finding information faster and easier. Related information has been consolidated. Printer-friendly links have been added to consolidate multi-page documents into a single one. The final phase of the update will be to encode descriptions for each page to increase the effectiveness of the search engine.
- Enhance SP documentation.
NERSC response: We have made an effort to keep up-to-date on a wide range of SP topics: IBM compilers, the LoadLeveler batch system, IBM SP specific APIs, and links to IBM redbooks. In addition the presentation of SP information has been streamlined; hopefully information is easier to find now. In August 2003 we received positive comments from ScicomP 8 attendees in regard to how we present IBM documentation.
- Provide more training on performance analysis, optimization and debugging.
NERSC response: Since last year's survey NERSC has emphasized these topics in our training classes, for example: CPU performance analysis on Seaborg, Scaling I/O and Communication, Debugging Parallel Programs with Totalview. See http://www.nersc.gov/nusers/services/training/.
- Provide more information in the New Users Guide.
NERSC response: More information on initial account setup was added to the New User Guide, which was also reformatted for ease of use. See http://hpcf.nersc.gov/help/new_user/.
This year's survey included several new questions:
- How useful were the DOE and NERSC scaling initiatives? [Read the Scaling Initiatives Response Page]
In FY 2003 NERSC implemented initiatives aimed at promoting highly scalable applications as part of the DOE emphasis on large scale computing. For the first time, DOE had in FY 2003 an explicit goal that "25% of the usage will be accounted for by computations that require at least 1/8 of the total [compute] resource." (Note: for FY 2004 this goal is for 50% of the usage, rather than 25%.)
The 24 respondents who had participated in the Large Scale Jobs Reimbursement Program and the 32 respondents who had worked on scaling their codes with the NERSC consultants rated these initiatives as "very useful" on average. poe+, used to measure code performance characteristics, had been used by 104 respondents and was also rated "very useful" on average. The 115 respondents who rated Seaborg's new batch class structure, designed to give preference to high concurrency jobs, gave it an average rating of "somewhat useful".
20 users wrote comments in support of the scaling initiatives, for example:
Please push this project as much as you can. This type of consulting is very important if one goes to the limit of a system in terms of #processors and sustained performance.11 users stated why they thought these initiatives are misguided. The general theme behind these comments was that it is science output that is important, not scaling per se. Some representative comments here:
I believe that they are totally misguided. The emphasis should be on maximizing the SCIENTIFIC output from NERSC. If the best way to do this is for the user to run 100 1-node jobs at a time rather than 1 100-node job, every effort should be made to accommodate him/her. ... In the final analysis, it should be up to the users to decide how they use their allocations. Most, if not all of us, will choose a usage pattern which maximizes our scientific output. Remember that most of us are in computational science, not in computer science. We are interested in advancing our own fields of research, not in obtaining Gordon Bell awards.
Don't freeze out the small-to-moderate user --- the science/CPU hour is often higher for the moderate user.
There is always a tension between massive users and those who want to run smaller jobs. While many researchers use a single node (16 processors), I think it would not be cost effective for DOE to pay them to run on their own machines.
- Why do you compute at NERSC? (What are the reasons NERSC is important to you?) [Read All 229 Responses]
Many of the answers were along the lines of "to run my codes in order to get my science done". Users pointed out that they need powerful compute resources that they can't get elsewhere. Many users specifically mentioned large numbers of processors or parallel computing as a reason to compute at NERSC. Turnaround time (getting results fast) is very important. Data analysis, especially in the context of PDSF computing is also a common theme. One user even pointed out that the time is "free".
- Has security gotten in the way of your work at NERSC?
Ninety percent of the respondents (217 users) answered no to this question.
- If security has gotten in the way of your work at NERSC, how? [Read All 25 Responses]
25 users answered this question:
- 10 pointed to difficulties accessing NERSC (the change to ssh version 2, FTP retirement, difficulties with tunneling and ports).
- 6 reported password or login attempt problems.
- 3 encountered difficulties with accessing HPSS
- 3 had grid/distributed computing concerns,
- 3 said "it's inconvenient".
- How do you compare NERSC openness and access to your home site and others? [Read All 146 Responses]
- 49% stated that NERSC has similar or greater openness than other sites they access
- 28% said that NERSC's openness or security measures are good (without making a comparison)
- 9% said that NERSC is less open or too secure
Users are invited to provide overall comments about NERSC:
119 users answered the question What does NERSC do well? 69 respondents pointed specifically to NERSC's good hardware management practices which provide users with excellent access to HPC resources; 62 mentioned User Support and NERSC's responsive staff; 17 highlighted documentation and 13 job scheduling and batch throughput. Some representative comments are:
Powerful and well maintained machines, great mass storage facility, and helpful and responsive staff. What more could you want?
As Apple would put it .... "it just works". I get my work done and done fast. Seaborg is up and working nearly all the time. Network, storage, it's all there when I need it. That is what matters most and NERSC delivers.
NERSC simply is the best run centralized computer center on the planet. I have interacted with many central computer centers and none are as responsive, have people with the technical knowledge available to answer questions and have the system/software as well configured as does NERSC.
75 users responded to What should NERSC do differently?. The area of greatest concern is job scheduling; 14 users expressed concerns with favoring large jobs at the expense of smaller ones; six wanted more resources devoted to interactive computing and debugging. Next in concern is the need for more hardware: more compute power overall, different architectures, mid-range computing support, vector architectures. Eight users pointed out the need for better documentation and six wanted more training. Some of the comments from this section are:
NERSC's new emphasis favoring large (1024+ processor) jobs runs contrary to its good record of catering to the scientific community. It needs to remember the community it is serving --- the customer is always right. The queue configuration should be returned to a state where it no longer favours jobs using large numbers of processors.
I'm not in favor of giving highest priority to the extremely large jobs on all nodes of seaborg. I think that NERSC must accommodate capacity computing for energy research that cannot be performed anywhere else, in addition to providing capability computing for the largest simulations.
NERSC should move more aggressively to upgrade its high end computing facilities. It might do well to offer a wider variety of architectures. For example, the large Pentium 4 clusters about to become operational at NCSA provide a highly cost effective resources for some problems, but not for others. If NERSC had a greater variety of machines, it might be able to better serve all its users. However, the most important improvement would be to simply increase the total computing power available to users.
It would be great if NERSC could again acquire a supercomputer with excellent vector-processing capability, like the CRAY systems which existed for many years. The success of the Japanese "Earth Simulator" will hopefully cause a re-examination of hardware purchase decisions. Strong vector processors make scientific programming easier and more productive.
Measure success on science output and not on size of budgets or quantity of hardware.
The overhead on account managers still seems a bit much for what we're getting. I still find the ERCAP process onerous (i.e., more information requested than should be necessary). Also, most of the codes we are using are changing much more from year to year in a scientific sense than a computational sense, it becomes repetitious to have to keep evaluating them computationally each year. You need to keep in mind that most of us are being funded to do science rather than computational research.
65 users answered the question How does NERSC compare to other centers you have used? 63% of the respondents stated that NERSC was a good center (no comparison made) or was better than other centers they used. Reasons given for preferring NERSC include good hardware, networking and software management, good user support, and better job throughput. 11% of the respondents said that NERSC was not as good as another center they used. The most common reason for finding dissatisfaction with NERSC is job scheduling.
Here are the survey results:
- Respondent Demographics
- Overall Satisfaction and Importance; Why do you use NERSC?; Security and Flexible Work Option
- All Satisfaction Questions and Changes from Previous Years
- DOE and NERSC Scaling Initiatives
- Web, NIM, and Communications
- User Services
- Comments about NERSC