NERSCPowering Scientific Discovery for 50 Years

2007/2008 User Survey Results

Response Summary

Many thanks to the 467 users who responded to this year's User Survey. The response rate has significantly increased from previous years:

  • 70 percent of users who had used more than 1 million MPP hours when the survey opened responded
  • 43 percent of users who had used between 10,000 and 1 million MPP hours responded
  • The overall response rate for the 2,804 authorized users during the survey period was 16.3%.

The respondents represent all six DOE Science Offices and a variety of home institutions: see Respondent Demographics.

The survey responses provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve. The survey results are listed below.

You can see the 2007/2008 User Survey text, in which users rated us on a 7-point satisfaction scale. Some areas were also rated on a 3-point importance scale or a 3-point usefulness scale.

Satisfaction
Score
MeaningNumber of
Times Selected
7 Very Satisfied 9,486
6 Mostly Satisfied 6,886
5 Somewhat Satisfied 1,682
4 Neutral 1,432
3 Somewhat Dissatisfied 485
2 Mostly Dissatisfied 130
1 Very Dissatisfied 81
Importance ScoreMeaning
3 Very Important
2 Somewhat Important
1 Not Important
Usefulness ScoreMeaning
3 Very Useful
2 Somewhat Useful
1 Not at All Useful

The average satisfaction scores from this year's survey ranged from a high of 6.71 (very satisfied) to a low of 4.46 (neutral). Across 128 questions, users chose the Very Satisfied rating 9,486 times, and the Very Dissatisfied rating 81 times. The scores for all questions averaged 6.07, and the average score for overall satisfaction with NERSC was 6.3. See All Satisfaction Ratings.

For questions that spanned the 2007/2008 through 2003 surveys, the change in rating was tested for significance (using the t test at the 90% confidence level). Significant increases in satisfaction are shown in blue; significant decreases in satisfaction are shown in red.

Significance of Change
significant increase (change from 2006)
significant decrease (change from 2006)
not significant

Areas with the highest user satisfaction include account support, the NERSC Global Filesystem, the HPSS mass storage system, consulting services, network performance within the NERSC center, and up times for the Jacquard, Seaborg and Bassi systems.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2006
1234567
SERVICES: Account support     2 1 6 82 265 356 6.71 0.57 0.07
NGF: Reliability       1 1 16 47 65 6.68 0.59 0.25
NGF: Uptime       1 1 17 47 66 6.67 0.59 0.32
HPSS: Reliability (data integrity)     1 3 4 29 111 148 6.66 0.70 -0.04
OVERALL: Consulting and Support Services     3 9 13 91 310 426 6.63 0.71 0.11
Network performance within NERSC (e.g. Seaborg to HPSS)       4 4 49 111 168 6.59 0.66 0.06
CONSULT: overall     4 8 7 102 241 362 6.57 0.74 0.10
CONSULT: Timely initial response to consulting questions 1 1 2 4 11 107 229 355 6.55 0.77 -0.02
Jacquard: Uptime (Availability)       4 6 34 82 126 6.54 0.73 -0.04
HPSS: Uptime (Availability)     1 3 6 45 96 151 6.54 0.73 -0.08
Seaborg: Uptime (Availability)       4 7 38 89 138 6.54 0.73 0.30
Bassi: Uptime (Availability) 1   1 5 6 48 122 183 6.54 0.84 0.13

Areas with the lowest user satisfaction include batch wait times for Bassi and Jacquard, Franklin availability and I/O, training classes and data analysis and visualization services.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2006
1234567
OVERALL: Data analysis and visualization facilities     6 68 28 67 62 231 5.48 1.24 0.11
Jacquard: Batch wait time 2 3 13 6 28 40 34 126 5.47 1.46 -0.40
TRAINING: NERSC classes: in-person     2 20 3 8 18 51 5.39 1.42 -0.55
Seaborg SW: Visualization software 1   1 10 4 12 9 37 5.38 1.42 -0.07
Bassi SW: Visualization software     2 16 4 16 11 49 5.37 1.27 0.00
Franklin SW: Visualization software 1   3 19 7 19 16 65 5.34 1.38  
Live classes on the web   1   23 6 16 14 60 5.30 1.29 -0.46
Franklin: Disk configuration and I/O performance 8 11 20 36 34 73 51 233 5.15 1.63  
Franklin: Uptime (Availability) 7 11 48 12 46 86 47 257 5.04 1.64  
Bassi: Batch wait time 11 19 36 16 33 46 22 183 4.46 1.80 -1.39

The largest increases in satisfaction over last year's survey are for the now retired Seaborg IBM POWER3 system, for computer and network operations 24 by 7 support, and for the software available on our systems.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2006
1234567
Seaborg: Batch wait time 2 2 8 12 33 47 34 138 5.53 1.32 0.59
SERVICES: Computer and network operations support (24x7)     2 11 9 46 93 161 6.35 0.95 0.31
Seaborg: Uptime (Availability)       4 7 38 89 138 6.54 0.73 0.30
Seaborg: overall       5 12 55 71 143 6.34 0.78 0.25
OVERALL: Available Software     3 19 43 157 176 398 6.22 0.87 0.24

The largest decreases in satisfaction over last year's survey are shown below.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2006
1234567
Bassi: Batch wait time 11 19 36 16 33 46 22 183 4.46 1.80 -1.39
Jacquard SW: Visualization software       11 4 13 10 38 5.58 1.18 -0.54
Jacquard: Batch wait time 2 3 13 6 28 40 34 126 5.47 1.46 -0.40
Bassi: Batch queue structure 2 2 9 26 25 66 46 176 5.57 1.32 -0.35
Bassi: overall 1 1 7 7 30 74 67 187 5.96 1.11 -0.30

Survey Results Lead to Changes at NERSC

Every year we institute changes based on the previous year survey. In 2007 and early 2008 NERSC took a number of actions in response to suggestions from the 2006 user survey.

  1. 2006 user survey: On the 2006 survey four users had concerns about the MOTD not being updated fast enough after status changes or that it was too long.

    NERSC response: The computer and network operations support has streamlined their procedures for managing status changes during outages, thus giving users more current status information both in the MOTD and by email (for users registered to receive status informational emails). The operations staff also increased their knowledge of account support procedures in order to provide better off hours support. The satisfaction score for Computer and network operations (24x7) support had a significant increase of .3 points over last year's score.

  2. 2006 user survey: On the 2006 survey a number of users requested longer wall times for the largest machines.

    NERSC response: In January the wall time for Franklin's regular queues was increased from 12 hours to 24, and then to 36 hours in May. The satisfaction score for Franklin queue structure on the 2007/2008 survey was 6.03 out of 7.

  3. 2006 user survey: On the 2006 survey a number of users commented on poor reliability for PDSF disks, and the satisfaction score for PDSF disks had the lowest PDSF hardware rating (5.1).

    NERSC response: NERSC has retired over 90 percent of the old NFS disk vaults and has installed new fiber channel based storage with better failover capabilities. In 2007/2008 the satisfaction score for PDSF disks increased to 5.54.

  4. 2006 user survey: On the 2006 survey a number of users requested more resources for interactive and debug jobs.

    NERSC response: NERSC now reserves nodes for interactive/debug jobs on weekends; previously this was done Monday through Friday. This change did not change the satisfaction ratings significantly.

  5. 2006 user survey: On the 2006 survey users asked that we provide more cycles and Get Franklin online ASAP.

    NERSC response: Franklin was delivered in January and February 2007. Initially installed with Catamount on the compute nodes, NERSC and Cray decided to install, and successfully tested for production use, an early release of Compute Node Linux (CNL). Early users started using Franklin with CNL in July 2007, and all users had access in September. Franklin was accepted in late October, 2007. Since then NERSC and Cray have worked together to improve system stability, I/O performance and the user environment. In July 2008 NERSC and Cray began upgrading Franklin's compute nodes from dual core to quad core, doubling both the number of cores and the amount of memory. We are looking forward to future enhancements, such as integrating the compute nodes with the NERSC Global Filesystem.

Users are invited to provide overall comments about NERSC:

150 users answered the question What does NERSC do well?  

  • 103 respondents stated that NERSC gives them access to powerful computing resources without which they could not do their science;
  • 59 mentioned excellent support services and NERSC's responsive staff;
  • 24 highlighted good software support or an easy to use user environment;
  • 17 pointed to good queue management or job turnaround;
  • 15 mentioned data services (HPSS, large disk space, purge policy, NGF, data analysis).

Some representative comments are:

 

The NERSC facility is fantastic. I'm very pleased with the hardware available, the people, the help, and the queues.
NERSC is generally excellent, and has both leadership computing power and good ease of use, increasing productivity. This is most of all because the staff are very responsive to user needs and are effective in making leadership class machines work well for user applications. Additionally, the queue structure is clear and usable, the networking is very good, and the storage resources are adequate for large jobs. The analytics and visualization programs and associated software support are very important.
Good computing. Good storage. We always need more.
What NERSC is best at is the combination of large-scale computing facilities with more flexible queuing policies than in other comparable facilities. Also the existence of "small-scale supercomputers" (Jacquard) is very useful to make tests.
NERSC is excellent. Franklin is a great resource - lots of cores. The waiting of queues for large core runs is very nice. [Obviously there is a wait time for 16384 core run for 36 hours :) ]
NERSC does customer service very well. I am always pleased whenever I deal with NERSC. I would also say that NERSC's infrastructure for users is very helpful.

108 users responded to What should NERSC do differently?.

The top three areas of concern were Franklin stability and performance, job scheduling and resource allocation policies, and the need for more or different hardware resources. NERSC will analyze these comments and implement changes where possible over the next year.

Some of the comments from this section are:

It would be great if NERSC could magically improve the stability of Franklin... Unfortunately, hardware failures increase with the size and complexity of the system.
Need to improve network and load management on the log in nodes for Franklin. At times it is very difficult to get any work done since the response time is so slow.
Providing for long serial queues (~12 hours) and enabling these for applications such as IDL would further improve the usefulness of Franklin in particular. We appreciate your efforts to do this and look forward to finding a solution with you soon.
Less emphasis on INCITE, special users. More emphasis on providing high throughput for production applications.
As computing clusters grow, it would be very interesting/helpful for NERSC to invest in robust queuing systems such as Google's MapReduce model. It seems that all of NERSC's clusters are based upon the premise that failures are abnormal and can be dealt with as a special case. As clusters and job sizes grow, single point failures can really mess up a massively parallel job (Franklin) or a large number of parallel jobs (bad nodes draining queues on PDSF). Companies like Google have succeeded with their computing clusters by starting with the premise that hardware failures will happen regularly and building queuing systems that can automatically heal, rather than relying upon the users to notice that jobs are failing, stop them, alert the help system, wait for a fix, and then resubmit jobs.
I would suggest doing more to discourage single node and small jobs
NERSC's seaborg was a great success because of its reliability and its large amount of per-node memory. That led to the situations that majority of scientific codes ran well on it. The future computer (NERSC6) shall have a configuration with large amount of per-node memory (similar to bassi or larger, but with larger amount CPUs than bassi has).
Bassi has become so busy as to be almost useless to me.
NERSC should tell more about their strategic plans. Hopefully in three years we will be operating differently than we are now (command line submission, manual data management etc.) Is NERSC going to actively help with this, or simply be a resource provider? Is NERSC going to help campaign to get better performance and resiliency tools (fault tolerance) actually put into production vs being left as academic demos?
More disk space to users. The whole point of having a LARGE cluster is to do LARGE simulations. That means LARGE amounts of data. We should get more storage space (individually).

104 users answered the question How does NERSC compare to other centers you have used?   61 users stated that NERSC was an excellent center or was better than other centers they have used. Reasons given for preferring NERSC include its consulting services and responsiveness, its security, and its queue management.

25 users said that NERSC was comparable to other centers or gave a mixed review and 11 said that NERSC was not as good as another center they had used. Some users expressed dissatisfaction with user support, with available disk space or with queue turnaround time.

 

Here are the survey results:

  1. Respondent Demographics
  2. Overall Satisfaction and Importance
  3. All Satisfaction, Importance and Usefulness Ratings
  4. Hardware Resources
  5. Software
  6. Visualization and Data Analysis
  7. HPC Consulting
  8. Services and Communications
  9. Web Interfaces
  10. Comments about NERSC

Respondents by DOE Office and User Role:

OfficeRespondentsPercent
ASCR 53 11.3%
BER 55 11.8%
BES 133 28.5%
FES 64 13.7%
HEP 58 12.4%
NP 104 22.3%
User RoleNumberPercent
Principal Investigators 71 15.2%
PI Proxies 63 13.5%
Project Managers 7 1.5%
Users 326 69.8%

 

Respondents by Organization:

Organization TypeNumberPercent
Universities 275 58.9%
DOE Labs 145 31.0%
Other Govt Labs 26 5.6%
Industry 16 3.4%
Private labs 5 1.1%
OrganizationNumberPercent
Berkeley Lab 70 15.0%
UC Berkeley 30 6.4%
Oak Ridge 14 3.0%
PNNL 12 2.6%
NREL 11 2.4%
Tech-X Corp 11 2.4%
U. Washington 11 2.4%
U. Wisconsin 11 2.4%
UC Davis 11 2.4%
NCAR 8 1.7%
U. Tennessee 8 1.7%
Argonne 6 1.3%
Livermore 6 1.3%
PPPL 6 1.3%
U. Colorado 6 1.3%
U. Maryland 6 1.3%
SLAC 5 1.1%
Colorado State 5 1.1%
Ohio State 5 1.1%
Stanford 5 1.1%
Texas A&M 5 1.1%
U. Chicago 5 1.1%
U. Illinois 5 1.1%
U. Oklahoma 5 1.1%
Vanderbilt 5 1.1%
OrganizationNumberPercent
Brookhaven 4 0.9%
Auburn Univ 4 0.9%
MIT 4 0.9%
Rice University 4 0.9%
U. Michigan 4 0.9%
UC Irvine 4 0.9%
UCLA 4 0.9%
APC Lab Astro France 3 0.6%
Cal Tech 3 0.6%
Georgia State 3 0.6%
Harvard 6 2.3%
Jefferson Lab 3 0.6%
Los Alamos Lab 3 0.6%
Louisiana State 3 0.6%
Northwestern 3 0.6%
Princeton 3 0.6%
U. Texas 3 0.6%
UC Santa Barbara 3 0.6%
William & Mary 3 0.6%
NASA GISS 3 0.6%
Other Universities 95 20.3%
Other Gov. Labs 15 3.2%
Other DOE Labs 5 1.1%
Other Industry 5 1.1%
Private labs 5 1.1%

 

Which NERSC resources do you use?

ResourceResponses  PercentNum who answered
questions on this topic
  Percent
NERSC Information Management (NIM) System 280 60.0% 363 77.7%
NERSC web site (www.nersc.gov) 278 59.5% 383 82.0%
Cray XT4 Franklin 268 57.4% 293 62.7%
IBM POWER5 Bassi 214 45.8% 225 48.2%
Linux Cluster Jacquard 170 36.4% 185 39.6%
HPSS Mass Storage System 164 35.1% 195 41.8
Consulting services 164 35.1% 362 77.5%
IBM POWER3 Seaborg (now retired) 147 31.5% 168 36.0%
Account support services 127 27.2% 356 76.2%
PDSF Cluster 75 16.1% 95 20.3
DaVinci 66 14.1% 115 24.6%
Off-hours 24x7 Computer and Network Operations support 47 10.1% 161 34.5%
NERSC Global Filesystem (NGF) 24 5.1% 73 15.6%
Visualization services 15 3.2% 71 15.2%
NERSC CVS server 11 2.4% 94 20.1%
Grid services 8 1.7% 42 9.0%

 

How long have you used NERSC?

TimeNumberPercent
less than 6 months 89 19.4%
6 months - 3 years 199 43.4%
more than 3 years 171 37.3%

 

What desktop systems do you use to connect to NERSC?

SystemResponses
Unix Total 347
PC Total 232
Mac Total 207
Linux 316
OS X 206
Windows XP 181
Windows Vista 41
Sun Solaris 23
Windows 2000 10
IBM AIX 3
HP HPUX 2
MacOS 1
SGI IRIX 1

 

Web Browser Used to Take Survey:

BrowserNumberPercent
Firefox 2 202 43.3%
Safari 79 16.9%
MSIE 7 54 11.6%
Firefox 3 52 11.1%
Firefox 1 35 7.5%
Mozilla 25 5.4%
MSIE 6 17 3.6%
Opera 3 0.6%

 

Operating System Used to Take Survey:

OSNumberPercent
Mac OS X 170 36.4%
Linux 138 29.6%
Windows XP 128 27.4%
Windows Vista 20 4.3%
Windows Server 2003 5 1.1%
Windows 2000 3 0.6%
SunOS 3 0.6%