NERSCPowering Scientific Discovery for 50 Years

2005 User Survey Results

Response Summary

Many thanks to the 201 users who responded to this year's User Survey. The respondents represent all six DOE Science Offices and a variety of home institutions: see Respondent Demographics.

The survey responses provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve. The survey results are listed below.

You can see the 2005 User Survey text, in which users rated us on a 7-point satisfaction scale. Some areas were also rated on a 3-point importance scale or a 3-point usefulness scale.

Satisfaction ScoreMeaning
7 Very Satisfied
6 Mostly Satisfied
5 Somewhat Satisfied
4 Neutral
3 Somewhat Dissatisfied
2 Mostly Dissatisfied
1 Very Dissatisfied
Importance ScoreMeaning
3 Very Important
2 Somewhat Important
1 Not Important
Usefulness ScoreMeaning
3 Very Useful
2 Somewhat Useful
1 Not at All Useful

The average satisfaction scores from this year's survey ranged from a high of 6.73 (very satisfied) to a low of 3.95 (neutral). See All Satisfaction Ratings.

For questions that spanned the 2004 and 2005 surveys the change in rating was tested for significance (using the t test at the 90% confidence level). Significant increases in satisfaction are shown in blue; significant decreases in satisfaction are shown in red.

Significance of Change
significant increase
significant decrease
not significant

Areas with the highest user satisfaction include the HPSS mass storage system, HPC consulting, and account support services:

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
Account support services     1 1 4 25 119 150 6.73 0.61 0.06
HPSS: Reliability (data integrity)       1 1 19 68 89 6.73 0.54 -0.01
OVERALL: Consulting and Support Services     1 1 2 38 137 179 6.73 0.57 0.06
CONSULT: overall     1 1 4 36 118 160 6.68 0.62 -0.01
HPSS: Uptime (Availability)       2 1 21 65 89 6.67 0.62 0.01

Areas with the lowest user satisfaction include batch wait times on both Seaborg and Jacquard, Seaborg's queue structure, PDSF disk stability, and Jacquard performance and debugging tools:

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
Jacquard SW: Performance and debugging tools 1   4 4 6 15 7 37 5.35 1.44  
Jacquard: Batch wait time 2 1 10 8 12 24 13 70 5.16 1.54  
PDSF: Disk configuration and I/O performance   1 2 8 10 8 6 35 5.14 1.29 -0.45
Seaborg: Batch queue structure 6 3 14 17 17 53 16 126 5.06 1.58 0.39
Seaborg: Batch wait time 17 15 28 13 33 27 5 138 3.95 1.76 0.10

The largest increases in satisfaction over last year's survey are shown below:

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
NERSC CVS server       2 5 15 17 39 6.21 0.86 0.87
Seaborg: Batch queue structure 6 3 14 17 17 53 16 126 5.06 1.58 0.39
PDSF SW: C/C++ compilers         1 9 18 28 6.61 0.57 0.37
Seaborg: Uptime (Availability)       3 2 48 85 138 6.56 0.64 0.30
OVERALL: Available Computing Hardware   3 3 4 37 88 46 181 5.89 0.98 0.24
OVERALL: Network connectivity 1   2 4 5 62 104 178 6.45 0.86 0.18

Only three areas were rated significantly lower this year: PDSF overall satisfaction and uptime, and the amount of time taken to resolve consulting issues. The introduction of three major ssytems in the last year combined with a reduction in consulting staff explain the latter.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
PDSF: Overall satisfaction       3 4 22 10 39 6.00 0.83 -0.52
PDSF: Uptime (availability)     1 5 3 16 12 37 5.89 1.10 -0.51
CONSULT: Amount of time to resolve your issue   2 1 3 7 54 86 153 6.41 0.89 -0.19

Survey Results Lead to Changes at NERSC

Every year we institute changes based on the previous year survey. In 2005 NERSC took a number of actions in response to suggestions from the 2004 user survey.

  1. 2004 user comments: On the 2004 survey 37 users asked us to change the job scheduling policies on Seaborg, 25 requesting more support for midrange jobs.

    NERSC response: In early 2005 NERSC implemented two changes to the queueing policies on Seaborg:

    1. we reduced the scheduling distance between midrange and large jobs
    2. we gave all premium jobs a higher scheduling priority than regular priority large-node jobs (prior to this change the scheduling priority for premium midrange jobs was lower than that for regular priority large-node jobs).

    User satisfaction with Seaborg's batch queue structure increased by .39 points on the 2005 survey.

  2. 2004 user comments: On the 2004 survey 25 users requested additional computing resources. In addition, another set of 25 users requested more support for midrange jobs.

    NERSC response: In August 2005 NERSC deployed Jacquard, a Linux cluster with 640 2.2 Ghz Opteron CPUs available for computations and a theoretical peak computational performance of 2.8 Teraflops. This was followed in January 2006 with Bassi, an IBM Power5 system with 888 processors available for computations and a theoretical peak computational performance of 6.7 Teraflops.

    User satisfaction with NERSC's available computing hardware increased by .24 points on the 2005 survey.

  3. 2004 user comment: "Faster network connectivity to the outside world. I realize that this may well be out of your hands, but it is a minor impediment to our daily usage."

    NERSC response: During 2005 NERSC upgraded its network infrastructure to 10 gigabits per second. User satisfaction with network connectivity increased by .18 points on the 2005 survey.

  4. 2004 user comment: "I want imagemagick on seaborg. Then I could make movies there, and that would complete my viz needs."

    NERSC response: NERSC installed imagemagick on Seaborg; it is also available on Jacquard and DaVinci.

Users are invited to provide overall comments about NERSC:

82 users answered the question What does NERSC do well?   47 respondents stated that NERSC gives them access to powerful computing resources without which they could not do their science; 32 mentioned excellent support services and NERSC's responsive staff; 30 pointed to very reliable and well managed hardware; and 11 said everything. Some representative comments are:

powerful is the reason to use NERSC

65 users responded to What should NERSC do differently?. The areas of greatest concern are the inter-related ones of queue turnaround times (24 comments), job scheduling and resource allocation policies (22 comments), and the need for more or different computational resources (17 comments). Users also voiced concerns about data management, software, group accounts, staffing and allocations. Some of the comments from this section are:

The most important improvement would be to reduce the amount of time that jobs wait in the queue; however, I understand that this can only be done by reducing the resource allocations.

A queued job sometimes takes too long to start. But I think that, given the amount of users, probably there would be no efficient queue management anyway.

Over-allocation is a mistake. Long waits in queues have been a disaster for getting science done in the last few years. INCITE had a negative affect on Fusion getting its science work done.

It's much better to have idle processors than idle scientists/physicists. What matters for getting science done is turnaround time. ...

... Interactive computing on Seaborg remains an issue that needs continued attention. Although it has greatly improved in the past year, I would appreciate yet more reliable availability.

Expand capabilities for biologists; add more computing facilities that don't emphasize the largest/fastest interconnect, to reduce queue times for people who want to runs lots of very loosely coupled jobs. More aggressively adapt to changes in the computing environment.

NERSC needs to expand the IBM-SP5 to 10000 processors to replace the IBM-SP3
Continue to test new machines, including the Cray products

NERSC needs to push to get more compute resources so that scientists can get adequate hours on the machine

51 users answered the question How does NERSC compare to other centers you have used?   Twenty six users stated that NERSC was an excellent center or was better than other centers they have used. Reasons given for preferring NERSC include its consulting services and responsiveness, its hardware and software management and the stability of its systems.

Twelve users said that NERSC was comparable to other centers or gave a mixed review and seven said that NERSC was not as good as another center they had used. The most common reason for finding dissatisfaction with NERSC is the oversubscription for its computational resources and the resulting long wait times. Among PDSF users, the most common dissatisfaction was with disk instability.

 

Here are the survey results:

  1. Respondent Demographics
  2. Overall Satisfaction and Importance
  3. All Satisfaction, Importance and Usefulness Ratings
  4. Hardware Resources
  5. Software
  6. Visualization and Data Analysis
  7. HPC Consulting
  8. Services and Communications
  9. Web Interfaces
  10. Training
  11. Comments about NERSC

Respondent Demographics

Number of respondents to the survey: 201

  • Respondents by DOE Office and User Role
  • Respondents by Organization
  • Which NERSC resources do you use?
  • How long have you used NERSC?
  • What desktop systems do you use to connect to NERSC?
  • Web Browser Used to Take Survey
  • Operating System Used to Take Survey

 

Respondents by DOE Office and User Role:

OfficeRespondentsPercent
ASCR 18 9.0%
BER 28 13.9%
BES 40 20.0%
FES 27 13.4%
HEP 39 19.4%
NP 46 22.9%
guests 3 1.4%
User RoleNumberPercent
Principal Investigators 38 18.9%
PI Proxies 23 11.4%
Project Managers 6 3.0%
Users 134 66.7%

 

Respondents by Organization:

Organization TypeNumberPercent
Universities 110 54.8%
DOE Labs 72 35.8%
Other Govt Labs 14 7.0%
Industry 5 2.5%
OrganizationNumberPercent
Berkeley Lab 41 20.4%
UC Berkeley 11 5.5%
Livermore 7 3.5%
U. Washington 7 3.5%
PPPL 5 2.5%
SLAC 5 2.5%
U. Wisconsin 5 2.5%
Argonne 4 2.0%
Oak Ridge 4 2.0%
Texas A&M 4 2.0%
U. Chicago 4 2.0%
OrganizationNumberPercent
Courant Inst 3 1.5%
NCAR 3 1.5%
Tech-X Corp 3 1.5%
UC Davis 3 1.5%
U. Michigan 3 1.5%
Yale 3 1.5%
Other University 67 33.3%
Other Gov. Labs 11 5.6%
Other DOE Labs 6 3.0%
Other Industry 2 1.0%

 

Which NERSC resources do you use?

ResourceResponses  PercentNum who answered
questions on this topic
  Percent
IBM SP (Seaborg) 149 74.1% 144 71.6%
NIM 126 62.7% 145 72.1%
NERSC web site (www.nersc.gov) 118 58.7% 162 80.6%
HPSS 93 46.3% 140 69.7%
Jacquard 82 40.8% 79 39.3%
Consulting services 78 38.9% 179 89.1%
Account support services 57 28.4% 150 74.6%
PDSF 42 20.9% 39 19.4%
DaVinci: overall 20 10.0% 20 10.0%
Computer and Network Operations 13 6.5% 92 45.8%
Visualization services 12 6.0% 36 17.9%
NERSC CVS server 8 4.0% 39 19.4%
Grid services 5 2.5% 43 21.4%

 

How long have you used NERSC?

TimeNumberPercent
less than 6 months 29 14.7%
6 months - 3 years 87 44.2%
more than 3 years 81 41.1%

 

What desktop systems do you use to connect to NERSC?

SystemResponses
Unix Total 181
PC Total 96
Mac Total 77
Linux 140
Windows XP 78
OS X 66
Sun Solaris 24
Windows 2000 15
MacOS 11
IBM AIX 6
SGI IRIX 6
HP HPUX 3
Windows ME/98 2

 

Web Browser Used to Take Survey:

BrowserNumberPercent
Mozilla 68 34.0%
Firefox 59 29.5%
Safari 35 17.5%
MSIE 6 34 17.0%
Netscape 4 3 1.5%
Galeon 1 0.5%

 

Operating System Used to Take Survey:

OSNumberPercent
Linux 76 38.0%
Windows XP 52 26.0%
Mac OS X 52 26.0%
SunOS 9 4.5%
Windows 2000 7 3.5%
IRIX 2 1.0%
MacOS 1 0.5%
Windows 98 1 0.5%

All Satisfaction, Importance and Usefulness Ratings

  • Legend
  • All Satisfaction Topics - by Score
  • All Satisfaction Topics - by Number of Responses
  • All Importance Topics
  • All Usefulness Topics

 

Legend

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied 5.50 - 6.49
Somewhat Satisfied 4.50 - 5.49
Neutral 3.50 - 4.49
ImportanceAverage Score
Very Important 2.50 - 3.00
Somewhat Important 1.50 - 2.49
Significance of Change
significant increase (change from 2004)
significant increase (change from 2003)
significant decrease (change from 2004)
significant decrease (change from 2003)
not significant
UsefulnessAverage Score
Very Useful 2.50 - 3.00
Somewhat Useful 1.50 - 2.49

 

All Satisfaction Topics - by Score

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004Change from 2003
1234567
Account support services     1 1 4 25 119 150 6.73 0.61 0.06 0.34
HPSS: Reliability (data integrity)       1 1 19 68 89 6.73 0.54 -0.01 0.12
OVERALL: Consulting and Support Services     1 1 2 38 137 179 6.73 0.57 0.06 0.36
CONSULT: overall     1 1 4 36 118 160 6.68 0.62 -0.01 0.34
HPSS: Uptime (Availability)       2 1 21 65 89 6.67 0.62 0.01 0.13
CONSULT: Timely initial response to consulting questions   1   1 3 41 111 157 6.65 0.66 -0.05 0.10
CONSULT: Quality of technical advice       2 3 47 101 153 6.61 0.60 0.03 0.07
NERSC security   1   6 1 44 125 177 6.61 0.75 0.13  
Computer and Network Operations     1 2 1 24 64 92 6.61 0.73 0.10  
PDSF SW: C/C++ compilers         1 9 18 28 6.61 0.57 0.37 0.17
Network performance within NERSC (e.g. Seaborg to HPSS)     1 1 2 31 71 106 6.60 0.67 0.14 0.06
CONSULT: Followup to initial consulting questions   1 1 5 1 37 101 146 6.57 0.83 -0.09 0.08
Seaborg: Uptime (Availability)       3 2 48 85 138 6.56 0.64 0.30 0.14
GRID: Job Submission     1   3 9 27 40 6.53 0.85    
HPSS: Overall satisfaction   1 1   2 34 58 96 6.51 0.79 -0.05 0.05
GRID: Job Monitoring     1 1 1 11 26 40 6.50 0.88    
SP SW: Fortran compilers     2 2 3 33 65 105 6.50 0.81 0.08 0.16
PDSF SW: Programming libraries         1 10 11 22 6.45 0.60 0.32 0.45
OVERALL: Network connectivity 1   2 4 5 62 104 178 6.45 0.86 0.18 0.22
PDSF SW: Software environment       1   14 15 30 6.43 0.68 0.08 0.10
GRID: Access and Authentication     2 2 2 7 30 43 6.42 1.10    
SP SW: Programming libraries     1 6 1 35 57 100 6.41 0.87 0.15 0.14
CONSULT: Amount of time to resolve your issue   2 1 3 7 54 86 153 6.41 0.89 -0.19 0.05
HPSS: Data transfer rates 1     3 4 31 51 90 6.40 0.93    
WEB: Accuracy of information   1 1 2 10 57 82 153 6.40 0.81 -0.01 0.15
SP SW: Software environment   1   3 3 54 60 121 6.39 0.78 0.05 0.15
TRAINING: New User's Guide       4 7 25 45 81 6.37 0.84 0.10 0.11
SP SW: C/C++ compilers   1 1 4 2 25 46 79 6.37 1.00 0.11 0.15
SERVICES: Response to special requests (e.g. disk quota increases, etc.)       6 6 20 46 78 6.36 0.93 0.28 0.01
OVERALL: Mass storage facilities 1 1 2 8 9 35 84 140 6.31 1.11 -0.04 0.19
WEB: NERSC web site overall (www.nersc.gov)     3 5 9 69 76 162 6.30 0.86 -0.02 0.30
GRID: File Transfer     2 2 3 11 25 43 6.28 1.10    
OVERALL: Software management and configuration 1   3 7 9 62 71 153 6.22 1.00 0.03 0.18
TRAINING: Web tutorials     1 4 3 39 31 78 6.22 0.85 0.12 0.15
NERSC CVS server       2 5 15 17 39 6.21 0.86 0.87  
OVERALL: Satisfaction with NERSC   2 1 5 16 93 76 193 6.20 0.87 0.10 -0.17
PDSF SW: Fortran compilers     1 1   5 8 15 6.20 1.21 0.33 0.17
PDSF SW: General tools and utilities       1 2 13 9 25 6.20 0.76 0.37 0.27
OVERALL: Available Software 1   3 11 12 63 80 170 6.19 1.04 -0.05 0.14
CONSULT: Software bug resolution   1 3 5 4 33 42 89 6.17 1.11 0.05 0.53
On-line help desk 1   2 6 9 26 47 91 6.16 1.16 0.00 0.14
SP SW: Applications software     3 2 3 32 27 67 6.16 0.98 0.03 0.16
SERVICES: Allocations process 1   5 5 7 49 57 124 6.16 1.11 0.23 0.47
NIM   2 1 11 14 48 69 145 6.15 1.07 -0.09 0.07
Jacquard SW: C/C++ compilers 1     3 1 24 19 48 6.15 1.09    
PDSF SW: Applications software       2 1 11 8 22 6.14 0.89 0.35 0.27
TRAINING: NERSC classes: in-person       3   5 8 16 6.12 1.15 0.64 1.25
Remote network performance to/from NERSC (e.g. Seaborg to your home institution)   3 6 3 9 47 61 129 6.12 1.19 0.01 0.00
HPSS: User interface (hsi, pftp, ftp) 1 1 5 2 7 27 46 89 6.12 1.29 -0.01 0.14
WEB: Timeliness of information   2 1 6 17 65 58 149 6.12 0.97 -0.05 0.07
Jacquard SW: Software environment 1   1 1 7 32 25 67 6.12 1.02    
SP SW: General tools and utilities     4 3 8 37 34 86 6.09 1.02 0.18 0.11
SERVICES: E-mail lists     1 7 5 22 27 62 6.08 1.06 -0.04  
Seaborg: Disk configuration and I/O performance   1 3 14 6 40 54 118 6.06 1.16 0.12 -0.09
SP SW: Performance and debugging tools 1   3 3 10 40 30 87 6.00 1.10 0.16 0.43
HPSS: Data access time 1 2 3 4 9 29 39 87 6.00 1.31 -0.25 -0.46
PDSF: Overall satisfaction       3 4 22 10 39 6.00 0.83 -0.52 -0.41
PDSF: Batch queue structure       6 2 14 14 36 6.00 1.07 -0.31  
PDSF SW: Performance and debugging tools       2 3 11 7 23 6.00 0.90 0.23 0.69
Jacquard SW: General tools and utilities       7 2 20 15 44 5.98 1.02    
OVERALL: Hardware management and configuration   3 2 8 25 71 55 164 5.98 1.04 0.09 -0.09
WEB: Ease of finding information   2 8 6 20 70 53 159 5.93 1.13 0.04 0.13
Jacquard SW: Programming libraries 1   4 3 2 22 21 53 5.92 1.36    
Seaborg: overall   3 7 2 19 69 44 144 5.92 1.13 0.15 -0.51
PDSF: Uptime (availability)     1 5 3 16 12 37 5.89 1.10 -0.51 -0.46
OVERALL: Available Computing Hardware   3 3 4 37 88 46 181 5.89 0.98 0.24 -0.24
Jacquard: Disk configuration and I/O performance     2 11 5 16 26 60 5.88 1.25    
Jacquard: Uptime (Availability)   2 4 4 14 19 31 74 5.85 1.32    
SERVICES: Visualization services 1     7 3 9 16 36 5.83 1.42 0.43 1.02
PDSF: Batch wait time     1 6 3 14 11 35 5.80 1.16 -0.07 -0.13
PDSF: Ability to run interactively     2 5 5 13 13 38 5.79 1.21 0.11 0.02
Jacquard: overall   2 4 5 12 31 25 79 5.78 1.25    
Jacquard SW: Applications software 1 1 1 3 3 14 13 36 5.78 1.48    
Jacquard SW: Fortran compilers 1 2 1 3 8 20 16 51 5.73 1.40    
Live classes on the web       4 1 9 4 18 5.72 1.07 0.57 1.05
WEB: Searching   2 3 10 11 26 25 77 5.70 1.30 0.07 0.26
SP SW: Visualization software 1 1 2 4 2 18 12 41 5.67 1.47 0.27 0.59
DaVinci: overall   1 1 1 5 5 7 20 5.65 1.42    
OVERALL: Data analysis and visualization facilities 1   3 14 9 33 22 82 5.65 1.26 0.24  
Jacquard: Ability to run interactively   1 4 8 6 25 13 57 5.56 1.28    
Jacquard SW: Visualization software       4 1 8 2 15 5.53 1.06    
Seaborg: Ability to run interactively 3 1 4 18 20 42 31 119 5.53 1.38 0.19 -0.04
Jacquard: Batch queue structure 2 1 4 10 3 37 12 69 5.46 1.42    
DaVinci SW: Visualization software 1   1 1 2 5 4 14 5.43 1.74    
Jacquard SW: Performance and debugging tools 1   4 4 6 15 7 37 5.35 1.44    
Jacquard: Batch wait time 2 1 10 8 12 24 13 70 5.16 1.54    
PDSF: Disk configuration and I/O performance   1 2 8 10 8 6 35 5.14 1.29 -0.45 -0.55
Seaborg: Batch queue structure 6 3 14 17 17 53 16 126 5.06 1.58 0.39 -0.63
Seaborg: Batch wait time 17 15 28 13 33 27 5 138 3.95 1.76 0.10 -1.29

 

All Satisfaction Topics - by Number of Responses

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
OVERALL: Satisfaction with NERSC   2 1 5 16 93 76 193 6.20 0.87 0.10
OVERALL: Available Computing Hardware   3 3 4 37 88 46 181 5.89 0.98 0.24
OVERALL: Consulting and Support Services     1 1 2 38 137 179 6.73 0.57 0.06
OVERALL: Network connectivity 1   2 4 5 62 104 178 6.45 0.86 0.18
NERSC security   1   6 1 44 125 177 6.61 0.75 0.13
OVERALL: Available Software 1   3 11 12 63 80 170 6.19 1.04 -0.05
OVERALL: Hardware management and configuration   3 2 8 25 71 55 164 5.98 1.04 0.09
WEB: NERSC web site overall (www.nersc.gov)     3 5 9 69 76 162 6.30 0.86 -0.02
CONSULT: overall     1 1 4 36 118 160 6.68 0.62 -0.01
WEB: Ease of finding information   2 8 6 20 70 53 159 5.93 1.13 0.04
CONSULT: Timely initial response to consulting questions   1   1 3 41 111 157 6.65 0.66 -0.05
CONSULT: Quality of technical advice       2 3 47 101 153 6.61 0.60 0.03
CONSULT: Amount of time to resolve your issue   2 1 3 7 54 86 153 6.41 0.89 -0.19
WEB: Accuracy of information   1 1 2 10 57 82 153 6.40 0.81 -0.01
OVERALL: Software management and configuration 1   3 7 9 62 71 153 6.22 1.00 0.03
Account support services     1 1 4 25 119 150 6.73 0.61 0.06
WEB: Timeliness of information   2 1 6 17 65 58 149 6.12 0.97 -0.05
CONSULT: Followup to initial consulting questions   1 1 5 1 37 101 146 6.57 0.83 -0.09
NIM   2 1 11 14 48 69 145 6.15 1.07 -0.09
Seaborg: overall   3 7 2 19 69 44 144 5.92 1.13 0.15
OVERALL: Mass storage facilities 1 1 2 8 9 35 84 140 6.31 1.11 -0.04
Seaborg: Uptime (Availability)       3 2 48 85 138 6.56 0.64 0.30
Seaborg: Batch wait time 17 15 28 13 33 27 5 138 3.95 1.76 0.10
Remote network performance to/from NERSC (e.g. Seaborg to your home institution)   3 6 3 9 47 61 129 6.12 1.19 0.01
Seaborg: Batch queue structure 6 3 14 17 17 53 16 126 5.06 1.58 0.39
SERVICES: Allocations process 1   5 5 7 49 57 124 6.16 1.11 0.23
SP SW: Software environment   1   3 3 54 60 121 6.39 0.78 0.05
Seaborg: Ability to run interactively 3 1 4 18 20 42 31 119 5.53 1.38 0.19
Seaborg: Disk configuration and I/O performance   1 3 14 6 40 54 118 6.06 1.16 0.12
Network performance within NERSC (e.g. Seaborg to HPSS)     1 1 2 31 71 106 6.60 0.67 0.14
SP SW: Fortran compilers     2 2 3 33 65 105 6.50 0.81 0.08
SP SW: Programming libraries     1 6 1 35 57 100 6.41 0.87 0.15
HPSS: Overall satisfaction   1 1   2 34 58 96 6.51 0.79 -0.05
Computer and Network Operations     1 2 1 24 64 92 6.61 0.73 0.10
On-line help desk 1   2 6 9 26 47 91 6.16 1.16 0.00
HPSS: Data transfer rates 1     3 4 31 51 90 6.40 0.93  
HPSS: Reliability (data integrity)       1 1 19 68 89 6.73 0.54 -0.01
HPSS: Uptime (Availability)       2 1 21 65 89 6.67 0.62 0.01
CONSULT: Software bug resolution   1 3 5 4 33 42 89 6.17 1.11 0.05
HPSS: User interface (hsi, pftp, ftp) 1 1 5 2 7 27 46 89 6.12 1.29 -0.01
SP SW: Performance and debugging tools 1   3 3 10 40 30 87 6.00 1.10 0.16
HPSS: Data access time 1 2 3 4 9 29 39 87 6.00 1.31 -0.25
SP SW: General tools and utilities     4 3 8 37 34 86 6.09 1.02 0.18
OVERALL: Data analysis and visualization facilities 1   3 14 9 33 22 82 5.65 1.26 0.24
TRAINING: New User's Guide       4 7 25 45 81 6.37 0.84 0.10
SP SW: C/C++ compilers   1 1 4 2 25 46 79 6.37 1.00 0.11
Jacquard: overall   2 4 5 12 31 25 79 5.78 1.25  
SERVICES: Response to special requests (e.g. disk quota increases, etc.)       6 6 20 46 78 6.36 0.93 0.28
TRAINING: Web tutorials     1 4 3 39 31 78 6.22 0.85 0.12
WEB: Searching   2 3 10 11 26 25 77 5.70 1.30 0.07
Jacquard: Uptime (Availability)   2 4 4 14 19 31 74 5.85 1.32  
Jacquard: Batch wait time 2 1 10 8 12 24 13 70 5.16 1.54  
Jacquard: Batch queue structure 2 1 4 10 3 37 12 69 5.46 1.42  
SP SW: Applications software     3 2 3 32 27 67 6.16 0.98 0.03
Jacquard SW: Software environment 1   1 1 7 32 25 67 6.12 1.02  
SERVICES: E-mail lists     1 7 5 22 27 62 6.08 1.06 -0.04
Jacquard: Disk configuration and I/O performance     2 11 5 16 26 60 5.88 1.25  
Jacquard: Ability to run interactively   1 4 8 6 25 13 57 5.56 1.28  
Jacquard SW: Programming libraries 1   4 3 2 22 21 53 5.92 1.36  
Jacquard SW: Fortran compilers 1 2 1 3 8 20 16 51 5.73 1.40  
Jacquard SW: C/C++ compilers 1     3 1 24 19 48 6.15 1.09  
Jacquard SW: General tools and utilities       7 2 20 15 44 5.98 1.02  
GRID: Access and Authentication     2 2 2 7 30 43 6.42 1.10  
GRID: File Transfer     2 2 3 11 25 43 6.28 1.10  
SP SW: Visualization software 1 1 2 4 2 18 12 41 5.67 1.47 0.27
GRID: Job Submission     1   3 9 27 40 6.53 0.85  
GRID: Job Monitoring     1 1 1 11 26 40 6.50 0.88  
NERSC CVS server       2 5 15 17 39 6.21 0.86 0.87
PDSF: Overall satisfaction       3 4 22 10 39 6.00 0.83 -0.52
PDSF: Ability to run interactively     2 5 5 13 13 38 5.79 1.21 0.11
PDSF: Uptime (availability)     1 5 3 16 12 37 5.89 1.10 -0.51
Jacquard SW: Performance and debugging tools 1   4 4 6 15 7 37 5.35 1.44  
PDSF: Batch queue structure       6 2 14 14 36 6.00 1.07 -0.31
SERVICES: Visualization services 1     7 3 9 16 36 5.83 1.42 0.43
Jacquard SW: Applications software 1 1 1 3 3 14 13 36 5.78 1.48  
PDSF: Batch wait time     1 6 3 14 11 35 5.80 1.16 -0.07
PDSF: Disk configuration and I/O performance   1 2 8 10 8 6 35 5.14 1.29 -0.45
PDSF SW: Software environment       1   14 15 30 6.43 0.68 0.08
PDSF SW: C/C++ compilers         1 9 18 28 6.61 0.57 0.37
PDSF SW: General tools and utilities       1 2 13 9 25 6.20 0.76 0.37
PDSF SW: Performance and debugging tools       2 3 11 7 23 6.00 0.90 0.23
PDSF SW: Programming libraries         1 10 11 22 6.45 0.60 0.32
PDSF SW: Applications software       2 1 11 8 22 6.14 0.89 0.35
DaVinci: overall   1 1 1 5 5 7 20 5.65 1.42  
Live classes on the web       4 1 9 4 18 5.72 1.07 0.57
TRAINING: NERSC classes: in-person       3   5 8 16 6.12 1.15 0.64
PDSF SW: Fortran compilers     1 1   5 8 15 6.20 1.21 0.33
Jacquard SW: Visualization software       4 1 8 2 15 5.53 1.06  
DaVinci SW: Visualization software 1   1 1 2 5 4 14 5.43 1.74  

 

All Importance Topics

Importance Ratings: 3=Very important, 2=Somewhat important, 1=Not important
Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total Responses for ImportanceAverage Importance ScoreStd. Dev.Total Responses for SatisfactionAverage Satisfaction ScoreStd. Dev.Change from 2004Change from 2003
123
OVERALL: Available Computing Hardware 1 29 137 167 2.81 0.41 181 5.89 0.98 0.24 -0.24
OVERALL: Satisfaction with NERSC 2 30 142 174 2.80 0.43 193 6.20 0.87 0.10 -0.17
SERVICES: Allocations process 1 22 86 109 2.78 0.44 124 6.16 1.11 0.23 0.47
OVERALL: Network connectivity 1 37 124 162 2.76 0.44 178 6.45 0.86 0.18 0.22
OVERALL: Consulting and Support Services 4 43 127 174 2.71 0.50 179 6.73 0.57 0.06 0.36
Account support services   39 92 131 2.70 0.46 150 6.73 0.61 0.06 0.34
Computer and Network Operations 2 27 59 88 2.65 0.53 92 6.61 0.73 0.10  
SERVICES: Response to special requests (e.g. disk quota increases, etc.) 3 21 50 74 2.64 0.56 78 6.36 0.93 0.28 0.01
OVERALL: Hardware management and configuration 4 54 92 150 2.59 0.55 164 5.98 1.04 0.09 -0.09
OVERALL: Software management and configuration 7 49 82 138 2.54 0.59 153 6.22 1.00 0.03 0.18
OVERALL: Available Software 10 67 76 153 2.43 0.62 170 6.19 1.04 -0.05 0.14
OVERALL: Mass storage facilities 18 50 72 140 2.39 0.71 140 6.31 1.11 -0.04 0.19
NERSC security 20 71 82 173 2.36 0.68 177 6.61 0.75 0.13  
OVERALL: Data analysis and visualization facilities 43 41 32 116 1.91 0.80 82 5.65 1.26 0.24  
SERVICES: Visualization services 20 17 14 51 1.88 0.82 36 5.83 1.42 0.43 1.02
SERVICES: E-mail lists 21 27 13 61 1.87 0.74 62 6.08 1.06 -0.04  
NERSC CVS server 19 17 5 41 1.66 0.69 39 6.21 0.86 0.87  

 

All Usefulness Topics

3=Very useful, 2=Somewhat useful, 1=Not useful

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
SERVICES: E-mail lists 1 42 104 108 2.70 0.47
TRAINING: New User's Guide 1 23 49 73 2.66 0.51
TRAINING: Web tutorials 2 32 44 78 2.54 0.55
MOTD (Message of the Day) 13 55 55 123 2.34 0.66
SERVICES: Announcements web archive 13 55 48 116 2.30 0.66
Phone calls from NERSC 25 25 32 82 2.09 0.83
Live classes on the web 9 13 8 30 1.97 0.76
TRAINING: NERSC classes: in-person 14 9 7 30 1.77 0.82

Hardware Resources

 

  • Legend
  • Hardware Satisfaction - by Score
  • Hardware Satisfaction - by Platform
  • Max Processors Effectively Used on Seaborg
  • Hardware Comments

 

 

Legend:

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied 5.50 - 6.49
Somewhat Satisfied 4.50 - 5.49
Neutral 3.50 - 4.49
Significance of Change
significant increase
significant decrease
not significant

 

Hardware Satisfaction - by Score

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
HPSS: Reliability (data integrity)       1 1 19 68 89 6.73 0.54 -0.01
HPSS: Uptime (Availability)       2 1 21 65 89 6.67 0.62 0.01
Network performance within NERSC (e.g. Seaborg to HPSS)     1 1 2 31 71 106 6.60 0.67 0.14
Seaborg: Uptime (Availability)       3 2 48 85 138 6.56 0.64 0.30
HPSS: Overall satisfaction   1 1   2 34 58 96 6.51 0.79 -0.05
HPSS: Data transfer rates 1     3 4 31 51 90 6.40 0.93  
NERSC CVS server         2 5 4 11 6.18 0.75 0.85
Remote network performance to/from NERSC (e.g. Seaborg to your home institution)   3 6 3 9 47 61 129 6.12 1.19 0.01
HPSS: User interface (hsi, pftp, ftp) 1 1 5 2 7 27 46 89 6.12 1.29 -0.01
Seaborg: Disk configuration and I/O performance   1 3 14 6 40 54 118 6.06 1.16 0.12
HPSS: Data access time 1 2 3 4 9 29 39 87 6.00 1.31 -0.25
PDSF: Overall satisfaction       3 4 22 10 39 6.00 0.83 -0.52
PDSF: Batch queue structure       6 2 14 14 36 6.00 1.07 -0.31
Seaborg: overall   3 7 2 19 69 44 144 5.92 1.13 0.15
PDSF: Uptime (availability)     1 5 3 16 12 37 5.89 1.10 -0.51
Jacquard: Disk configuration and I/O performance     2 11 5 16 26 60 5.88 1.25  
Jacquard: Uptime (Availability)   2 4 4 14 19 31 74 5.85 1.32  
PDSF: Batch wait time     1 6 3 14 11 35 5.80 1.16 -0.07
PDSF: Ability to run interactively     2 5 5 13 13 38 5.79 1.21 0.11
Jacquard: overall   2 4 5 12 31 25 79 5.78 1.25  
DaVinci: overall   1 1 1 5 5 7 20 5.65 1.42  
Jacquard: Ability to run interactively   1 4 8 6 25 13 57 5.56 1.28  
Seaborg: Ability to run interactively 3 1 4 18 20 42 31 119 5.53 1.38 0.19
Jacquard: Batch queue structure 2 1 4 10 3 37 12 69 5.46 1.42  
Jacquard: Batch wait time 2 1 10 8 12 24 13 70 5.16 1.54  
PDSF: Disk configuration and I/O performance   1 2 8 10 8 6 35 5.14 1.29 -0.45
Seaborg: Batch queue structure 6 3 14 17 17 53 16 126 5.06 1.58 0.39
Seaborg: Batch wait time 17 15 28 13 33 27 5 138 3.95 1.76 0.10

 

Hardware Satisfaction - by Platform

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
CVS server         2 5 4 11 6.18 0.75 0.85
DaVinci Analytics Server   1 1 1 5 5 7 20 5.65 1.42  
HPSS: Reliability (data integrity)       1 1 19 68 89 6.73 0.54 -0.01
HPSS: Uptime (Availability)       2 1 21 65 89 6.67 0.62 0.01
HPSS: Overall satisfaction   1 1   2 34 58 96 6.51 0.79 -0.05
HPSS: Data transfer rates 1     3 4 31 51 90 6.40 0.93  
HPSS: User interface (hsi, pftp, ftp) 1 1 5 2 7 27 46 89 6.12 1.29 -0.01
HPSS: Data access time 1 2 3 4 9 29 39 87 6.00 1.31 -0.25
Jacquard: Disk configuration and I/O performance     2 11 5 16 26 60 5.88 1.25  
Jacquard: Uptime (Availability)   2 4 4 14 19 31 74 5.85 1.32  
Jacquard: overall   2 4 5 12 31 25 79 5.78 1.25  
Jacquard: Ability to run interactively   1 4 8 6 25 13 57 5.56 1.28  
Jacquard: Batch queue structure 2 1 4 10 3 37 12 69 5.46 1.42  
Jacquard: Batch wait time 2 1 10 8 12 24 13 70 5.16 1.54  
Network performance within NERSC (e.g. Seaborg to HPSS)     1 1 2 31 71 106 6.60 0.67 0.14
Remote network performance to/from NERSC (e.g. Seaborg to your home institution)   3 6 3 9 47 61 129 6.12 1.19 0.01
PDSF: Overall satisfaction       3 4 22 10 39 6.00 0.83 -0.52
PDSF: Batch queue structure       6 2 14 14 36 6.00 1.07 -0.31
PDSF: Uptime (availability)     1 5 3 16 12 37 5.89 1.10 -0.51
PDSF: Batch wait time     1 6 3 14 11 35 5.80 1.16 -0.07
PDSF: Ability to run interactively     2 5 5 13 13 38 5.79 1.21 0.11
PDSF: Disk configuration and I/O performance   1 2 8 10 8 6 35 5.14 1.29 -0.45
Seaborg: Uptime (Availability)       3 2 48 85 138 6.56 0.64 0.30
Seaborg: Disk configuration and I/O performance   1 3 14 6 40 54 118 6.06 1.16 0.12
Seaborg: overall   3 7 2 19 69 44 144 5.92 1.13 0.15
Seaborg: Ability to run interactively 3 1 4 18 20 42 31 119 5.53 1.38 0.19
Seaborg: Batch queue structure 6 3 14 17 17 53 16 126 5.06 1.58 0.39
Seaborg: Batch wait time 17 15 28 13 33 27 5 138 3.95 1.76 0.10

 

What is the maximum number of processors your code can effectively use for parallel computations on Seaborg?   51 responses

Processor CountNumber of
Responses
Number of respondents who actually ran
codes on this Number of Processors
Percent
4,560 - 6,000 3 2 1.4%
4,096 1 6 4.3%
2,016 - 3,074 1 10 7.2%
1,008 - 1,728 1 16 11.6%
512 - 768 7 25 18.1%
256 - 400 11 14 10.1%
112 - 192 8 22 15.9%
64 - 96 5 25 18.1%
32 - 48 8 9 6.5%
≤16 6 9 6.5%

 

Hardware Comments:   38 responses

  • 5 overall hardware comments
  • 2 comments by Bassi Users
  • 8 comments by Jacquard Users
    3   Need more scratch space
    3   Queue issues
  • 3 comments by HPSS Users
  • 3 comments on networking performance
  • 5 comments by PDSF Users
  • 16 comments by Seaborg Users
    11   Turnaround too slow
    3   Queue /job mix policies should be adjusted

 

Overall Hardware Comments:   5 responses

It appears that the load on the resources implies need for much larger and/or faster computers.

Cannot run interactive jobs in the night. This should be fixed, considering there are many people who are willing to work at night.

I have generally been quite satisfied with NERSC's hardware resources, though of course faster machines are always helpful, and I look forward to seeing what Bassi can do. Copying large datasets to my home machines for visualization can be a bottleneck, as scp seems to be limited to about 600 kB/s.

I feel that NERSC should focus on providing computing resources for real world parallel applications rather than focusing on machines with high theoretical performance and poor performance with realistic parallel applications which use domain decomposition.

I am a long-time nersc user, but only recently began work with a large parallel application: it is too early to have an opinion on many questions in this survey and thus you find them unanswered.

 

Comments by Bassi Users:   2 responses

The IBM-SP5 should be expanded as soon as possible since it is an order of magnitude faster than the IBM-SP3

I could comment here on Bassi. I am mostly satisfied with Bassi. I run on 48 processors. The present queue structure is a pain with its 8 hour time limits, but I understand that this will almost certainly change when it goes into production next week.

 

Comments by Jacquard Users:   8 responses

Need more scratch space

Availability of scratch disk on Jacquard is a major restriction on the usefulness of that system. The default 50 GB scratch is not enough for many runs. While temporary extensions are useful and have been granted, permanent increases would be a great improvement to the usefulness of the system. ...

Increasing the available scratch space on Jacquard would significantly improve the environment from my perspective.

... My major issue with jacquard arises from insufficient /scratch file system space to store even one year of inputs and outputs for our minimal-resolution model configuration. I am working around this currently by downloading half-year outputs as they are produced, but this requires me to offload outputs before submitting the next run segment to batch. This cramped file system also makes jacquard impractical for higher-resolution simulations that we will need in the future. I expect that other jacquard users are facing similar limitations. Is additional storage for jacquard /scratch prohibitively expensive?

NERSC response: Users requiring large amounts of scratch space should consider using the NERSC Global Filesystem.

Queue issues

... I hope that Jacquard will be more user friendly for our applications that require 10 -100 processors.

The only complaint I have is the wait time to run short large jobs (128 or 256 nodes) on Jacquard. It seems like it takes about a week to get these jobs through, where other systems can get them through the queue in a day or two.

My biggest complaints are over-allocation leading to long queue wait times, and the inability of PBS on Jacquard to run larger than average jobs. PBS should be replaced, or most of the nodes should be exclusively allocated to queues with a minimum job size of 16 nodes.

NERSC response: NERSC is investigating alternatives to PBS Pro for Jacquard scheduling, including using the Maui scheduler.

Other

The lack of any compiler option on Jaquard except for the pathscale compilers makes this machine useless to me. Pathscale is just not up to the standard of freely available compilers available from INTEL or commercial compilers like NAG. I don't understand why INTEL compilers can't be installed on Jaquard

Still working on best setup for jacquard.

 

Comments by HPSS Users:   3 responses

... htar is terrible -- random crashes without returning error codes, etc. hsi would be significantly improved by allowing standard command line editing. HPSS is ok, but the interfaces to it are poor.

Though the HPSS mass storage is very good already, I found a system installed at the supercomputer in Juelich easier to use. There, the data migration is done by the system software and the migrated data can be accessed the same way as a regular file on disk. This data storage system is very convenient. ...

when data migrates to tape on HPSS, it sure takes a long time to retrieve it

 

 

Comments on Network Performance:   3 responses

... NERSC to LBL bandwidth needs improvement for X based applications to be usable (including Xemacs) ...

... Network performance appears to be limited by LBL networking not NERSC, so the neutral answer reflects the fact that I cannot really evaluate NERSC performance.

Accessing PDSF for interactive work from Fermilab, BNL is slow. Probably due to latency. Not sure whether this is intrinsic (distance) or a problem in the network.

 

Comments by PDSF Users:   5 responses

Some of the PDSF disks which store STAR data become unavailable quite often due to users running with un-optimised dvio requirements. This makes it impossible for other users to do their work. It would be great if we could somehow come up with a system for ensuring that users cannot run jobs irresponsibly - i.e. perhaps setting a safe minimum dvio requirement that everyone has to adhere to? Or perhaps developing a system tool which users can use to benchmark their code and better judge what dvio requirement to run with? These comments obviously refer mainly to users, not to PDSF hardware. In general, I have no problems with the hardware itself - it's how we use it that is sometimes not optimal!

It seems that even "normal" system usage (on PDSF where I do all of my work) is too much for the system to handle. I can always expect some choppiness in responsiveness, even when doing simple tasks like text editing. In periods of high usage, the system may even stop responding for seconds at a time, which can be very frustrating if I'm just trying to do an "ls" or quickly edit a file. Is the home directory one giant disk in which everyone is working? If so, I would suggest dividing users' home directories among multiple disks to ease the situation.

... PDSF diskvaults are very unreliable, but they are upgrading to GPFS which works much better. I am dissatisfied with the current situation but quite happy about the future plans. ...

NERSC response: NERSC has made a substantial investment by providing GPFS licenses for PDSF. This has allowed us to consolidate and convert the many NFS filesystems to GPFS. NERSC expects GPFS to be a more robust and reliable filesystem.

The PDSF cluster has been instrumental for our data analysis needs and overall we are very satisfied with the system. Because the system is build out of commodity hardware, older hardware that may no longer meet the computing requirements, but is otherwise in good condition, is reused for other tasks. One such task is providing cheap disk servers, which is very valuable for our data intensive applications.

Refresh time just running emacs on pdsf is slow. Network?? Recently there have been times where one could not effectively do anything even at the command line. Surprising it took days to find and fix.

 

Comments by Seaborg Users:   16 responses

Turnaround too slow

If we have to wait for more than 2 weeks to get a batch job run on Seaberg, it diminishes the usefulness of NERSC facility.

batch jobs had to wait too long to really get some work done.

Of course I would prefer shorter queue times, but having seen the load distribution on Seaborg using the NERSC website that seems to be a function of demand, not mismanagement. One feature I would find useful (perhaps it is available and I am unaware of it) is email notification when a job has begun.

Seaborg is user-friendly but somewhat old and vastly oversubscribed. As a result, I find it necessary to use premium queue for routine production jobs, or see my jobs spend several days waiting to do 6-hour runs. I'm glad to learn of the arrival of bassi.nersc.gov and would appreciate hearing more about it.

Your queueing structure is optimized for the wrong thing. It shouldn't be maximum CPU utilization. It should be maximum scientist productivity. If the wait time in the queue is long, then it really doesn't matter how fast the machine is; it's equivalent to being able to use a much slower machine right away. The total turnaround time is what matters for a scientist being able to get work done, and wait time in the queue is a huge part of that.

... The turn around time for large seaborg jobs is very long. I sometimes would appreciate that my production jobs cannot prevent my debug jobs from starting. As it is now, a couple of large jobs block my queue for weeks even for the small test jobs. It would be better to restrict the number of jobs in each class, so that debug jobs are still running though I am waiting for production jobs to start.

Seaborg is a great resource, but it is heavily utilized and turn around times are sometimes quite slow. If at all possible, it would be a real asset to scientific computing if NERSC could expand their facilites.

INCITE-driven wait times (and their seasonal fluctuations) are killing us. Occasional super-priority has been a huge boon and is much appreciated.

During most of last year queue wait times on seaborg were very bad. This has changed dramatically in the last few weeks, and wait times are now very good.

The Seaborg wait times of 2 weeks make the machine useless or hard to use. Some very cycle hungry GYRO users in may group still use it, but I didn't use much last year. ...

My primary issue with seaborg is that batch wait times can vary wildly with little notice (for example: in October 2005, a "reg_1" queue turnaround time of overnight abruptly increased to more than a week, and even "premium" jobs were subject to waits of more than two days). I am not certain if this is an unavoidable issue with shared supercomputer resources, but is it possible to provide some warning of batch wait times to be expected by a job? ...

NERSC response: NERSC has made several changes based on the over allocation that occurred in 2005 and the resulting long turnaround times:

  • Time for new systems will never be pre-allocated but allocated shortly before the system enters full production.
  • Time has not been over allocated in 2006.
  • NERSC is investigating with DOE options to under allocate resources in order to improve turnaround times while still meeting the DOE mission.
  • NERSC has suggested to DOE that the allocation process align allocations more to the mission and metrics DOE has for NERSC

However, many things that affect turnaround time are outside of NERSC control, such as large numbers of projects that have similar project deadlines.

 

Queue /job mix policies should be adjusted

SEABORG is getting more and more difficult to use for our kind of needs where the inter-processor communication is large. This limits our use of the resources to about 100 processors. These kind of jobs are heavily penalized with very low priority compared to larger jobs using thousands of processor. I understand that this is the policy of NERSC. I hope that Jacquard will be more user friendly for our applications that require 10 -100 processors.

We can't use very many processors at one time, but we need processors for a significant portion of every day to get reasonable turn around on our model runs. Queues on Seborg have always been a terrible headache, but this last year they were abysmal. It takes just under 24 hours to execute one model year. Our model is getting only one run slot per real-time week in the regular queue this fall. At this speed, it will take two years to finish the run. Because of restrictions on the queue, there is no way for us to use the time allocated to us, except by running our jobs in the premium queue all the time.

Jobs using 64-128 processors in the seaborg batch queue seem to be unfairly penalized as compared with larger jobs.

NERSC response: Users running jobs on fewer than 512 processors should consider using the Jacquard Linux cluster or the Bassi Power5 system.

Other

increase memory

It seems that the processors run not fast as anticipated. As compared with other supercenter, the same parallel programs run at seaborg even two times lower! I think there should be some more and more upgrades needed.

Software

 

  • Legend
  • Software Satisfaction - by Score
  • Software Satisfaction - by Platform
  • Software Comments

 

 

Legend:

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied 5.50 - 6.49
Somewhat Satisfied 4.50 - 5.49
Significance of Change
significant increase
not significant

 

Software Satisfaction - by Score

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
PDSF SW: C/C++ compilers         1 9 18 28 6.61 0.57 0.37
GRID: Job Submission     1   3 9 27 40 6.53 0.85  
GRID: Job Monitoring     1 1 1 11 26 40 6.50 0.88  
SP SW: Fortran compilers     2 2 3 33 65 105 6.50 0.81 0.08
PDSF SW: Programming libraries         1 10 11 22 6.45 0.60 0.32
PDSF SW: Software environment       1   14 15 30 6.43 0.68 0.08
GRID: Access and Authentication     2 2 2 7 30 43 6.42 1.10  
SP SW: Programming libraries     1 6 1 35 57 100 6.41 0.87 0.15
SP SW: Software environment   1   3 3 54 60 121 6.39 0.78 0.05
SP SW: C/C++ compilers   1 1 4 2 25 46 79 6.37 1.00 0.11
GRID: File Transfer     2 2 3 11 25 43 6.28 1.10  
PDSF SW: Fortran compilers     1 1   5 8 15 6.20 1.21 0.33
PDSF SW: General tools and utilities       1 2 13 9 25 6.20 0.76 0.37
SP SW: Applications software     3 2 3 32 27 67 6.16 0.98 0.03
Jacquard SW: C/C++ compilers 1     3 1 24 19 48 6.15 1.09  
PDSF SW: Applications software       2 1 11 8 22 6.14 0.89 0.35
Jacquard SW: Software environment 1   1 1 7 32 25 67 6.12 1.02  
SP SW: General tools and utilities     4 3 8 37 34 86 6.09 1.02 0.18
SP SW: Performance and debugging tools 1   3 3 10 40 30 87 6.00 1.10 0.16
PDSF SW: Performance and debugging tools       2 3 11 7 23 6.00 0.90 0.23
Jacquard SW: General tools and utilities       7 2 20 15 44 5.98 1.02  
Jacquard SW: Programming libraries 1   4 3 2 22 21 53 5.92 1.36  
Jacquard SW: Applications software 1 1 1 3 3 14 13 36 5.78 1.48  
Jacquard SW: Fortran compilers 1 2 1 3 8 20 16 51 5.73 1.40  
SP SW: Visualization software 1 1 2 4 2 18 12 41 5.67 1.47 0.27
Jacquard SW: Visualization software       4 1 8 2 15 5.53 1.06  
DaVinci SW: Visualization software 1   1 1 2 5 4 14 5.43 1.74  
Jacquard SW: Performance and debugging tools 1   4 4 6 15 7 37 5.35 1.44  

 

Software Satisfaction - by Platform

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
DaVinci SW: Visualization software 1   1 1 2 5 4 14 5.43 1.74  
GRID: Job Submission     1   3 9 27 40 6.53 0.85  
GRID: Job Monitoring     1 1 1 11 26 40 6.50 0.88  
GRID: Access and Authentication     2 2 2 7 30 43 6.42 1.10  
GRID: File Transfer     2 2 3 11 25 43 6.28 1.10  
Jacquard SW: C/C++ compilers 1     3 1 24 19 48 6.15 1.09  
Jacquard SW: Software environment 1   1 1 7 32 25 67 6.12 1.02  
Jacquard SW: General tools and utilities       7 2 20 15 44 5.98 1.02  
Jacquard SW: Programming libraries 1   4 3 2 22 21 53 5.92 1.36  
Jacquard SW: Applications software 1 1 1 3 3 14 13 36 5.78 1.48  
Jacquard SW: Fortran compilers 1 2 1 3 8 20 16 51 5.73 1.40  
Jacquard SW: Visualization software       4 1 8 2 15 5.53 1.06  
Jacquard SW: Performance and debugging tools 1   4 4 6 15 7 37 5.35 1.44  
PDSF SW: C/C++ compilers         1 9 18 28 6.61 0.57 0.37
PDSF SW: Programming libraries         1 10 11 22 6.45 0.60 0.32
PDSF SW: Software environment       1   14 15 30 6.43 0.68 0.08
PDSF SW: Fortran compilers     1 1   5 8 15 6.20 1.21 0.33
PDSF SW: General tools and utilities       1 2 13 9 25 6.20 0.76 0.37
PDSF SW: Applications software       2 1 11 8 22 6.14 0.89 0.35
PDSF SW: Performance and debugging tools       2 3 11 7 23 6.00 0.90 0.23
SP SW: Fortran compilers     2 2 3 33 65 105 6.50 0.81 0.08
SP SW: Programming libraries     1 6 1 35 57 100 6.41 0.87 0.15
SP SW: Software environment   1   3 3 54 60 121 6.39 0.78 0.05
SP SW: C/C++ compilers   1 1 4 2 25 46 79 6.37 1.00 0.11
SP SW: Applications software     3 2 3 32 27 67 6.16 0.98 0.03
SP SW: General tools and utilities     4 3 8 37 34 86 6.09 1.02 0.18
SP SW: Performance and debugging tools 1   3 3 10 40 30 87 6.00 1.10 0.16
SP SW: Visualization software 1 1 2 4 2 18 12 41 5.67 1.47 0.27

 

Comments about Software:   15 responses

  • 6 overall software comments
  • 2 comments by Grid users
  • 4 comments by Jacquard Users
  • 2 comments by PDSF users
  • 7 comments by Seaborg users

 

General Software Comments

  6 responses

It would be nice to have ncdump and nco (netcdf operators) installed on the NERSC computers.

NERSC response: netcdf and nco are installed on all NERSC computers (Seaborg, Bassi, Jacquard and Davinci). To access the ncdump utility and/or nco (the netcdf operators) first load the module "netcdf".

I'd like to see LaTeX on a platform that can access the new global file system.

NERSC response: To request software for a NERSC machine please fill out the software request form.

Can I use Matlab? It's important for me.

NERSC response: MATLAB is available on Jacquard and Davinci.

I would like to have CLAPACK available

NERSC response: The LAPACK fortran library is available on all NERSC platforms in vendor provided libraries. LAPACK routines can be called from C-language programs; make sure that you follow the mixing Fortran and C rules on each platform. Note that CLAPACK (which uses a Fortran to C conversion utility called f2c) would have lower performance than LAPACK, which is why we recommend that C users use LAPACK.

... Although I have only used them slightly, some of the NERSC-developed performance analysis tools look to be helpful in increasing code efficiency

Our research focuses on demanding numerical problems in computational structural biology. In this regard, we would suggest two improvements:
It would be great to get the molecular dynamics software NAMD working better. We have successfully used it on seaborg, eventhough some inexplicable errors have occurred  On Jacquard, we have so far been unable to use it, despite communication with the consulting team.
Furthermore, adding the Poisson-Boltzmann PDE solver APBS to the list of available software would be of interest to us and possibly the larger molecular modeling community.

 

Comments by Grid Users:   2 responses

File transfer using grid resources (GLOBUS) is clumsy. Common tasks like recursively copying a directory structure don't seem to work. it seems fine for copying large single files, but not more complicated tasks.

The GRID-based parallel transfer of files is very useful but still awkward to use.

 

Comments by Jacquard Users:   4 responses

There have been major problems running the CAM climate code on Jacquard, that appear to be system related. Apparently there is a fix, but trying to test CAM on the development portion of Jacquard has been slow because there have been many issues related to a spartan environment (tools not installed, different versions of libraries, etc).

NERSC response: The kernel bug that affected CAM was fixed on Jacquard on Dec 13, 2005. You should now be able to run CAM on Jacquard.

I would prefer to use compilers from Portland Group.

A parallel debugger on jacquard would be of great help.

Need hardware performance monitors on Jacquard and Bassi.

 

Comments by PDSF Users:   2 responses

We need a group account so we would not be dependent upon individual users having to run group jobs. This is a major issue for us.

I am very happy with the recent decision to switch to GPFS. We already noticed much better performance and look forward to have the complete center switch to this file system.

 

Comments by Seaborg Users:   7 responses

It is appreciated if Matlab is available on seaborg.

NERSC response: MATLAB is available on the Linux Cluster, Jacquard, and the SGI Altix, DaVinci. We can't easily justify installing MATLAB on Seaborg because big MATLAB jobs would affect other users on the interactive nodes. DaVinci, which is tuned for interactive use, is the best platform for MATLAB.

Add ferret on seaborg

NERSC response: FERRET 5.81 is available on the SGI Altix, DaVinci. To access please First load the module "ferret". This version is not yet available for AIX, so it is not on Seaborg.

I have been very happy with the development environment at NERSC, particularly on Seaborg, although the relative leniency of its compilers in their default settings sometimes allows errors to persist that only get caught on other machines. I am glad to see that Bassi will be so similar.

Debugging is still a daunting task on Seaborg. I avoid doing it and, thus far, have not faced a situation where I have needed to do anything complicated (fingers crossed for the future!). GUI interfaces are frustrating for remote users; debugging with large numbers of processors is always awkward and frustrating within a batch environment. This is not a NERSC-specific issue. However, some innovation on this problem would be welcome. I rue the day when I find myself with a code that runs just fine on 1000 processors, but fails on 2000! ...

Since batch jobs had to wait too long in queue, I am wondering if in the future each use is limited to submit/run two jobs at one time so that every user has a chance to get job run at any time.

I'm not enough of an expert to comment on this intelligently. All compilers look alike to me (though I'm aware they don't to my codes).

It is frustrating to not have good syntax highlighting in VIM. The default terminal you provide isn't good, and the ansi terminal isn't much better. Don't you have enough people using linux to support a linux color terminal?
Can't you set up some sort of passwordless ssh so that it isn't such a pain to check many directories in and out of CVS?
Can't you tell your linker that it doesn't matter if symbols are unresolved if they aren't actually used? That is a gigantic pain and prevents some of our test code from compiling because some *UNUSED* symbols in a linked library aren't resolved.
Can't you fix your password checking? For a long password, where "a" is an alpha character, "s" is a symbol and "d" is a digit, aaaaaaaaaaasdd is not recognized properly, but sddaaaaaaaaaaa is recognized as satisfying your password security requirement policy.

Visualization and Data Analysis

Where do you perform data analysis and visualization of data produced at NERSC?

LocationResponsesPercent
All at NERSC 13 6.7%
Most at NERSC 19 9.7%
Half at NERSC, half elsewhere 39 20.0%
Most elsewhere 51 26.2%
All elsewhere 65 33.3%
I don't need data analysis or visualization 8 4.1%

Are your data analysis and visualization needs being met? In what ways do you make use of NERSC data analysis and visualization resources? In what ways should NERSC add to or improve these resources?

[Read all 55 responses]

22   Yes, data analysis needs are being met
14   Do vis locally / don't need
12   Requests for additional services
7   Moving data is an inhibitor / disk issues
4   Need more information / training
3   Network speed is an inhibitor

 

Yes, data analysis needs are being met / positive comments:   22 responses

Yes.

I would like to acknowledge a great deal of help by Cristina Siergerist.

Yes, they are met. Although reading the question, I don't use most of the visualization tools mentioned.

NERSC is fully equipped with all the facilities which a analyst uses.

I use PDSF often to analyze the STAR data. I hope the system at interactive and computing nodes are stable.

Our data analysis needs are based on the ROOT framework

I do mostly batch generation of plots from HDF5 files using GnuPlot.

yes, the data visualization group has been very useful

The visualization packages that I use at NERSC depend on IDL, and the output file processing depends on netcdf (I use NCO to manipulate files). ...

Visualization tools with GEANT4 applications

I use IDL on Seaborg and Jacquard.

Most with IDL, visualization group also try to use the other visualization software. I have got good consulting help from the visualization group. ...

Yes. I use IDL, occasionally on the serial queue if I have a large data set.

I am moving IO to HDF5 format. I will see how good the visualization software is dealing with this format (presumably OK vis-vis IDL)

have previously had accounts on escher for analysis and visualization but mostly used seaborg for access as it seemed the most stable environment. have yet to use DaVinci.

Yes... great to see Visit in use.

I use serial jobs on Seaborg for some trivial post-processing of NERSC-generated data, but otherwise do all analysis and visualization on my local machines.

The consultants have been very helpful with visualization needs

My data analysis and visualization needs are being met very. I use DaVinci and visualization software developed together with the vis group. PLEASE KEEP the vis group running they do a very good job!!!!!!

I use the PDSF queues for processing high energy physics data with my experiment's software. I am satisfied with the speed at which my jobs are processed.

I use IDL on davinci

I use NERSC to visualize some very large datasets. I have some visualization software written within our group, which I sometimes use on Davinci. I need to use NERSC for special rendering options and publication-quality visualization. For this I have used the help of a NERSC consultant and was very satisfied. The constants make lots of good suggestions for viewing the data and ran the software to produce images that we have used several times in publications. They also helped me get started with graphics packages that I could use on my own.

 

Do vis locally / don't need:   14 responses

Our group does all visualization locally (LINUX / MAC OSX) using open-source programs such as plotmtv and xmgrace.

I don't make use of data analysis and visualization facilities at all. I find it is more expedient to relegate that responsibility to my local machines.

I do all of my visualization elsewhere.

Do not use Data visualization at NERSC

I do my analysis at the local computer

I have not used the data analysis/vis tools- it's something I'd like to try, but haven't gotten around to. I don't know the tools yet, and haven't had the time to learn.

Don't need analysis and visualization at NERSC.

Not needed

the visualization we have used so far this past year has been done at NAVO and ERDC centers (DoD). It was easier for us to keep using these consultants at DoD ....

I don't use NERSC for this

I do not currently use visualization facilities at NERSC. However, I am planning to do so in the future, once I have assembled all the needed pieces at my end.

I currently do not use NERSC analysis and visualization services.

 

Requests for additional services:   12 responses

Requests for additional software

Our group works with the visualization group and we have a definite need for visualization. Would like to have information visualization software tools such as Spotfire to be available at NERSC. Ideally, it would be good to integrate information visualization with our system.

... It would be helpful to have LaTeX available for a specific post-processing step, but I don't see it anywhere at NERSC.

NERSC response: Please submit a software request.

My data analysis and visualization needs have not been met. Ferret could not be used on seaborg. At least, I would like NERSC adds Ferret on seaborg. ...

NERSC response: The older AIX Ferret binary from the Ferret download site does not work on seaborg (operating system incompatibility). We have contacted the Ferret developers who are working on a current AIX port in conjunction with creating a 64-bit version of Ferret for many different operating system versions. The Ferret developers have not provided a target date when we might expect Ferret to be ready. Meanwhile, we have installed a 32-bit version of Ferret on DaVinci; that version is operational.

not all of the needs are met. using idl for data visualization and simple pre/post analysis. please add 64-bit visualization software if possible

NERSC response: The operating systems on Bassi, DaVinci, Jacquard and Seaborg are all 64-bit; most of the visualization software installed on those systems is a native 64-bit version. On DaVinci, there are a few commercial applications whose vendors have not yet provided a native IA64 port (Matlab, IDL) but fully native 64-bit versions of those applications are installed on other platforms. If there is a specific visualization application you need for your project, please submit a software request.

In order to process data files, I need the NCO operators, which can manipulate netcdf files. If these were available at NERSC, I would do all my analysis there (using NCO, then MATLAB).

NERSC response: netcdf and nco are installed on all NERSC computers (Seaborg, Bassi, Jacquard and Davinci). To access nco (the netcdf operators) first load the module "netcdf". MATLAB is available on Jacquard and DaVinci.

The main data analysis tools I use are CDAT and VCDAT developed at LLNL. I am not sure if are supported now. If they are supported on NERSC platform, I would use them.

NERSC response: Please submit a software request.

No, there are programs that we need that are not on the systems.

Requests for consulting assistance

I have not used NERSC for my visualization needs for several years although I relied heavily on them in years past. I was very satisfied with the computers and consultants at the time. With the large user base of the NIMROD code and its emergence as the defacto standard for computational plasma physics it would be very helpful if NERSC would devote some consultant resources to NIMROD visualization. This would provide a much needed resource to the fusion energy community.

I do all my post-processing and data viz on my Mac, but this is getting out of hand with my recent interest in 3D data. I'll contact NERSC in the near future for help.

I do all my own analysis. I need to figure out what analysis capability NERSC has to offer and how I may use it.

I like to use Matlab for rough graphic purpose. I tried Matlab at nersc. It seemed it didn't work well for me. ...

Other

it seems odd that the 'vis machine' has no graphics pipes. it's faster for us to render our visualizations on our own machines that have OpenGL than it is to use the CPU renderer on davinci.

NERSC response: The Center has future plans to upgrade DaVinci in the 2007 time frame in terms of both processor count as well as the addition of graphics hardware.

 

Moving data is an inhibitor / disk issues:   7 responses

... Also, large amount of data needs to be moved from seaborg to davinci for analysis. It is very inconvenience. ...

... It is also not convenient if we need to transfer data from seaborg to other nersc machines for data analysis.

... I don't like to transfer data into DaVinci to do the visualization.

Used mostly interactive runs (IDL) on seaborg and struggled with the 30 mins time limit and memory limitations. Recently switched to using DaVinci what solved both these problems. The need to copy files from seaborg to DaVinci is an issue here as well as different endiannes. The common file systems as the one between Jacquard and DaVinci is a great way to go from my point of view.

NERSC response: The NERSC Global Filesystem has been deployed to address the above issues. It is a large, shared filesystem that can be accessed from all the computational systems at NERSC.

Mostly yes, but diskpool storage remains a problematic issue. [PDSF user]

... The diskspace of scratch is not big enough. [Seaborg user]

I wish there were automatic backups for DaVinci

 

Need more information / training:   4 responses

We might like to use these services, but are mostly unaware of what is available.

NERSC response: The NERSC Visualization services are documented on the web. If the documentation doesn't answer your questions, please contact a NERSC consultant.

A more helpful FAQ on using (especially, the remote fonts) Mathematica (on Jacquard or Davinci) would be extremely helpful. Local system administrator is still unable to configure my Linux machine so that the fonts work correctly over X11. As a result, this somewhat expensive software is unusable. Several of my coworkers, have exactly the same problem; none of them have been able to resolve it.

I think it would be useful to have a greater awareness of what visualization resources are available. Users may not be aware of what is available to them unless they specifically ask a consultant.

I mostly do my data analysis and visualization on local computers. Maintenance of serial queues on Seaborg and other newer platforms (Jacquard, Bassi) is a good idea. Classes on newer visualization software (e.g., Ensight) would be helpful.

 

Network speed is an inhibitor:   3 responses

Network connectivity between PDSF and LBL building 50 is so poor that data visualization via X is not practical (nor is even an X based editor session such as xemacs). I have to copy all my data to local machines for reasonable visualization.

No. I have mostly done my data analysis on seaborg and davinci. On seaborg, I use IDL and it works very well. On davinci, I use Matlab and it is slow. ... I want to use the software Visit on my windows XP system but it is slow too. I am not sure it is because of the slow network or davinci is slow itself.

The network latency to my work place (East Coast) is too high and makes software with heavy graphical interfaces almost impossible to use. Debugging sessions with Totalview, for example, are very frustrating. IDL is usable as long as the interface stays simple. We do all the intensive visualization locally.

NERSC response: The adverse effect of network latency on interactive remote visualization is a well-known phenomenon. While we can't do anything about reducing network latency, we are evaluating use of an emerging technology for remote desktop sharing (a la "VNC") that will provide some help as it eliminates most of the round-trip Xlib protocol traffic between NERSC and the remote site; such round-trip Xlib traffic significantly contributes to usability and performance issues - the new solution does not eliminate it completely but should result in a noticeable performance improvement. We will post more information when it becomes available later in 2006.

HPC Consulting

Legend:

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied 5.50 - 6.49
Significance of Change
significant decrease
not significant

Satisfaction with HPC Consulting

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
HPC Consulting overall     1 1 4 36 118 160 6.68 0.62 -0.01
Timely initial response to consulting questions   1   1 3 41 111 157 6.65 0.66 -0.05
Quality of technical advice       2 3 47 101 153 6.61 0.60 0.03
Followup to initial consulting questions   1 1 5 1 37 101 146 6.57 0.83 -0.09
Amount of time to resolve your issue   2 1 3 7 54 86 153 6.41 0.89 -0.19
Software bug resolution   1 3 5 4 33 42 89 6.17 1.11 0.05

 

Comments about Consulting:   46 responses

20   Good service
3   Mixed evaluation
3   Unhappy
Good service:   20 responses

Nice job!

The consultants do well in a tough job.

NERSC's technical help has been excellent. Each issue I brought to their attention was dealt with quickly, many times within the same day, if not while I was on the phone with them.

good work and thank you.

I've only talked to them about password resets and when some machines are going to be up, but they were helpful.

really very helpful and knowledgeable

I think the consultants at NERSC are doing a great job!

I continue to be impressed with the competence and helpfulness of the NERSC consulting staff.

This group is great. They have been very helpful.

They are good

My experiences with consulting at NERSC have been uniformly positive

In general I find consulting at NERSC superb.

NERSC consultants are always very helpful and prompt at replying. I have accounts on a number of other computing centers and rank NERSC the highest in this regard

Very good

The high-quality online consulting system is one of the most valuable aspects of NERSC.

The consultants are excellent and set NERSC apart from other major computing centers

Great Job, folks. Keep up the good work!

NERSC has by far the best consulting service of all the computing centers that I know.

NERSC has the highest level of consulting support that I can imagine. Consultants are knowledgeable, friendly, and have dedication to customer service. If only everybody else in other parts of my life were as good!

I have found the consultants very useful, prompt to reply, and very helpful with whatever problem I encountered. Thanks!

Mixed evaluation:   3 responses

Iwona is fantastic at responding to and resolving PDSF problems in a timely manner. My experience with non-PDSF support staff has been much worse -- slow response for even simple tasks like rebooting a bad diskvault or compute node. The advertized 4 hour turnaround for online help requests seems to only be true for Iwona.

I'm very happy with the consulting in general. Sometimes (if people are on vacation, on travel, during weekends), there seems to be a lack of experts on call/duty for pdsf. The staff on the 'general' hotline is not necessarily very helpful for pdsfok at) the problems which arise during these times. problems. I don't blame them but there should be somebody in reach to resolve (or at least lo

I mostly use PDSF. When the PDSF consultant is in everything is great. Iwona is a wonderful help to all of us at KamLAND and when she is on travel it is noticeable because the performance of PDSF is often noticeably worse and the response to questions and problems (usually from consultants not specializing in PDSF), while well-intentioned has sometimes not been particularly useful. If there was one complaint I have, its that there does not appear to be enough staff to cover the consulting needs for PDSF. I get the impression that the personnel resources for PDSF are stretched very thin.

NERSC response: For the above three comments: Iwona is a part-time sysadmin for PDSF. The other NERSC consultants do not perform system administrations tasks on the PDSF. NERSC will work to improve the PDSF response time for such issues.

Unhappy:   3 responses

somewhat improve on the C++ knowledge

NERSC response: We recognize the need for more C++ expertise and expect that newly hired consultants will be knowledgeable in C++.

As a followup to my last time, I still have not got a serious answer to the problems. The consultants waited to so long I gave up working on Jacquard for a while. Then they said everything was updated and changed so I should try to solve my problem again and see if it persists. Well, when I have some free time, I'll give it a go, but it seems that if I run into a problem, I might be on my own anyways.

I had bad luck with fortran compilers on Jacquard. I use OpenMP+MPI parallelization and the compilers were not able to handle OMP directives. It is not clear why more robust pg compilers were not used for this linux computer.

NERSC response: For the above two comments: Vendor support of the user environment and software on jacquard is not as tightly integrated as the support for Seaborg and Bassi and resolutions of these issues has taken longer. However, the consultants could have done a better job of at least letting the users know that we have not forgotten the issues, and our goal is to do better in this area.

Services and Communications

 

  • Legend
  • Satisfaction with NERSC Services
  • How Important are NERSC Services to You?
  • How useful are these methods for keeping you informed?
  • Are you well informed of changes?
  • Comments about Services and Communications

 

 

Legend:

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied 5.50 - 6.49
ImportanceAverage Score
Very Important 2.50 - 3.00
Somewhat Important 1.50 - 2.49
Significance of Change
significant increase
not significant
UsefulnessAverage Score
Very Useful 2.50 - 3.00
Somewhat Useful 1.50 - 2.49

 

Satisfaction with NERSC Services

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
Account support services     1 1 4 25 119 150 6.73 0.61 0.06
Computer and Network Operations     1 2 1 24 64 92 6.61 0.73 0.10
Response to special requests (e.g. disk quota increases, etc.)       6 6 20 46 78 6.36 0.93 0.28
NERSC CVS server       2 3 10 13 28 6.21 0.92 0.88
Allocations process 1   5 5 7 49 57 124 6.16 1.11 0.23
E-mail lists     1 7 5 22 27 62 6.08 1.06 -0.04
Visualization services 1     7 3 9 16 36 5.83 1.42 0.43

 

How Important are NERSC Services to You?

3=Very important, 2=Somewhat important, 1=Not important

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
Allocations process 1 22 86 109 2.78 0.44
Account support services   39 92 131 2.70 0.46
Computer and Network Operations 2 27 59 88 2.65 0.53
Response to special requests (e.g. disk quota increases, etc.) 3 21 50 74 2.64 0.56
Visualization services 20 17 14 51 1.88 0.82
E-mail lists 21 27 13 61 1.87 0.74
NERSC CVS server 19 17 5 41 1.66 0.69

 

How useful are these methods for keeping you informed?

3=Very useful, 2=Somewhat useful, 1=Not useful

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
SERVICES: E-mail lists 1 42 104 147 2.70 0.47
MOTD (Message of the Day) 13 55 55 123 2.34 0.66
SERVICES: Announcements web archive 13 55 48 116 2.30 0.66
Phone calls from NERSC 25 25 32 82 2.09 0.83

 

Are you well informed of changes?

Do you feel you are adequately informed about NERSC changes?

AnswerResponsesPercent
Yes 161 99.4%
No 1 0.6%

Are you aware of major changes at least one month in advance?

AnswerResponsesPercent
Yes 137 85.6%
No 23 14.4%

Are you aware of software changes at least seven days in advance?

AnswerResponsesPercent
Yes 152 95.6%
No 7 4.4%

Are you aware of planned outages 24 hours in advance?

AnswerResponsesPercent
Yes 155 96.9%
No 5 3.1%

 

Comments about Services and Communications:   10 responses

MOTD / Communication of down times

MOTD is rarely useful b/c it mixes so many systems and is so long that the relevant pieces usually scroll off my screen before I see them. e.g. if PDSF logins would only show PDSF MOTD, that would be useful. ...

The MOTD seems never up to date.

I have noticed that the machine status does not always reflect reality. Several times I have been informed that Seaborg or Jacquard is "Up and available" when they are in fact down.

NERSC response: We are revamping the MOTD so it will be more concise and accurate. In particular, it will never show machines as being "Up and available" when they are not.

Authentication issues

... Account support: NERSC has very poor account support for collaborative computing (e.g. production accounts that are independent of individual user accounts). Currently essentially nothing exists for this and it has been frustratingly slow to get any action on this front. I consider this a major shortcoming of NERSC's computing framework.

How's about raising the number of incorrect login attempts before a lockout to, say, 20? If you make password requirements so strict, it's easy to forget the exact password, and if only a few guesses at my own password locks me out, then it's a waste of all our time to have it unlocked.

NERSC response: NERSC must abide by DOE password guidelines. These specify that "Three failed attempts to provide a legitimate password for an access request will result in an access lockout".

Other

It would be nice to have some kind of the mechanism to restore the jobs that has been crashed due to unexpected outages

NERSC response: This actually happens in many cases and users never know it. Sometimes this is not possible, however.

Enabling a PI to email everyone in their repo would be a great help.

NERSC response: We will investigate this.

Enabling a PI to hold queued/kill running jobs submitted by members of their repo would also help.

NERSC response: This is probably not possible for technical reasons. Among other issues: users can belong to multiple repos. PIs and Account Managers can limit the time available to each individual in their repo via the NIM accounting web interface.

Increase the maximum of time for a user code, if possible.

NERSC response: The longest wall times are on Seaborg, where the maximum wall time is 48 hours for 512 or more processors, and 24 hours for fewer than 512 processors, and Jacquard, where the maximum wall time is 48 hours for 2 to 32 processors and 24 hours for up to 64 processors. If this doesn't meet your needs, we urge you to contact the NERSC Users' Group, NUG. NUG has a significant influence on queue policies.

Have been very happy with extent to which personnel made contact directly to help us use NERSC most efficiently.

Generally satisfied with NERSC services. Applaud move to clusters. Allocation process is needlessly burdensome and machines are overallocated leading to long wait times

Given the sometimes dodgy nature of computer systems, I cannot think of anything that NERSC staff could do to keep us better informed of impending issues.

Web Interfaces

Legend:

SatisfactionAverage Score
Mostly Satisfied 5.50 - 6.49
Significance of Change
not significant

Satisfaction with Web Interfaces

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
Accuracy of information   1 1 2 10 57 82 153 6.40 0.81 -0.01
NERSC web site overall (www.nersc.gov)     3 5 9 69 76 162 6.30 0.86 -0.02
On-line help desk 1   2 6 9 26 47 91 6.16 1.16 0.00
NIM   2 1 11 14 48 69 145 6.15 1.07 -0.09
Timeliness of information   2 1 6 17 65 58 149 6.12 0.97 -0.05
Ease of finding information   2 8 6 20 70 53 159 5.93 1.13 0.04
Searching   2 3 10 11 26 25 77 5.70 1.30 0.07

 

Comments about web interfaces:   8 responses

nim.nersc.gov is useful for allocation status

The NIM account website is too cluttered and as a PI it is difficult to find out information about the users. I have to relearn my way every time I use the NIM web pages.

I still find the NIM interface for getting history of personal usage a bit clunky. I doubt the GOOGLE people would rate it as an A+ in user friendliness.

NERSC response: Granted. Given our resources we have emphasized functionality over interface design. We try to make improvement where we can. Specific suggestions are always welcome.

Would be nice to have a single password for NIM and seaborg.

NERSC response: NERSC is moving toward single-password authentication; many computational systems and NIM share a password. The version of LDAP, which is the authentication mechanism NERSC uses for this, is not supported by AIX 5.2 on Seaborg. NERSC is weighing costs and benefits associated with implementing a custom solution to align Seaborg with the other systems.

A lot of important information is buried in the FAQ. Finding it requires knowing the right search keywords. It would be nicer if the information was better organized somehow. [PDSF user]

Need to check faqs for dead links. I tried to look how to back up on HPSS and hit a lot of dead ends. [PDSF user]

NERSC response: NERSC does not currently have resources to devote to better FAQs maintenance. We are exploring the possibility of having user-maintained FAQs or user-helping-user BB systems.

I find web site cumbersome. If you can't find what button to click in 5 seconds per page the page is too cluttered

typing in things you know are there in the documentation somewhere does not turn up the document, and it can be difficult to find the appropriate document.

Training

 

  • Legend
  • Satisfaction with Training
  • How Important are these Training Methods?
  • What Training Methods should NERSC Offer?
  • Comments about Training

 

 

Legend

SatisfactionAverage Score
Mostly Satisfied 5.50 - 6.49
ImportanceAverage Score
Very Important 2.50 - 3.00
Somewhat Important 1.50 - 2.49
Significance of Change
not significant

 

Satisfaction with Training

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2004
1234567
New User's Guide       4 7 25 45 81 6.37 0.84 0.10
Web tutorials     1 4 3 39 31 78 6.22 0.85 0.12
In-person classes at your site       3   5 8 16 6.12 1.15 0.64
Live classes on the web       4 1 9 4 18 5.72 1.07 0.57

 

How Important are these Training Methods?

3=Very important, 2=Somewhat important, 1=Not important

MethodNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
New User's Guide 1 23 49 73 2.66 0.51
Web tutorials 2 32 44 78 2.54 0.55
Live classes on the web 9 13 8 30 1.97 0.76
In-person classes at your site 14 9 7 30 1.77 0.82

 

What Training Methods should NERSC Offer?

MethodResponses
Web documentation 115
Web tutorials on specific topics 102
Live web broadcasts with teleconference audio 16
Live classes at LBNL 15
Live classes on the web 11
In-person classes at your site 8

 

Comments about Training:   8 responses

It would be helpful an orientation workshop and/or lectures on basics of using NERSC facilities (overview, compilation, running jobs, HPSS, etc.) early on, say within a week or two after allocation announcements.
I also would like to distinguish the help from David Skinner and Cristina Siergerist

The more info that can be put up on transitioning codes from Seaborg to the Linux cluster the better.

Training needs to be broadcast over Internet II
Also, Training sessions at other sites than NERSC, like at the NCCS at ORNL

The Access Grid is a good tool but does not work well most of the time. Live classes are very important since the trainers usually give extra information that does not appear in the web documentation.

Despite the advances of technology, nothing beats live training. Combining training with other events (such as the combination of Jacquard training and the NUG meeting this past October) is an ideal for me. When time permits, the airplane is still the best vehicle for delivering training (or at least the trainers/trainees).
I am not opposed to using the Access Grid. It is, however, of little use to me personally.

While the resources available at NERSC are vital to the success of my project, I do not utilize most facilities that are available; therefore, the need for training is minimal.

On web classes and tutorials is probably more my problem then NERSC's. I need to do a better job of availing myself to them.

don't use

Comments about NERSC

What does NERSC do well?

[Read all 82 responses]

47 Provides access to powerful computing resources, parallel processing, enables science
32 Excellent support services, good staff
30 Reliable hardware, well managed center
11 Everything, general satisfaction
11 HPSS, data and visualization services
6 Software resources and management
4 Documentation, NERSC web sites

What should NERSC do differently?

[Read all 65 responses]

24 Improve queue turnaround times
22 Change job scheduling / resource allocation policies
17 Provide more/new hardware; more computing resources
8 Data Management / HPSS Issues
5 Software improvements
5 Account issues
5 Staffing issues
4 Allocations / charging issues
2 Network improvements
2 Web Improvements

How does NERSC compare to other centers you have used?

[Read all 51 responses]

26 NERSC is the best / overall NERSC is better / positive response
12 NERSC is the same as / mixed response
7 NERSC is less good / negative response
6 No comparison made

 

What does NERSC do well?   82 responses

Note: individual responses often include several response categories, but in general appear only once (in the category that best represents the response). A few have been split across response categories (this is indicated by ...).

  Provides access to powerful computing resources, parallel processing, enables science:   47 responses

powerful is the reason to use NERSC

I compute at NERSC because I cannot afford to purchase and maintain clusters with fast interconnect

NERSC provides powerful computing resources and excellent services. It enables people to perform many challenging, difficult computational work.

NERSC provides superior hardware computing capabilities, and professional user support and consulting. These two areas is where I see NERSC core strengths, here NERSC offers resources that are not easily matched by any local cluster or computer farm set up.

I like the new hardware (jacquard and bassi).

... I compute at NERSC because I have to analyze large amounts of data stored on PDSF and I need the parallel computing facilities to enable my analysis to run fast. I would not be able to do my research without a cluster like PDSF.

I am not a super-power user but find NERSC essential for getting <1 day turn around for runs that on a fast PC would take 1 day to 1 month or more.

NERSC provides me the resources badly needed that can not be obtained elsewhere. It is very important to my scientific research program.

INCITE project

NERSC (in particular Seaborg) is extremely valuable for our research in theoretical nuclear structure and reaction physics. While we are able to write and debug our Fortran-95 codes on our local workstations (LINUX and MAC OSX), we need a supercomputer to do production runs.

The machines are powerful, well administered, reasonably responsive, and accessible to groups which do not have the resources to build their own large clusters.

Online allocation process is convenient. Since NCSC is closed in our place, NERSC is the only facility that I can rely on for large numerical calculations. I am mostly satisfied with the performance of NERSC. Thanks.

it allows to do things that take too long or take too much memory on a linux box

Speed, data accessibility

Excellent resources and support.

I mostly use NERSC to run my Titanium benchmarks on lots of nodes.

It is the primary source of computing assigned by DOE to carry out the computational work which is part of my contractual obligations with DOE.

NERSC computer resources are very important for my research. It gives me access to high performance computers and the mass storage that I need to store intermediate and final results. I am very satisfied with flexible way, NERSC handles allocations of CPU time and mass storage.

I compute at NERSC because of the availability of massively parallel resources. IT is one of only a few facilities which offers the ability to compute across 100's or 1000's or processors. I am most pleased with the stability of the NERSC machines; most of the time one can run very large number of processor jobs and the machine will usually be stable in this time.

NERSC provides generally reliable hardware and software, and very good user support from consultants. It provides a significant portion of my overall computing resources.

The increases of computer performance (CPUs & Memories) are important for us to do the large-scale computing at NERSC.

The resources provided by NERSC are essential to my DOE-funded research. NERSC does a very good job of making massively parallel computing resources available and usable.

Access to large parallel systems is very useful.

Very useful for large "batch" projects

I compute at NERSC because it provides enough processors with the required memory and speed for my applications.

NERSC is doing a good job and is very important to me. There is a large amount of computing resource and I need to run hundreds of jobs to analyze the data.

NERSC has the largest and fastest computers I have access to.

I compute with Jacquard and analyze with Davinci. Both have been really helpful, fast and reliable. I have some comments on the batch system, which I think needs improvement. My jobs were sometimes killed because of an NFS file handle error, so that I would have to wait in the queue to restart (continue) that run. I think that in the event of a run killed due to a system error, any *dependent* job related to that killed job should be given the highest priority to start. Also, there have been problems with jobs getting stuck in the queue for a week or more, overtaken by jobs by jobs submitted more recently. I think the batch system should be improved to avoid such scenarios.

  Excellent support services, good staff:   32 responses

There is a distinct PROFESSIONALISM in the way NERSC conducts business. It should be a model for other centers

Provide a source of parallel computing that I do not have. Great resource & a Great Consulting staff. Without them I could not do and learn the advantages of parallel processing

Consulting is great. Management is great. DOE strategy is not so great.

NERSC is important because it combines state of the art HW/SW but most important the combination of state of the art HW/SW with excellent first class consulting/collaboration.

Iwona is great at PDSF support. PDSF node availability is quite good.

very useful service
really fast to help with different problems

... I am also impressed with the fast turnaround time for solving issues. ...

I'm very happy with the support and (at least in general) the timeliness of the replies to requests. The staff makes NERSC (= pdsf in my case) a nice place to work.

support and consulting are very good

Processors available and good consulting support.

NERSC provides good help line support. I don't do much except get our codes to compile on the seaborg compiler. Others in my group are the ones who actually run the codes.

Consulting is excellent Up until recently the computing power has been second to none.

It is a great place for high end computing. The services at NERSC wonderful.

Good user service. We are cycle hungry. Very pleased with move to clusters. We have been getting a lot of uncharged time from Cheetah and Phoenix at ORNL-CCS. Cheetah is gone, and we don't know how well the new Phoenix allocations will work. Got a lot done on Jacquard this year but it is now overloaded.

NERSC tries to be a user facility and tailor its systems for the user, although it is not as good at this as it used to be. It does not discourage actually talking to its consultants, unlike some other places.

I am extremely satisfied with NERSC! In particular I would like to thank Francesca Verdier, she has always and very promptly responded to all my concerns/questions/requests and have always showed a willingness to help that is not usual!

Services of the personnel are excellent. Antiquated and inadequate hardware is the main issue from what I see. Bassi helps but more is needed.

So far NERSC is doing very well on flexibility. NERSC can meet with users requests timely. NERSC is important because it is very close to my University. I am supposed to be able to do complex data visualization and analysis interactively although it turns out not as I expected. The consulting services at NERSC are also very good compared to other centers.

  Reliable hardware, well managed center:   30 responses

NERSC is and remains to be the best run central computing facility on the planet. I have been working at NERSC/MFECC for 25 years and continue to use it for highly productive work.

Management of large-scale multi-processing.

I think the ability to provide support for a wide variety of codes, that scale only to 64/128 proc to those that scale all the way to 4096. This flexibility given different scientific applications is crucial for its success.

NERSC maintains a high performance computing center with minimal downtime which is very reliable. I am very impressed with how short the amount of downtime is for such complicated systems. ...

NERSC systems are generally stable and well maintained. This is great for development. But, NERSC becomes a victim of its own success because many people want to use it, resulting in long batch queues.

I use seaborg primarily because of how much memory it has on each node. And I've had very few problems with it - a definite plus.

Stable computing environment and fair queuing system.

NERSC has good machines, the overall setup is quite good.

I run at NERSC's Seaborg because the performance is the most consistent and reliable of any of the facilities at which I compute. I never have trouble running on even 1000+ processors - that's the most important thing for me. The few times I've needed questions answered or account changes, the consultants gave me a very quick and satisfying response and I really appreciate

Recently I do lots of simulations at NERSC. NERSC is really a world-class facility, in my opinion. The administration of the system is really great.

Capacity and capability computing; flexibility and responsiveness locally.

Machines are up and running most of the time, and the waiting times for job execution are quite acceptable

Generally, it runs well at NERSC. There are so many computing nodes and the system is very robust and reliable. There are many scientific programs needed to be run on the NERSC. That's why we choose and use it.

Excellence machine offering reliable and excellence service. But we need bigger one and more allocation.

  HPSS, data and visualization services:   11 responses

We generate very large amounts of data and having a reliable partner who can store at large scale, plan and manage the storage capacity etc is very valuable.

Provides good computing support for large computer jobs, including an easy to use and remotely access archiving system.

free; convenient; HPSS has now become indispensable for me

Mostly everything works well when there are no network problems or other issues the either prevent me from logging on or make my sessions crawl. NERSC is very important to me because I do all of my data analysis on PDSF and the data for our experiment (KamLAND) is all processed at NERSC.

(We only use PDSF and HPSS:) I am very satisfied with the overall facility. The HPSS system is the main repository for all our experimental data and we are quickly becoming its largest user. The system is efficient when there are not many connections to it and once you have an HPSS connection, the transfer of data onto HPSS is very fast.
PDSF is our main analysis cluster and was essential for doing our analysis for all our papers. The many available processors and the fact that the facility is shared among several experimental groups that have similar needs, make it very useful for us. The flexibility of the machine and staff allow things to get switched around quickly if necessary.
I am very happy that NERSC decided to use the GPFS file system. We converted our disks to this new file system a month or so ago and see dramatic improvements in the number of concurrent clients and throughput over what was used previously.

NERSC is best for interactive running and visualization. The time on NERSC is important to the completion of my projects, however,more time is spent waiting in the Seaborg queue than on other machines.

  Many things, general satisfaction:   11 responses

NERSC is doing a great job and continuously improving. I am very satisfied with your job. Thanks!!

NERSC is doing terrific. It has very helpful consultant systems, provides all kinds of softwares and libraries we need, and also the computing systems are well customized. I can get helps, find documentations, solve my problems, get my work done very very quickly.

NERSC is best at delivering the combination of leading-edge (or near leading-edge) hardware and system reliability in a production setting. High-performance scientific computing requires both computing muscle as well as systems reliability. NERSC has always been able to manage the balance between the two to give us (the users) an impressive product. As a result, my colleagues and I have been able to run simulations this year that we could not have done at any other (open) facility. Understandably, we are very pleased and hope that NERSC will have the funding to both upgrade its systems and maintain its consulting excellence.

NERSC runs a very tight ship. Although the queues are often long, downtime is rarely a significant issue and the facilities are well maintained to keep up with the demand. This reliability and the extent of both hardware and software resources allow me to run simulations in approximately 1 week (including queue time) that would take on the order of a month to run on my local machines. Additionally, the availability of a dedicated computing cluster allows me to handle all necessary data processing while the "next run" is in progress, thereby maximizing my efficiency.

NERSC staff are excellent. Machine uptime and access is excellent. queue structure is very good.

Ease of supercomputer use
Computing power of machines is very good

Support (consulting) is excellent
Nersc helped me and my entire community with a special project (see qcd.nersc.gov)

I am mainly very satisfied with the PDSF facility. This satisfaction and its proximity are the main reasons for use.

NERSC's strengths are a combination of its dedicated and knowledgeable consulting and support staff; and its very large systems, which make it possible to run very large parallel jobs on the same machine where they are developed and tested serially or on a few processors. The presence of good analysis and debugging tools, such as Totalview, is also critical.

I compute at NERSC because the systems are always up and running, the software is available and up to date, and the consultants always have the answers to my questions.

In large-scale computing, *many* things are *very important* (any single one going wrong can ruin your day, possibly your cpu bank account), as I checked. nersc does them well!

Stable super computing environment with almost always up to date libraries good technical support

  Documentation, NERSC web site:   3 responses

The help documents online. I need NERSC to handle parallel computation for time consuming problems.

Abundant and very helpful website information about scientific computing.

NERSC has great computers and except for the last 5-6 months, great support. Things tend to run most of the time and one can get interactive runs done quickly. Web site is comprehensive and quite extensive.

  Used to be Good:   2 responses

I have not used the facilities significantly in the last few years and cannot answer detailed questions in a meaningful way. In prior years, I found NERSC to have excellent state of the art computers and a dedicated support staff. The main problem was the rate at which changed. When not using the facility intensively it became more and more difficult to keep up with changes. Right now I would like and excellent, stable facility rather than a state-of-the art computing facility.

NERSC used to be a superb place for supercomputing. I've done very substantial simulations in previous years. The problem is that the seaborg (the main resource at NERSC) is very much outdated. This problem is exacerbated by overloading the system with too many jobs. We could not even use our last year allocation because it was difficult to run jobs.

 

 

 

What should NERSC do differently? (How can NERSC improve?)   65 responses

  Improve queue turnaround times:   24 responses

The most important improvement would be to reduce the amount of time that jobs wait in the queue; however, I understand that this can only be done by reducing the resource allocations.

* to improve turn around time for jobs waiting in the batch ...

A queued job sometimes takes too long to start. But I think that, given the amount of users, probably there would be no efficient queue management anyway.

The turnaround time should be much better, although it has improved since last year (Especially since the inclusion of Jacquard). ...

run more jobs and faster

I wish the queue time for large short jobs on Jacquard got through faster. Also, now that Jacquard seems to be stable, please minimize its down time. Thanks.

Queue on seaborg is a bit slow due to heavy use. I recommend another machine.

Batch Scheduling needs to be upgraded so that wait time in queue is reduced ??

Limit the waiting time in the queues, especially on seaborg (it is pretty good on jacquard).

Decrease waiting time for jobs.

The batch queue waits can be very long. It is difficult to get jobs through with time requests for more than 8 hours of computing.

I am mostly dissatisfied with the batch wait time. Sometimes I had to wait for more than a week for a job to get into the run mode. The situation has become that the turnaround time sometimes is slower than our local cluster for those jobs that we can afford to run here. Any ways to reduce the turnaround time will be useful.

Less waiting time on queues

... * The queue wait times are ludicrous on seaborg. I don't want to use that machine anymore, although it's a great machine.

Reduce batch queue hold time.

My major complaint of the past year was the time spent waiting in batch queues on Seaborg. The introduction of Jacquard has helped this somewhat, though queue times are occasionally still long (one week for 1000 processors). I certainly appreciate being given "boosts", when circumstances demand it. However, relying on this a lot slows things down for users who are in a more ordinary mode of operation. Short of additional systems capacity, I'm not certain of the best solution...

NERSC response: NERSC has made several changes based on the over allocation that occurred in 2005 and the resulting long turnaround times:

  • Time for new systems will never be pre-allocated but allocated shortly before the system enters full production.
  • Time has not been over allocated in 2006.
  • NERSC is investigating with DOE options to under allocate resources in order to improve turnaround times while still meeting the DOE mission.
  • NERSC has suggested to DOE that the allocation process align allocations more to the mission and metrics DOE has for NERSC
However, many things that affect turnaround time are outside of NERSC control, such as large numbers of projects that have similar project deadlines.

 

  Change job scheduling / resource allocation policies:   22 responses

Don't over allocate

Overallocation is a mistake.
Long waits in queues have been a disaster for getting science done in the last few years. INCITE had a negative affect on Fusion getting its science work done.

Under allocate to reduce queue waits. Look for better queue systems that don't involve priorities. Keep large ps queues like 64ps and 32ps on Jacquard always open. ...

NERSC should never over-allocate its resources. It should seek to decrease the wait times in the queues. It should work with DOE to enhance resources (Seaborg is kind of old now).

It's much better to have idle processors than idle scientists/physicists. What matters for getting science done is turnaround time.
Don't measure the GFlops the machine is getting, measure the GFlops the scientist is getting. For example, my job might do an average of 800 GFlops continuously over 8 hours. But that doesn't matter if I have to wait two weeks (336 hours) in the queue, in which case I really only get
(8/(8+336)) * 800 GFlops = 18.6 GFlops.
So the turnaround time is 43 times longer than what it could be if it weren't for the wait in the queue!!! So the supercomputer is effectively turned into a much slower machine. Although that already looks bad, it's actually even worse: scientific research is often an iterative process.
For example, one has to wait several weeks (for the queue) and a day (for the run) to obtain a result. That result then needs to be studied, so that an appropriate next run can be decided. Then another several week wait before those results are available. When turnaround time is this long, some problems simply aren't solvable (even though they could be if turnaround time was about the same as compute time). So big science research is being harmed, not by Seaborg's hardware, but by the way you are allocating Seaborg's hardware.
Don't over-allocate! It seems like you think that "no idle processors" means "we're getting good efficiency". But it *really* means: "we have inadequate computing resources for the number of jobs we're accepting".

NERSC response: NERSC agrees that over allocation is not the right thing to do. Time has not been over allocated in 2006 and we did not pre-allocate time for the new Bassi Power5 system. NERSC is investigating with DOE options to under allocate resources in order to improve turnaround times while still meeting the DOE mission.

Capacity computing needs

Although this problem has been addressed somewhat in the last year or so, the batch queue system on Seaborg can make it difficult to run jobs for a long time on a modest number of nodes rather than in bursts of a few hours on larger numbers.

My work would be advanced more effectively if NERSC were oriented toward what is often termed 'capacity computing'. I most emphatically have no need for high-end systems that can apply thousands of CPUs to one process and break speed records. Very few projects will ever be awarded a large fraction of such a system, so money is being wasted to provide a capability that is not really used much of the time. A user with a large allocation of, say, 500000 hours can use it up in a year by running on 64 CPUs all the time (fewer CPUs on the newer machines). What is the point of emphasizing massively parallel capabilities if users can't afford to routinely run on huge numbers of processors? We should aim for high MFLOPs/dollar, not high total speed on one job.

Realize that , in general, the user knows best and uses the facilities in a way that maximizes the scientific output from his/her allocation. Don't put barriers which try to force a particular style of usage and which discourage other types of usage. For example, recognize that 100 1-node jobs are as good as 1 100-node job, if that enables the user to get his/her work done in a timely fashion.

NERSC response: NERSC hopes that the introduction of the 640-CPU Opteron cluster, Jacquard, and the 888-CPU Power5 system, Bassi, has helped to meet capacity computing needs.

More interactive/debug resources

... Interactive computing on Seaborg remains an issue that needs continued attention. Although it has greatly improved in the past year, I would appreciate yet more reliable availability.

As a user, I would like to see debug job available most of time. I think that making debug job fast and convenient is biggest save because it saves user's time (not computer time).

At this moment, I think the time limit (half an hour) for the interactive queue seems too short. It is not possible for debugging a large scientific program within only half an hour. I suggest this time limit can be extended to 2 hours. Or add another queue such as express for such kind of purpose. Thank you.

NERSC response: Increasing the time limit on the interactive/debug queues would increase their turnaround time. We think that 30 minutes is a good compromise. For longer debug runs, please consider using the premium class.

It would be useful if there was higher priority given to interactive multiprocessor jobs --- it can be very slow getting 4-8 processor runs executing. I find the poe stuff a pain in the ass in terms of having to specify the wait time between retries and the number of retries.

I think the most important issue are the batch queuing policies on seaborg. The decision that two production jobs can block even the debug queue slows down the program development, since production and development interfere.

Different queue policies

... The time limits on the queues should be larger, although I know it would affect turnaround. I would suggest skewing the processor allocation towards jobs which are a bit less parallelizable. Not by much, but in that way jobs wouldn't sit forever and then run 24 hs in 16 processors.

... * to have a mechanism of restoring jobs that crashed due to unexpected outage ...

I accept that when debugging small 1/2 processor jobs are essential, but I would like to see a restriction to stop multiple (ie > 10) being submitted swamping the batch queue.

Queues queues queues.
INCITE needs to be reformulated so that it does not disable every other user of the Center for months at a time.

NERSC might be able to improve by specializing in large-scale computations by the following points:
1) account charges for large-scale computations
2) job scheduling for large-scale computations
3) compute nodes for large-scale computations
1) & 2) are already tried and now installed at Seaborg. I think the people who are doing the large-scale computations at Seaborg have really benefited from these concerns. I hope NERSC to continue to try these adjustments and the people who run the large-scale jobs can take full advantage of NERSC computer resources.
As far as 3), some compute nodes can be reserved for large-scale computations. For example, 64-128 nodes might be enough for this purpose. Of course, if there is no use for large-scale computations, NERSC should arrange these nodes for the computations with small number of nodes to share.
Also it might be possible by combining 2) & 3) that NERSC can set the class charge priority for large-scale computations higher at some compute nodes.

  Provide more/new hardware; more computing resources:   17 responses

Expand the computational ability:
Reduce the waiting time;
Increase the maximum time for a running code.

Expand capabilities for biologists; add more computing facilities that don't emphasize the largest/fastest interconnect, to reduce queue times for people who want to runs lots of very loosely coupled jobs. More aggressively adapt to changes in the computing environment.

Our group would like to use a CRAY supercomputer again (I have used NERSC's CRAY SV1 several years ago).

Aside from increasing the amount of computational resources, I think NERSC is serving my needs.

We've been less active taking advantage of the computing facilities mainly due to the perceived complexity of getting the kinds of software/systems needed to do computational biology installed on the cluster. This is an historical perception, based on previous attempts to get software like blast running. I would like to try this again, as it makes more sense to use centralized clusters.

NERSC needs to expand the IBM-SP5 to 10000 processors to replace the IBM-SP3
Continue to test new machines, including the Cray products

NERSC needs to push to get more compute resources so that scientists can get adequate hours on the machine

6000 or more P5-processors to replace Seaborg's P3-processors!

The overloaded queues on Seaborg and other systems is an on-going problem. DOE should give more money so that NERSC can buy new hardware.

... Dump Seaborg and move to clusters..change from a supercomputer center to a cluster center. The total speed of a single computer (cycles/sec) is completely irrelevant. How many cycles/year can NERSC deliver is what you should shoot for

It would be great to increase the size of jacquard.

Obtain major increase in hardware available to the user.

NERSC response: NERSC has recently deployed two new computational platforms: the 640-CPU Opteron cluster, Jacquard, and the 888-CPU Power5 system, Bassi. In addition, the NERSC-5 procurement is underway, with the goal of providing a substantial increase in sustained performance over the existing NERSC systems. The increase is expected to be at least 3 to 4 times the computational power of Seaborg. NERSC-5 is expected to start arriving at NERSC during FY 07.

  Data Management / HPSS Issues:   8 responses

Disk space, especially scratch on Jacquard but also more generally, is often a restriction.

I hope NERSC can have a stronger data visualization platform and make all data accessible to all platforms without physically moving data around.

NERSC response: NERSC has deployed the NERSC Global Filesystem, a large shared filesystem that can be accessed from all of the compute platforms.

Disk storage (at PDSF) unfortunately remains a headache: diskservers are not very robust. This may change with GPFS. Also, cheaper local storage on compute nodes is at the moment of limited use with relatively frequent downtime of individual nodes and scheduling difficulties (or the impossibility to access data from other nodes using e.g. rootd).
HPSS is also not as robust as could be hoped for.

... Improve HPSS interface software (htar and hsi).

... - I would like to see more tapes on HPSS to help getting our data off of tapes. We now occasionally have to wait more than 12 hrs to stage data. I realize that there are never enough tape drives, but this would help us.

... File system quotas seem to favor small numbers of large files, rather than the reverse, which is occasionally a difficulty when one is trying to concurrently maintain and develop several versions of a code divided into hundreds of source files.

... Remove the distinction between permanent and scratch disk space. Many of us need large amounts of disk space on a long term basis.

  Software improvements:   5 responses

... * to have faster compilers (say, from Portland Group) ...

The lack of flexibility in supplying alternate compilers on Jacquard has rendered that machine useless to me and my group, even though in principle, it could be a highly useful resource.

Can work towards improving the scaling of Gaussian 03 with number of processors

The software support on seaborg and jacquard leaves room for improvements, also more software could be supported.

  Account issues:   5 responses

allow group accounts

... Allow production accounts for collaborations to use. ...

... Another request, which is for my collaboration, we could use a dedicated account for running our data production jobs. It would greatly simplify and improve our data production process and I believe our liaison has proposed a viable solution that is compatible with the DOE user account mandate.

- I would like to see the return of production accounts. We continue to have trouble with running large scale production by multiple individuals. We run into obvious problems like file-permissions. These can be solved by users being diligent about them when creating directories, but it doesn't solve the fact that some programs that do not honor the set umask when creating files - these have to all be set by hand at a later stage (they create files using 0644 mask instead of the more permissive 0666). Other more problematic issues are: job control for long running jobs (user may not be available and someone else should take control over the production) and ensuring a common environment to make sure that all processing is done in the same way (it turns out that many of the people running our production are also developers and may accidentally run the production using their developer environment, instead of the production env). Another issue that we continue to run against is the maximum number of simultaneous connections that HPSS allows. On a distributed computing system like PDSF, one needs the ability to open many simultaneous connections in order to extract the data from HPSS to the local machine. The HPSS group set a limit (I think 15) for the number of connections that any user can have, but are willing to increase that limit on an individual basis. We would like to see our production account to have this higher limit. It would ensure that normal users would not be able to abuse the HPSS system, yet allow our production to function properly.
Production account issues were discussed extensively in April/May and we had a viable solution that would fulfill the DOE mandate of needing to track users. (the solution involved having production accounts that are not accessible from outside of NERSC, the user would always have to log into a system at NERSC as himself and only then ssh/su to the production account, allowing full tracking of who-did-what-at-what time). I am disappointed that this system has not yet been implemented, it would solve quite a few problems/annoyances we have at the moment when running large scale production.

Password management is awkward.
A single pw should apply to all nersc hosts.
A pw change should propagate almost instantly to all nersc hosts.
Forced pw changes are a security vulnerability & wastes valuable time thinking of new ones.
NERSC web services should not depend on vulnerable cookie & javascript technology.

  Staffing issues:   5 responses

* Consulting needs to actually answer one's technical questions. They seem competent but highly understaffed. What's the point of a fancy computer if you can't run things!?!? ...

Increase staff for maintenance/support. There should be somebody skillful available all the time (even when the one person in charge is on vacation/in the weekend). [PDSF user]

My biggest request is please hire more PDSF specialists. The staff appear to me to be stretched very thin, and there are significant and noticeable downtimes for these machines that affect my work schedule almost weekly. I think the staff at PDSF are doing a fantastic job but there aren't enough of them and I know that there has been some big reorganization there lately. I think it would help everyone if more resources could be devoted to hiring or training new personal specifically for PDSF. ...

... - More staff at PDSF would help tremendously. The staff maintaining the cluster is very good and helpful, but appears to be stretched to the maximum at the moment.
- There appears to be somewhat of a separation between NERSC and PDSF. I would like to see PDSF treated as a full member of NERSC, just like HPSS and the super computers are. The combination of PDSF and HPSS is very powerful and it appears that not everyone at NERSC realizes this. This may of course just be a perception issue on my part, but I do not have the same perception with HPSS for instance. It could also be due to the fact that users of PDSF are in a much more narrow field than the general users of SEABORG and JACQUARD. The PDSF users are overwhelmingly in nuclear-, particle- or astro-physics, but these fields all have enormous computing requirements and this is something that NERSC does well - use it to your advantage!

Extend remote data mining *vis group*

  Allocations / charging issues:   4 responses

The most important thing for NERSC is a fair and equitable resources allocation. It should be based on what the researcher has accomplished in the prior year or years, not on some outlandish promises that never fulfilled and changes all the time. They may sound sexy or politically correct.

Not catering so much to special interests such as INCITE and paying more to much needed ER production computing needs

it seems like my allocations keep going down year after year, despite the fact that I always request the same amount of CRU's, namely the minimal allowed in the NIM application form. I think this is happening because I don't do parallel computations, so this is probably not considered "sexy" by the allocations committee. I challenge anybody in the committee, however, to prove to me that my NERSC project is not at the forefront of DOE's current HEP mission. I believe I deserve more generous allocations.

The only thing that I am not sure about is the charging system. I would probably not use the machine charge factor. I would have the same charge for all systems and if some resource becomes computationally inexpensive this system will have a larger queue that will eventually encourage people to use a different system, which is slower but has less of a queue. In any case, I would like to know why you choose one system instead of the other!

  Network improvements:   2 responses

Better network bandwidth to LBL.

increase network speed

  Web improvements:   2 responses

Better web pages. By this make it easer to selectively scroll through (Clicking) the information.

Better web documentation on using the systems would be useful.

  Other suggestions:   2 responses

... * to have an orientation workshop on basics of using NERSC after allocation announcements
* to have a shorter survey form :)

I sometimes feel that changes which are to be made could be announced with a little bit more time for users to make the changes that are required. [PDSF user]

  Don't change, no suggestions:   4 responses

I think you are doing very well.

no comments

--blank--

Looking forward to using the resources.

 

How does NERSC compare to other centers you have used?   51 responses

  NERSC is the best / overall NERSC is better / positive response:   26 responses

Excellent!. I (we) have used the ORNL facility and NSF supported centers. NERSC is the best in my opinion.

Better than ORNL for the standard user.

There is a distinct PROFESSIONALISM in the way NERSC conducts business. It should be a model for other centers

very good

One of the best known to me (comparing to RZG Garching and SDSC).

NERSC is very good--the best, or in the top 2 in my experience. Others I have used:
NCSA

PCS
SDSC
TACC
Various teragrid sites (above and others)
Several European facilities (including CERN, Swiss Center for Supercomputing, NIKHEF in the Netherlands).

NERSC is the most useful computing center I have used.

Much better than HRLN (Hanover, Germany). On a different plane than our local computational center OSCER.

Centers used:
Stanford BioX cluster

Stanford Linear Accelerator Center (SLAC) Computing Farm
Local group workstations
NERSC is more professional and far more resourceful compared to the above centers. The main draw back is obviously the need to apply for computer time and limit on available computer time. Generally, we fully support an expansion of NERSC's
activities and an increase in their total computational resources.

I think NERSC compares very favorably to other computer centers (eg CCS at ORNL). There seems to be greater man-power available at NERSC to resolve consulting issues and any machine problems.

I can compare NERSC to the RHIC Computing Facility (RCF) and the MIT-LNS computing facility. I would say that NERSC compares very favorably to RCF, with better equipment, better people, better setup. NERSC compares similarly to the computing facility at MIT, but the MIT center was much smaller in scale.

NERSC is the best.

At present, I am very satisfied with NERSC computer resources, comparing to the following centers:
RCNP SX-5, Japan
Titech GRID, Japan

It works ! (eg. Columbia at Ames still can't perform large data reads !)

NERSC > ORNL as far as available computing resources are concerned

NERSC training is good for remote users, the remote class on jacquard let me participate without traveling.

I haven't had previous experience with large off-site storage systems, but my overall impressions are that the storage system, in particular the management and accounting is very well thought through and had most of the features I expected.

NERSC compares favorably to other centers I have used, such as the newer computing facilities at ORNL. Hardware resources seem to be greater at NERSC, resulting in much shorter queues; and more stable, with considerably less downtime. NERSC also excels in user support, where other facilities can seem comparatively short-staffed.

NERSC is superior to other places when it comes to consulting and web services.

Very happy with NERSC - as compared to NAVO and ERDC. The documentation at NERSC is considerably better than from NAVO and ERSC.

I am comparing with ORNL, LLNL, ESC, and HPLS.
NERSC has more stable systems and better consulting service. Significantly more software is available on the NERSC machines and always up to date.

NERSC is the number one center to me: LANL, CSCS, CERN

NERSC has generally been more responsive to the user than other centres, although recent demands that the user use large numbers of processors for his/her job have moved it away from that model. NERSC has tended to be more reasonable with disk allocations, although I would prefer that all disk allocations were considered permanent.

I would rate NERSC substantially higher than TeraGrid in terms of information available (web documentation, on-line allocation process, etc) and easy of use. In terms of resources I would say that in my experience they are comparable.

NERSC is still head and shoulders above all other pretenders. In making this ranking, I am judging in terms of both systems and personnel. NERSC has the mos usable hardware capacity, system stability, and HPSS stability. Complementing that are a staff that is knowledgeable, professional and available.
One often undermentioned aspect of NERSC, which I especially appreciate, is the approachability of management at NERSC (both mid-level and senior). NERSC management seems much better tied in with both the needs of the scientific community and the "realities of the world" than I experience at other centers.
I have used countless computing centers in my career (including numerous NSF and DOD centers, industry-specific centers, as well as other DOE centers) and I make my comparisons based on this experience. However, for my current project, much of my comparison is based on experiences at ORNL/CCS, at which I have also done a fair bit of large-scale computing. It saddens me to say this, but the ORNL organization seems to have neither the vision nor the technical savvy to accomplish the mission to which they aspire.

I can only compare Jacquard with our local Bewolf cluster. Jacquard is both faster and more reliable than what I use locally so it has been a key component in my work of these last few months.

  NERSC is the same as / mixed response:   12 responses

NERSC is similar to PSC and SDSC.

On the whole a good center. As good as the now defunct Pittsburgh center, which was truly outstanding in my opinion. Miles ahead of SDSC which I could never get to use without hassle.

The NERSC staff are very knowledgeable and professional. NERSC suffers somewhat from its own success, I find many of the resources to be oversubscribed.

RCF @ BNL: NERSC is a much friendlier and more 'open' place. I got the impression that rcf was messier, but it looks like they improved quite a bit. Due to disk access problems at NERSC (LBL RNC disks) I move part of my production back to rcf. Disk access seems to be more stable there.

its equally good with other facilities

I also use the Ohio Supercomputing Center (OSC) and the National Center for Supercomputing Applications (NCSA). Each has its advantages and disadvantages.
Comparing to Seaborg:
OSC: Slightly higher performance; shorter queue waits; cannot run on 1000+ procs.; Consulting not quite as good.
NCSA: Much higher performance (NCSA's Cobalt machine runs about 3 or 4 times faster than Seaborg in my experience.); Similar queue wait times; cannot run on 1000+ procs; Not as good about keeping users informed on updates/changes.
In my fantasy world, I would have NERSC upgrade Seaborg to get performance comparable to NCSA's Cobalt and reduce queue wait times a bit. But, I'm still quite happy even if that never happens. NERSC sets the standard for reliability and user relations, and allows for users to easily run on 1000+ processors.

NERSC (PDSF) is very easy to contact, know what is going on and accessible for question remarks as to operation. This is more so than at other facilities (RCF, Cern computer center). Performance is comparable between these centers, although centralized disk storage is more robust at RCF and Cern. Also, Cern seems to have a superior tape system (although I have no recent experience with it).

I have some experience with the NIC in Juelich. Here, the archive file system is transparent for the user. One can access migrated files as if they were stored on a regular disk of the computer. This is convenient. In terms of computational resources, documentation and support NERSC certainly compares very well.

User service at NERSC is better than ORNL-NCCS, but we have had more "uncharged" cycles from ORNL-NCCS. Don't know how new NCCS program will work.

The NERSC hardware and software are more stable that those at NCCS. However, the turnaround time at NCCS is much faster once the job get started.

NCSA , ARL (DoD) mostly less waiting time in queue at DoD but they are weaker at system configuration and keeping the compilers and libraries updated

SDSC has faster processors and usually better wait times than Seaborg.
LLNL computers have faster processors, but the wait times and queue structure is erratic. Jacquard should address the problem with the wait time for smaller jobs, making it more advantageous to use NERSC.

  NERSC is less good / negative response:   7 responses

Compared to Fermilab and SLAC, NERSC is terrible with regard to collaborative computing; NERSC seems entirely oriented toward single user computing which is unrealistic for most of my needs. SLAC's use of AFS is very effective; I miss that at NERSC.

Computing resources at NERSC is still small compared to other centers like NCSA and SDSC.

The system is not so stable recently as RCF at BNL.

The computing resources at SLAC. It's not what they do, it's what they don't do:
They don't over-allocate their machines, so we can get jobs running very soon.
A computing result from seaborg is only really useful if it can be obtained in about the same amount of time as it takes seaborg to produce it.
It's much better to have idle processors than idle people.

It is falling behind SDSC in hardware and batch turnaround.
Interactive computing is much better at SDSC.

A simpler and clearer allocation process, such as at NCSA, would be useful.

LLNL has much more in the way of resources.

  No comparison made:   6 responses

Seaborg is my first exposure to massively parallelized computing resources--I have nothing to compare it with.

I have some experience with the CCS at Oak Ridge, but not so much that I could really compare the centers.

I have not used other to compare them to.

I use no other centers.

no comments

Recently I have started using LLNL's Thunder and their staff has provided excellent and friendly assistance.

NERSC provides superior hardware computing capabilities, and professional user support and consulting. These two areas is where I see NERSC core strengths, here NERSC offers resources that are not easily matched by any local cluster or computer farm set up.

Speed, data accessibility

There is a distinct PROFESSIONALISM in the way NERSC conducts business. It should be a model for other centers

NERSC is important because it combines state of the art HW/SW but most important the combination of state of the art HW/SW with excellent first class consulting/collaboration.

Show Pagination