NERSCPowering Scientific Discovery for 50 Years

2010/2011 User Survey Results

Response Summary

A special thanks to the 411 users who responded to the 2011 survey, which was conducted from June 6-30, 2011. This represents a 13.1 percent response rate from the 3,130 users who had been active in the 12 months prior.  Your responses are important to us because they provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve.

The survey strives to be representative of all NERSC users. The hours used by the respondents represent about 71 percent of all MPP hours (used on Hopper, Franklin or Carver) used at the time the survey closed.  MPP respondents were classified according to their usage:

  • 60 respondnts had used over 1.5 million hours, generating a response rate of 77% from this community of "large MPP users".
  • 131 respondents had used between 100,000 and 1.5 million hours, generating a 38% response rate from the "medium MPP users"
  • 149 respondents had used fewer than 100,000 hours, generating a 14% response rate from the "small MPP users".
  • 70 respondents were not MPP users - they were either Principal Invesigators or project managers supervising the work of their NERSC users, or they were users of other NERSC resources, such as HPSS, PDSF, Euclid, or Dirac.

On this survey users scored satisfaction on a seven-point scale, where “1” is “very dissatisfied” and “7” indicates “very satisfied.”  The average satisfaction scores from this year's survey ranged from a high of 6.79 to a low of 5.16; the average score was 6.29.

Satisfaction
Score
MeaningNumber of
Times Selected
7 Very Satisfied 9,159
6 Mostly Satisfied 5,333
5 Somewhat Satisfied 1,280
4 Neutral 941
3 Somewhat Dissatisfied 210
2 Mostly Dissatisfied 62
1 Very Dissatisfied 42

For questions that spanned previous surveys, the change in scoring was tested for significance (using the t test at the 90% confidence level). Significant increases in satisfaction are shown in blue; significant decreases in satisfaction are shown in red.

Areas with Highest User Satisfaction

Areas with the highest user satisfaction are those with average scores of more than 6.5.  NERSC resurces and services with average scores in this range were:

  • Global homes, project and scratch
  • HPSS mass storage system
  • Account support and technical consulting
  • Services and Security
  • NERSC overall
  • Carver
  • NERSC's internal network

The top 6 of the 18 questions that scored over 6.5 are shown below.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2010
1234567
GLOBAL HOMES: Reliability       2 3 35 188 228 6.79 0.49 0.15
HPSS: Reliability (data integrity)       4 3 29 127 163 6.71 0.63 0.02
HPSS: Uptime (Availability)       3 5 31 127 166 6.70 0.62 0.05
GLOBAL HOMES: Uptime     1 3 6 46 173 229 6.69 0.63 0.09
PROJECT: Reliability       3 1 27 88 119 6.68 0.62 0.05
SERVICES: Account support   1   9 13 56 262 341 6.67 0.72 0.06

Areas with Lowest User Satisfaction

Areas with the lowest user satisfaction are those with average scores of less than 5.5.

ItemNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2010
1234567
FRANKLIN: Batch wait time 5 5 16 24 66 88 43 247 5.34 1.35 0.46
CARVER: Batch wait time   8 10 18 17 42 19 114 5.16 1.47 -0.65

Significant Increases in Satisfaction

16 questions scored significantly higher in 2011 compared with 2010.  NERSC has never before had so many increases in satisfaction!

Most of the significant improvements from 2010 were related to the Hopper transition from a small Cray XT5 to a large Cray XE6 (No. 5 on the November 2010 TOP500 list) and NERSC training initiatives.

The two lowest scores on the 2010 survey — Hopper and Franklin batch wait times — improved significantly in 2011, thanks to the new Hopper system.

ItemNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2010
1234567
PDSF SW: STAR       1 1 13 8 23 6.22 0.74 0.69
HOPPER: Batch wait time 2 3 7 13 44 114 77 260 5.86 1.13 0.67
TRAINING: NERSC classes   1   18 10 29 40 98 5.90 1.19 0.51
HOPPER: Ability to run interactively 1   2 23 19 52 97 194 6.11 1.14 0.49
FRANKLIN: Batch wait time 5 5 16 24 66 88 43 247 5.34 1.35 0.46
SERVICES: Ability to perform data analysis   1   8 11 52 46 118 6.13 0.94 0.40
SERVICES: Data analysis and visualization assistance       12 7 35 37 91 6.07 1.01 0.40
HOPPER: Overall 1 2   3 12 92 156 266 6.47 0.82 0.38
HOPPER: Batch queue structure 2 2 6 16 25 106 102 259 6.03 1.13 0.37
NERSC SW: Data analysis software       37 15 40 52 144 5.74 1.20 0.27
OVERALL: Available Computing Hardware     3 5 24 134 236 402 6.48 0.73 0.25
HOPPER: Uptime (Availability)   1 2 4 15 83 157 262 6.47 0.79 0.24
FRANKLIN: Batch queue structure 3   5 20 34 104 78 244 5.89 1.13 0.18
GLOBAL HOMES: Reliability       2 3 35 188 228 6.79 0.49 0.15
GLOBAL HOMES: Overall     1 2 12 58 164 237 6.61 0.66 0.14
OVERALL: Satisfaction with NERSC     3 5 14 134 251 407 6.54 0.69 0.14

Significant Decreases in Satisfaction

The largest decrease in satisfaction came from batch wait times on the Carver cluster.  NERSC plans to address this by increasing the size of the Carver system with hardware from the Magellan project, which will conclude in late 2011.

ItemNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2010
1234567
CARVER: Batch wait time   8 10 18 17 42 19 114 5.16 1.47 -0.65
PDSF: Uptime (availability)   2 1   2 14 19 38 6.16 1.28 -0.55

Satisfaction Patterns for Large, Medium and Small MPP Respondents

The MPP respondents were classified as "large" (if their usage was over 1.5 million hours), "medium" (usage between 100,000 and 1.5 million hours) and "small". Satisfaction differences between these three groups are shown in the table below.

The top increases in satisfaction for the large and medium MPP users were for Hopper, analytics, and training.  For the smaller MPP users the three top areas were the PDSF physics cluster, Franklin, and Hopper.

ItemAll Users:Large MPP Users:Medium MPP Users:Small MPP Users:
Avg ScoreNum RespAvg ScoreChange 2010Num RespAvg ScoreChange 2010Num RespAvg ScoreChange 2010
GLOBAL HOMES: Reliability 6.79 46 6.80 0.16 83 6.83 0.19 75 6.77 0.13
GLOBAL HOMES: Uptime 6.69 47 6.72 0.12 83 6.69 0.08 75 6.75 0.14
SERVICES: Account support 6.67 54 6.74 0.14 114 6.73 0.12 118 6.66 0.06
GLOBAL HOMES: Overall 6.61 49 6.55 0.08 86 6.64 0.16 76 6.67 0.17
OVERALL: Satisfaction with NERSC 6.54 60 6.80 0.40 130 6.53 0.13 148 6.47 0.08
OVERALL: Available Computing Hardware 6.48 60 6.62 0.39 129 6.50 0.27 148 6.48 0.25
HOPPER: Uptime (Availability) 6.47 55 6.69 0.46 107 6.38 0.15 86 6.43 0.20
HOPPER: Overall 6.47 55 6.62 0.53 109 6.41 0.33 88 6.44 0.36
WEB SERVICES: Accuracy of information 6.46 52 6.27 -0.10 90 6.58 0.21 108 6.48 0.11
CONSULT: Special requests (e.g. disk quota increases, etc.) 6.44 40 6.63 0.35 64 6.53 0.25 58 6.31 0.03
WEB: System Status Info 6.44 52 6.13 -0.38 93 6.57 0.06 109 6.50 -0.01
NERSC SW: Software environment 6.35 52 6.44 0.16 115 6.49 0.20 113 6.26 -0.03
PDSF SW: Software environment 6.35 1 6.00   0     9 6.67 0.44
WEB SERVICES: www.nersc.gov overall 6.35 56 6.34 -0.01 98 6.50 0.15 114 6.20 -0.15
WEB SERVICES: Timeliness of information 6.34 51 6.16 -0.13 87 6.49 0.21 107 6.33 0.04
NERSC SW: Applications software 6.33 45 6.44 0.23 105 6.40 0.18 102 6.25 0.04
PDSF SW: Programming libraries 6.25 1

6.00

  0     9 6.56 0.52
TRAINING: Web tutorials 6.24 27 6.22 0.17 57 6.44 0.38 49 6.12 007
TRAINING: New User's Guide 6.23 31 6.19 0.03 62 6.44 0.27 74 6.11 -0.06
PDSF SW: STAR 6.22 1 6.00   0     5 6.20 0.68
OVERALL: Mass storage facilities 6.18 56 6.41 0.24 108 6.16 -0.01 126 6.10 -0.08
SERVICES: Ability to perform data analysis 6.13 16 6.19 0.46 40 6.05 0.32 38 6.03 0.30
HOPPER: Ability to run interactively 6.11 41 6.44 0.82 74 5.97 0.35 65 6.06 0.44
WEB SERVICES: Ease of finding information 6.07 52 6.08 -0.02 95 6.29 0.20 112 5.93 -0.17
SERVICES: Data analysis and visualization assistance 6.07 12 6.17 0.50 33 6.21 0.55 30 5.87 0.20
HOPPER: Batch queue structure 6.03 56 6.13 0.46 106 6.04 0.37 84 6.05 0.38
HOPPER: Disk configuration and I/O performance 5.99 51 6.31 0.44 96 5.95 0.07 79 5.80 -0.08
TRAINING: NERSC classes 5.90 18 5.94 0.56 30 5.90 0.52 36 5.86 0.48
FRANKLIN: Batch queue structure 5.89 47 5.79 0.08 95 5.77 0.06 85 6.04 0.32
HOPPER: Batch wait time 5.86 56 6.02 0.83 106 5.93 0.74 84 5.77 0.58
NERSC SW: Visualization software 5.70 19 6.05 0.59 59 5.64 0.18 54 5.57 0.11
FRANKLIN: Batch wait time 5.34 48 5.29 0.42 95 5.18 0.31 86 5.47 0.59
CARVER: Batch wait time 5.16 18 4.89 -0.92 46 5.30 -0.51 41 5.02 -0.79

Survey Results Lead to Changes at NERSC

Every year we institute changes based on the previous year survey. In 2010 and early 2011 NERSC took a number of actions in response to suggestions from the 2009/2010 user survey.

On the 2009/2010 survey NERSC training workshops received the third lowest score, with an average satisfaction rating of 5.38 / 7.  In response, NERSC renewed its training efforts in 2010. In additional to its traditional training during the annual NERSC Users Group (NUG) Meeting, NERSC conducted a two-day workshop for Cray XE6 users at its facility in Oakland, joining with members of the Cielo team from Los Alamos National Laboratory and staff from Cray, Inc. Both the NUG training and the XE6 training were concurrently broadcast over the web. In addition, NERSC held a number of web-based training events (webinars) through 2010–2011. In all, NERSC put on eight events for its users from July 1, 2010 to June 30, 2011, with an aggregate attendance of about 375.   

NERSC’s users responded positively to the training classes as indicated by the satisfaction score increase of 0.51 points on the 2010/2011 User Survey. Additional surveys were conducted after each class, with 97.8% of respondents indicating that the training was “useful to me.”

Data analysis and visualization was another area which received  lower satisfaction ratings on the 2009/2010 survey.  In 2010 NERSC hired a new consultant to enhance NERSC’s visualization and data analysis software and services. A significant accomplishment was the robust implementation of an NX server that enabled remote X-Windows based graphical software. NERSC aggressively publicized this new service to its users and held training sessions. NERSC also re-organized the analytics materials on the new web site. As a result of these efforts, users’ satisfaction as measured by the 2010/2011 user survey increased significantly for three data analysis and visualization ratings.

In 2010 NERSC also replaced its data analysis/visualization machine, DaVinci, with a new Sun Sunfire platform, Euclid. User satisfaction with the new system was evident in the user survey results, with Euclid scoring a 6.10/7 satisfaction score, an increase of 0.27 points over DaVinci’s 2009/2010 rating.

Respondent Demographics

Respondents by DOE Office

OfficeRespondentsPercent
ASCR 42 10.2%
BER 55 13.4%
BES 149 36.3%
FES 59 14.4%
HEP 52 12.7%
NP 54 12.9%

Respondents by Project Class

Project ClassRespondentsPercent
DOE Base 312 75.9%
SciDAC 74 18.0%
Startup 10 2.4%
NISE 7 1.7%
Education 4 1.0%
ALCC 3 0.7%
CSGF 1 0.2%

Respondents by User Role

User RoleRespondentsPercent
Principal Investigators 70 17.0%
PI Proxies 69 16.8%
Users 272 66.2%

Respondents by Type of Organization

Organization TypeRespondentsPercent
Universities 251 61.1%
DOE Labs 114 27.7%
Other Govt Labs 23 5.6%
Industry 23 5.6%

Respondents from the Most Represented Organizations

OrganizationRespondentsPercent
Berkeley Lab 46 11.2%
UC Berkeley 27 6.6%
Tech-X Corp 16 3.9%
Oak Ridge 13 3.2%
U. Washington 12 2.9%
Cal Tech 11 2.7%
PNNL 7 1.7%
PPPL 7 1.7%
University of Wisconsin - Madison 7 1.7%
Brookhaven Lab 6 1.5%
University of Maryland 6 1.5%
Vanderbilt University 6 1.5%
Argonne Lab 5 1.2%
Auburn University 5 1.2%
Livermore Lab 5 1.2%
MIT 5 1.2%
NCAR 5 1.2%
NREL 5 1.2%
General Atomics 4 1.0%
Los Alamos Lab 4 1.0%
Princeton University 4 1.0%
Sandia Lab CA 4 1.0%
UC Santa Barbara 4 1.0%
Yale University 4 1.0%

How Long Have You Used NERSC?

TimeRespondentsPercent
less than 1 year 110 27.6%
1 - 3 years 121 30.4%
more than 3 years 167 42.0%

What Desktop System Do You Use to Connect to NERSC?

SystemRespondents
Unix Total 280
Mac Total 191
PC Total 189
Linux 267
OS X 191
Windows 7 101
Windows XP 67
Windows Vista 21
FreeBSD 4
Sun Solaris 4
Other Unix 3
MacOS 1
Other Mac 1

Score Legend

Satisfaction Score Legend

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied - High 6.00 - 6.49
Mostly Satisfied - Low 5.50 - 5.99
Somewhat Satisfied 4.50 - 5.49

Importance Score Legend

ImportanceAverage Score
Very Important 2.50 - 3.00
Somewhat Important 1.50 - 2.49

Significance of Change Legend

Significance of Change
significant increase
significant decrease
not significant

Satisfaction and Importance Scores

The average Overall Satisfaction with NERSC score for 2011 (6.54 / 7) was the highest ever recorded in the 13 years the survey has been in its current form. In 2011 NERSC increased its consulting staff and put a large, stable Hopper system into production.

Overall Satisfaction with NERSC

410 of the 411 respondents answered questions in this section.  The average score was 6.39 / 7.

 

Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2010Change from 2009
1234567
OVERALL: Services     4 10 12 89 279 394 6.60 0.76 0.03  
OVERALL: Satisfaction with NERSC     3 5 14 134 251 407 6.54 0.69 0.14 0.31
OVERALL: Available Computing Hardware     3 5 24 134 236 402 6.48 0.73 0.25 0.48
OVERALL: Mass storage facilities 1   4 28 37 105 173 348 6.18 1.03 0.01 0.12
OVERALL: Available Software   3 3 32 22 134 143 337 6.11 1.04 0.01 -0.10

All Satisfaction Topics

Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2010Change from 2009
1234567
GLOBAL HOMES: Reliability       2 3 35 188 228 6.79 0.49 0.15  
HPSS: Reliability (data integrity)       4 3 29 127 163 6.71 0.63 0.02 0.04
HPSS: Uptime (Availability)       3 5 31 127 166 6.70 0.62 0.05 0.07
GLOBAL HOMES: Uptime     1 3 6 46 173 229 6.69 0.63 0.09  
PROJECT: Reliability       3 1 27 88 119 6.68 0.62 0.05 0.13
SERVICES: Account support   1   9 13 56 262 341 6.67 0.72 0.06 0.01
PROJECT: Overall       3 4 32 88 127 6.61 0.67 0.04 0.31
GLOBAL HOMES: Overall     1 2 12 58 164 237 6.61 0.66 0.14  
OVERALL: Services     4 10 12 89 279 394 6.60 0.76 0.03  
CONSULT: Overall   2 1 8 12 62 212 297 6.58 0.81 -0.06 0.06
PROJECT: Uptime 2     2 2 29 87 122 6.58 0.93 0.00 0.03
GLOBAL SCRATCH: Reliability     2 6 5 32 116 161 6.58 0.83    
CONSULT: Response time     1 8 14 72 200 295 6.57 0.74 -0.03 -0.04
OVERALL: Security 1   1 9 14 70 215 310 6.56 0.80 0.01 0.17
OVERALL: Satisfaction with NERSC     3 5 14 134 251 407 6.54 0.69 0.14 0.31
CONSULT: Quality of technical advice 1   2 10 14 66 201 294 6.53 0.86 -0.04 0.05
CARVER: Uptime (Availability)       4 5 33 74 116 6.53 0.74 0.14  
NETWORK: Network performance within NERSC (e.g. Seaborg to HPSS)     2 4 9 60 130 205 6.52 0.76 -0.03 0.01
GLOBAL SCRATCH: Uptime 1   3 7 3 40 111 165 6.48 0.97    
OVERALL: Available Computing Hardware     3 5 24 134 236 402 6.48 0.73 0.25 0.48
HOPPER: Uptime (Availability)   1 2 4 15 83 157 262 6.47 0.79 0.24  
HOPPER: Overall 1 2   3 12 92 156 266 6.47 0.82 0.38  
GRID: Access and Authentication     1 3 3 21 47 75 6.47 0.86 -0.03 0.03
WEB SERVICES: Accuracy of information   1 2 6 14 98 171 292 6.46 0.79 0.09 0.14
CONSULT: Special requests (e.g. disk quota increases, etc.)       18 10 31 129 188 6.44 0.97 0.16 0.12
WEB: System Status Info 2 2 2 8 14 86 184 298 6.44 0.96 -0.08  
GRID: File Transfer       2 5 25 37 69 6.41 0.75 0.02 0.12
CONSULT: Time to solution   2 1 13 20 80 171 287 6.40 0.91 0.00 0.01
GLOBAL SCRATCH: Overall     3 6 13 48 102 172 6.40 0.90    
PDSF: Batch queue structure       2 1 14 19 36 6.39 0.80 0.16 0.18
GLOBAL HOMES: File and Directory Operations   1 2 12 15 58 129 217 6.37 0.95 0.04  
PROJECT: File and Directory Operations   1   9 4 34 68 116 6.36 0.97 -0.05 0.16
GLOBAL HOMES: I/O Bandwidth     1 12 18 63 123 217 6.36 0.89    
NERSC SW: Programming libraries   1 5 10 13 117 162 308 6.36 0.88 0.09 0.03
NERSC SW: Software environment   1 3 12 14 132 168 330 6.35 0.84 0.07 0.00
SERVICES: Allocations process 1     17 21 89 163 291 6.35 0.91 0.09 0.32
PDSF: Ability to run interactively     1 1 2 11 19 34 6.35 0.95 0.10 0.20
PDSF SW: Software environment       2 1 14 17 34 6.35 0.81 0.13 -0.07
WEB SERVICES: NIM web interface 1   1 12 30 111 177 332 6.35 0.87 -0.03 -0.04
WEB SERVICES: www.nersc.gov overall     4 7 20 129 155 315 6.35 0.80 -0.01 0.07
PROJECT: I/O Bandwidth       6 9 40 61 116 6.34 0.84 -0.03 0.11
WEB SERVICES: Timeliness of information   1 1 9 21 112 145 289 6.34 0.82 0.06 0.14
DIRAC: Uptime (Availability)     1 3 1 7 21 33 6.33 1.11    
NERSC SW: Applications software   1   11 21 118 144 295 6.33 0.81 0.11 0.23
GRID: Job Monitoring       5 5 22 37 69 6.32 0.90 0.01 -0.25
GRID: Job Submission       6 4 22 38 70 6.31 0.93 -0.10 -0.18
DIRAC: Overall     1 3 2 5 21 32 6.31 1.15    
CARVER: Overall       3 11 51 54 119 6.31 0.75 -0.04  
HPSS: Data transfer rates   1 5 7 13 45 96 167 6.30 1.05 -0.02 0.05
HPSS: Overall satisfaction 2 1 4 6 10 56 102 181 6.30 1.11 -0.16 -0.14
CONSULT: On-line help desk   3   17 8 31 101 160 6.29 1.15 -0.03 -0.06
EUCLID: Uptime (Availability)       6 2 11 27 46 6.28 1.05    
PDSF SW: General tools and utilities       1 3 16 14 34 6.26 0.75 0.06 0.12
PDSF SW: Programming libraries       1 3 15 13 32 6.25 0.76 0.22 -0.09
GLOBAL SCRATCH: I/O Bandwidth     3 12 13 48 85 161 6.24 1.01    
TRAINING: Web tutorials       9 19 54 74 156 6.24 0.88 0.18 0.20
GLOBAL SCRATCH: File and Directory Operations   1 7 9 7 46 87 157 6.24 1.13    
TRAINING: New User's Guide     2 6 27 71 90 196 6.23 0.87 0.06 0.09
PDSF SW: STAR       1 1 13 8 25 6.22 0.74 0.69 0.03
NETWORK: Remote network performance to/from NERSC (e.g. Seaborg to your home institution) 1 2 3 8 27 98 121 260 6.22 0.99 0.11 0.07
HPSS: Data access time     5 6 18 54 80 163 6.21 0.99 -0.05 -0.14
EUCLID: Ability to run interactively     2 5 2 9 26 44 6.18 1.23  
OVERALL: Mass storage facilities 1   4 28 37 105 173 348 6.18 1.03 0.01 0.12
PDSF: Overall satisfaction     2 2   19 17 40 6.18 1.03 -0.23 -0.11
PDSF: Uptime (availability)   2 1 2   14 19 38 6.16 1.28 -0.55 -0.20
PDSF SW: CHOS       2 3 11 11 27 6.15 0.91 0.22 0.03
PDSF SW: Performance and debugging tools       2 1 19 9 31 6.13 0.76 -0.12 0.17
SERVICES: Ability to perform data analysis   1   8 11 52 46 118 6.13 0.94 0.40 0.14
HOPPER: Ability to run interactively 1   2 23 19 52 97 194 6.11 1.14 0.49  
OVERALL: Available Software   3 3 32 22 134 143 337 6.11 1.04 0.01 -0.10
FRANKLIN: Overall   3 7 7 24 114 98 253 6.11 1.01 -0.01 0.36
EUCLID: Overall       7 4 14 23 48 6.10 1.08    
WEB SERVICES: Ease of finding information 1 1 10 12 35 126 120 305 6.07 1.05 -0.03 0.12
SERVICES: Data analysis and visualization assistance       12 7 35 37 91 6.07 1.01 0.40 0.23
CARVER: Disk configuration and I/O performance 1   1 16 6 29 51 104 6.05 1.23 -0.02  
HOPPER: Batch queue structure 2 2 6 16 25 106 102 259 6.03 1.13 0.37  
FRANKLIN: Uptime (Availability)   2 9 12 29 106 90 248 6.01 1.06 0.02 1.10
FRANKLIN: Ability to run interactively 2     19 24 60 72 177 6.00 1.12 0.06 0.24
CARVER: Ability to run interactively 1   3 10 6 25 40 85 6.00 1.28 0.08  
HOPPER: Disk configuration and I/O performance   1 5 26 28 83 97 240 5.99 1.10 0.12  
EUCLID: Disk configuration and I/O performance     1 7 5 11 21 45 5.98 1.20    
DIRAC: Ability to run interactively 1     5 3 6 17 32 5.97 1.45  
NERSC SW: Performance and debugging tools 1   2 27 34 83 86 233 5.94 1.08 0.04  
DIRAC: Disk configuration and I/O performance       8 1 4 15 28 5.93 1.33    
PDSF SW: Applications software       4   17 6 27 5.93 0.92 -0.04 -0.31
TRAINING: NERSC classes   1   18 10 29 40 98 5.90 1.19 0.51 0.30
FRANKLIN: Disk configuration and I/O performance 2 1 3 27 29 76 83 221 5.90 1.19 -0.06 0.29
FRANKLIN: Batch queue structure 3   5 20 34 104 78 244 5.89 1.13 0.18 -0.01
HOPPER: Batch wait time 2 3 7 13 44 114 77 260 5.86 1.13 0.67  
PDSF: Disk configuration and I/O performance   2 1 1 3 17 11 35 5.86 1.31 -0.40 -0.09
CARVER: Batch queue structure   1 4 15 14 42 37 113 5.80 1.19 -0.12  
WEB SERVICES: Searching   1 2 28 41 61 65 198 5.79 1.11 0.02 0.11
HPSS: User interface (hsi, pftp, ftp) 5   17 14 15 42 79 172 5.77 1.56 -0.17 -0.25
NERSC SW: Data analysis software       37 15 40 52 144 5.74 1.20 0.27 -0.10
NERSC SW: Visualization software 1   3 35 19 38 57 153 5.70 1.29 0.23 -0.21
FRANKLIN: Batch wait time 5 5 16 24 66 88 43 247 5.34 1.35 0.46 -0.21
CARVER: Batch wait time   8 10 18 17 42 19 114 5.16 1.47 -0.65  

How Important To You Is?

Importance Ratings: 3=Very important, 2=Somewhat important, 1=Not important

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
OVERALL: Available Computing Hardware   27 360 387 2.93 0.26
OVERALL: Satisfaction with NERSC 3 49 338 390 2.86 0.37
OVERALL: Services 7 118 260 385 2.66 0.51
OVERALL: Mass storage facilities 40 144 182 366 2.39 0.68
SERVICES: Ability to perform data analysis 28 40 86 154 2.38 0.78
OVERALL: Available Software 44 135 171 350 2.36 0.70
SERVICES: Data analysis and visualization assistance 38 44 60 142 2.15 0.82

HPC Resources

397 of the 411 respondents answered questions in this section.  The average score was 6.23 / 7.

 

 

 

 

 

Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2010
1234567
GLOBAL HOMES: Reliability       2 3 35 188 228 6.79 0.49 0.15
HPSS: Reliability (data integrity)       4 3 29 127 163 6.71 0.63 0.02
HPSS: Uptime (Availability)       3 5 31 127 166 6.70 0.62 0.05
GLOBAL HOMES: Uptime     1 3 6 46 173 229 6.69 0.63 0.09
PROJECT: Reliability       3 1 27 88 119 6.68 0.62 0.05
PROJECT: Overall       3 4 32 88 127 6.61 0.67 0.04
GLOBAL HOMES: Overall     1 2 12 58 164 237 6.61 0.66 0.14
PROJECT: Uptime 2     2 2 29 87 122 6.58 0.93 0.00
GLOBAL SCRATCH: Reliability     2 6 5 32 116 161 6.58 0.83  
CARVER: Uptime (Availability)       4 5 33 74 116 6.53 0.74 0.14
NETWORK: Network performance within NERSC (e.g. Seaborg to HPSS)     2 4 9 60 130 205 6.52 0.76 -0.03
GLOBAL SCRATCH: Uptime 1   3 7 3 40 111 165 6.48 0.97  
HOPPER: Uptime (Availability)   1 2 4 15 83 157 262 6.47 0.79 0.24
HOPPER: Overall 1 2   3 12 92 156 266 6.47 0.82 0.38
GRID: Access and Authentication     1 3 3 21 47 75 6.47 0.86 -0.03
GRID: File Transfer       2 5 25 37 69 6.41 0.75 0.02
GLOBAL SCRATCH: Overall     3 6 13 48 102 172 6.40 0.90  
GLOBAL HOMES: File and Directory Operations   1 2 12 15 58 129 217 6.37 0.95 0.04
PROJECT: File and Directory Operations   1   9 4 34 68 116 6.36 0.97 -0.05
GLOBAL HOMES: I/O Bandwidth     1 12 18 63 123 217 6.36 0.89  
PROJECT: I/O Bandwidth       6 9 40 61 116 6.34 0.84 -0.03
DIRAC: Uptime (Availability)     1 3 1 7 21 33 6.33 1.11  
GRID: Job Monitoring       5 5 22 37 69 6.32 0.90 0.01
GRID: Job Submission       6 4 22 38 70 6.31 0.93 -0.10
DIRAC: Overall     1 3 2 5 21 32 6.31 1.15  
CARVER: Overall       3 11 51 54 119 6.31 0.75 -0.04
HPSS: Data transfer rates   1 5 7 13 45 96 167 6.30 1.05 -0.02
HPSS: Overall satisfaction 2 1 4 6 10 56 102 181 6.30 1.11 -0.16
EUCLID: Uptime (Availability)       6 2 11 27 46 6.28 1.05  
PDSF: Batch queue structure       4 1 14 19 38 6.26 0.95 0.04
GLOBAL SCRATCH: I/O Bandwidth     3 12 13 48 85 161 6.24 1.01  
GLOBAL SCRATCH: File and Directory Operations   1 7 9 7 46 87 157 6.24 1.13  
PDSF: Ability to run interactively     1 3 2 11 19 36 6.22 1.07 -0.13
PDSF SW: Software environment       4 1 14 17 36 6.22 0.96 -0.00
NETWORK: Remote network performance to/from NERSC (e.g. Seaborg to your home institution) 1 2 3 8 27 98 121 260 6.22 0.99 0.11
HPSS: Data access time     5 6 18 54 80 163 6.21 0.99 -0.05
EUCLID: Ability to run interactively     2 5 2 9 26 44 6.18 1.23  
PDSF SW: General tools and utilities       3 3 16 14 36 6.14 0.90 -0.07
PDSF SW: Programming libraries       3 3 15 13 34 6.12 0.91 0.08
HOPPER: Ability to run interactively 1   2 23 19 52 97 194 6.11 1.14 0.49
FRANKLIN: Overall   3 7 7 24 114 98 253 6.11 1.01 -0.01
EUCLID: Overall       7 4 14 23 48 6.10 1.08  
PDSF: Overall satisfaction     2 4   19 17 42 6.07 1.11 -0.33
PDSF: Uptime (availability)   2 1 2 2 14 19 40 6.05 1.34 -0.66
CARVER: Disk configuration and I/O performance 1   1 16 6 29 51 104 6.05 1.23 -0.02
PDSF SW: STAR       3 1 13 8 25 6.04 0.93 0.52
HOPPER: Batch queue structure 2 2 6 16 25 106 102 259 6.03 1.13 0.37
FRANKLIN: Uptime (Availability)   2 9 12 29 106 90 248 6.01 1.06 0.02
FRANKLIN: Ability to run interactively 2     19 24 60 72 177 6.00 1.12 0.06
CARVER: Ability to run interactively 1   3 10 6 25 40 85 6.00 1.28 0.08
PDSF SW: CHOS       4 3 11 11 29 6.00 1.04 0.07
PDSF SW: Performance and debugging tools       4 1 19 9 33 6.00 0.90 -0.25
HOPPER: Disk configuration and I/O performance   1 5 26 28 83 97 240 5.99 1.10 0.12
EUCLID: Disk configuration and I/O performance     1 7 5 11 21 45 5.98 1.20  
DIRAC: Ability to run interactively 1     5 3 6 17 32 5.97 1.45  
DIRAC: Disk configuration and I/O performance       8 1 4 15 28 5.93 1.33  
FRANKLIN: Disk configuration and I/O performance 2 1 3 27 29 76 83 221 5.90 1.19 -0.06
FRANKLIN: Batch queue structure 3   5 20 34 104 78 244 5.89 1.13 0.18
HOPPER: Batch wait time 2 3 7 13 44 114 77 260 5.86 1.13 0.67
CARVER: Batch queue structure   1 4 15 14 42 37 113 5.80 1.19 -0.12
PDSF SW: Applications software       6   17 6 29 5.79 1.01 -0.17
HPSS: User interface (hsi, pftp, ftp) 5   17 14 15 42 79 172 5.77 1.56 -0.17
PDSF: Disk configuration and I/O performance   2 1 3 3 17 11 37 5.76 1.34 -0.50
FRANKLIN: Batch wait time 5 5 16 24 66 88 43 247 5.34 1.35 0.46
CARVER: Batch wait time   8 10 18 17 42 19 114 5.16 1.47 -0.6

Software

390 of the 411 respondents answered questions in this section.  The average score was 6.16 / 7.

 

 

 

 

 

Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2010
1234567
NERSC SW: Programming libraries   1 5 10 13 117 162 308 6.36 0.88 0.09
NERSC SW: Software environment   1 3 12 14 132 168 330 6.35 0.84 0.07
NERSC SW: Applications software   1   11 21 118 144 295 6.33 0.81 0.11
NERSC SW: Performance and debugging tools 1   2 27 34 83 86 233 5.94 1.08 0.04
NERSC SW: Data analysis software       37 15 40 52 144 5.74 1.20 0.27
NERSC SW: Visualization software 1   3 35 19 38 57 153 5.70 1.29 0.23

Services

379 of the 411 respondents answered questions in this section.

Satisfaction with NERSC Services

The average score was 6.37 / 7.

 

 

 

Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2010
1234567
SERVICES: Account support   1   9 13 56 262 341 6.67 0.72 0.06
CONSULT: Overall   2 1 8 12 62 212 297 6.58 0.81 -0.06
CONSULT: Response time     1 8 14 72 200 295 6.57 0.74 -0.03
OVERALL: Security 1   1 9 14 70 215 310 6.56 0.80 0.01
CONSULT: Quality of technical advice 1   2 10 14 66 201 294 6.53 0.86 -0.04
WEB SERVICES: Accuracy of information   1 2 6 14 98 171 292 6.46 0.79 0.09
CONSULT: Special requests (e.g. disk quota increases, etc.)       18 10 31 129 188 6.44 0.97 0.16
WEB: System Status Info 2 2 2 8 14 86 184 298 6.44 0.96 -0.08
CONSULT: Time to solution   2 1 13 20 80 171 287 6.40 0.91 0.00
SERVICES: Allocations process 1     17 21 89 163 291 6.35 0.91 0.09
WEB SERVICES: NIM web interface 1   1 12 30 111 177 332 6.35 0.87 -0.03
WEB SERVICES: www.nersc.gov overall     4 7 20 129 155 315 6.35 0.80 -0.01
WEB SERVICES: Timeliness of information   1 1 9 21 112 145 289 6.34 0.82 0.06
CONSULT: On-line help desk   3   17 8 31 101 160 6.29 1.15 -0.03
TRAINING: Web tutorials       9 19 54 74 156 6.24 0.88 0.18
TRAINING: New User's Guide     2 6 27 71 90 196 6.23 0.87 0.06
SERVICES: Ability to perform data analysis   1   8 11 52 46 118 6.13 0.94 0.40
WEB SERVICES: Ease of finding information 1 1 10 12 35 126 120 305 6.07 1.05 -0.03
SERVICES: Data analysis and visualization assistance       12 7 35 37 91 6.07 1.01 0.40
TRAINING: NERSC classes   1   18 10 29 40 98 5.90 1.19 0.51
WEB SERVICES: Searching   1 2 28 41 61 65 198 5.79 1.11 0.02

How Useful Are These Services To You?

Usefulness Ratings: 3=Very useful, 2=Somewhat useful, 1=Not useful

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
WEB: System Status Info 5 84 225 2.70 0.49
SERVICES: E-mail lists 6 92 227 2.68 0.51
TRAINING: New User's Guide 8 51 136 2.66 0.56
TRAINING: Web tutorials 17 55 110 2.51 0.66
MOTD (Message of the Day) 21 111 179 2.51 0.62
TRAINING: NERSC classes 30 61 52 2.15 0.74

Are You Adequately Informed About NERSC Changes?

Yes 273 97.8%
No 6 2.2%

How Important Are Analytics Services?

Importance Ratings: 3=Very important, 2=Somewhat important, 1=Not important

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
SERVICES: Ability to perform data analysis 28 40 86 154 2.38 0.78
SERVICES: Data analysis and visualization assistance 38 44 60 142 2.15 0.82

Where Do You Perform Data Analysis and Visualization of Data Produced at NERSC?

All at NERSC 20 5.7%
Most at NERSC 50 14.4%
Half at NERSC, half elsewhere 58 16.7%
Most elsewhere 96 27.6%
All elsewhere 105 30.2%
I don't need data analysis or visualization 19 5.5%

Analytics Comments

Software Comments

It would be very useful to have better interactive access to data on local $SCRATCH systems for VisIt use. I create large data sets on Hopper and Franklin local scratch space. I'd like to perform VisIt analysis in place, so I don't need to move several TB of data to global scratch. But I can only get 30 minutes of interactive time on the large systems, and even then my job might not start promptly -- not useful if I'm not available to use it when it does start. It would be very useful to allow Euclid or some other interactive system to access Hopper and Franklin $SCRATCH, or perhaps have special interactive queues with high priority and a wall-clock limit of at least several hours, limiting the number of cores if necessary.

I'd like to be able to use VisIt with data on Hopper, but when I start a job, I cannot see my job in the queue. So instead I move my data to global scratch & use VisIt on Euclid w/ NX. NX works great, but it could be better (faster) to use Hopper's multiple processors to render images.

I have only just begin to use VisIt software, and haven't used the distributed computing features yet. As I get familiar with it I will have a better understanding just what my needs are and how well they will be met.

Not a complaint; I very much like the new NX server and use it all the time. I don't mind that I can only use FVWM2 as my window manager. But I am a pretty light-weight visualizer (mostly 2D graphs).

The connection via X or NX is just too slow.

More matlab licenses would be nice.

NERSC doubled the number of Matlab licenses this summer (2011). 

I think it is good to have Mathematica and IDL available, though I used python locally in the recent time for visualization. If I should have to deal with mathematically more tricky issues than right now, I would use Mathematica again. IDL is my backup solution if I get too frustrated with the nesting structure and confusing documentation of python's matplotlib.

t would be nice if the (python library + ) gui software p4vasp was installed on carver & hopper. This + the new NX would be fantastic and save me a lot of time transferring large files.

would like to have ParaView running in client/server mode on Hopper

I use homegrown data analysis software written in IDL. For a short period during the earlier part of this year, there seemed to be too few IDL licenses and I was waiting quite some time to use it, but the problem seemed to clear up. It's fine now.

My main concerns have been:
1) I use R and some specific R packages and a couple of times in the past years I haven't been able to access either R or a package that at one point was available and then became unavailable when I tried to go back to a project a couple months later. Daniela has been very helpful in resolving these issues, but it's not clear to me why the system setup is not stable.
2) For jobs requiring more than about 8 nodes, wait times for batch jobs can be fairly long (a day or more)

My data is trivial to analyze or I use gnuplot. I'm high-tech :-)

Other Comments

I feel like the I/O performance isn't good enough to process my huge datasets, or maybe I haven't found the correct machine to do this on. Euclid is way over subscribed...

I am usually over my quota for scratch and have to spend most of my time moving data from scratch to analyze/visualize with my local tools so that I can submit new jobs to the queue. I probably need to begin using HPSS.

My work involves porting and benchmarking. So my use of NERSC machines doesn't require data analysis or visualization tools.

We could use some more interactive nodes on PDSF. Right now there are only 4 nodes and they are often very loaded.

Not enough available time on PDSF.

My need is being met but I want to explain why anyway. I write my own post-processing tools. Because of the size of the data that needs to be in memory and problems with transferring hundreds of TB off-site, I analyze them at NERSC, typically at lower concurrency than the runs themselves. The post-processing is io and memory intensive while the computation is cpu intensive. The post-processing is developed iteratively (there is no canned analysis). The debug queue is excellent for developing the required tools on a reduced set of data. It would be difficult to imagine doing this work without the debug queue. Then later, when everything is ready, I will process a whole data set in the normal queue.

Have not done much visualization yet. Expect to do this in future.

I need more help getting software up and running (but I just got a nersc account, so I plan on contacting people to get help).

NERSC supercomputers are the best computers I have used. I most used Hopper to run my VASP jobs this year. It would be perfect if Hopper has a queue with 48-hour walltime limit. Or even 36-hour limit. Because some of my jobs could not finish in 24 hours (reg_small) with 1 node. Using 2 nodes for a VASP job is waste of resource because VASP does not perform very well with 48 cores. Anyway, job well done you guys.

I think I am use to work with my local tools and I am satisfied at the moment. More information about the potentialities of working on the NERSC machines could be beneficial. I honestly cannot say that these information is not available since I did not feel the need to look at this possibility.

I'm not really aware of what is available to me, but that is probably my fault. I should check out what NERSC has in this avenue.

Can't reliably read long, broad time series data from netCDF files, which is bizarre. No problem after copying same files to Mac.

Mine is a simple preference to analyze my data on my home system.

I do use compute nodes to produce that intermediate gaussian cube files I use for much of my analysis, these are then used for analysis of multipole moments and visualization in xcrysden. If you could come up with something that could to fast direct from plane wave based wavefunction visualization that would be awesome. Right now a great deal of disk grinding is necessary to go from reciprical space (10's of gigs) to a realspace grid of voxel based density data (100's megs) that is then pretty trivial to display (but still slow with tools like xcrysden).

Due to the large size and amount of data generated from my computation, I have to perform most post-processing, data analysis and visualization in my account at NERSC. Unfortunately, transferring data from the scratch space of the computing machines such as Franklin and Hopper to the analysis/visualization machine such as Euclid has become increasingly inconvenient and inefficient. The ability to perform data analysis/visualization at NERSC has been strongly comprised. It would be highly desirable to have a shared, global scratch file system that allows simultaneous accesses from both the computing machines and the analysis/visualization machine, just as the way the $HOME directory has been setup.

I have no problems with the data analysis and visuaslization facilities at NERSC; since I am not using visualization techniques at present. My data analysis is either built in our codes or it is very simple and can be done easily without using data analysis available at NERSC. Thanks for asking.

What Additional Services or Information should be on the NERSC Web Site?

Suggestions for New Services

- Mobile optimized webpage/MyNersc
- Portal like ability to see files, run/manage jobs - backup to hpss etc...
- Detailed Usage Statistics and Analytics (i.e. google analytics type info)

NERSC is working to create a mobile portal where users can see remaining balance, check the queues and view the MOTD.  Stay tuned.

I would like the message of the day to be available as an RSS feed, at least when there is a situation other than normal. Currently I must log in or check the website to know if there is a problem.

Just for fun - it would be cool to make a javascript widget with a message like "I have used x hours at NERSC". HPC people could add it to their websites to advertise NERSC usage.

Twitter feed for machine status/downtimes/uptimes?

Status Information

For the system availability page, more accuracy in when Franklin is up or down would be helpful, as would a projection as to when it is likely to come back up.

It would be more convenient if having an extra column "Idle Cores" on NERSC webpage "For Users" --> "System Status". There are four columns in the present page: System, Status, Jobs Running, and Cores in Use. Currently, "idle cores" can only be obtained after logged in.

My biggest complaint with the NERSC website is that the machine status page is often out of date. Also it is missing two important pieces of info:
1) If the machine is up, when is the next scheduled downtime? It helps me to plan backfill jobs.
2) If the machine is down, when is the estimated up time? That helps me manage workflow around the downtime.

I find that the amount of time that elapses between, say, a login session becoming unresponsive and the webpage telling me that, yes, something is wrong with the machine I'm on, can take longer than I would like. I understand it takes time to detect a problem, but sometimes it can take 10-20 minutes before I understand that yes, that machine will be down for a while.

Overall the web pages are kept up to date very well, but I have noticed a significant lag between when systems become unexpectedly unavailable and when that information is propagated onto the system status page, the MOTD etc. I find this frustrating.

We aim to update the MOTD as soon as we can confirm a problem.  With large systems such as Hopper, one component of the system can get jammed, such as the file system or scheduler, but the system will often recover on its own.  Other times we notice a system is showing signs of trouble and we are actively working to make sure it stays up.  We will try to make notes on the status page when we are experiencing system trouble, but aren't in a clear downtime.

 

Job Information

A more straightforward access to the queues. It always seems buried in "For Users". There should be an option under "For Users" that says "Global Queue Look". This is just my opinion.

The structure of the website makes it hard to quickly find some essential info, such as queue policies on each system.

A number of users gave us the same feedback.  Queues can now be easily found under "For Users".  Thanks for the feedback.

Accurate showstart information for my jobs

In general, I think the website has enough services (although I think a link to NERSC Information Management may be absent or very difficult to find), but some commonly used features are difficult to navigate to. In particular, the queue information page (which I frequently use) is very well designed, but the link to it is small and not prominent. Also, the login sequence to see this and other secure pages is strange:
1) Click original link.
2) "You must login" page with "login" button appears. Press button.
3) Login page
4) Final page.
Step (2) seems unnecessary.

Documentation

More vendor documentation.

I really like the fact that the "Training and Tutorials" section gives a link to an HPC course given outside of NERSC. It would be great if more links to external resources were included. It would make NERSC a "one-stop-shop" for HPC resources and training. The downside is the difficulty in maintaining those links though...

I would like the software documentation to be current.

- A simple and clean summary of all resources much like the Teragrid would be great: https://portal.teragrid.org/systems-monitor
- I know the system status page (http://www.nersc.gov/users/live-status/) provides some of that information, but knowing how many jobs remain in the queue helps gauge the wait time.

- More links to web resources for the novice programmer.
- Additionally, while available software is clearly listed, it would help to be able to sort by the kinds of mathematical/visualization tools provided by the different software packages.

I would like to have some documentation on POSIX access control lists, and which NERSC systems they're available for -- even if the documentation is only a set of links to online man pages. I've been using UNIX for nearly 20 years but didn't know that these existed.

When running batch jobs, it is best to give the shell which is able to run program on.

Even though I have scripts and generators for them that work on Franklin the docs for Hopper have been insufficient for me to port them successfully. More detail on the PBS configuration on the machines would be helpful. But that's what consult is for.

probably a bit easier navigation on the site and more homogeneous description of software at different platforms, and links to external sources and publications would be useful

More information about cloud computing resources and how one can access them would be useful as well.

Searching / Finding Information

Searches on the website yield many hits before what I was looking for. When using google for general searches I get what I want easier. Still the search is better than in other institutions I worked with.

I haven't checked the web site in a month or two, but it used to be hard to find things on it.

The NIM portal link should be more "up front"

Other Comments

A few months ago I missed more information or a link to further resources about NX. I had a problem and it took me quite some time to find out how to start the administrator mode. There was a discrepancy between information on the website and the actual state. I didn't check if this changed by now.

I was quite happy with the previous organization of the web site and interface. I am not sure it was necessary to monkey with this.

It is honestly a bit less user-friendly and hard to navigate than the previous version.

I liked the old NERSC website better than the new one. The new interface is more convoluted than the old one and what bothers me the most is that it's difficult (if not impossible) to access my usage statistics.

Plots of repository and my usage vs time seem to have disappeared in the new format. ?

I think the web site is very satisfying.

It is great!

If possible turnaround time for Carver should be improved for batch jobs.

Comments - What can NERSC do to make you more productive?

Queue suggestions and comments

Longer wall times

Longer wallclock times available on some of the large machines would help me.
longer available job times
Longer job running times.
If possible, l need longer Wallclock time for my jobs.
Increase walltime when one runs VASP.

At the moment you can open an additional long-time queue for VASP. The code is well scaled up to ~1024 cores and would benefit much more from uninterrupted runs. Would be great to have at least 48 hours of running. Thank you

Increase max run time for jobs

The walltime is somewhat usually short and it sometimes make me work less productively.

more flexible queue strcutre may help. for example for some smaller core jobs, extending the wall clok limit can be very helpful to me and save me time.

I think maybe allowing some more special queues for longer jobs.

Increase walltime one can request. There are some jobs we just can't finish in the allotted walltime and the problem is that they can't be restarted that well.

I wish I could request more hours (several to many days) for queued jobs

If the time for the queue can be facilitated, it would be great.

more flexibility with the walltime limits for parallel jobs

Increase max walltime in Hopper to 48 hours when using less than 16 nodes.

Introduce on Hopper a 48 hour queue.

Increase the wallclock limits on Hopper to 48 hours to bring it into line with Franklin andCarver/Magellan.

increase the maximum run time from 24 to 48 (or longer) on Hopper. Many of us need to run the same job for days if not weeks and having to restart the job every 24 hours is very counterproductive. The present queues on Hopper are not so long as that some fraction of the machine (say 25%) could be allocated to allow for longer queues (particularly for smaller jobs). I know this would be appreciated by a large number of users.

Again as I commented. It would be awesome and helps me a lot if Hopper has a longer queue (e.g., 36 hours or even 48 hours) than that 24-hour queue. Some of my jobs could not be finished in 24 hours in the reg_small queue, and I had to resubmit to finish them.

The running time limit right now for Hopper and Franklin is at most 24 hours. Therefore our jobs have to frequently write to a restart file and we have to manually restart jobs using the restart files every 24 hours. Many of our jobs would require running time much longer than 24 hours to produce statistically satisfying data. If for some jobs the running time limit can be extended to something like 48 hours, then we will be much more productive without baysitting jobs all the time

The time limit of franklin is too short, and it is supposed to be able to run 48 hours. But it is not. The reason is unclear.

Many Hopper queues have been increased to 36 hours.  We will consider increasing it to 48 hours in the new allocation year.

Shorter queue wait times

As always, decrease wait time, increase wall time.

NERSC should have better turnaround time in the queues, as well as longer runtime queues.

get batch wait times down..hard to do with out more resources.

Reduce queue times :)

Hum... Buy more computers so that my jobs don't have to wait in the queue.

Reduce queue times. Improve the queuing system?

My only quibble could be with batch wait times and turn-arounds, but this is really no problem (when we place this issue in the greater scheme of things).

Less wait time in Queues. ...

More computing resources = less wait time.

Make the number of hours allotted per job on the queues longer without increasing wait time!

The only remaining big problem is the several-day batch queue wait times. I don't know how to change that, unless there is a general policy of over-allocation of the resources If that is the case, it may reduce the wait times to better align the allocations with the time available. If the large job discount is heavily used, that should be taken into consideration when determining if the systems are over-allocated.

shorten the queue time of Franklin

Decrease wait time on Franklin.

There were times during which my throughput on franklin seemed very low. Hopper has been a valuable addition.

Try to improve the wait time on reg_small on Franklin.

One of the biggest limiting factors is the long queue wait times on Carver.

make the carver queue go faster (increase the size of carver)

Hard to get jobs thru on Carver.

better interactive/debug queue on hopper, sometimes the delay is too long

the new hopper system is great (maybe you should have another one of these to increase the speed of the queue

Better queue turn-around on smaller (8-128) node jobs.

I run lots of very short jobs (~15 minutes or less) jobs that are highly parallel (256 to 8K processors) for debugging purposes, to debug problems that don't occur with fewer processors. If there were a way to speed up turnaround time for such jobs it would improve my productivity dramatically.

Queue management suggestions and comments

A more accurate "showstart" command would be very useful for planning computing jobs and analysis.

It would be useful to have an estimated start time for submitted jobs. Although jobs enter the queues in order, certain jobs that are either small or short run much sooner than other jobs. In terms of planning work, it would help to know, roughly, when a given job should run.

I would love to see a column on the "queue look" webpage thatgives the estimated time to job start

We agree that the showstart command is not useful for any job other than the top one waiting in line.  We've found predicting start times to be inaccurate as running jobs end sooner than the time requested.  Additionally, jobs submitted to high priority interactive, debug and large queues change start time estimates.  On the queue pages we have added a column which shows a job's position in the queue which we hope will give users a better idea of how much longer they have to wait. 

Queues are a bit slow. My jobs sometimes fail in the middle of a run for no reason. I can restart and finish them, but I have to wait in the long queue again. Could there be some mechanism for identifying failed jobs so that they can be given higher priority when they are restarted?

NERSC can develop a system for monitoring VASP jobs. There is a possibility to interrupt VASP softly, allowing it to write the charge density matrix. The problem is to estimate whether to start a new optimization step or write the matrix and finish the job. Killed jobs leave no charge density, making restart more time consuming.

Increase the number of small jobs one can have running simultaneously on Hopper.
Allow automatic reassignments of Carver jobs to Magellan, when Magellan is underused.
I'm sure there might some non-trivial scheduling issues associated with this, but there are certain aspects of our work that require large numbers of jobs (100s-1000s) that are not very parallel (i.e., don't scale well beyond 1-4 nodes). Clearly it would be nice to run these on NERSC because the overall computing requirement is still large, but the batch queues don't really facilitate running many "small" jobs(despite being relatively easy to schedule as "backfill"). The most obvious solution I can think of -- essentially limiting max number of simultaneous cores instead of max number of simultaneous jobs -- defeats one of the main purposes of NERSC and is thus not a good one. Nonetheless, it would be nice if there were a way to do this. Obviously we're pretty happy with NERSC though, so this is not a deal-breaker...

Right now the queues are pretty heavily biased against users running many smaller jobs on fewer CPUs, but this has been improving!   For my systems, the former limit of 12 H on Hopper for small jobs limited the usefulness of hopper, but now the 24 H limit is very favorable. The same can be said for increasing the queue limits to 8 running and 8 queued jobs. I would prefer slightly higher queue limits for smaller jobs, but the recent increases have been very welcome.

On the occasions when I need to run many jobs requesting only one node, the scheduler only allows for a handful to be run at one time.

NERSC recognizes that many science applications require thousands of jobs on a relatively low number of cores.  We have recently increased the number of concurrent running jobs allowed on Hopper and we've also added a serial queue on Carver.  Users with special requests should contact the consultants.  We have the capability to reserve a block of time or nodes for users with special needs.

Quantum Monte Carlo is one of a few methods that can readily exploit high concurrency computations. From this point of view, discounted charge factors for large queues would be helpful.

It will be very nice if the charging factor of the computing time is much lower, because the cost of large job is really expensive.

better back-up (xfer) queue management

One suggestion I had from the evaluation period was to create an xfer queue on Hopper, but I see this was recently done. 

allow time-critical batch jobs to have higher priority. Not sure if this is feasible or not.

Users can submit to the premium queue to gain higher priority.  The premium queue has double the charge factor and so should be used only for time critical jobs.

better control over job requests, e.g. being able to request physically contiguous sets of nodes on hopper.

Add a premium batch queue to Hopper.

A premium queue now exists on Hopper.

I have noticed several times recently that sometimes there are no jobs in the eligible queue and Hopper is not running at full capacity. I think it would be a good idea that when this happens the maximum number of jobs that a user can run can dynamically be increased from 6 to a higher number, that way Hopper does not go underutilized at any time. Wasted cycled are never a good for anyone.

I like the concept of 'scavenger ' modeat Kraken, under Teragrid. Jobs whose results would provide insight to the production calculation , or as a further check of the result , you can run there at no cost to the main account .... but at very low priority If NERSC was to have a queue , that one could submit to, that in times where a machine is under utilised, a calculation could be run.

1) It would have slightly less priority than low
2) It would fulfil .... 'just one last check under this condition'
3) I need to profile my code, but i don't want to burn the main account
4) It might produce an interesting scientific result, when the code is run under an extreme condition(s), but you would not risk the time usually.

A scavenger queue has been added to Hopper and Franklin.  Thanks for the input.

The long run ....... (bend the rules slightly option)
For long runs (48 hrs to 96hrs) , people could submit jobs at say up to double  time, with half the priority , and perhaps 200% more cost. So it would enable the  'last check' run , but with enough cost that they don't do it on a regular basis.

Regular queue drains on hopper sound good; I am the type of user who can  take great advantage of those.

 I find that I only use HOPPER. I used to also use FRANKLIN, but I started to have trouble submitting jobs, so I just stayed with HOPPER instead. For me personally, it is more efficient to work on a single machine. Much easier than having to remember which jobs are running where. But as long as HOPPER isn't overloaded, I don't mind other users working on other machines!

Software suggestions and comments

Would be nice to see the Intel compilers on Hopper.
The presence of Intel compilers on Hopper and Franklin!!! We have problem compiling some codes based on advanced C++ templates techniques (mostly in Boost library) with PGI, Pathscale, and Cray compilers. GNU is OK, but we also need some additional features, such as quadruple floating-point precision in Fortran, etc.
I also like to use Midnight Commander. However, I haven't seen it on any supercomputer. Don't know why :).
I am currently unable to use hopper due to a plethora of problems of my application code with the various compilers/libraries available on hopper. Ironically, my Cactus-base code works just fine NICS's kraken (an XT5), but has all sorts of memory/MPI/OpenMP issues on hopper, leading to incorrect results. We are still working on tracking down the problem, but it has been a major drain of time for my group. Unfortunately, our code is so complex that we cannot easily hand it over to the consultants to 'let them fix the problem'. What would help, though, would be getting the Intel compiler suite installed on hopper.
Please, please, please provide an intel compiler environment on hopper2. Our group tried hard using the pgi compiler, but the invested time so far isn't worth it and we now switched to gcc despite the possible decrease in simulation performance. We have a lot of experience with the intel compilers on other systems. It typically produces the fasted binaries from our codes even on non-intel cpus, and while also the intel compiler has some problems, they are much more managable than with pgi.
The Intel compilers have been added to Hopper. (They are also on Carver.) Thanks for the feedback.
Make compiling and linking easier.
Cleaner development environment.
We have recently had problems building VORPAL or necessary packages with new PGI compilers and/or their Cray wrappers. (An example of this is the missing std::abs(double) on freedom.) Better compiler reliability before making a version the default would be good.
Vasp at hopper2 is not so stable, the speed is not so good, and would suddenly stop sometimes.
To the extent the environment on NERSC machines mirrors my environment my workstations and laptops, my work is more productive. Also, I can often do initial development on my (much smaller) machines and then transfer it to NERSC facilities when it is more mature.
To that end, things like support for shared libraries and software environments like Python are very help and increase my productivity. This has been more and more true for the newer systems, e.g., "hopper", but I want to stress that it is very important and should continue to be supported and developed.
Also, it would be REALLY nice to be able to cross-compile on my workstations and/or laptops for the NERSC machines. It would make the compilation process much more decoupled from the NERSC user resources and allow me to use personal workstations with 16-48 processors to accelerate compilation times!
Would be nice to see XEmacs installed.
Python is more supported than previously, but still find things that don't work.
Module system is getting out of hand.
Wider choice of software and tuning tools.
More python libraries available through "module load numpy" for example.
I still can not figure out how to run replica exchange molecular dynamics (REMD) with NAMD on Hopper and Franklin. If you can help me with it, I will appreciate it.
Also, is it possible to install Gaussian on Franklin or other clusters?
If Molpro were available on Hopper, our research would benefit greatly.
NERSC can add visualization software such as SMS and GIS, to ease my analysis of output data.
The NX server may greatly improve interactivity; that had been poor in the past.
Faster response for remote X windows.
Add gv to Hopper software for viewing .ps files
Add nedit to Hopper software for editing files 
Interactive debugging is a bit painful. Regrettably, I don't know how to make it less painful, but if you think of anything
Reliable, usable debuggers can come in handy. Totalview hadn't worked for me in years so I gave up on it. Always possible to fall back on print statements, of course.
Weird how IDL can't read some of my netCDF files.
I like profiling tools that i can interface with my code at the compile line
Good : poe+, ipm
Bad : cray-pat ( setenv PAT_RT_HWPC 5 ) ... if i have to google the profiling option it has just failed. or at least a default script that would simplify this for user, and take the pat_build, relabel, resubmit again, hassle away.
One problem I have had is regarding visualization software for my field. The software has been installed, but there is not someone to help with troubleshooting. I realize that users may need to be on their own in some instances, but it is a bit confusing to me that installed software is not supported by the consulting staff. If software is installed, it should be supported in my opinion. In my case, I search out HPC resources that have software I need installed, with the assumption that it will be supported by the consult staff. If this is not common or perhaps my assumption is faulty, then I realize this is not a NERSC issue but rather my own.

Hardware suggestions and comments

More FLOPS, disk and network bandwidth! (Does everybody answer this?)
Get sufficient DOE funding to be able to run separate capacity and capability systems.
More money for larger computers would be very helpful!
NERSC should buy a Blue Gene/Q system to better support low-memory applications with well-behaved communication behavior.
Also, it would be helpful to have more large-memory nodes available (at least 4GB/core).
Add some disk space to some carver nodes.
I really think if we could have directly attached storage on some machinesmy productivity would increase 2 fold.
It would be helpful if the number of nodes on Carver could be increased. Its per core performanceis so much better than Hopper or Franklin.
increase the size of carver
The hopper login nodes seem overwhelmed. I frequently see many users and several long-running tasks on the nodes. Difficult to find a "free" node. Would be nice to see more login nodes.
My primarily productivity "liability" is lack of time to invest in doing things better. I don't think NERSC can help with that. There are 2 things I wish could be better, however:
A Hopper with faster processors! It's great to have access to so many cores, but the slower speed (say vs. IBM bluefire at NCAR) means one has to use nearly twice as many to get the same throughput. At least Hopper is faster than Franklin, which is rather slow.
More available nodes on PDSF.

File Storage and I/O suggestions and comments

In Franklin, default quota in scratch is a bit small. Of course, we can make a request to increase for a while. But default can be a bit larger than 750 [GB].
Increase my quota on [Franklin] scratch from 750GB to 3TB.
More than 40 gigs of [HOME] storage space would be quite nice. I tend to fill up that space with a few high fidelity simulations and have to transfer off data to HPSS that I will need to access to do my data analysis.
The HD space in the home could be also enlarged to avoid wasting time in long back-up procedures.
User quotas are a bit tight, and therefore heavy use must be made of moving stuff back and forth from SCRATCH and GSCRATCH. This is esp true when I have tried to move data back and forth to euclid, e.g. to run a matlab session.
More available disk space between global scratch and PDSF would be helpful for cloud based computing buffered on carver.
Allow users to keep their raw data in global scratch disk for longer time (3-6 month for instance).
Keep /scratch2/scratchdirs/mai available on hopper.
I have a comment about the scratch purging policy. For the most part, my groups back up our important files from scratch locally (i.e. on our own drives) or on hpss but every once an while forget (particularly new group members).
I wonder if you have considered using the following: when doing a purge, first create a list of all the files that have not been touched beyond the cutoff date, then reorder those files in terms of size, then delete the files starting with the largest until a threshold has been reached, but leaving most small files intact. This has the benefit of deleting most large intermediate files but preserving the small but important input / output / result files.
The global scratch is great, however I can not write my data directly on there because of the I/O performance. I first need to run on Hopper /scratch and then transfer my data which takes a bit of time. How about having a serial queue available on Hopper to do this kind of thing? An xfer queue to transfer data to the hpss would also be helpful.

Also it would be great if Franklin was also linked to /global/scratch
Due to the large size and amount of data generated from my computation, I have to perform most post-processing, data analysis and visualization in my account at NERSC. Unfortunately, transferring data from the scratch space of the computing machines such as Franklin and Hopper to the analysis/visualization machine such as Euclid has become increasingly inconvenient and inefficient. The ability to perform data analysis/visualization at NERSC has been strongly comprised.
It would be highly desirable to have a shared, global scratch file system that allows simultaneous accesses from both the computing machines and the analysis/visualization machine, just as the way the $HOME directory has been setup.

Increase the speed of tab completions. For some reason (I believe it's latency of the global file system, but not sure) tab completion of commands and file names is slow on NERSC machines lately. I know it's a a small complaint, but slow auto-complete can really break your command line rhythm.

I just feel that the I/O on Hopper $SCRATCH is sometimes slow and not very responsive. Hope this can be solved after the planned upgrade.

Nothing that you're not trying to improve already - specifically the performance of disk space.

Improve I/O performance for data intensive computations.

The time to build code on the login nodes for Franklin and Hopper is very slow compared to other big parallel machines I have used, specifically, running configure scripts and compiling Fortran code. When doing a lot of development, this performance is very important and improving it would be helpful. 

smooth out internode i/o on new Cray systems [there have been a few software issues on hopper that affected parallelization of some internode i/o intensive jobs]

HPSS suggestions and comments

I don't know if this is possible, but it'd be nice if the hsi command line interface had tab completion. For example, if you enter paths or file names. I've requested this before, & I think the answer was that it was not possible.
A better shell interface to HPSS with hsi would be nice; for instance tab completion would make using the system easier.
HPSS sometimes wants to drop the ftp connection from my desktop and I am finding it easier to move the data to scratch and scp it from there.
If I can somehow store my data during batch jobs directly to the HPSS and also be able to retrieve data from HPSS directly to my local computer, I see my productivity increasing significantly.

It would be great to have the globus connection to the HPSS again.
htar really needs to support larger files. My largest wave function files now are rejected which is extremely inconvenient and requires using hsi to transfer just the one big file or separately compressing it. I'd like to have a backup archive of entire calculations and htar is failing. it should support the largest files possible on the systems at nersc, realistically developers are going to avoid fragmenting files that are logical single for the convenience of support applications.

HPSS file management can be somewhat time consuming.
The interface to hpss, via hsi and htar is a bit clunky. It would be good to browse tar files moved to hpss more easily than htar -t. Maybe some graphical browser?

Maybe there is a reason for this, but the interface for HPSS is just atrocious when I used hsi. Why can't I delete batches of files and use commands usually available to me in linux. Perhaps it is to protect me from myself in terms of not deleting my data, but it drives me crazy how bad the interface is. It take me probably 5 times as long to do things in HPSS with hsi because of this.
Is it possible to copy (synchronize) data (directory) to the storage server?
I don't like having to come up with my own backup solution. It seems that there are great tools available for backing up, yet I have to do this myself to HPSS. I've actually lost a lot of time when /scratch was erased trying to recover -- not data, but just my state. It seems like NERSC has decided to put the burden of backing up on the users. This criticism assumes that the disk quota for home is not sufficient.

Allocations and Account Management suggestions and comments

Would like increased allocation

Other than the increased allocation, ...

Larger allocations and machines

More hours.
Secondly, I think my allocation time should be increased each year. I normally applied for 2 million hours, but always got 1 million hours. But in the meantime, I notice some other projects got more than needed. Then, we are allowed to ask for additional hours. This practice has a limitation. It is difficult to plan things ahead of time. I think this is probably the responsibility of DOE not NERSC. DOE can check the usage of previous years for one user group, and then determine the future amount of time. Just an idea!
More allocation opportunities during the year.
more flexibility with the resource allocations

ERCAP allocation request processs

Simplify annual ERCAP renewal process and only require it on a 2-3 year schedule instead of every year.
Regards reversions of ERCAP allocations. I appreciate (and have benefited from) that account usage is monitored for unused CPU hours, and that such hours are then given to other users that can use extra time. However, users in our group are not doing "production" computations, but rather research/development on improved boundary conditions, elliptic solvers, etc. -- all in the general march towards using more cores. Typically there is much analysis of the code (as opposed to resulting data) required in this work (our code is an atmospheric model). Further, we all have other responsibilities that may dominate from time to time. As a result, we may have entire months where very few CPU hrs are used. When some advance has been developed and coded, then we may have months in which we use a lot of CPU time as we examine various test cases. I don't know how this kind of usage can be better "monitored", but the CPU reversions are problematic for us. However, in most reversion cases to date, we have gotten back at least the amount of the reversion by the end of the ERCAP year. So we are tending to watch for that possibility at years end and to be prepared to take advantage of it.
Please try to make the yearly web resource request (ERCAP) as "updateable" as possibly, so don't need to type in much new info....there is tyicaly not much change year to year.

Account management

One minor complaint: the login-in security does not allow more than 3-4 (?) failures on password and I often need a "reset"...more "failures" should be allowed.
The login lock-out policy due to repeated unsuccessful attempts is too restrictive: it is very easy to get locked out by a few failures to type in the correct password. This is particularly problematic since remote access through the NX Server is sometimes faulty, so the user does not know whether it is just the usual trouble or whether the wrong password has been typed (perhaps since he/she was forced to change the password recently). Moreover, the procedure to unlock the login is VERY annoying, since it requires calling the NERSC HELP desk, which is not staffed 24/7 -- a login lockout at the wrong time could easily mean losing a whole day's of access and work. Either the HELP desk should be staffed at all times or, better yet, an automatic password reset or other system should be implemented for users in the same manner as many publicly available account services.

The number of login failures allowed has been increased slightly.  NERSC Operations Staff are available 24/7 to help users clear login failures.

More flexibility in dealing with group authorizations and shared control of resources, processes & data.

Consulting suggestions and comments

Often it is hard to get complex problems worked out with the consultants.Some tickets are not resolved for a long time. If a solution is not obvious the attitude is to avoid solving the problem.

Ticket system not adequate for group use: one can't search for and comment on tickets opened by colleagues

.Response to questions and online help.

Have someone on call for software/environment problems on weekends (especially long weekends).

On that note, allowing wider access to exchanges with NERSC consultants via email or the online interface would be useful. Often I'm working together with other VORPAL developers (usually John Cary) to resolve build issues, so all of us being able to participate in discussions with consultants would be nice.

The amount of email coming from NERSC is a bit excessive.

Training suggestions and comments

More tutorials for new users
Would be nice to see more examples online. Examples of all sorts of tasks.
More specific test examples for using libraries like SuperLU, PARPACK and sparse matrix packages, so one could test quickly before making the switch.
Parallel computing courses
I hope to NERSC can offer more workshop/Seminar about their systems. For example, I would love to learn more about NERSC cloud computing, GPU computing facility. I hope NERSC can have this kind of seminar/web seminar more often.

Availability, Reliability and Variability suggestions and comments

Inconsistency in calculation times continues to be an issue for my large runs. I think this is something of an endemic feature to HPC and is only likely to get worse as system (and problem) sizes scale up. Variations on the order of 50% make it difficult to streamline the batch system by requesting only the needed time.
There have been some issues with lustre robustness and we've noticed significant model timing variability on hopper (2).
Reliability issues (jobs lost to node failures) are still frequent, and require a fair amount of time to check that jobs finished properly and to rerun failed jobs.
I have mostly used franklin at NERSC so far. My jobs often crash for a persistent system error of "node failed or halted event".
Franklin has been in operation for several years, but is still not stable. Lean on Cray to fix the problem.
Reduce Franklin downtime.
Two big problems: PDSF downtime for maintenance/upgrades is too frequent and intrusive; at the very minimum, it should be carried out ONLY at night and/or weekends (preferably weekend nights).
More uptime is always great when in need of it.
more uptime
Not much really, perhaps except extending uptimes of the systems.
Need to devise a method to improve fault tolerance. Lately I have been running jobs that requre ~16384 cores on Franklin. After 2-5 days of waiting in the queue to gain access to the cores, more than 50% of these jobs fail with 'Node failed'. It only takes one failure to kill the entire calculation. There must be a way to request an extra two or four cores that are sitting thereidle during the simulation, so that when one of the cores being used in the calculation drops out, the information it had in memory could be handed over to one of these waiting cores, tell MPI that this new core is the one handling the work, and then move forward again. Anyway, my biggest problem running large jobs lies in this issue. It also wastes a large portion of my allocation. It would significantly help productivity if NERSC could devise a relatively turn-key strategy for users to deal with machine-level failures.

Networking suggestions and comments

Also transfers from NERSC dtn_new to the users desktop are 1/3 as fastas uploads to NERSC.
Increasing the bandwidth between NERSC and external users will remain a popular demand, but this is not necessarily dependent solely on NERSC.
Increase data transfer bandwidth with BNL

No Suggestions / NERSC is doing well

productivity roadblocks are pretty much on my end (for now)
I have a hard time thinking of anything that NERSC could do better.
You are quite already a very efficient organization.
Keep doing what they have been doing for the last few years
Keep doing what you are doing!
Keep doing an outstanding job!

Comments - What Does NERSC Do Best

Multiple Things Done Well / NERSC is Overall a Good Center

In most of aspects, I'm very satisfied.

Compared to other national computer centers, NERSC does an excellent job.

Provide reliable/stable access to state-of-the-art HPC resources. Balance the needs of a large number and range of users. Provide exceptional levels of service for as required.

It provides high performance computing hardware and software as well as user service.

just about everything is done well at NERSC

Since the introduction of Hopper, NERSC has been excelling at providing a stable machine, environment with very reasonable queue times. For the most part, NERSC provides all the computing resources and tools you need without getting in your way.

They keep the machines up & running, set up accounts quickly, proactively contact us when allocations are running low. Web pages are very useful, help line & support is friendly & reliable. As long as jobs move quickly through the queues, everyone is happy!

Excellent, stable computing environment and quick response to problems.

Well managed, well organized, user oriented, little beauracratic demands, and much more.

NERSC has consistently done a super job. I think they are a role model for other national labs. I can tell anybody they are the best among any labs. Congratulations!

The support and the machines are very good.

99% VERY WELL

NERSC has state-of-the-art computing resources which are maintained at an equally high level in both softwares and services. Both the global home and scratch are very convenient. NERSC NIM provides a very nice interface to manage and monitor account usage. MOTD provides on-time updates. Overall, NERSC is the best.

For me NESRC sets the standards to which I compare the other HPC providers I interact with. keep doing what you're doing!

Keeping systems running with little interruption to work, customer service, providing the right set of tools made easy to install.

Provides great resources. Manages various systems well. Consulting is timely and effective. Adapt well and constantly provide new services.

Very good maintenance of the machines. Very good consulting/help desk. Very good security. Very good workshops.

Offers hassele free computing with access to good software

The regular addition of new machines, complete with comprehensive programming libraries allows us to continually update our codes to run on the latest hardware. Most of the software we need is installed by NERSC staff, whenever new machines are added. The redistribution of account allocations throughout the year ensures that our requests for CPU time are always met. Timely warnings about maintenance-related downtime allows us to plan our work accordingly.

For my project, NERSC was the only option for developing co-array fortran codes. It also provides quite a lot of information on supercomputing through the web site. Supercomputing facilities at NERSC are outstanding. Thank you.

Scientific computing! I have been using NERSC for several years now, and I LOVE NERSC. The availability of sympathetic consultants is reassuring, even if they are not needed.

Allocations and computing resources. Time allowed to keep files in scratch area before purging. Computer reliablity and uptime - very little downtime

System availability and resources are good. User support is generally helpful.

NERSC does everything that I can think of at the moment extremely well. I'd say that of the number of supercomputer systems we have access to that NERSC machines and support stand above all others in having their act together. In terms of consulting quality (technical accuracy, response time, etc.) NERSC absolutely "crushes" consulting from the TERAGRID computing system. It's not even comparable really. When I eventually start up my own research group, getting resources at NERSC will be a top priority as the environment is extremely conductive to scientific research.

Reliable computing resource and great user services.

It provides me with computational resources and software that are preoptimized. This helps me get productive results without too many headaches.

Manage and maintain its computing resources, both software and hardware.Solve users' problems and answer their questions.

very good hardware. very good support. very good information about the machine. very good efficiency

Great resources...Easy and simple to connect from outside - for instance using ssh...Excellent and very helpful support team...

Uptime is good on most of the machines. Consulting help is good with day to day issues. With hopper onboard this year, queue wait time is down to comfortable levels.

It is over all very satisfactory.

Perfect

consulting, management of allocation opportunities, acquisition of systems

In addition to the computational facilities, NERSC has always been and continues to be fantastic with their support. Much better than other supercomputing centers.

Everything that is important to me

large number of computing nodes. good back-up facility. well documented. good connectivity from outside world

Very well. I was a bit unhappy at first, but once I read all the documentation on the web, I realized how great a resource it is and how everything is well thought out.

I am pretty satisfied with NERSC overall.

NERSC does well in most aspects.

NERSC is the best HPC center in all the DOE. Much better than ORNL and loads better than ANL. Keep it up! NERSC seems to be really connected with the scientist. Great staff, fair allocation process.

The new Hopper Cray XE6 is good at parallel efficiency. The scratch drive space is abundant and fast. The help desk staff are responding to our questions extremely fast and professional.

Great user support, software support is exhaustive on mature systems, and machine uptime is impressive.

Easy to use development environment with straightforward progression to full scaleproduction. Support for machines and software environment so that computationalscientists don't have to worry about it.

Maitanence; Communication with the users; Consulting

I find the webpage to be generally helpful from a content perspective. The systems are reliable and with the addition of Hopper, the wait times have decreased to the point where using NERSC is very useful for my work.

User services, help desk, general human interactions are the best I've found. NERSC's machines are the only machines I have access to where the problems to which I need answers can be performed. I really enjoy running at NERSC.

1. Allocation time 2. Data transfer 3. Website information 4. Scratch system

I think the support service and the resources availability are very good.

The biggest improvement in the past year has been the reduction in wait time on Hopper/Franklin jobs less than 2304ps (which is my max...mostly I do 768ps recently) particularly in past few month. The wait time really should not exceed the compute time....on average that has only been true recently. The wait time on 30min debug queue has been typically only a few minutes which is extremely important in code development and debug. (In years past...the wait time has got to 7-10x compute time making machines useless....I've been a strong critic....not sure why the turn around..machine loading in highly variable...so I hope the recent "good times" lasts) The up-time has been reasonably good except for the recent down time from security breech. Consulting services via phone (and email) are quite prompt...at least on trivial queries.

Make it more economical to run larger jobs. Easy to get extra time. Easy to add new users.

Computers run well, queues are set up well and run jobs how I would expect in terms of timeliness. I have had no problems with data corruption or instability, as I've had on smaller cluster. The scratch and home directories are easy to navigate. HPSS is pretty simple to use as well.

Maintains large supercomputers with sensible queuing and allocations. I have always found support staff to be helpful.

The compute resources at NERSC are awesome. Constantly improving. The consultants are great and we can usually find a solution to any problem. Carver is a GREAT machine because it is very forgiving, is fast at most everything, and has the Intel compilers.

I think the machines are run with a high level of reliability. The lustre file system seems to perform well compared to, say, kraken. I am impressed by the consultants. Dirac has been a good platform on which to explore GPU computing.

Everything

I am quite happy with NERSC resources. I can't think of any problem.

I feel NERSC sets up and maintains it facilities very well. This doesn't simply include the hardware and hardware support. This extends to the programming environment, software, and user support. Their systems are useable, accessible, and powerful.

Provides good access to massive, reliable computational resource, through competent, outgoing staff. Sure appreciated the free time on hopper!

Great software support. Allocation according to need. Plentiful computational resources. Variety of queue structures to accommodate a wide range of different types of jobs.

NERSC provides a very high level of user support and a good mix of large and small(er) systems. The staff are the most consistently helpful of any HPC systems I have ever used. The increase in disk quota on $GSCRATCH was long overdue but has helped tremendously. The global file system (/project) is a big plus, though the space is typically small. I personally find the global homes annoying, since the systems are not all binary compatible so I need multiple versions of executables.

Good execution of thier responsibilities. Good uptime and accessibility of high performance computing resources. Extensive and well-maintained software libraries.

System availability is good. Downtimes are scheduled after consultation with users. Some tickets are answered quickly and efficiently.

Modern computers, up to date website, timely notifications, and excellent uptime

A great many cores in a sufficiently stable environment that we actually have enough allocation to use. Large fast jobs are encouraged which is where I think the focus of computational development should be in terms of human scientists learning more, never been a fan on sitting on my hands for a week or two for a result. HPSS is really quite nice in general.

I have found the staff to be very helpful. The computing system is up for a good fraction of time, and messages are informative about what is going on. The software you have installed works well, and it you do a good job keeping up with versions.

I'm very satisfied with overall NERSC performance.

For our group and my specific uses, NERSC provides "heavy lifting" computing resources that we could not possibly afford on our own. I know that our work -- primarily molecular dynamics (MD) simulations -- doesn't come close to needing the number of CPU cores that some of the truly large fluid, particle, and plasma physics codes use, but nonetheless jobs that would require a complete takeover of our group's cluster (e.g., those needing 500-1000 cores) can be performed simultaneously with ease on NERSC's machines. It is therefore a tremendous resource for us and has enabled a scale of MD simulations that would not otherwise be possible. In turn, these resources have enabled our group to become one of the leaders in force field development, the modeling of intrinsically disordered proteins, etc. Beyond simply furnishing computer time and resources, I have found that NERSC's consulting services are extremely helpful. Whenever I have run into a problem, the help desk has jumped in to investigate and almost always solved the issue within a day or two. For a large computing facility, this is a fabulous service. In addition, we recently had a couple of folks from NERSC come down to "interview" our group about our current and future computing needs to use in planning the next big cluster. This sort of user-centric planning is no doubt part of what makes NERSC great.

Support teams on Nersc are excellent!

Making their HPC resources user-friendly.

NERSC is overall an excellent facility! In particular, response times to all inquiries are very short, and the answers and quite helpful more often than not. Also, the maintenance schedule is quite compact, and machines rarely go down at other times.

System uptime; system stability; programming environment; help desk/customer service; plenty of disk space on all systems.

I believe the hardware and software available at NERSC is world-class and has state-of the art bsupercomutational resources which have been meeting the demands of thousands of computational scientists all over the world. NERSC facility is becoming a sine quo non for a world-wide community of researchers in diverse fields of various scientific and engineering disciplines. NERSC does it excellently and should be congratulated. I have been using from Canada the NERSC facilities for more than a decade.I am most grateful to Prof. Walter Loveland ,the PI and Dr. Ted Barnes, Monitor , Nuclear Physics Division for providing me acess to NERSC. Ms. Francesca Verdier has been most helpful and her kind advice and guidance are highly appreciated. Mr. David Turner has guided me and ironed out many problems which I encountered during my interaction with Carver. But for the kind help and judicious advice of the above mentioned user friendly scientists and administrators , I doubt very much if I could have accomplished anything in our research in Physics and Chemistry of Superheavy Elements. In conclusion, my sincerest thanks to all at NERSC.

I think NERSC does very good.

They make state-of-the-art supercomputing resources available to the DOE research community, with a strong (and important) emphasis on usability and scientific productivity.

Maintain HPC systems with the necessary environment and software that I can create workflows for my computing jobs that are reliable and easy to use.

NERSC Provides Good Computational Machines and Cycles

Big computing.

Massive parallel computations. Nersc is very good for running huge parallel jobs.

providing a robust , diverse computing platform for codes with different needs

good file, reliable huge file systems

provide compute nodes with good access to high-volume data storage

NERSC provides reliable and highly usable high-performance computing resources for my scientific computing needs.

Professional more or less seamless maintenance of the hardware and its integration with the applied software.

Simulations on Hopper runs extremely stable.

Vasp at Carver is fast, and very stable for running.

Everything I need them to. They are a clean shop that knows what they are doing. I know that the HPC part of my research will be seamless and I won't have to worry about unexpected problems because I use NERSC.

time of availablity

Lots of resources are available

PDSF: Interactive and batch performance is good, NXServer access is very fast, disk I/O is adequate, the programming/analysis environment and software is user-friendly.

And I like that big jobs get to top of the queue quickly.

In general uptime is pretty good, and queues wait times are reasonable.

I am extremely happy with the quality of service on the Dirac machine, which I know is not operated as a production resource. Nonetheless, it has been extremely stable and easy to use. Dirac led to a huge boost in research productivity on GPU systems.

Offer excellent compute resources in a well organized fashion to users.

I was very impressed by the way the staff brought Hopper on line. The startup was smooth (from the user perspective at least). Hopper is running very nicely.

Provide high performance computing to many users. In particular, the limit on job length (24 hours to 48 hours max) allows for users to access the cluster without having to wait for an excessive amount of time. Also, maintenence is done quickly, minimizing down time.

Good queuing system. Good that admins tell us when the system is down.

Compare with Jaguarpf the downtimes are relatively few

Up time for STAR jobs is very good, with lots of available CPU for jobs when necessary.

The DIRAC GPU cluster is a terrific resource for evaluating the suitability of the technology for our applications, without having to invest in hardware that might not be useful.

Provides services to a wide variety of codes and for a wide variety of platforms.

Deploy huge hardware, run that hardware well

The new Hopper system is a great resource that has really helped our research.

Supply computing cycles!

Getting my work completed on time has become finally possible, thanks to the fast Hopper nodes and their queue time.

The available computing resources and the programming environment are simply excellent.

Good choice of computing systems. Easy to access and run large jobs.

I am amazed that I have access to all these computing resources. Even though I can get frustrated when a system is unavailable, it's so nice to be able to use these resources whenever I want. I especially appreciate the Dirac cluster and being able to experiment on the most recent GPUs.

Hopper has been working quite well lately.

HPC resources are generally available. Easy navigation from one machine to another. Easy job submission system.

NERSC tends to be more responsive to users than other centres. Although it took many years to get to this point, NERSC recognizes that people need sizable amounts of permanent disk space which is accessible to MPP jobs and provides it through the/projects file system. It was also good that you made Hopper's queue structure jobsize neutral, unlike Franklin.

NERSC Provides Good Services

Wow where to start? I think you have the best technical support I have everused. Very quick, accurate, and done with a smile ;-) I think you do a fantastic job of informing the user of upcoming changes, this really helps the user to prepare for the week/month. I think you listen well to the user needs, and respond to those needs.

Organizing and integrating its user community, reaching out to the users, addressing user requests.

User Services and Viz/Analysis support are outstanding.

good ability to store and analyse data

Very prompt and responsive to user concerns and questions. I want to just convey a big thank you to all at NERSC for their help and service.

Working with many different software packages for HPC systems

Consulting is excellent - rapid responses and are eager to help with good advice.

User support is proactive and easily the best I've seen anywhere. Communication about the software environment is likewise excellent.

NERSC consulting has been helpful with environment variables for my memory-intensive large runs.

Account support. The help desk is very responsive.

Very good basic user documentation.

The new NERSC website is great. The navigation is very intuitive and it is easy to find information when I look for it.

easy to run interactively; new NX server is wonderful, changed my way of working; great support

I have found that a phone call to the help line at NERSC has always been extremely helpful and efficient! Thank you.

The consultants are very good, and usually solve problems quickly.

NERSC's website has an enormous amount of good information and Google frequently takes me to it when I am looking for generic answers. The new website has been very easy to navigate.  NERSC's people are very talented and are on par with the staff at the best supercomputing centers in the world. I currently run on 8 petaflop/s worth of supercomputers on two continents, and I count NERSC among the top 3 places I use. For reference, CSCS and ALCF are the other two I think are excellent. For Cray-specific help, NERSC is much better than NCCS/NICS. Only part of this can be attributed to the relative IQ distribution of the two host states.  Katie Antypas and Francesca Verdier have been extremely helpful to me with access to Hopper2 and with NERSC-related issues I've had over the years. Zhengji Zhao does an excellent job with the extremely challenging responsibility of maintaining electronic structure codes at NERSC. Viraj Paropkari and Hemant Shukla have done an excellent job with Dirac.

Customer support, software updates, responsive to requests.

Technical consulting and responsiveness are very good, especially compared to other centers.

Lots of very good support, very very helpful technical assistance!

Flexibility in allocating additional CPU time.

Asistance with problems

support staff are excellent

I think consultant services are really good. Machine maintenance is also quite good, with dates for down times given well in advance. There are relatively few surprizes. Quite generally, the interest of NERSC staff in helping us out is excellent, and very welcome.

The consulting is good and they always follow through. With webinars the training has become easily available.

NERSC has a very competent group of people.

Service reliability and user support.

Help Desk is really exceptional in responsiveness

NERSC has superb technical support staff. I'll never forget the experience of calling at 3am on a weekend for a password reset and getting a staff member right away who was much more awake than I was. Their informative messages are helpful, especially the ones announcing that particular queues are mostly empty.

Works with users to make codes work and to optimize performance.

Consulting is usually fast and very helpful. The once a year allocation process works well, along with the ability of department heads to allocate additional hours if needed.

Your consulting staff are helpful, courteous, and timely. The same is true for your accounts staff.

Manage accounts well. Prompt and effective responses. Good solutions and suggestions.

Excellent service. Like fact that one can get free initial allocation.

NERSC's support is done very well. I always receive the help I need in a timely manner.

Support

Communication is quite clear and consistent. Good response to users needs.

Responds quickly when issues are reported.Keeps users well informed about coming changes, and doesn't appreciably alter the way things work at the user level without giving due warning and explanation-of-what-I'm-going-to-have-to-do-differently.The NX client interface and the make-data-available-through-publicly-accessible-URL's capability are very nice.

Support of all kinds (techincal, password assistance, etc.)

The NERSC consulting/support group is very efficient. Responses offered by the personnel in this group are always very helpful to me.

Support staff and advice is excellent.

NERSC is very responsive to requests. The information is always up-to-date and comprehensive. The website is also well designed and the NIM is pretty useful.

Customer support

I appreciate very much the helps from Francesca Verdier on how to get allocation time and Zhengji Zhao on how to run VASP

Show Pagination