NERSCPowering Scientific Discovery for 50 Years

2012 User Survey Results

Response Summary

A special thanks to the 552 users who responded to the 2012 survey, which was conducted from January 11 through February 4, 2013. This represents a 11.8 percent response rate from the 4,659 users who had been active in 2012.  User responses are important to us because they provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve.

The survey strives to be representative of all NERSC users. The hours used by the respondents represent about 61 percent of all MPP hours used (on Hopper, Franklin or Carver) in 2012.  MPP respondents were classified according to their usage:

  • 98 respondents had used over 2 million hours, generating a response rate of 67% from this community of "large MPP users".
  • 197 respondents had used between 250,000 and 2 million hours, generating a 36% response rate from the "medium MPP users"
  • 149 respondents had used fewer than 250,000 hours, generating a 8% response rate from the "small MPP users".
  • 108 respondents were not MPP users - they were either Principal Investigators or project managers supervising the work of their NERSC users, or they were users of other NERSC resources, such as HPSS, PDSF, Genepool, Euclid, or Dirac.

Starting in 2010, NERSC has been providing computational support for Berkeley Lab’s Joint Genome Institute (JGI), and in 2011 deployed a cluster for them called Genepool, purchased with JGI funds. The survey results reported here exclude those JGI users who use Genepool exclusively and do not use other NERSC systems (71 respondents). The results described below thus come from 481 respondents.

On this survey users scored satisfaction on a seven-point scale, where “1” is “very dissatisfied” and “7” indicates “very satisfied.”  The average satisfaction scores from this year's survey ranged from a high of 6.79 to a low of 5.16; the average score was 6,29.

Satisfaction
Score
MeaningNumber of
Times Selected
Percent of
Scores
7 Very Satisfied 10,843 57.1%
6 Mostly Satisfied 5,477 28.8%
5 Somewhat Satisfied 1,264 6.7%
4 Neutral 898 4.7%
3 Somewhat Dissatisfied 353 1.9%
2 Mostly Dissatisfied 95 0.5%
1 Very Dissatisfied 58 0.3%

For questions that spanned previous surveys, the change in scoring was tested for significance (using the t test at the 90% confidence level). Significant increases in satisfaction are shown in blue; significant decreases in satisfaction are shown in red.

Areas with Highest User Satisfaction

Twenty-three of the 97 topics received an average score higher than 6.5.  NERSC resources and services with average scores in this range were:

  • HPSS uptime and reliability
  • Global project, homes, and scratch reliability and uptime
  • Account support
  • Security
  • Technical consulting
  • NERSC website and the New Users' Guide
  • Services overall
  • Hopper uptime
  • NERSC's internal network
  • NERSC Information Management (NIM) system

The top 7 of the 23 questions that scored over 6.5 are shown below.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

TopicNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2011
1234567
HPSS: Uptime (Availability)      1 5 2 34 156 198 6.71 0.66 0.01
PROJECT: Reliability      1 1 7 34 136 179 6.69 0.63 0.01
HPSS: Reliability       7 3 34 152 196 6.69 0.68 - 0.02
GLOBAL HOMES: Reliability   2 2 6 6 42 222 280 6.68 0.79 - 0.12
SERVIVES: Account Support   1 4 6 10 68 294 383 6.67 0.74 0.00
OVERALL: Security   2 1 13 6 46 256 324 6.66 0.82 0.09
CONSULT: Overall     6 1 17 63 263 350 6.65 0.874 0.06

Significant Increases in Satisfaction

Ten topics received significantly higher satisfaction ratings in 2012 as compared with 2011 (as measured by the t-test at the 90% confidence level).  The largest increase came from the Carver Batch Wait Time score.  This was the only topic that scored below the satisfactory level (or 5.25 / 7) in 2011, having received an average score of 5.16.  It's score improved by almost 1/2 a point in 2012, attaining 5.64.

This increase in satisfaction is explained by a significant improvement in Carver batch wait times in 2012. There are two major reasons for this improvement. First, in 2012 Carver added three racks of nodes from the ARRA Magellan Project to its existing five racks and so had 60% more compute resources. In addition, during 2012 NERSC introduced and perfected new services on Hopper, in particular CCM (Cluster Compatibility Mode), that allowed users — for example, Gaussian users — to run on Hopper for the first time, thus alleviating the load on Carver. Carver is a small resource compared to Hopper, contributing 5% to the 2012 MPP allocation.

The table below compares Carver batch wait times for the three months preceding the 2011 and 2012 surveys, and shows the dramatic impact of increasing the amount of computational resources.

Number of Cores in the Carver
Regular Priority Queues
March - May 2011
Wait Times
Oct - Dec 2012
Wait Times
1 - 31 13h 17m 07h 39m
32 - 63 83h 53m 12h 08m
64 - 511 31h 46m 32h 50m
512 - 1023 29h 10m 16h 20m

Other significant improvements from 2011 were related to NERSC's training initiatives (New User's Guide, classes, online tutorials), an increased focus on supporting data intensive computing (data analysis software), our website modernization which started in 2011 (searching, ease of finding information, website overall), HPSS bandwidth improvements, and a small NIM redesign effort.

The table below shows the questions that scored significantly higher in 2012 compared with 2011.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

TopicNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2011
1234567
CARVER: Batch wait time  3  4 7 9 29 57 44 153 5.64 1.38 0.48
TRAINING: New Users Guide       5 10 66 141 222 6.55 0.69 0.32
TRAINING: NERSC classes       8 10 23 42 83 6.19 0.99 0.29
Data Analysis Software 1   2 20 14 61 66 164 6.01 1.12 0.26
WEB: Searching   4 5 19 30 90 105 253 6.02 1.13 0.24
WEB: Ease of finding information   1  4 9 33 146 173 366 6.13 0.85 0.22
HPSS: Overall      4 5 8 59 133 209 6.49 0.84 0.19
NIM (NERSC Information Management)   1  7 11 13 86 253 371 6.52 0.88 0.17
TRAINING: Web tutorials       4 9 61 76 150 6.39 0.72 0.16
WEB: Overall      1 7 12 146 219 385 6.49 0.67 0.15

Areas with Lowest User Satisfaction

Fourteen of the 97 topics received an average score below 6.0 out of 7. Of these, only one was less than the satisfactory level of 5.25 - Hopper Batch Wait Time, which received an average score of 4.9. This represents a decrease of one point from 2011. Comparing the waits for the three-month periods between each of the two surveys does not reveal dramatic wait increases, but the waits during the weeks leading up to the last day of the allocation year (December 1, 2012–January 7, 2013) are about double those of the previous year.  The 2012 survey opened at the end of the allocation year, when users try to use up their remaining time balance.

The other event of note that explains the decrease in satisfaction is that we responded to user requests for increased wall limits on Hopper. In 2012 these limits were raised from the 24-hour limit that was in place during 2011 to as much as 168 hours for the throughput queue (users can only use one or two nodes — up to 48 cores — with this queue), 96 hours for the Cluster Compatibility Mode queue (1–16,368 cores), 48 hours for the regular 1–16,368 core queue, and 36 hours for the 16,369–98,304 core queues. As predicted, these longer wall limits have resulted in longer wait times. And the “very unsatisfied” scores tended to come from users needing to run long-running jobs. Sample comments from the survey from these users:

"My simulations require moderate CPU resources (i.e. a few nodes) but very long runtimes (1–2 weeks). Hopper regular queue jobs are limited to short periods, and have very long queue times (multiple days for a 12 hour job)."

"Hopper is a capability system optimized for *huge* jobs. It would be nice to have a capacity system that would allow high-throughput of medium size jobs (e.g. of order 1000 cores instead of order 20,000 cores). It is incredibly difficult to push jobs with ~3000 cores. Jobs with < 512 cores and much larger than 10,000 cores seem to start just fine. Our jobs are mid-size and take more time in the queue than they actually request in wall time. This is very frustrating."

The 14 questions that scored below 6.0 are shown below.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

TopicNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2011
1234567
TRAINING: Video tutorials     1 8 4 17 22 52 5.98 1.15  
CARVER: Ability to run interactively 1 1 1 17 9 36 49 114 5.95 1.25 -0.05
TRAINING: Presentations 2     12 13 25 42 94 5.95 1.28  
DIRAC: Ability to run interactively     2 2 3 7 11 25 5.92 1.29 -0.05
HOPPER: Batch queue structure 3 3 15 26 41 164 115 367 5.86 1.18 -0.17
HPSS: User interface (hsi, pftp, ftp) 1 3 13 19 18 61 83 198 5.85 1.36 0.09
EUCLID: Overall 3     4 7 14 21 49 5.82 1.56 -0.29
NERSC SW: Visualization software 1   3 29 19 53 60 165 5.81 1.21 0.11
CARVER: Batch queue structure 3 3 4 18 13 54 57 152 5.80 1.40 -0.00
NX: Overall 2 5 5 5 12 22 39 90 5.69 1.63  
CARVER: Batch wait time 3 4 7 9 29 57 44 153 5.64 1.38 0.48
WEB: Ease of use with mobile devices 1 1   14 5 11 22 54 5.63 1.48  
WEB: Mobile Web Site (m.nesc.gov) 2     16 2 12 17 49 5.45 1.57  
HOPPER: Batch wait time 13 17 52 34 87 128 38 369 4.90 1.55 -0.96

Significant Decreases in Satisfaction

Six topics received significantly lower satisfaction ratings in 2012 as compared with 2011 (as measured by the t-test at the 90% confidence level), as shown in Table 1-8. Four of the topics are related to the amount of compute cycles available. NERSC ended three years of flat allocations in 2012, and users had much pent-up demand. As one user put it on the 2012 User Survey when asked, “Does NERSC provide the full range of systems and services you need to meet your scientific goals? If not, what else do you need?”

"No, we are never satisfied. We need more compute power and more on-line disk storage, and we need it yesterday."

The table below shows the questions that scored significantly lower in 2012 compared with 2011.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

TopicNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2011
1234567
HOPPER: Batch wait time  13 17 52 34 87 128 38 369 4.90 1.55 -0.96
Available Computing hardware  1 2 8  6 49 177 228 471 6.28 0.92 -0.20
HOPPER: Overall   1 7 5 31 156 175 375 6.29 0.85 -0.18
HOPPER: Batch queue structure  3 3 15 26 41 164 115 367 5.86 1.18 -0.17
GLOBAL HOMES: Overall  1 2 8 9 12 55 204 291 6.47 1.04 -0.14
GLOBAL HOMES: Reliability   2 2 6 6 42 222 280 6.68 0.79 -0.12

Areas of Most Importance to Users

Assuming that the number of responses correlates with importance, the table below shows the 12 topics that were of highest importance to the users.

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

TopicNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2011
1234567
OVERALL: Satisfaction with NERSC   1 6 3 22 165 281 478 6.48 0.76 -0.05
OVERALL: Available Computing Hardware 1 2 8 6 49 177 228 471 6.28 0.92 -0.20
OVERALL: Services     4 10 20 119 297 450 6.54 0.76 -0.05
OVERALL: Mass storage facilities   1 4 39 30 128 195 397 6.18 1.03 -0.00
WEB: www.nersc.gov overall     1 7 12 146 219 385 6.49 0.67 0.15
NERSC SW: Software environment   1 6 12 21 134 210 384 6.37 0.88 0.02
SERVICES: Account support   1 4 6 10 68 294 383 6.67 0.74 0.00
OVERALL: Available Software     6 31 22 137 186 382 6.22 0.98 0.11
HOPPER: Overall   1 7 5 31 156 175 375 6.29 0.85 -0.18
HOPPER: Uptime (Availability) 1 1 2 6 18 99 248 375 6.54 0.80 0.07
WEB SERVICES: NIM web interface   1 7 11 13 86 253 371 6.52 0.88 0.17
HOPPER: Batch wait time 13 17 52 34 87 128 38 369 4.90 1.55 -0.96

Scores for Large, Medium, and Small MPP Users Compared

Large MPP users are defined to be those who used more than 2 million MPP hours in 2012 (about 0.15% of total usage). There were 145 such users, of which 98 took the survey. This group of users was the unhappiest of all with Hopper batch wait times and with the provisioning of compute resources in general. They had three scores significantly lower than last years’ and eight significantly higher. Overall, they remained very satisfied with NERSC.

There were 542 medium MPP users, who used between 250,000 and 2 million hours in 2012; 197 responded to the survey. This group was the happiest overall, with 15 significantly higher scores than last year, and three significantly lower. This group also remained very satisfied with NERSC.

The small MPP users, of which there were 1,847 in 2012, were on average less satisfied with NERSC than the larger users. They had seven scores significantly lower than last years’ and only three significantly higher.

The table below compares key scores for users based on the size of their NERSC allocation. Significant increases in satisfaction from 2011 are shown in blue, significant decreases in red.

TopicLarge MPP ScoreMedium MPP ScoreSmall MPP Score
Average of all scores 6.29 6.37 6.24
Average of all scores 6.29 6.37 6.24
Average of all scores 6.29 6.37 6.24
Satisfaction with NERSC 6.53 6.53 6.40 (-0.14)
Available Computing Hardware 6.16 (-0.32) 6.25 (-0.23) 6.25 (-0.23)
Services 6.67 6.60 6.47
Available software 6.31 (+0.21) 6.22 6.12
Mass storage facilities 6.12 6.20 6.09
SGlobal Homes overall 6.54 6.55 6.26 (-0.35)
Consulting overall 6.64 6.69 6.65
Account support 6.70 6.80 (+0.13) 6.47 (-0.20)
Web overall (www.nersc.gov) 6.48 6.57 (+0.23) 6.42
NIM (nim.nersc.gov) 6.51 6.59 (+0.25) 6.53 (+0.18)
Hopper overall 6.35 6.26 (-0.21) 6.27
Hopper batch wait time 4.60 (-1.27) 4.91 (-0.96) 5.13 (-0.74)

Survey Results Lead to Changes at NERSC

Every year we institute changes based on the previous year survey. In late 2011 and in 2012 NERSC took a number of actions in response to suggestions from the 2010/2011 user survey.

The lowest score on the 2011 survey was Carver batch wait time.  As explained above, in the Significant Increases of Satisfaction section, NERSC implemented changes to give Carver users more resources to run their jobs.

Another area NERSC focused on was continuous improvement to the NERSC website. When NERSC implemented its new website design in 2011, the primary goals were to improve organization of content, navigation, and searching. To that end, we conducted usability studies with selected users both before and after the design phase. NERSC spent considerable time in 2011 tuning its Google search appliance in order to maximize good search results on the NERSC website. We have also made tweaks to make it easier for Google to crawl our site (a sitemap.xml file, use of meta tags where applicable). We use Google webmaster tools and Google analytics to help with tuning.  The results of these efforts showed up in the 2012 user survey results, with increases in satisfaction for three web topics (searching, finding information, website overall).

Respondent Demographics

Respondents by DOE Office

OfficeRespondentsPercent
ASCR 38 7.9%
BER 99 20.6%
BES 166 34.5%
FES 65 13.5%
HEP 70 14.6%
NP 40 8.3%

Respondents by Project Class

Project ClassRespondentsPercent
DOE Base 380 79.0%
SciDAC 51 10.6%
NISE 15 3.1%
Data Pilot 14 2.9%
Startup 9 1.9%
ALCC 7 1.5%
Guest 3 0.6%
CSGF 2 0.4%

Respondents by User Role

User RoleRespondentsPercent
Principal Investigators 122 25.4%
PI Proxies 73 165.2%
Users 286 59.5%

Respondents by Type of Organization

Organization TypeRespondentsPercent
Universities 247 51.4%
DOE Labs 191 39.7%
Other Government Labs 24 5.0%
Industry 15 3.1%
Private Labs & Non Profit 4 0.8

Respondents from the Most Represented Organizations

OrganizationRespondentsPercent
Berkeley Lab 87 18.1%
UC Berkeley 21 4.4%
PNNL 20 4.2%
Oak Ridge 16 3.3%
PPPL 15 3.1%
Brookhaven Lab 13 2.7%
MIT 12 2.5%
University of Wisconsin - Madison 12 2.5%
Argonne Lab 10 2.1%
UC Irvine 9 1.9%
Sandia NM 7 1.5%
Auburn University 6 1.2%
Cal Tech 6 1.2%
Los Alamos 6 1.2%
NCAR 6 1.2%
Princeton 6 1.2%
University of Colorado 6 1.2%

How Long Have You Used NERSC?

TimeRespondentsPercent
less than 1 year 81 17.1%
1 - 3 years 199 42.1%
more than 3 years 193 40.8%

Score Legend

Satisfaction Score Legend

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied - High 6.00 - 6.49
Mostly Satisfied - Low 5.50 - 5.99
Somewhat Satisfied 4.50 - 5.49

Importance Score Legend

ImportanceAverage Score
Very Important 2.50 - 3.00
Somewhat Important 1.50 - 2.49

Significance of Change Legend

Significance of Change
significant increase
significant decrease
not significant

Satisfaction and Importance Scores

The average Overall Satisfaction with NERSC score for 2012 (6.48 / 7) was the second highest ever recorded in the 14 years the survey has been in its current form. Continuing efforts to enhance the NERSC and NIM (NERSC Information Management) web sites and to provide extensive training opportunities, as well as a doubling of HPSS bandwidth and capacity, contributed to these favorable results.

Overall Satisfaction with NERSC

The purpose of this section is to elicit responses from many users for a few very broad areas.  All respondents answered this section.  The average score for these five topics was 6.35 / 7.

Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

TopicNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2011
1234567
OVERALL: Services     4 10 20 119 296 449 6.54 0.76 -0.05
OVERALL: Satisfaction with NERSC   1 6 3 22 165 280 477 6.48 0.76 -0.05
OVERALL: Available Computing Hardware 1 2 8 6 49 177 227 470 6.27 0.92 -0.21
OVERALL: Available Software     6 31 22 137 185 381 6.22 0.98 0.11
OVERALL: Mass storage facilities   1 4 39 29 128 195 396 6.18 1.03 0.00

All Satisfaction Topics

The average score for the 97 topics was 6.32 / 7.

Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

ItemNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2011
1234567
HPSS: Uptime (Availability)     1 5 2 34 156 198 6.71 0.66 0.01
PROJECT: Reliability     1 1 7 34 136 179 6.69 0.63 0.01
HPSS: Reliability (data integrity)       7 3 34 152 196 6.69 0.68 -0.02
GLOBAL HOMES: Reliability   2 2 6 6 42 222 280 6.68 0.79 -0.12
SERVICES: Account support   1 4 6 10 68 294 383 6.67 0.74 0.00
OVERALL: Security   2 1 13 6 46 256 324 6.66 0.82 0.09
CONSULT: Overall     6 1 17 63 263 350 6.65 0.74 0.06
GLOBAL HOMES: Uptime 1   4 7 9 46 216 283 6.62 0.85 -0.07
CONSULT: Response time   2 6 2 15 70 254 349 6.60 0.83 0.03
PROJECT: Overall     1 2 11 44 127 185 6.59 0.70 -0.02
PROJECT: Uptime     3 1 7 47 122 180 6.58 0.75 -0.00
CONSULT: Quality of technical advice     6 7 12 76 238 339 6.57 0.81 0.04
WEB: Accuracy of information     1 8 11 102 225 347 6.56 0.70 0.10
GLOBAL SCRATCH: Uptime       7 13 56 156 232 6.56 0.74 0.07
WEB: My NERSC     2 8 9 84 196 299 6.55 0.74  
TRAINING: New User's Guide       5 10 66 141 222 6.55 0.69 0.32
OVERALL: Services     4 10 20 119 297 450 6.54 0.76 -0.05
HOPPER: Uptime (Availability) 1 1 2 6 18 99 248 375 6.54 0.80 0.07
WEB: System Status Info   1 5 9 10 74 214 313 6.53 0.86 0.10
NETWORK: Network performance within NERSC (e.g. Hopper to HPSS)     3 8 10 67 172 260 6.53 0.81 0.00
GLOBAL SCRATCH: Reliability     1 10 11 52 154 228 6.53 0.82 -0.05
WEB SERVICES: NIM web interface   1 7 11 13 86 253 371 6.52 0.88 0.17
CONSULT: Special requests (e.g. disk quota increases, etc.)     2 13 8 41 151 215 6.52 0.90 0.07
WEB: www.nersc.gov overall     1 7 12 146 219 385 6.49 0.67 0.15
HPSS: Overall satisfaction     4 5 8 59 133 209 6.49 0.84 0.19
CONSULT: On-line help desk   1 1 10 9 40 127 188 6.48 0.92 0.19
OVERALL: Satisfaction with NERSC   1 6 3 22 165 281 478 6.48 0.76 -0.05
GLOBAL HOMES: Overall 1 2 8 9 12 55 204 291 6.47 1.04 -0.14
CONSULT: Time to solution   3 7 7 22 79 223 341 6.45 0.96 0.05
WEB: Timeliness of information   1 2 12 16 106 203 340 6.45 0.83 0.11
DIRAC: Uptime (Availability)     1 2   5 19 27 6.44 1.09 0.11
CARVER: Uptime (Availability)   1 2 4 8 56 92 163 6.40 0.87 -0.12
NERSC SW: Applications software 1 1 3 13 17 114 200 349 6.40 0.90 0.07
TRAINING: Web tutorials       4 9 61 76 150 6.39 0.72 0.16
GLOBAL HOMES: File and Directory Operations 2 1 6 12 14 61 175 271 6.39 1.08 0.02
GLOBAL SCRATCH: Overall       9 23 73 133 238 6.39 0.81 -0.01
NERSC SW: Software environment   1 6 12 21 134 210 384 6.37 0.88 0.02
NERSC SW: Programming libraries     6 17 16 125 199 363 6.36 0.89 0.00
SERVICES: Allocations process   1 1 17 24 103 179 325 6.35 0.89 -0.00
PROJECT: File and Directory Operations 2 1 4 3 12 44 100 166 6.34 1.12 -0.02
NETWORK: Remote network performance to/from NERSC (e.g. Hopper to your home institution)   1 5 10 28 98 168 310 6.33 0.92 0.11
HOPPER: Overall   1 7 5 31 156 175 375 6.29 0.85 -0.18
HPSS: Data transfer rates 1 2 3 11 7 67 109 200 6.29 1.07 -0.01
WEB: Ease of finding information   1 4 9 33 146 173 366 6.29 0.85 0.22
PDSF: Uptime (availability)       4 3 15 24 46 6.28 0.93 0.12
OVERALL: Available Computing Hardware 1 2 8 6 49 177 228 471 6.28 0.92 -0.20
CARVER: Overall     5 5 16 56 86 168 6.27 0.96 -0.04
GLOBAL SCRATCH: File and Directory Operations   1 5 16 17 55 124 218 6.26 1.08 0.02
SERVICES: Ability to perform data analysis       8 8 42 50 108 6.24 0.88 0.11
PROJECT: I/O Bandwidth 1 2 4 5 15 56 91 174 6.24 1.10 -0.11
SERVICES: Data analysis and visualization assistance       6 7 7 31 51 6.24 1.09 0.17
HPSS: Data access time   1 2 15 12 65 96 191 6.23 1.00 0.02
DIRAC: Disk configuration and I/O performance       3 1 6 12 22 6.23 1.07 0.30
OVERALL: Available Software     6 31 22 137 186 382 6.22 0.98 0.11
PDSF: Batch queue structure     1 4 4 9 24 42 6.21 1.12 -0.17
SERVICES: NERSC databases       6 1 13 21 41 6.20 1.05  
TRAINING: NERSC classes       8 10 23 42 83 6.19 0.99 0.29
PDSF: Overall satisfaction 1     3 1 21 21 47 6.19 1.12 0.02
OVERALL: Mass storage facilities   1 4 39 30 128 195 397 6.18 1.03 -0.00
GLOBAL SCRATCH: I/O Bandwidth   4 6 10 27 58 120 225 6.17 1.15 -0.07
EUCLID: Disk configuration and I/O performance     1 4 4 11 22 42 6.17 1.10 0.19
NERSC SW: Performance and debugging tools     3 25 33 91 118 270 6.10 1.01 0.15
PDSF: Disk configuration and I/O performance     2 4 2 15 20 43 6.09 1.15 0.24
PROJECTB: Reliability (data integrity)     2 3 8 6 24 43 6.09 1.21  
EUCLID: Ability to run interactively   1 2 4 4 7 25 43 6.07 1.37 -0.11
PDSF: Ability to run interactively   1 1 4 5 9 22 42 6.05 1.29 -0.31
HOPPER: Disk configuration and I/O performance 2 2 8 27 34 126 144 343 6.04 1.13 0.05
DIRAC: Overall   1   3 1 12 12 29 6.03 1.21 -0.28
HOPPER: Ability to run interactively 1 2 6 31 23 78 124 265 6.03 1.20 -0.08
WEB: Searching   4 5 19 30 90 105 253 6.02 1.13 0.24
SERVICES: NERSC Science Gateways       9 1 13 20 43 6.02 1.16  
EUCLID: Uptime (Availability) 3   2 2 1 12 27 47 6.02 1.67 -0.26
NERSC SW: Data analysis software 1   2 20 14 61 66 164 6.01 1.12 0.26
PROJECTB: Overall satisfaction     4 2 6 12 22 46 6.00 1.26  
TRAINING: Video tutorials     1 8 4 17 22 52 5.98 1.15  
CARVER: Ability to run interactively 1 1 1 17 9 36 49 114 5.95 1.25 -0.05
TRAINING: Presentations 2     12 13 25 42 94 5.95 1.28  
DIRAC: Ability to run interactively     2 2 3 7 11 25 5.92 1.29 -0.05
HOPPER: Batch queue structure 3 3 15 26 41 164 115 367 5.86 1.18 -0.17
HPSS: User interface (hsi, pftp, ftp) 1 3 13 19 18 61 83 198 5.85 1.36 0.09
PROJECTB: I/O bandwidth   1 4 3 4 12 19 43 5.84 1.43  
EUCLID: Overall 3     4 7 14 21 49 5.82 1.56 -0.29
NERSC SW: Visualization software 1   3 29 19 53 60 165 5.81 1.21 0.11
CARVER: Batch queue structure 3 3 4 18 13 54 57 152 5.80 1.40 -0.00
PROJECTB: Uptime (availability)   1 4 3 7 11 19 45 5.78 1.41  
NX: Overall 2 5 5 5 12 22 39 90 5.69 1.63  
CARVER: Batch wait time 3 4 7 9 29 57 44 153 5.64 1.38 0.48
PROJECTB: File and directory (metadata) operations   2 6 1 6 11 18 44 5.64 1.59  
WEB: Ease of use with mobile devices 1 1   14 5 11 22 54 5.63 1.48  
WEB: Mobile Web Site (m.nesc.gov) 2     16 2 12 17 49 5.45 1.57  
GENEPOOL: Overall satisfaction     3 3 1 7 5 19 5.42 1.46  
GENEPOOL: Batch queue structures     5 2 2 6 4 19 5.11 1.56  
GENEPOOL: Batch wait time 1 1 2 2 4 6 4 20 5.05 1.73  
HOPPER: Batch wait time 13 17 52 34 87 128 38 369 4.90 1.55 -0.96
GENEPOOL: Ability to run interactively 1 1 1 3 2 1 4 13 4.77 2.01  
GENEPOOL: Uptime (Availability) 2   2 5 3 6 3 21 4.76 1.76  
GENEPOOL: Scratch configuration and I/O performance   1 5 3 2 2 4 17 4.65 1.73  

 

All Importance Topics

Users were asked to rate the importance of selected topics.

Importance Ratings: 3=Very important, 2=Somewhat important, 1=Not important

TopicNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
OVERALL: Available Computing Hardware 1 37 407 445 2.91 0.29
OVERALL: Satisfaction with NERSC   49 400 449 2.89 0.31
SERVICES: Ability to perform data analysis 5 16 72 93 2.72 0.56
DATA: Live storage space (real-time access) 14 63 244 321 2.72 0.54
OVERALL: Services 3 133 286 422 2.67 0.49
DATA: I/O bandwidth 13 83 230 326 2.67 0.55
SERVICES: Data analysis and visualization assistance 6 16 40 62 2.55 0.67
OVERALL: Available Software 20 124 208 352 2.53 0.60
OVERALL: Mass storage facilities 16 156 190 362 2.48 0.58
DATA: Archival storage space 28 113 166 307 2.45 0.66
SERVICES: NERSC Science Gateways 8 11 26 45 2.40 0.78
DATA: Adequate bandwidth and space to support cechpointing 21 86 103 210 2.39 0.66
SERVICES: NERSC databases 7 13 23 43 2.37 0.76
DATA: Metadata performance 40 126 108 274 2.25 0.69
DATA: Access to a large shared-memory system for data analysis and vis 43 69 84 196 2.21 0.78
DATA: Data management tools 36 79 50 165 2.08 0.72
DATA: Science Data Gateways 43 51 42 136 1.99 0.79
DATA: Analytics and vis assistance from NERSC 52 75 35 162 1.90 0.73
DATA: Databases 57 45 32 134 1.81 0.80
DATA: Read-Only data files or databases 61 52 25 138 1.74 0.75

All Usefulness Topics

Users were asked to rate the usefulness of selected topics.

 Usefulness Ratings: 3=Very useful, 2=Somewhat useful, 1=Not useful

TopicNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
TRAINING: New User's Guide 6 34 184 2.79 0.47
COMMUNICATIONS: E-mail announcements 4 103 260 2.70 0.48
COMMUNICATIONS: Web Live Status 11 87 249 2.69 0.53
TRAINING: Web tutorials 11 45 116 2.61 0.61
COMMUNICATIONS: MOTD on the computers 24 124 196 2.50 0.63
TRAINING: Presentations 18 47 60 2.34 0.72
TRAINING: NERSC classes 23 52 51 2.22 0.74
TRAINING: Video tutorials 27 41 32 2.05 0.77

HPC Resources

466 of the 481 respondents answered questions in this section.  The average score was 6.23 / 7.

Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

TopicNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2011
1234567
HPSS: Uptime (Availability)     1 5 2 34 155 197 6.71 0.66 0.01
PROJECT: Reliability     1 1 6 34 136 178 6.70 0.62 0.02
HPSS: Reliability (data integrity)       7 3 34 152 196 6.69 0.68 -0.02
GLOBAL HOMES: Reliability   2 2 6 5 42 221 278 6.68 0.78 -0.11
GLOBAL HOMES: Uptime 1   4 7 8 46 215 281 6.63 0.85 -0.06
PROJECT: Overall     1 2 10 44 127 184 6.60 0.69 -0.02
PROJECT: Uptime     3 1 6 47 122 179 6.59 0.74 0.00
GLOBAL SCRATCH: Uptime       7 12 56 156 231 6.56 0.73 0.08
HOPPER: Uptime (Availability) 1 1 2 6 18 99 246 373 6.54 0.80 0.07
GLOBAL SCRATCH: Reliability     1 10 10 52 154 227 6.53 0.81 -0.04
NETWORK: Network performance within NERSC (e.g. Hopper to HPSS)     3 8 10 66 172 259 6.53 0.81 0.01
HPSS: Overall satisfaction     4 5 8 59 133 209 6.49 0.84 0.19
GLOBAL HOMES: Overall 1 2 8 9 11 55 203 289 6.47 1.04 -0.14
DIRAC: Uptime (Availability)     1 2   5 19 27 6.44 1.09 0.11
CARVER: Uptime (Availability)   1 2 4 8 56 92 163 6.40 0.87 -0.12
GLOBAL SCRATCH: Overall       9 22 73 133 237 6.39 0.81 -0.00
GLOBAL HOMES: File and Directory Operations 2 1 6 12 13 61 174 269 6.39 1.08 0.02
PROJECT: File and Directory Operations 2 1 4 3 11 44 100 165 6.35 1.12 -0.02
NETWORK: Remote network performance to/from NERSC (e.g. Hopper to your home institution)   1 5 10 28 97 168 309 6.33 0.93 0.11
HOPPER: Overall   1 7 5 31 155 174 373 6.29 0.86 -0.18
HPSS: Data transfer rates 1 2 3 11 7 67 108 199 6.29 1.07 -0.01
PDSF: Uptime (availability)       4 3 15 24 46 6.28 0.93 0.12
CARVER: Overall     5 5 16 56 86 168 6.27 0.96 -0.04
GLOBAL SCRATCH: File and Directory Operations   1 5 16 16 55 124 217 6.26 1.08 0.03
PROJECT: I/O Bandwidth 1 2 4 5 14 56 91 173 6.24 1.10 -0.10
HPSS: Data access time   1 2 15 12 65 96 191 6.23 1.00 0.02
DIRAC: Disk configuration and I/O performance       3 1 6 12 22 6.23 1.07 0.30
PDSF: Batch queue structure     1 4 4 9 24 42 6.21 1.12 -0.17
PDSF: Overall satisfaction 1     3 1 21 21 47 6.19 1.12 0.02
GLOBAL SCRATCH: I/O Bandwidth   4 6 10 26 58 120 224 6.18 1.15 -0.06
EUCLID: Disk configuration and I/O performance     1 4 4 11 22 42 6.17 1.10 0.19
PROJECTB: Reliability (data integrity)     2 3 7 6 24 42 6.12 1.21  
PDSF: Disk configuration and I/O performance     2 4 2 15 20 43 6.09 1.15 0.24
EUCLID: Ability to run interactively   1 2 4 4 7 25 43 6.07 1.37 -0.11
PDSF: Ability to run interactively   1 1 4 5 9 22 42 6.05 1.29 -0.31
HOPPER: Disk configuration and I/O performance 2 2 8 26 34 126 143 341 6.04 1.13 0.05
DIRAC: Overall   1   3 1 12 12 29 6.03 1.21 -0.28
HOPPER: Ability to run interactively 1 2 6 31 23 78 123 264 6.03 1.20 -0.08
PROJECTB: Overall satisfaction     4 2 5 12 22 45 6.02 1.27  
EUCLID: Uptime (Availability) 3   2 2 1 12 27 47 6.02 1.67 -0.26
CARVER: Ability to run interactively 1 1 1 17 9 36 49 114 5.95 1.25 -0.05
DIRAC: Ability to run interactively     2 2 3 7 11 25 5.92 1.29 -0.05
HPSS: User interface (hsi, pftp, ftp) 1 3 12 19 18 61 83 197 5.87 1.34 0.10
HOPPER: Batch queue structure 3 3 15 26 41 163 114 365 5.86 1.18 -0.17
PROJECTB: I/O bandwidth   1 4 3 3 12 19 42 5.86 1.44  
EUCLID: Overall 3     4 7 14 21 49 5.82 1.56 -0.29
CARVER: Batch queue structure 3 3 4 18 13 54 57 152 5.80 1.40 -0.00
PROJECTB: Uptime (availability)   1 4 3 6 11 19 44 5.80 1.42  
NX: Overall 2 5 5 5 12 22 39 90 5.69 1.63  
PROJECTB: File and directory (metadata) operations   2 6 1 5 11 18 43 5.65 1.60  
CARVER: Batch wait time 3 4 7 9 29 57 44 153 5.64 1.38 0.48
GENEPOOL: Overall satisfaction     3 3 1 7 5 19 5.42 1.46  
GENEPOOL: Batch queue structures     5 2 2 6 4 19 5.11 1.56  
GENEPOOL: Batch wait time 1 1 2 2 4 6 4 20 5.05 1.73  
HOPPER: Batch wait time 13 17 51 34 86 128 38 367 4.90 1.55 -0.96
GENEPOOL: Ability to run interactively 1 1 1 3 2 1 4 13 4.77 2.01  
GENEPOOL: Uptime (Availability) 2   2 5 3 6 3 21 4.76 1.76  
GENEPOOL: Scratch configuration and I/O performance   1 5 3 2 2 4 17 4.65 1.73  

Software

458 of the 481 respondents answered questions in this section.  The average score was 6.24 / 7.

Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

TopicNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2011
1234567
NERSC SW: Applications software 1 1 3 13 17 113 200 348 6.40 0.90 0.07
NERSC SW: Software environment   1 6 12 21 133 210 383 6.37 0.88 0.02
NERSC SW: Programming libraries     6 17 16 124 199 362 6.36 0.90 0.00
NERSC SW: Performance and debugging tools     3 25 32 91 118 269 6.10 1.01 0.16
NERSC SW: Data analysis software 1   2 20 14 61 66 164 6.01 1.12 0.26
NERSC SW: Visualization software 1   3 29 19 53 60 165 5.81 1.21 0.11

Services

454 of the 481 respondents answered questions in this section.  The average score was 6.46 / 7.

Satisfaction Ratings: 7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

TopicNum who rated this item as:Num RespAverage ScoreStd. Dev.Change from 2011
1234567
SERVICES: Account support   1 4 6 10 67 294 382 6.67 0.74 0.00
OVERALL: Security   2 1 13 6 46 255 323 6.66 0.82 0.09
CONSULT: Overall     6 1 17 62 263 349 6.65 0.74 0.07
CONSULT: Response time   2 6 2 15 69 254 348 6.60 0.83 0.03
CONSULT: Quality of technical advice     6 7 12 75 238 338 6.57 0.81 0.04
WEB: Accuracy of information     1 8 11 101 224 345 6.56 0.70 0.10
TRAINING: New User's Guide       5 9 66 141 221 6.55 0.68 0.32
WEB: My NERSC     2 8 9 84 194 297 6.55 0.74  
WEB: System Status Info   1 5 9 9 74 214 312 6.54 0.86 0.10
WEB SERVICES: NIM web interface   1 7 11 13 85 253 370 6.52 0.88 0.18
CONSULT: Special requests (e.g. disk quota increases, etc.)     2 13 8 40 151 214 6.52 0.90 0.08
WEB: www.nersc.gov overall     1 7 12 145 219 384 6.49 0.67 0.15
CONSULT: On-line help desk   1 1 10 9 40 127 188 6.48 0.92 0.19
CONSULT: Time to solution   3 7 7 22 78 223 340 6.45 0.96 0.06
WEB: Timeliness of information   1 2 12 16 106 203 340 6.45 0.83 0.11
TRAINING: Web tutorials       4 9 61 76 150 6.39 0.72 0.16
SERVICES: Allocations process   1 1 17 24 102 179 324 6.35 0.89 -0.00
WEB: Ease of finding information   1 4 9 33 145 173 365 6.29 0.85 0.22
SERVICES: Ability to perform data analysis       8 8 42 50 108 6.24 0.88 0.11
SERVICES: Data analysis and visualization assistance       6 7 7 31 51 6.24 1.09 0.17
SERVICES: NERSC databases       6 1 13 21 41 6.20 1.05  
TRAINING: NERSC classes       8 10 23 42 83 6.19 0.99 0.29
WEB: Searching   4 5 19 30 90 105 253 6.02 1.13 0.24
SERVICES: NERSC Science Gateways       9 1 13 20 43 6.02 1.16  
TRAINING: Video tutorials     1 8 4 17 22 52 5.98 1.15  
TRAINING: Presentations 2     12 13 25 42 94 5.95 1.28  
WEB: Ease of use with mobile devices 1 1   14 5 11 22 54 5.63 1.48  
WEB: Mobile Web Site (m.nesc.gov) 2     16 2 12 17 49 5.45 1.57  

How Useful Are These Services To You?

Usefulness Ratings: 3=Very useful, 2=Somewhat useful, 1=Not useful

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
TRAINING: New User's Guide 6 33 184 223 2.80 0.46
COMMUNICATIONS: E-mail announcements 4 103 260 367 2.70 0.48
COMMUNICATIONS: Web Live Status 11 87 249 347 2.69 0.53
TRAINING: Web tutorials 11 45 116 172 2.61 0.61
COMMUNICATIONS: MOTD on the computers 24 124 196 344 2.50 0.63
TRAINING: Presentations 18 47 60 125 2.34 0.72
TRAINING: NERSC classes 23 52 51 126 2.22 0.74
TRAINING: Video tutorials 27 41 32 100 2.05 0.77

How Important Are These Services To You?

Importance Ratings: 3=Very important, 2=Somewhat important, 1=Not important

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
SERVICES: Ability to perform data analysis 5 16 72 93 2.72 0.56
SERVICES: Data analysis and visualization assistance 6 16 40 62 2.55 0.67
SERVICES: NERSC Science Gateways 8 11 26 45 2.40 0.78
SERVICES: NERSC databases 7 13 23 43 2.37 0.76

Are You Adequately Informed About NERSC Changes?

Yes 290 96.3%
No 11 3.7%

 

Where Do You Perform Data Analysis and Visualization of Data Produced at NERSC?

 

All at NERSC 25 6.1%
Most at NERSC 61 15.0%
Half at NERSC, half elsewhere 80 19.7%
Most elsewhere 114 28.0%
All elsewhere 109 26.8%
I don't need data analysis or visualization 18 4.4%

What Additional Services or Information should be on the NERSC Web Site?

Comments about online documentation

More complete documentation on the compilers, MPI, OpenMP, etc., or at least make it easier to find.

A central point with all commonalities and differences between systems would be good. A description of what is good for what. Descriptions of commonalities and differences between queues on different systems.

common errors and maybe more examples for executing important queries should be useful. I also think that most of the bugs I have had already appeared before, so If there would have been some kind of a forum where users can help each other it could be good.

More more more use case examples/tutorial material. Can't have too much.

There should be more examples on running batch jobs. It is very mysterious and the documentation is sparse.

More information about the usage/compilation of the libraries and software installed in the NERSC machines.

It would be great if the website could provide more information on how to effectively use Mathematica, Matlab and other softwares.

More online training -- what's there is a great start. Video courses are good. Quality is high so far. Useful.2. Ability to click on jobs in queue and change their status (qdel, qrls, etc). Ability to queue jobs from a preset directory. Display time remaining for running jobs. All this for mobile as well. Overall excellent, though.

NERSC Response: These are great suggestions. By early summer, we will release improvements to MyNERSC that allows job/file control. Some of these features are already available at m.nersc.gov, but expect new features for both soon.

I wish it were easier to allocate more memory per process on Hopper (currently, the advice from the website is to simply allocate fewer processes per node).

The online instructions are accurate but not always complete. For example, just a few weeks ago I was trying to use a job array on PDSF and wanted to know the automatic variable to use to refer to the array iteration number of a particular job. This info is nowhere to be found on the web pages. Instead I had to dig through man pages, which gave two different possibilities, and then use trial-and-error to finally figure out the right one.

I've heard many times that slides and presentations from genepool training etc... would be made available online, I've yet to see this happen. The documentation pages for genepool are out of date, policies change frequently and necessitate code changes on our part, the documentation on the web hasn't kept up.

Comments about live status, job information, and the mobile website

A more useful resource monitor like the one at XSEDE would be helpful. https://portal.xsede.org/group/xup/resource-monitorA summary of how many jobs are running, queued on reach resource would be more useful that what's on the live status page currently.

NERSC Response: We have recently added a few more useful columns to the live status page.

easy links to add remove people from repo; better approximations to job start (queue view is awesome, but its hard to tell when the job might start) -- maybe worst/best case / average case?

A NERSC mobile app with an interactive queue structure and realtime notifications

tool for figuring out why jobs are waiting in queue so long; live estimated queue time

Mobile app for android...

NERSC Response: This is in development. Stay tuned.

I have not used the mobile web site but a texting feature to find out balance hours on, say Hopper, would be great. Similar to texting your bank to find out account balance. If this feature already exists, please ignore.

 

NERSC Response: In the mobile website, and soon in the mobile native apps, you can check your account balances. We don’t currently have plans to implement a texting feature (if we get a few more requests for this, we will take another look).

 

NERSC Response: 

Comments about NIM

Within NIM, I am always confused as to how much allocation I have personally used versus the entire project.

It would be nice to see running/queued batch jobs from the NIM interface. (Maybe you can and I didn't see where.)

I would like my listed available hours to be updated more regularly. This is especially a problem when my remaining hours are low, as it can be difficult to know how large a job I can run.

Other comments

The ability to unsubscribe from emails.

A user wiki, perhaps?

I have requested in the past (and I think there is still a ticket for it) that a Google Calendar be created to show the same information (planned and unplanned outages) as the MOTD and status mailing list.

NERSC Response: This has now been implemented

 

I want a web portal (science gateway), accessible from the nersc web site.

NERSC consultants working with JGI, Doug Jacobsen and Kirsten Fagnan, are doing a fantastic job. They are both extremely helpful and knowledgeable and together they have identified or implemented solutions to numerous computing problems at JGI.

The software stack and support has been great. The biggest issues I have are queue waits (way way to long) and variation in wallclock run time. It takes about 2 days to run a 3 hour job, so my throughput is not very good. I think variability in job wallclock is mostly due to I/O, esp run during day vs night, making it harder to keep my est time for batch jobs accurate. Also have had issues with slow filesystem response from hopper frontend nodes, both on GSCRATCH, SCRATCH, and home areas. Often compile times are very slow (not always) and things like ls, cd, or just getting your login prompt can take quite a while. These are not on directories with thousands of files. My guess is all of this is due to lack of physical resources. All my experiences with nersc help and staff have been excellent.

Stay the way you are! Thanks a lot!

Please keep the light shining. Thank you all for the wonderful services provided. Best regards to you all.

The information contained is already complete and satisfies my needs as a user. I cannot think of anything more that should be included in the web site.

My few experiences with consulting have been very good. They understand that they are there to help the users accomplish their goals instead of the other way around. Not all computing centers have figured this out.

Everything is fine.

Seems good for everything I've needed.

NERSC Website together with NIM is pretty comprehensive and very informative as it is.

It is very good.

I think the website is good as is.

Analytics Comments

Comments about analytics software

With most of my data analysis I'm working with SciDB. Not only that SciDB saves me a lot of execution time it also keeps the data in a very convenient form (matrices and vectors) and makes my data analysis easy and intuitive. In my data analysis I'm doing practically the same analysis with different bins of data, saving them all and letting me get only the parts I want is very convenient. for example - for a regular action I do ~3 times a week (normalizing spectra) it usually takes about ~8 hours for each; with scidb it takes no more than an hour (for the most heavy queries). I would like to see a production service offered at NERSC for scidb.

Scidb is extremely useful and the support we got for using it has been very thorough.

I realize this is a space to complain or ask for something, but instead I just want to remark how much we appreciate the fact that NERSC has adequate IDL licenses for its users. We have found that other computing centers have too few. This becomes ever more important as we create larger and larger output data sets, which require us to do our analysis at the supercomputer center rather than copying the files home and doing analysis there.

Our visualization tool needs IDL7.1 to run with, but hopper don't have it. IDL8.0 don't support our tool. Could you install IDL7.1 on hopper?

Sometimes I need a newer version of IDL to do analysis (for example 8.2, while current is 8.0). I got into a jam recently and had to upgrade our own cluster to do the analysis.

having idl and python available on Hopper is all I need. The visualization tools are home grown.

It would be nice if MATLAB were available to run in parallel on more than 1 node at a time.

More matlab licenses on euclid would be nice.

Most of my calculations were based on NAMD software. The data analysis were performed mostly with VMD tcl scripts, and to some extend with R, perl , and python scripts. Time to time I need to visualize my results with VMD to check the data. Unfortunately the network is very slow to visualize, so I transfer the data to local machines and perform the analysis. I am not sure this is because of nersc network speed or local network speed. Another important reason is that the hard disk space provided for the project is insufficient for storing all the data for analysis. Therefore I had to transfer a lot of data to the local machine and keep them till my simulations were complete and then perform data analysis. It would be great to have some additional hard disk space in /project.

I have the Gaussview visualization software on my home and office computers.

I would note that GaussView program is the most useful for me for data analysis and data preparation to execute calculations since I use Gaussian 03 and Gaussian 09 suite of programs. GaussView is not presented among programs available on NERSC. GaussView is a proprietary software.

when uvcdat runs with MPI i will be much more productive

Visualization is a bull buzzword. My visualization needs would be complete if someone made something to plot stuff efficiently over the network through something akin to pylab. But "visualization" experts instead prefer focusing on useless stuff.

PDSF is way behind on the current ROOT version. Right now it has 5.22.00 and the latest version is 5.34.xx. There are a lot of features that I use regularly which aren't in 5.22 but are in 5.34. This alone has discouraged me from using PDSF more. However, this could be more an issue with the STAR framework than with PDSF itself.

I do a lot of my visualization and analysis using MathCad worksheets and there is no support for that at NERSC.

This is probably not a NERSC problem, but at Tech-X we have not yet figured out how to install Vorpal Composer on Hopper. Also, I've used Euclid for post-processing. I haven't missed it yet, but I wonder if I will. NX is nice, but the FVWM GUI could be better. For example, there are other GUIs that allow the user to change font size and color of terminals.

My data analysis/visualization is using a proprietary program designed by model developers which is not a very useful tool. My ability to analyze data is limited by the program itself and the slowness of how it runs. I doubt it has anything to do with NERSC resources.

Visualization software such as GaussView is not available/

I have everything set up locally. It is not easy to get the libraries right to recompile analysis programs on the NERSC machines.

1. It would help my work if there is firefox installed in Carver as in Euclid, since Euclid is going to retire soon. The firefox helps us to view the html figure files which are created when using programs like AMWG diagnostics or ice diagnostics that we use to analyse the climate simulations using CESM. 2. Since Euclid will be retiring soon, a visualization software VCDAT/UVCDAT installed on Carver will be very helpful for many of us accessing this computational facility from LLNL. VCDAT and UVCDAT are freely downloadable softwares available at http://uv-cdat.llnl.gov/ .3. More licenses on Matlab will be very helpful. Thanks. Subarna

As of my last look at this nersc didn't have any capacity to analyze wavefunction data. Realtime calculation of density grids from this would be useful. This is the significant visualization problem that high performance computing could make more interactive that I'm interested in. My understanding is there aren't enough users this would be useful for. Right now I run post processing code on nersc which basically makes 1 grid file per state from the wavefunction data (there can be thousands). Then the files need to be shipped to the local machine.

NX / remote performance comments

I run VisIt on euclid and now on carver and display it through nx to my off-site workstation. VisIt is very good and runs well on carver. However the process size on nx quickly exceeds the size limit and crashes. The nx process size increases quickly when looking at 3-D data in VisIt. I'm trying to render a thousand frames to make a movie. nx tends to crash after about every 60 frames making it impractical to render a movie. Any help on this would be appreciated. Thank you.

NoMachine now makes it practical to do some online analysis from the East Coast ... though the FVWM wm is inconvenient, and the picture quality isn't that great. In any case, NX is a transformative tool. Not much we can do about the 71ms latency from Boston to Berkeley -- speed of light.

I usually collect data needed, download them to local machine, and analyze locally. I tried 'nomachine' before but not very satisfied with the speed and interface. Anyway, I guess it's just a habit.

I prefer to do a visualization on my local computer because X-Window connection is slow. NX helps, but then I cannot use my favorite software for programming (Eclipse)

I have tended to analyze data elsewhere during my heavier usage periods in the past, and this is mainly due to bandwidth issues slowing the tunneling of visualizations.

My main complaint is regarding my inability to use the NX server. This prevents me from having a straight forward way of using UNIX tools such as Molden. It would also help if I were more aware of what other visualization / analysis tools are available that might be useful to me. However, this is not necessarily a flaw with NERSC, it maybe a flaw with my willingness to spend time to research what other tools are available that might be useful to me.

Sometimes running data analysis scripts that require substantial computational resources as well as an X window environment (via X forwarding) and (in my case, at least) python on (Hopper) compute nodes is difficult or doesn't work properly.

sometime we need to transfer the files to the local computer for analysis. we hope we can transfer big files quickly.

Comments on analytics hardware platforms / queues

I find having to submit a normal job through Hopper for visualization, say, with VisIt to be quite annoying. This often takes a long time in the queue. I know alternatively I can run on Euclid, but there are some severe limitations on the size of a job that should be run there. I much rather prefer analyzing data on a dedicated cluster for visualization, such as lens at OLCF. Often I find myself transferring my data using Globus Online (and several TB of data at that) to OLCF for analysis, where things tend to run much smoother.

Need more machines like the (now-retired) Euclid. Not everything is convenient to do with batch jobs

My data analysis needs often require additional jobs submitted to the queue. These are often short jobs, 30-60min., but the waiting time inside the queue on hopper is too long for me for these jobs. Even the debug queue takes often quite a bit of waiting time and not all of the analysis jobs fit into a 30min. time window.

quite often login node on carver is very busy with some file operations and too slow for data analysis.

short or single processor analysis jobs still have to wait in the queue a long time it is hard to install software the scratch directory is deleted often and the home directory is small

Euclid is limited in memory usage and cpu time for interactive data analysis. Hope that the new replacement will be better.

My data analysis and visualization need were mostly met at NERSC, and overall I am satisfied. What could be improved for my purposes is the speed of I/O on the file systems. I had to analyze about 40 TB of data spread over thousands of files, and the bottleneck was reading in the data. Sometimes performance would be good, and other times the entire compute time would be spent trying to read in one file.

The down times for Euclid this past year were a significant "disruption".

recently I had difficulties since I used Euclid for visualization with visit and Euclid was not working well.

Comments on analytics training

Could use some more online courses on viz, such as VISIT and VTK.

There can be more trainings about the available data analysis tools at NERSC.

Possibly unawareness of the possible uses for my application?

I need to learn how to use NERSC better, and make it part of my workflow. Right now, I sometimes start trying to do things on NERSC, and if something doesn't work, I give up and do it my old way.

Other comments

Have not tried to use NERSC analysis and visualization capability because what we have at CMU is adequate for now.

I have been using NERSC for too little time to appreciate the many resources available. Having most of the analysis and visualization tools available at my home institution I haven't explored the opportunities at NERSC.

My needs are being met on my desktop so far, so I simply have not bothered trying doing it at NERSC. But I may need to in the coming year.

My data visualization needs are minimal, except for a few isolated cases.

I will frequently do quick checks and plots at NERSC, but if i have to move any files to do analysis, I will just copy them to a local machine, and do data analysis locally. Our data output is not terribly large, so this is not a major issue.

Future Architectures

HPC architectures in the coming years are expected to rely heavily on processors that contain many, light-weight, low-power compute cores. This shift away from traditional "heavy-weight" processing units may require a fundamental change in the way you write your programs and run your jobs. Another trend in HPC is the growing importance of data. In this section, we are gathering information that we hope will help us help you transition to the many-core era and make sure we meet your data needs.

Accelerators and Many-Core

The table below shows which accelerator technologies respondents had used.

Accelerator ArchitectureNum. Responses
GPUs 102
Hardware Multi-Threading 63
Intel Phi (MIC) 21
IBM Cell 12

The table below shows which many-core programming models had been used by survey respondents.

Num. Responses

Programming ModelNum. Responses
OpenMP 193
CUDA 79
pThreads 48
CUDA Fortran 29
OpenCL 24
Other 18
OpenACC 13
Thrust 12
Coarray Fortran 11
Intel TBB 11
UPC 9
Intel Cilk 5
CAPS HMPP 2

What are your plans for transitioning to a many-core architecture like GPUs or Intel Phi? How much of your code can use vector units or processors?

49 respondents had not yet started to transition; some are waiting for  better multi-core support.  45 respondents had codes that use GPUs or were investigating the use of GPUs.  18 were investigating general multi-core code enhancements (usually via OpenMP).  18 respondents were considering the Intel Phi (or MOC) and 6 said they were done.

Our code uses GPUs / we plan to transition to GPUs

Our DFT code (GPAW) already runs on GPUs. Our QMC code, qwalk, needs extensive rewrite to run on GPUs.

Most of the MD codes are or have recently been updated to accommodate GPU-based computing (hopefully Phi soon). For many of the problems my group works on, however, there is not a huge speed advantage to using GPUs (although we could use fewer nodes).

We have plans to port some of our in-house developed code for MP2-MD to CUDA/GPU. We have great hope that we might be able to use NERSC resources for this task in the future.

CP2K is gpu ready and we are working hard to develop this capability.

80% of my code can now use GPUs.

Most of the methods we use already have a fully CUDA-ized or OpenCL-ized implementation, either written by us or imported from Europe. Challenge to gaining more experience/performance/better planning info is that there are too few GPU systems of sufficient size that we can actually do production science. **** Production science is all that counts **** For production science (computational materials/chemistry, with quantum methods based methods) we need a system that gives hopper-like turnaround for jobs of order 100 nodes (where a node is 16 cpu cores+a Xeon phi or GPU for example). i.e. Dirac needs to be >5x bigger.

We are working toward hybrid OpenCL / MPI programming. I have never had much luck with vector units when relying on traditional compilers to find opportunities for vectorization in a C++ code.

Plan to use OpenCL in future.

We have a GPU version code available now.

I am planning to use C++11 and dynamic parallelism with GPUs, but on another machine.

I plan to continue to increase the use of GPUs as code becomes available and as I find algorithms that can port well to the GPU. A subset of my simulations make use of HOOMD-Blue, while the rest utilize LAMMPS (as of yet, GPU performance of lammps for my applications is lower than cpus alone). Several of my analysis codes make use of the GPU, however these are typically run locally.

I plan to use GPUs for chemistry applications

I plan to transition to GPU like environments. I am still waiting for models/standards to settle. Problems I do are perfectly parellelizable.

I am waiting an open standard for GPU, like openMP. Not sure if openCL does it or not.

The code I typically use, NAMD, already has some GPU acceleration.

eventually I would like all my code to run on the GPU's, right now the important one for me does but I hope to port others over.

We have transitioned to all GPUs for our simulations. It would be great to be able to run on more GPUs!

I run exclusively on GPUs.

We are testing GPU optimization on Dirac. If significant improvement can be demonstrated we will move to other many-core architecture.

One of our codes PSTGF has already been ported to use on GPU's

We use LAMMPS for all of our simulations - LAMMPS now runs on GPUs. In the next year I plan to test my applications on gpus.

Most of my code is translated to GPU code already. I also plan to code for Intel Phi. All my code can easily be parallelized to use vector units/processors (at least by OpenMP).

I mainly use a code (NWChem) that is developed elsewhere (PNNL) and is ported to the NERSC computers. There is an early GPU version of NWChem being developed and I am looking forward to see it on the NERSC computers

My group is making very extensive use of the 64 new nodes, each with 12 intel cores and aTesla 2090 GPU, on Pleiades at NASA Ames to run our Sunrise radiative transfer code to make realistic images of galaxy simulations including stellar source spectral energy distribution evolution and the effects of dust in scattering, absorbing, and re-emitting the stellar radiation. The GPUs are especially used to calculate the dust emission self-consistently. If we just use the intel cores, a typical job takes 48 hours. With the GPUs the same job takes 4 hours, a 12x speedup! We have used at least 2 million cpu hours on this system, and continue to use it heavily.

My code is a new CFD/ocean model, and is currently all MPI. When the code is running and tuned, I plan to migrate portions of the model to GPU (eg the pressure solver). The code will run in a hybrid cpu/gpu mode.

We are presently testing different parts of an electromagnetic field solver (the AORSA full-wave code) on GPU architectures. These tests are being carried out on the Titan supercomputing platform at Oak Ridge National Laboratory. We think the speed-up we can achieve using a GPU architecture even for only parts of this field solver calculation could make it possible to perform 3D simulations of RF antenna structures used to heat reactor size fusion plasmas.

Most of my code uses GPUs.

I plan to try out GPU support (probably via CUDA Fortran) on my desktops and then transition to NERSC.

I mostly use Hopper to run NAMD which has a CUDA compiled version which has large gain in performance compared to CPU version. If CUDA NAMD were available on NERSC machines I would like to transition into it.

I have an opencl code that I plan to run extensively at nersc this year. This code is extremely accelerated by using gpus.  I am intrigued by Intel Phi because it promises a more transparent programming model, but I have not yet looked into it. If it were available I would begin writing code for it.

We are in the process of porting certain parts of our codes to GPUs using CUDA and OpenACC for large-scale production runs on Titan at ORNL. Several members in our group have used Dirac for this purpose. However, our codes tend to be memory bound, and I expect to get at most a modest speedup from the use of GPUs.

We are actively looking into using GPUs in our MD simulations. The codes we use already have at least some support.

The computational complexity of our current codes make massive parallelization fro the GPUs difficult. Our near-future plans are limited to porting a multigrid electrostatics solver to GPUs.

There are a few cases where the GPUs would be clear winners. However, they will need to prove themselves greatly superior to maintain another stream in a large suite of codes

The codes I use are currently being developed to take advantage of GPUs.

We have plans to migrate expensive parts of the code to GPUs, but it requires significant rewrites. I believe a lot of this effort will actually end up being wasted, because, currently, it requires maintaining two parallel versions of the code, one in CUDA Fortran and one in Fortran 90, since CUDA Fortran does not currently cross compile to CPU only machines.

Currently, our code S3D is undergoing a major rewrite first from an MPI to Mpi/OpenMP version and then to an openacc version with accelerators. Considerable portions of S3D can use vector processors, but the parallel performance seems to be a bottleneck.

Our group has a cluster with 64 GPUs and the center I'm working at, PNNL EMSL, has just purchased a new machine with intel MICs. We have several development codes that can use GPUs. We plan to integrate these developments into our main codes during the next year.

We can get performance improvements from GPUs and we expect to be able to get them from Intel Phi but we have no experience with the latter yet. With respect to GPUs the main issue remains having to rewrite code to exploit them effectively. Major kernels of our code will not port to GPUs with the current methods, however expressing the problems on real space grids is likely to produce code that is much more amenable to GPU processing. The major issue here is the scale of undertaking such a rewrite process. This is likely to take years to complete and we have to be careful doing it as many expect GPU style processing to get absorbed in emerging CPU architectures. Hence new approaches have to port to those architectures as well.

Right now I mostly use a hybrid code where nonlinear fluid equations are coupled to kinetic closures calculated via PIC method. Therefore it is expected GPU will be useful for the PIC part of computation.

We have someone in our group looking at GPUs. I think about 50% of our code would benefit from GPU. The other 50% is spent in PETSc libraries, so if they could make use of GPUs, then up to 100% of our code would benefit.

I plan performing a lot of simulations using HOOMD-Blue, the powerful molecular simulation engine optimized for GPUs.

I plan to transition all of my codes to GPU. I don't know much for Intel Phi..If it means that we can modify less codes for transition, it is worth to try.

Code transition to multi-core (including OpenMP and threads) started or being investigated

We are working on using OpenACC and CUDA Fortran. We are investigating whether GPUs or Intel Phi makes more sense for our code.

I'm trying to make this transition, as time permits.

Unclear at the moment but we are looking at this.

I am beginning to explore this. My code is written using the STAPL library (http://parasol.tamu.edu/stapl), which provides a higher level of abstraction for my applications. Within STAPL, we're exploring how to perform data distribution across the memories of the many-cores and how to implement our communications library using their primitives.

Right now my only plan is to thread my programs. I don't know much about coding on GPUs or the Phi system.

We are developing primarily in openMP now and would actually like to see more share-memory systems at NERSC rather than just the multi-core chips on individual nodes. With the loss of Euclid, the will be no good shared memory environment larger than a single compute node.

As of now, we're not ready. But our colleagues are testing on these new computational environment and we are closely working with them to acquire necessary skills and to modify code if needed.

Our code makes use of MPI and/or OpenMP. There are currently no plans for using GPUs or Intel Phi.

Plans are to start off with OpenACC and OpenMP implementation of loops.

We are currently investigating this, and we have purchased a medium cluster that has both GPUs and Phi's, as well as having allocations on Titan and Blue Waters for GPU testing. Part of this investigation is to determine just exactly how much of our code can leverage such accelerators.

I plan to use openACC and openMP in the future. The only drawback of openACC is that it is commercial. So, I might need CUDA for sometime.

Preliminary stages of planning.

50%

I already make extensive use of MPI and OpenMP. It would be nice to retain as much of the programmer interface as possible.

I would like to be able to use a many-core architecture. However, we are currently working on openMP+MPI version of our code. The next step will be then to investigate what parts of our code can be efficiently run on GPU.

Gradual phase in of accelerators (off loading computation heavy physics calculations) while technology is developing. Enter an exploratory phase for testing the feasibility of using many-core architecture for bulk of the code work.

Most of my code can be readily transitioned to multicore architecture/vector units.

I am already working on many-core architectures, and most of my code is highly parallel.

I found the transition from pure MPI to a hybrid of MPI and OpenMP relatively straightforward, though time consuming. My experience with GPUs has not been very positive, and I am waiting for things to settle a bit before re-engaging.

Plan to transition to (or investigate) Intel Phi

Spend 1 year playing around with many-Phi machines before transitioning all my production codes to this model.

I do not believe that GPUs are that useful for general-purpose HPC. Our codes are already quite complicated to deal with. We already spend most of our time on debugging and optimizing. Adding GPUs, in my opinion, would make our codes unmanageable. However, we plan to try intel phi because of its relatively simple programming model. Significant part of our code uses vector processors.

Intel phi, largely because of the stampede architecture. Also, the phi seems better suited to standard mpi calculations as compared to GPUs. Being able to run without code restructuring is an enormous benefit.

Looking forward to Intel MIC, memory on GPUs too small.

GPUs so far have not proven to be a great fit, much of what I do involves moving large amounts of data and that's annoying on them.  I'm very interested in Intel Phi, is there a test system available?

We are developing code for Intel Phi processors and are considering OpenMP.

HPC resource at my site will contain Intel Phi coprocessors. Will build application versions to run as written on Phi, may try re-writing some performance-critical pieces.

We are in the currently exploring the usefulness of MICs for our primary application. In general, a better integration/correlation of callpath level performance AND vectorization status would be helpful.

Code transition to multi-core done

My research is on multi-core architectures so most of my code uses such processors.

I currently maintain both an MPI-parallel and OpenCL-parallel version of my software, and am working on a hybrid MPI+OpenCL parallel version.

Our main code framework is designed around the use of multiple options for acceleration/threading. The compute-intensive component of the framework has been completely ported to all available current options.

I have already ported my codes to CUDA and MIC.

100%

Our code has been design with these new architecture challenges in mind and is running currently on Cell and GPU accelerated systems as well as on standard clusters such as Hopper. We identified the most compute intensive part of our code and ensured that it can be easily swapped in and out and optimized for any architecture available.

Have not yet started to transition, waiting for better multi-core support

As electronic structure programs transition to many-core architectures, we will follow. Others in my group use classical molecular dynamics and monte carlo, which are more trivially migrated to many core.

Adapting our code for vector processing would be very difficult, beyond the basic "linking-to-a-vectorized-BLAS" strategy. Exploiting MIMD many-core architectures is also difficult, but much more practical and feasible for us.

My code right now is serial.

We are taking a tentative approach because we must consider the cost-benefit analysis. From what I have seen, GPUs and the Intel Xeon Phi perform rather poorly for most applications within our domain. Vectorizing complex codes such as ours is rather complicated. We rely on many external libraries which must be updated in order to take advantage of newer hardware. We would prefer to use the Intel Xeon Phi since we do not need to use a different API or special function calls for native execution. However, Intel support is lacking. MKL performs abysmally on the Phi and support for STL and C++ is borderline laughable. The main issue for us is that GPUs within the HPC market are relatively mature, but require special effort to program for unless you want to use OpenACC, which is itself lacking in features. Also, the Intel Xeon Phi is an immature platform, but has the benefit of not relying on an exotic API. Neither is particularly appealing to us. Although we have the necessary expertise within our group, we do not have the funding to update our software to achieve optimal performance. Any changes we would make would necessarily need to be quick and dirty.

I have none. My code is not set up to run like this. I work with a heavily modified version of Gromacs 4.5.4. Rewriting it would be a major time sink with a learning curve.

not much at this time. Plans to make it more compatible when resources will justify it.

Significant re-programming will be needed for my existing code.

Currently, none of my code can use many-core accelerators. I have not funding to rewrite the models I use. So, for the time being I plan to continue using just the main CPUs and not the accelerators.

Currently I do not have any plans to use GPU architectures.

I am primarily a vasp and turbomole user. Improvements have to be made in these codes. My code runs very fast and just calls these codes.

Still not successful in speeding up Fortran MPI codes

None. Don't really see the point.

At this moment I don't have specific plans. Switching to these types of architectures would require significant effort and it is not clear what is the best way forward.

My code cannot use vector units.

No explicit personal plans, although the codes I regularly use are likely to be upgraded by their respective development teams.

We are waiting for compiler directives that will transcend vendor-specific extensions. Hopefully, OpenACC will work.

Most of our code currently does not benefit from high parallelism.

Till now I have exclusively been running VASP on Hopper. If VASP programmers developed an upgrade that made for better scaling or allowed better performance on GPUs I would take advantage. My programming nowadays is pretty minimal.

There is so much happening on the market that it seems prudent to wait for another year.

The codes that are most important to me run mostly very well on vector processors (e. g. NEC-SX9). I have never tried to make to make it work in CUDA, OpenCL, etc.

I'm familiar with GPUs, but not with Intel Phi. Transitioning would depend on how much resources I can use. In other words, policy of NERSC would affect my transition to which language.

I don't have any immediate plans because I do not want to spend the time to learn how to write e.g. code for GPU architecture, but as front-end languages for many-core architectures continue to develop, I will likely switch to such architectures in the future.

Very little part of my codes can use GPU and I do not plan to work much on GPU coding only unless computing resources will move in this direction. I am interested more in exploiting hybrid MPI-OpenMP solution.

Still under study and planning but probably will make a transition this year.

I'm an end-user of code written by others, so I have no useful response here.

Not sure yet. I do tomographic reconstruction, which is a bunch of fft's, so using gpu's would make sense.

15+ years ago, many of our types of codes relied on vector processing on the older Crays that were prevalent then. It will be interesting to revisit this with modern hardware.

I don't know the answer to this question as I'm not familiar enough with the terminology for GPU, Intel Phi etc.

The code we use was written for MPI some years ago. Transitioning away from MPI will be a big blow.

As molecular dynamics codes transition to these new hardwares I will use them.

I know that some computational biologists are adapting some code to GPUs architectures, then I may be interesting as a code user first, but it's not my short-term plans. I don't know about Intel Phi.

I think we could in theory use these things without too much work, but we are a bit hesitant to change our codes to use technologies that may not be the "final" solution, i.e. maybe GPUs are somewhere on the way to what will be used in the future, but it's hard to want to commit a lot of development resources if we are not positive GPUs will have staying power.

I am using a multi-processor software package 'Quantum Espresso' that is developed by somebody else. Being one of the software packages that nersc supports, this program works quite smooth. From what I gather, 'Quantum Espresso' runs on GPUs but I'm not aware of the latest developments.

We do not currently have concrete plans as such. Our main production code is a high-order finite element solver implemented in a matrix-free fashion. Fortunately, this is an area for which there is a considerable body of literature on GPU / coprocessor implementation.

The structure of my main codes is not well suited to GPU-type many-core architecture. Nonlinear fluid models of plasmas require solutions of partial differential equations that are very tightly coupled globally and require fast global communication. This is one of the major weak points of this type of architecture. The relatively small memory/processor allowance of future architectures may become another serious limitation (for present Hopper-type configurations, it's not a problem). The code still retains a great deal of lower-level vector-processing structure from the original transition from vector machines to MPP, but the higher level part of the global solutions produce the bottleneck. My uses of data analysis and visualization are also becoming strongly global and time-dependent, so data access and storage and also the available, working versions of visualization software for larger jobs is rapidly becoming more difficult (even 1000 cores on Hopper).

I believe that the code developers are currently working to take advantage of the GPU architecture. I am not one of the code developers so I cannot comment more.

We don't have any plans yet, since we're not sure this type of architecture will save us time.

to particle-in-cell simulation, memory is more important. GPU is not attractive to pic simulation.

No current plans.

none

Unknown for now. Probably in the next couple of years.

No plans.

not sure

Not yet evaluated in detail.

No plans

None

No plans. Most of codes I use multiple processors.

Most of our code cannot use GPUs.

no plan

How can NERSC help you make the transition to many-core?

28 respondents stated that online tutorials, presentations or documents would be helpful; 15 requested testbeds; 14 highlighted the importance of good code examples; 11 said it was important for NERSC to provide software optimized for the new architecture, and 10 requested code porting or consulting support.

Provide testbeds

NERSC can help by making the latest accelerator (from NVIDIA, Intel, AMD) available as well as giving access to a variety of architectures (from different vendors as well as multiple generations of technology from the same vendor (such as NVIDIA Tesla, Fermi, and Kepler architectures all have different characteristics))

I am excited about NERSC's Dirac cluster, and would like to see as much emphasis in this direction as possible (more, newer GPUs, perhaps a mix of AMD + Nvidia?).

We need resources. We need access to hardware and technical support.

A Dirac-like many-Phi cluster as a testbed would be a great idea.

Get a larger GPU (preferably K20) cluster. Get a new Intel Phi cluster.

Does NERSC has plan of build some test cluster of Intel Phi?

A smaller scale testbed system like DIRAC will probably be most useful for us at this point

Making these systems open and available for code testing and debugging would be beneficial, though the transition is already in progress.

We have become aware of the Dirac cluster. For now we intend to use it to test the codes we use.

I would like to try Intel Phi, if it is available I will use it.  I have not yet tried the GPU systems but plan to: I can't yet comment.

Tutorials/workshops and perhaps a (more publicized) small testbed to tryout.

Increase the availability/accessibility of many-core of special architectures such as GPU and Intel Phi.

A new test system using the intel 60 core cluster on a card? Online training would be useful.

Make many-core environments more available to users.

Provide code examples

Keep up-to-date examples on the website.

Examples are crucial. There should be sample code, sample scripts, sample everything. It might be useful to provide simple examples that work badly as well as those that work well.

I would appreciate: a) a side by side comparison of the various multicore methods and their pros/cons. I currently am aware of (and use) a few of the ones listed above, but it's nice to have them all in one place to determine which is best for my application ...

100 line examples for simple operations like matrix multiply.

One obstacle is inserting CUDA calls into Fortran code. It would be nice to have more examples with MPI/Fortran/CUDA mixing (the one example I know of is lifepackage.tar, which uses PGI).

Working examples to download from the web pages.

Not sure. Providing simple examples to moderately complex examples for the vendor that NERSC chooses could be useful. It depends on whether vendor-specific extensions (e.g., CUDA) are going to be the only option.

Simple tutorials. Like how to do threaded fits to simple datasets but have a shared output resource that need to be managed. Something that'll get the basics across with people could then cannibalize to make their own multicore programs.

Always, step-by-step presentations with very simple jobs, like sorting, are helpful.

Provide software optimized for the new architecture

If VASP and quantum espresso were compiled for many-core, I would use those.

My codes are written by others. It would be too much to ask for NERSC to make them multi-core.

Nothing that is not already available. Profiling and debugging tools are what we might require and NERSC's repository of these is very good.

We are already using openMP. An openMP-like pragma language for accelerators would be best for us.

Other than writing cross compiling CUDA Fortran compiler, I'm not sure...

 Solid user environment where the common/key env variables and paths are defined.

Usually the PNNL developers install the latest version on NERSC computers.

As a user, having a good programming environment in terms of compilers, libraries, etc. helps a lot.  It makes the transition easy.

What I heard was programming environment was more complicated, i.e., we needed to rewrite the code. Yet, some new tools are much more user friendly. Introduction of these tools could be very useful.

compatible libraries and tools with respect to general tendency in the field to allow the usage of common tools also developed outside nersc.

Provide consulting / code porting help

I could use help with GPU strategies.

We will need help developing the code for such type of architecture, and testing it.

Not sure yet. There are a number of possible approaches, both short and long term, some of which I am working on with the aid of various NERSC people, including the VISit group.

We could probably use advice on how this might be done. I think we have the technical capacity, actually, but are conservative for the reason stated above.

I need help writing the code and porting it to NERSC.

Rewrite application software to use accelerators?

I run climate models, which are complex codes. There are portions of the code suitable for many-core architecture, however the level of effort for the transition requires more than code rewriting. The large memory footprints and manipulation of large data pose challenges for the implementation on GPU.

We are running multicore adaptive mesh refinement hydrodynamic simulations on Hopper nodes now, using OpenMP.  If you plan to install many GPUs or MIC chips on Hopper or another NERSC machine, we would like to work with you to port our adaptive mesh refinement hydrodynamic simulation code ART to this new hardware.

Improving the Support service!

Support and funding to convert community climate modeling code would be very helpful, specifically WRF and CESM. This code is used by a wide range of atmospheric scientists and currently I am unaware of any serious effort to make these codes fully accelerator capable.

NERSC should tell DOE to stop funding legacy applications and invest in new codes.

Provide online tutorials / presentations

Provide introduction tutorials + relevant links.

More tutorials on OpenACC and Cuda

Provide tutorials.

a web-based tutorial on GPU etc would be very helpful.

on-line tutoring

Offering tutorial would be helpful.

tutorials for various platforms are nice. eg..python tutorial or Fortran..links to sample cuda code which works in each of these languages.

Tutorials would be appreciated.

Some basic tutorial information would be helpful: a set of small sample programs and instructions on how to run them.

A set of basic to thorough tutorials that demonstrate the difference between the many-core vs contemporary technologies and examples of what changes must be made to existing codes...

Tutorials beyond "hello world" kind of programs.

Although I ought to watch on-line tutorials, it's usually sufficient to find a set of slides that lays out examples of how to program (or re-program) a code to take advantage e.g. of GPUs. Showing examples in both C/C++ and Fortran is also really helpful.

Provide more online documentation

some fundamental knowledge of CUDA programming.

Providing up to date documentation with extensive examples of available libraries would be the most helpful thing you could do.

Put more information on the web concerning Dirac. There is nothing there about how to run an OpenCL code (last I looked). When I inquired the answer seemed a little nonchalant, although we did get a solution. ...  NERSC has been first-class in the MPP arena for a long time, yet in this area I don't sense much urgency.

 (b) links to/local tutorials/documentation on the use of said multicore methods.

NERSC consulting could do a better job in explaining that converting codes to effectively use many cores will more than likely require significant changes to them. Stop gap measures such as using openmp in a simple way are not likely to be effective for many codes. Unfortunately, the complexity of new HPC systems seems to only be getting worse, and in my opinion for HPC centers to be viable in the future will require that their users try to keep up with the changes in technology (there's no reason for a HPC center to have the latest HPC technology if its users aren't using it). This may need to be done gently, but it needs to be done. Otherwise, NERSC and other HPC centers will not be able to differentiate themselves from cloud computing centers.

It would be helpful if NERSC provides recommending links of website about relevant information in use of those languages.

List of simple do's and don'ts based on experience on NERSC machines.

In first place explain if there is a need and time line to do this transition. This and the previous question give the impression that a regular MPI based code will not be supported by future hardware (like the successor of Hopper). Should every user of NERSC have such a plan?

Web pages with help for beginners.

I also rely heavily on the documentation on the website. The more information and presentations available, the easier one can use the system.

Provide workshops / classroom training

I think specialized classes and workshops would help.

Provide more training and more online documents.

I am very interested in this. Is it possible that NERSC has some training sessions for this year. Teach us some basic techniques. Once I know how this is implemented, I can purposely tune toward this many-core.

Tutorials and training classes.

Workshops can be helpful.

Other

Ultimately in planning to go through these kinds of transitions the crucial thing is to know what will work and what won't. So detailed information on the architecture along with performance data and benchmarks that users might be able to run for themselves to try out usage scenarios might be the best means to inform the code design process.

We are not entirely certain at this time. It would be good to see some profiling for benchmark codes representative of a few kernels and/or communications patterns "common" among users' codes. As a user, seeing the difference in performance and implementation between coprocessor-aware and non-aware versions of codes near to my problem domain would be quite interesting.

Provide good debug queues

Keep debug queues very open.

Make it easy to run smaller test & dev cycles of code.

Provide more memory

Don't impose "artificial" hard (and soft) memory restrictions. Leave it as flexible for the user as possible.

Most importantly, by providing sufficient memory per physical node. Already on hopper I can often enough use only half of the nodes' cores or less, simply because there is not enough memory in the node.

Nothing / don't know

Great question.

Do not have specific answers to this question at this point.

none

by not buying such a beast

The problem is not with NERSC. The problem is for me to find the time.

I just need more time in the day! Seriously, though, I have some fears that in the march to many core architectures, if my code performance suffers before I have the time to make a transition, it would not be great for me personally. I understand NERSC's need to maintain and develop cutting edge resources, though, so this is just something I'll have to deal with.

At this point the work is all on my end in deciding which approaches to attempt first.

already there

Already there!

Importance of Data Resources and Features

Users were asked to rate the importance of various data resources. The two most important resources were disk space and I/O bandwidth.

Importance Ratings: 3=Very important, 2=Somewhat important, 1=Not important

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
Live storage space (real-time access) 14 63 243 320 2.72 0.54
I/O bandwidth 13 83 229 325 2.66 0.55
Archival storage space 28 113 165 306 2.45 0.66
Adequate bandwidth and space to support cechpointing 21 86 103 210 2.39 0.66
Metadata performance 40 126 107 273 2.25 0.69
Access to a large shared-memory system for data analysis and vis 43 69 83 195 2.21 0.78
Data management tools 36 79 50 165 2.08 0.72
Science Data Gateways 43 51 42 136 1.99 0.79
Analytics and vis assistance from NERSC 52 75 35 162 1.90 0.73
Databases 57 45 32 134 1.81 0.80
Read-Only data files or databases 61 52 25 138 1.74 0.75

Comments - Does NERSC provide the full range of systems and services you need to meet your scientific goals? If not, what else do you need?

Respondents' answers can be grouped into broad categories as shown in the table below.

Response CategoryBig MPP RespondentsMedium MPP RespondentsSmall MPP RespondentsTotal
NERSC meets science needs 25 22 31 78
Need more cycles, HPC resources 9 16 7 32
Need different architectures 8 11 9 28
Need different batch policies 7 6 5 18
Need more software support 1 7 5 13
Need more data resources 5 4 3 12
Need more documentation, training 1 2 5 8

Need more cycles, HPC resources

No, we are never satisfied. We need more compute power and more on-line disk storage, and we need it yesterday.

Yes. The queue waiting time could be improved ideally, but I realize that it would require more resources from DOE to accommodate an increasing number of computational scientists in recent years.

2. Improved turnaround in the second half of the allocation year would be appreciated. I realize that this is more a money/resources problem than a NERSC problem.

Most importantly queue waits are really way too long.

My only other minor complaint is that the wait-times on hopper have substantially increased. I've been running jobs that require ~16384 cores. On Franklin, because this was one of the larger jobs my job would take priority and would start in 2-4 days. On hopper, this sized problem falls into the reg_med class and I now need to wait 5-12 days, sometimes seems like it takes more like 14-20 days. I find it really difficult to remember where I was and what I was doing that far back; especially when I wait that long and then the job dies at initiation due to some hardware glitch.  Otherwise, computing at NERSC has been bliss.

NERSC response: We anticipate that the introduction of Edison phase 2 later in 2013 and the return of Titan to production at OLCF will result in improved turnaround times on Hopper.  Please note that on Hopper, reg_med has the same "big job boost" as reg_big - so your wait times are a result of overall demand being placed on Hopper.

With more nodes, turnaround time would be shorter.

The number of processors at Carver can be increased and cost factor of Carver can be decreased from 1.5 to 1.0.

Bigger computers.

More computational cores in Caver. Longer running time limits in Carver.

Sometimes small jobs on hopper sit for ridiculously long queue times. I spend much more time on the queues than actually running.

I find that the limitations on the number of jobs I can run at a time and the wait time in the queue have forced me to look for other resources instead (namely XSEDE). I can guess why this may happen: the applications process is much easier for NSERC, and you have great resources. I also find the disk limitations on the home directory difficult. Like the other reasons I find NERSC less useful, my guess is that the volume of users requires the limits on home directory and how long files can stay on global scratch are required, but again make the resource less useful.

Yes, although one could always use additional computing hours.

My main frustration is waiting in the Hopper queue for a long time.

I am a little disappointed by the long waiting time time for a job to enter the execution phase.

NERSC provides a very good computational resource, but I often feel the allocation is not enough and I have to set the Execution Queue as "low" instead of "regular". Then this will postpone the process of our research.

Yes, but perhaps things like throughput or special one-off allocation increases could happen more often.

Yes, the computing facilities at NERSC are key to our program. However, there are cases,that the big calculations have a very long waiting time to get one calculation get started.

Yes. The only limitation in my experience is the time to run (queue wait) which can exceed a day for a job that runs for a day. This < %50 duty cycle reduces the overall 'speed' of the machine in terms of cycles per wall time. Smaller allocations and concentrating DOE resources to maximize the number of available flops (as opposed to visualization and data analysis tools, for example) would help.

Shorter Queue times.

It seems like the wait time for Hopper regular queue has gone way up. This is a huge barrier to run anything small longer than 30 minutes. Our group has even resorted to running max cores in debug instead of regular just to get around this, which is not ideal and clearly not what debug was meant for (sorry). Otherwise, we have to wait several days for something small. Maybe there is a better way, but we couldn't find one when our data is stored on Hopper scratch.

Yes. But the decommission of Franklin has left Hopper as the only truly massive parallel platform that is needed to launch computations with large number of cores.

I only need more cpu times...Just a joke.

Mostly. Since my jobs are very huge, so I need much more allocation time.

Overall what NERSC provides is very good. What we need is more of it!

Hopper regular queue jobs are limited to short periods, and have very long queue times (multiple days for a 12 hour job).

NERSC provides the resources that is needed for running my simulations. In my current project I need to run many medium sized jobs (few thousand cores). Unfortunately the wait time for my jobs going through the queue is very high. It would be great if the jobs that run can go through the queue quickly. This would enable me to analyze and publish results even faster.

My main difficulties were with the queuing. I wish the wait times were shorter. I would typically do jobs on 192 cores on hopper that would take 30 hours to run, so they had to be in the regular_small queue, often with 4 day wait times.

Need different architectures / architecture concerns

Yes. My main complaint is that switching between franklin and hopper and the associated memory reduction required a significant time investment. I intentionally did not spend more time than required to simply get myself functional. I'm happy to see that Edison will have more memory/core and that I did not completely re-write my code to specifically target hopper.

For the most part, yes, apart from many-core.

A larger visualization cluster.

NERSC response: We are investigating visualization options as part of our next ("NERSC-8") procurement.

Large memory (at least 512GB, better ~1TB) system would be helpful, if there are enough of them to support a reasonable turn-around time.

I prefer a larger memory on carver to run matlab for visualization.

No, I need big shared memory nodes, something ~ 2 TB.

NERSC response: Please note that we do have two 1 TB memory nodes on carver.  If these nodes were to experience a backlog of use we will consider buying additional ones.  We will track the availability of 2 TB memory nodes. See Carver 1 TB Memory Nodes.

Machines which are optimized for jobs with intermediate levels of parallelism (32-2048) are the most useful. Such machines need high-performance cores and fast stable communications.

Our codes would benefit from systems having more memory per core. 8GB per core would be ideal.

It is unfortunate that Euclid is being retired. I realize that its use cases are being migrated to Carver, but there's something to be said for a fully interactive system that can be used at will with minimal preplanning. Using Carver will slow things down for me. So, my main comment would be to provide more in the way of interactive systems.

Large memory/core systems

Hopper is a capability system optimized for *huge* jobs. It would be nice to have a capacity system that would allow high-throughput of medium size jobs (e.g. of order 1000 cores instead of order 20000 cores).

As the push to more cpus per core increases without a push to more RAM per core, my ability to use all cpus on a core decreases. While this can be mitigated with threading, it is not worth the human effort to get no speedup.

Need access to latest multi-core and many-core architectures. NERSC should try to acquire test-bed clusters of new architectures.

NERSC response: We have a GPU testbed (Dirac) and an in house MIC testbed being used by staff.  The MIC testbed will be available by invitation later in 2013.

1. We need a heterogeneous multicore system big enough to do production science. e.g. A system capable of supporting 100 node CPU+GPU jobs with decent turnaround would likely be the best in the world (time to solution) for ab initio molecular dynamics. Dirac is too small. The INCITE machines are too large. By supporting "large enough" runs we would also be able to optimize implementations/improve methods on realistically sized datasets.

NERSC response: We will deploy a testbed based on the architecture chosen for the current (NERSC-8) procurement. Whether we deploy a larger GPU cluster, if that is not the NERSC-8 architecture, will depend on "allocation of scarce resources" decisions.  If a substantial portion of NERSC codes perform better on GPUs than other platforms, we will revisit this question.  We first want to see if codes that perform well on GPUs also perform well on other multi-core architectures.

Whether we deploy a larger GPU cluster, if that is not the NERSC-8 architecture, will depend on "allocation of scarce resources" decisions.  If a substantial portion of NERSC codes perform better on GPUs than other platforms, we will revisit this question.  We first want to see if codes that perform well on GPUs also perform well on other multi-core architectures.

For the most part yes. My only problem is that documentation on Dirac is sparse, and running jobs has an extra layer (going through carver) I could do without.

A large part of my job is provide and support code for a heterogeneous user community. For this purpose we want to test our code on the widest possible range of platforms. For this purpose the more different machines you can provide the better it is. Obviously this endeavor is quite different from doing science with our code. For doing science the mix of platforms you have is perfectly adequate.

I feel as if NERSC is always lagging behind other computing centers. You ask how NERSC can help us transition to many-core systems. You need to provide access to experimental systems.

It would be nice to see some MIMD many-core machines sometime in the near future.

It would be useful to have access to a greater variety of accelerator architectures (such as GPU accelerators from NVIDIA and AMD as well as Xeon Phi from Intel)

Global arrays; more memory flexibility for the user.

For the most part yes. I would like more systems with GPUs!

All of the NERSC platforms are x86 (if one ignores the GPUs on Dirac).

My only additional wish is for more GPUs with longer walltimes and less restrictive queue limits.

I would like to see a new shared memory system at NERSC for development in a pure openMP environment rather than openMP/MPI hybrids. While most codes written this way do not scale past 256 or 512 processes, the software development time is greatly reduced for us.

Future access to hybrid "CPU-GPU" type architectures will probably be important for the types of problems we are doing. These problems involve combining electromagnetic field solvers with 3D antenna codes and Fokker Planck codes.

Yes, I could probably utilize even more NERSC resources for data analysis and visualization.

We would like to be able to run on more GPUs.

yes, but I find it better to use my university (Wisconsin) cluster.

Different batch policies / queue concerns

A ton of small jobs swamp the queue's on back fill.

Since my usual allocation does not meet my needs, I have to run many jobs on hopper using low queue. Can the queuing policy change a little as not to penalize low queue jobs too much (increase wall clock time limit and not to wait excessively long time > 2 weeks!)

NERSC response: NERSC is consistently over-subscribed, so we have no plans to increase the scheduling priority of the low queues.

A lot of problems we do are still trivially parallelizable. Something NERSC does really bad is ability to queue thousands of independent jobs. Sure, I can write a wrapper that asks for 2400 cores at the same time at hopper, but this is bad usage of resources, because it unnecessarily waits for 2400 cores to be available and doesn't take into account that some of the process will be finished early. Carver has a serial queue but it maxes at 300 jobs (even though website wrongly says 150). I'm talking about literally submitting something like 100,000 jobs. PBS can't cope with this, but here surely must be a way of implementing this. Second, I'd love to have a queue that comes with no guarantees attached to fill the idle parts of the machine. I.e. I ask for 128 cores and I know I have no guarantee how long the slot would be, if I manage to do something and checkpoint well, if I get killed before then, well, it was a free time, so it doesn't matter. Third, I would like to see scalable queuing. Say I know that the time required for my process scales as t0/N_cores^alpha. Instead of specifying 2400 cores for one hour, I should be able to ask for computing resources of N cores for t0/N^alpha hours. I.e. I might get 2400 cores for one hour, or 4800 cores for 40 minutes, it doesn't matter, I know my problem will be done so I let the queue decide what it is the optimum. I actually played this game by submitting several jobs for the same problem with different Ncores/time requirements and after one got through the queue, I'd kill the others...

Too much reliance on batch jobs and queues. Need at least a few 'free-form' machines for running long scripts, cron jobs, etc

This year, especially the last 3 months, saw very slow start (3-4 days) in the mid range size job (1000-8000) processors in regular, which seems to be the debug size for codes that need to transition to a large number of cores. Can we stop script submission of thousands of jobs just so people have 'place holder' jobs in the queue. Or if people do submit 1000's of jobs, let's there be an exponential decay in queue priority.

The maximum running time on Hopper can be increased.

NERSC Response: In response to requests for longer wall times on Hopper, we raised the wall limit in 2012 from the 24-hour limit that was in place during 2011 to as much as 168 hours for the throughput queue (users can only use one or two nodes — up to 48 cores — with this queue), 96 hours for the Cluster Compatibility Mode queue (1–16,368 cores), 48 hours for the regular 1–16,368 core queue, and 36 hours for the 16,369–98,304 core queues. However, these longer wall limits have resulted in longer wait times - there is always a trade-off between wall times and queue wait times.

I am sometimes limited by the queue length for Gaussian jobs, as items don't seem to restart properly on Hopper. I have other resources available, though, that can handle the large jobs.

It is incredibly difficult to push jobs with ~3000 cores. Jobs with < 512 cores and much larger than 10000 cores seem to start just fine. Our jobs are mid-size and take more time in the queue than they actually request in walltime. This is very frustrating.

In my opinion, I would also like to see less push towards massively parallel large jobs. It seems to me that national computing facilities are obsessed with the idea that extremely large calculations will lead to dramatically new science. In reality, I see the opposite. I see lots of examples of people scaling up calculations just to demonstrate that they can use large machines. Clearly there is a place for large calculations, but the vast majority of good computational science that I see is done with standard resources. Being able to run many calculations on different systems, as well as run lots of checks for convergence, is in my opinion much more useful than aiming for the few massive simulations that demonstrate some impressive scaling or system size.

faster turnover rate in the queues for moderate size jobs

We tend to run MD simulations that require very long runs. Often a week of CPU time. This is a general issue for MD simulations of molecular systems. The time limit requires that jobs be continued, which slows overall throughput, especially when the system is heavily loaded.

My simulations require moderate CPU resources (i.e. a few nodes) but very long runtimes (1-2 weeks).

I find that the amount of time one has to wait in the queue system is not transparent whatsoever. There are numerous rules, which lead to an unpredictable system. For example, I can look to see how many cpus are being utilized per queue, but it is not obvious what the total number of unused cpus is because there is overlap between the queues, and furthermore not all cpus may be accessible because of memory requirements of current jobs. Also, the queue system is nontransparent. There should be a simple way that I can tell how busy the cluster is, how long I will need to wait until I can access the queue, and how many jobs are ahead of mine in the queue. This seems like a simple piece of information, but it is amazingly difficult to get.

Better real-time and post-job monitoring tools would be of great help.

Yes; queue wait times can be excessive for medium jobs (1000-2000 cores), would be nice if degree of resource utilization were more heavily weighted--infrequent users would get somewhat higher priority.

I am processing data from the Dark Energy Survey. I run tens of thousands of independent jobs, each of which is analyzing a single image. Most of these jobs require a single core and take less than 30 minutes. A smaller number of jobs require high memory, for which we use openmp and use all cores on a node. It is difficult to efficiently use the provided systems in this fashion. I am aware of the serial queue, but one can only run ~150 jobs at a time. Furthermore, users in the serial queue abuse the system. I have often found that users are running single core jobs that use up most of the memory on the machine, which slows down other jobs. Also, the serial queue also does not support jobs that want to grab a single node and use all the memory. I have gotten around these problems by running MPI jobs on carver where the master farms out single core jobs to the workers. But this is problematic. I must wait of order a day in the queue for the job to begin running. But these are single core jobs, which are easy to schedule immediately, so the wait seems unnecessary. I understand that the big queues will have long wait times, and this makes perfect sense for big MPI jobs that need to grab thousands of cores and share memory. But for processing data, where each job takes ~20 minutes, it is frustrating and inefficient. I need a *large scale* queue that supports grabbing a single core or single node for each job. The need to grab a full node is for two reasons: 1) we have openmp jobs 2) we can avoid the abuses of memory seen in the serial queue by taking a full node. Note I have used carver but not hopper. The documentation indicates it would be very difficult to run our jobs on hopper because all jobs require shared libraries. We use a lot of python and other tools that require shared libraries.

I find that the limitations on the number of jobs I can run at a time and the wait time in the queue have forced me to look for other resources instead (namely XSEDE). I can guess why this may happen: the applications process is much easier for NSERC, and you have great resources.

More software support / software concerns

There may be no good solution for this problem, but I thought to mention it anyway. Major software and library upgrades are performed and set to default quite frequently. These often cause my binaries to no longer work. I can always recompile, of course, but then I need to go through all the testing and checking again to make sure everything works correctly and reliably. It can be very annoying to have one's codes stop working from one day to the next because of changes in the environment and having to invest a lot of work to get it running again. The alternative is to write down the exact library versions used for every binary and load them explicitly, which is also a bit annoying. Other HPC centers change the environment far less often. Let me stress that the above is not at all meant as a complaint. The people doing the administration and maintenance work are doing a great job. Any other HPC center I am currently using in the US is far less reliable. I am just wishing for a (default) environment with less changes over time that potentially cause my binaries to stop working. But as I said, there is probably no good solution ;-).

It would be nice if MATLAB could be used in parallel across more than one node at a time.

In the past I would try to use matlab (run on Euclid or Carver) to visually examine my data. However I often had trouble with matlab crashing when doing extensive visualization, so now I tend to only process my data at NERSC and do the final visualization locally on my linux machine. This is not actually a criticism of NERSC as this method works fine for me, its more of a general comment.

Mostly, yes! Exceptions to this: I would like to use debugging and performance analysis software more, but I find it is still quite time consuming to get set up with our code. Implementing performance analysis packages requires working with the package developers, and just investing some time in it, and while I am personally inclined to spend time on things like this, I have obligations to others, and have to spend it obtaining science results instead. Any way that this process could be streamlined for big applications (such as GTC) would be great.

More mpi environment in Caver, such as mpich2 or intelmpi, not only openmpi.

Badly need better 3D and time-dependent analysis and visualization (better software, support for on-demain interactive use of larger jobs, easier data handling for larger quantities of data). My main code uses the commercial AVS/Express viz package, but this is losing support at NERSC. While Visit can do some things better, it cannot replace it without a great deal of package writing (by experts, not me). (If AVS were supported, I believe it would be able to do many of the same things, but without support I find that it's impossible to learn to program it in a limited amount of time. (AVS consultants refuse to answer any questions because it's an old version.) Journal articles that I have published using AVS/Express pictures have often been appeared on the journal covers; this isn't true for Visit.

Better scaling for basic DFT codes, (Quantum Espresso, Vasp, Siesta).

Yes it does, although I can see in the future lacking some specialized tools such as for hardware synthesis and placement.

mostly..it would be nice if there was a MathCad (but it runs only on windows..) and more Mathematica licenses

By and large, NERSC provides the full range of systems and services I need to meet most of my scientific goals. This would be improved significantly if support of shared library codes and systems (e.g., Python) were improved on the larger systems. Currently, there is support for shared library codes but the default start up scales very poorly as the shared library data isn't distributed by default and must be read from a single location by all the processors involved in the computation. NERSC has been working on a scheme to help with this issue but it is a fairly manual operation and makes the process of running shared library code(s) significantly more difficult than codes that don't use shared libraries. In the case of shared libraries that are part of the system (e.g. Python), it would seem that it's shared libraries could be available on all the nodes independently and each core could be set up to read them efficiently.

I think one important respect of NX service is to provide a continuously existing working environment which is handy for users as they move between offices and travel. However, without a unified support across popular platforms and NX clients, NX is not very useful in this sense. NX player vs NX client, Windows vs Linux vs Mac...... It is not helping. It would be very nice if I could continue my NX sessions from a wider choices of platforms and client software. Thanks.

Need more data / data concerns

I would like to see larger home areas, faster data transfer node rates, faster disk response (home, scratch, gscratch).

I also find the disk limitations on the home directory difficult. Like the other reasons I find NERSC less useful, my guess is that the volume of users requires the limits on home directory and how long files can stay on global scratch are required, but again make the resource less useful.

The speed to access files is sometimes very slow so it would be nice to speed up it. I would prefer a larger home directory disk memory.

More SCRATCH disk space would be very useful for our needs

It would be great to have additional disk space for storing the data for analysis.

I like have a fairly large amount disk space that is directly writable from the compute nodes and available to the data transfer nodes. The allocations for /global/scratch aren't huge, so that constrains me a bit.

Yes, though many of the problems I am working on would benefit from larger allocations on scratch and project space.

Would like better stability and robustness of file systems.

The Lustre file system is not well suited to large volume transfers involving multiple scratch files during computation. For this purpose, having local disks mounted to compute nodes would be ideal.

Not much setup for data management. e.g. how can a user automatically migrate his/her new data from NERSC to a local server automatically every day?

I think so. It's not clear to me how easy it will be able to implement real-time data analysis through a science gateway.

In terms of data analysis, the main issue is that on PDSF I cannot access most of the data for my experiment (STAR @ BNL). I have to create my ntuples in BNL and then transfer them to PDSF.

More documentation, training, communication

You need to teach me what is possible in order for me to want to learn how to use new tools.

One thing I would like to know, but which is probably on the website (I haven't checked) is how to quickly tar up a bunch of files in order to archive them. This is to make sure my file is large for good HPSS performance. A standard tar command is very slow for large data sets. I find what I end up doing is archiving all the large files separately from the tarred up small files. This is very time consuming since I have to separate large and small files. Again, there is probably already something for this, but if not, it would be good.

Yes, although sometimes the interface to its HPC machines (namely Hopper) can be difficult to understand.

Yes. Perhaps more examples from users (with practical, nitty-gritty details) of how they have adapted their codes to changing environments would be useful.

Perhaps, more tutorials on parallel programming making a hybrid use of CPU and GPU.

I've found that I'm quite unaware of the full range of systems and services NERSC provides, however, some of them seem to be quite useful. Higher visibility of these services and their use/practicality would help me greatly.

Also, I occasionally miss announcements about outages. I would like all announcements that affect systems I use to be sent to me via email. Sometimes they are only posted on the MOTD.

NERSC Response: You can get NERSC system status announcements by email by subscribing to the status email list using the instructions at http://www.nersc.gov/users/announcements/email-lists/

 

One thing I would like to know, but which is probably on the website (I haven't checked) is how to quickly tar up a bunch of files in order to archive them. This is to make sure my file is large for good HPSS performance. A standard tar command is very slow for large data sets. I find what I end up doing is archiving all the large files separately from the tarred up small files. This is very time consuming since I have to separate large and small files. Again, there is probably already something for this, but if not, it would be good.

Yes, NERSC meets my science needs

Yes, NERSC does an outstanding job in systems and services.

NERSC is 100% behind the user.

Yes. The technical support and documentation of advanced optimization techniques (e.g., changing the page size used by an application) is very good.

YES! Wholeheartedly YES!

As stated above , NERSC is doing a superb job of providing the whole range of systems and services to meet our scientific goals.

Yes. I am now increasingly reliant on NERSC for my computing needs.

For the purposes of my research, the computational resources provided by NERSC are powerful and we can study problems that could not be addressed at a university computer cluster.

Yes, indeed.

Yes, extremely satisfied

Generally, I would say that it does.

I mostly need a big machine for large simulations, which hopper provides. I do my analysis and visualization on local machines.

Yes, it does.

Yes. No need for anything else.

Till now, yes.

Yes, NERSC provide the full range of systems and services I need.

Yes it does at the moment.

For the most part, I just use NERSC to run big simulations, then do all the analysis and visualization in-house. I am very satisfied with the way everything has gone at NERSC.

Yeah, I have not really found anything for which NERSC is not sufficient for my needs.

Yes. I am very happy.

At the moment, Yes.

NERSC does provide the full range of systems and services needed.

Yes, NERSC satisfies all our computing needs

Currently, yes.

Yes it does. Thank you.

t present, I barely use a fraction of what NERSC has to offer though I expect to use more as I start a new project. The short answer to the question is 'yes'.

I do find the full range of tools, systems, and services that I need at NERSC. This includes HPC resources as well as data management and archival resources. I have been a researcher using HPC tools for 20 years.

... and 54 users responded with "Yes".

Comments - What Does NERSC Do Best?

Many responses fell into multiple categories, and can be broadly categorized as in the table below.

Response CategoryBig MPP RespondentsMedium MPP RespondentsSmall MPP RespondentsTotal
User support, good staff 28 46 34 108
Well managed center, good in many ways, supports science 27 29 20 76
Hardware, HPC resources 19 32 22 73
Uptime, reliability 17 14 12 43
Software support 9 16 8 33
Web, documentation, training 6 14 10 30
Data, I/O, networking 6 9 7 22
Batch structure, policies 6 8 6 20
Communications to users 4 10 2 16
Security, ease of use, account management 4 5 2 11
Allocations 3 3 3 9

Multiple Things Done Well / NERSC is Overall a Good Center

Well managed center, allows science, many things

Super helpful and interactive user support folks. Great communication streams available. Solid up-times. Fast data transfers. Very few hiccups with the large machine. It truly is a pleasure to be able to compute at NERSC.

NERSC is everyman's supercomputer center. Problem solving is top notch.

Overall management of Hopper is extremely good.  Consultants are just superb -- I can't say enough about how helpful and timely they are.  They really make using NERSC resources attractive compared to other centers.  The mass storage is also splendid. The recent upgrade in bandwidth from Hopper to mass storage and back is especially helpful.

NERSC provides reliable massively parallel resources and storage, supported with high quality consulting and software, and provides users the year to year continuity they need to stay productive. The staff accommodates special needs, and I am delighted overall. I only wish more time were available, and queue wait times were shorter. This continuity is very important -- having a safe place to archive data, and not having to recode for a new system every year is essential.

NERSC is the flagship of the DOE. Best people, facilities, policies of fairness.

Basically everything. Computing quality is excellent, available software is excellent. Where NERSC really outdoes the "competition" though, is in user support. NERSC user support somehow manages to be shockingly fast and shockingly competent at solving user problems. Other supercomputer managing entities usually tend to fail with regards to at least one of these or only succeed in doing them sporadically. Elsewhere, you submit a ticket and 'hope for the best'. With NERSC, almost any reasonable problem can be mentally filed away as "solved" as soon as you submit the ticket to the help desk.  Anyway, as you can see, I'm a fan. I think NERSC truly improves the quality and timeliness of the research that it supports.

I have for years found NERSC to be simply an outstanding center for computational resources, and in particular I am impressed by the help desk, which has never failed to diagnose a problem or offer help, is always businesslike and pleasant, and seems very efficient (e.g., it quick to comprehend and resolve the issue). The website also is well structured to find whatever information is needed, and I also have been impressed with the structure and functioning of the queues. It is really the model for how such a center should be run.  Even the urging and reminding of scratch space hogs is done with courtesy, and overall that reflects the spirit of NERSC.

NERSC truly provides the ideal environment to get your work done, from transferring your data, having the software needed available,computing, and archiving.  Technical support is superb! This includes responsiveness, communication and also being technically astute.

Up time is a strong point.  The redesign of the NERSC website is also very good. It manages to be very functional: easy to navigate with a number of useful resources, as well as, surprisingly for a computing center, modern and attractive.  In general, I feel that NERSC is a very stable and reliable computing environment. When there are problems, it is not difficult to find out what is happening, and NERSC support is very responsive.  Of three major computing centers that I have access to, NERSC is the easiest to log in to and use, without lag or authentication issues. It is also the most proximate geographically, which may contribute to the connection speed.

Interactive and debug queues, support, connection speed, tutorials. Keep the good work!

The overall ease of using NERSC resources is very important. We have been using NERSC for 10+ years. The annual allocation process, the queue structure and the help provided by the consultants are all very good.

Excellent service and top-notch professional management. Best of all government supported supercomputer centers.

From my experience, NERSC succeeds in its stated goal, "NERSC strives to be the most user friendly supercomputing center in the world."

Almost every aspect. Sometimes it is too busy and queue is too long.

Mostly everything, one of the best centers I used so far (used NSCA, SDSC, PSC).

Excellent platforms (e. g. Cray6); great programming environment; very easy to use; reliable; excellent administration and user support;

EVERYTHING ! NERSC is THE model for how a computer user facility should be run.

NERSC provides adequate permanent disk allocations which are accessible to the compute nodes (the /project file systems).  NERSC provides batch queues with adequate wallclock limits.  NERSC enables users to seek supplemental allocations of CPU-time and storage during the allocation year without having to write a new proposal.  NERSC listens to users.

NERSC is one of my favorite super computing centers due to high reliability of the hardware, the outstanding helpdesk services and the very good documentation through the webpages. In addition, the availability of a large suite of software is very important.

The consulting support, data storage, software support.

I am quite happy with NERSC overall. Keep it up.

Information dissemination is exemplary. The range and sheer variety of resources: computing resources, architectures, programming environments, third-party application software, analysis tools is comprehensive.

Uptime. Software. Storage.

Provides an awesome HPC computational facility. Enables us to run long time climate model simulations without much interruptions due to system failures or downtime.

Everything.

It works like clockwork.

Just about everything.

NERSC is undoubtedly one of the best Supercomputing Facility in the USA that provides services in various areas to scientists all over the world. Its thousands of users are fortunate that they have a facility like NERSC which is second to none. Not only allocating resources very fairly but also the numerous dedicated Consultants provide help and guidance in using state-of the art hardware and software. I find that NERSC facility and its dedicated personnel are sine quo non for my computational research in the Physics and Chemistry of superheavy elements. Our sincerest thanks to NERSC for giving us the opportunity to utilize its facility for our scientific research

NERSC does a very good job of providing reliable HPC resources to a diverse user community. NERSC provides and maintains a set of programming and debugging software and libraries that enable top-level computing for multiple disciplines. NERSC does a good job of providing HPC job information in real-time through the web interface allowing for accurate tracking and archival of run metrics from each job. NERSC provides a very good introduction to their HPC platforms and operations for the uninitiated with sufficient information specific to NERSC to be a valuable resource for more experienced users of HPC resources and provides other information that makes migration to and use of NERSC resources straightforward.

Most everything. I am quite pleased with the service.

System uptime and stability; well-maintained scientific computing and visualization applications; amount of computing resources available; amount of disk space available to users.

NERSC allows me to run high resolution numerical model simulations of the Earths climate that I could not run 'in house' at my institute.  I'm delighted to say that I'm extremely happy with my experience at NERSC. The 'uptime' of the machines (I mainly use hopper) is excellent, and there is always ample warning of when machines are expected to be down for upgrades etc. I'm also happy with the queue time on Hopper and I think the debug queue is an excellent tool for fixing bugs when I'm writing new code for my model.  Thank you.

Stay up to date with current technologies (hardware, software, www, etc.)

NERSC is a well run computer center.

This is the first time we've been able to thoroughly test and refine our computing system against real case study data. This is made possible by access to significant and timely computer resources. This is the first time we've been able to produce ground-breaking tropical and severe storm studies at the cloud-scale using the most sophisticated microphysical routines possible.

very well!

We have been generally very pleased with the reliability of the NERSC resources and the flexibility of NERSC in meeting our computational needs.  I work with many students new to high performance computing, and they have found the systems fairly easy to use and were usually able to answer questions by accessing the online resources and tutorials. NERSC was also very supportive of a summer school we ran at UC Berkeley/LBNL in 2011 that introduced ~25 graduate students to HPC, providing the students with training accounts, computing hours, and some hands on instruction.  One item of great value to us this last year was the initiation of a NERSC science gateway, which will host the output of our simulations, allowing us (and the scientific community) to easily access, search, and interact with our data. I think the ability to communicate of our calculations to the outside world will be an increasingly large part of our final scientific impact. David Skinner and his group have been extremely helpful, and very generous with their time as my student begin the process of setting up a database and interface. I hope this sort of service and support will continue to be offered.

NERSC does almost everything well: infrastructure, software, tech support, frequent updates. The only problem is that sometimes the queue wait times are too long.

I am extremely pleased with the NERSC environment, the responsiveness of the consulting staff and self-help tutorials and tools. Training workshops, at least the ones I have attended, were very useful and I would hope that the staff would continue to offer them. I am also satisfied by the informative email messages on events, planned outages or software upgrades.

Pretty much everything. I'm very satisfied with most of my experiences using NERSC's resources.

Respond to questions and problems. Running large parallel jobs. Installing commonly used software.

NERSC has excellent support whenever there is any type of issue, e.g. account issues, code is unexpectedly not working, etc. I am always very happy in working with NERSC staff. The website is also excellent. It almost always has the information I need. The systems purchased are all very high quality, much better than what is at teragrid. Hopper is so much easier to use  because its data access and connectivity is much better. I also like NERSC's purge policy. It is necessary to force users to back things up and not use up all the scratch space, but I like that NERSC gives me a month or two before my stuff starts getting purged. Teragrid gives much less time before purging and in that case I feel that I would have to spend a very large amount of my time just managing my data to make sure I don't lose it.

Help centre (assistance). Availability of resources. Communications. Training.

Yes, of course.

NERSC does way more than what I need. If I ever have a tough computing problem, NERSC would be a good place to try.

Documentation, uptime, diverse platforms, etc.

Maintains a large cluster for general computer use with a range of compiler and software configurations useable by a wide range of the scientific community.

I am very pleased with virtually every aspect of NERSC. I've been a long time user and have been continually impressed with the resources and support. NERSC is much better, in my opinion, than other facilities I have worked with.

Everything.

almost everything

NERSC provides many services which are extremely important to computational scientists.

HPC !

They keep the HPC computers up, running, and available. They provide a variety of useful services (e.g., webpages, videos, training) to help users get on the systems and use them effectively. They provide attentive and useful consulting services and system administration services. They provide a range of computing platforms and data storage options.

NERSC is the best supercomputing center in the country in my view. Their staff is responsive, very knowledgeable, and innovative. I am a manager of one of the science gateways and have been extremely happy with the help and performance I have had from the HPSS staff in helping me develop and maintain this. Recently, this has been the incorporation of grid (globus online) into the gateway.

we can all use it, across our collaboration

Overall I'm satisfied with NERSC

NERSC Provides Good HPC Systems and Staff

The high-performance computing resources at NERSC are fast, reliable, and have short queue wait times. The support provided by the NERSC staff is exceptional.

Provide an excellent hetrogeneous state of the art computing environment. Provide prompt and timely excellent consulting services.

NERSC is very responsive to users needs and does a great job planning for the future (listening to input from users). The computing system is reliable and easy to use.

Consulting is great! So are the machines.

Providing state of the art high-computing without being on the bleeding edge. NERSC provides first-rate assistance with its consultants. NERSC responds quickly to issues.

User Services is outstanding. The help I have received -- with consulting, accounts, and special requests -- has always been prompt, well informed, and extremely helpful.  NERSC's ability to provide reliable, stable access to HPC resources is also excellent. I could not perform my research, and neither could the teams I am involved with, without these resources.

Service and Support. Maintaining long-term, stable machines.

Provide hardware and support to get projects running quickly

The support services seem quite good whenever I have had to use them and the availability of the Hopper system is high. I am satisfied with the performance when I am running for the most part and small errors that might occur have been addressed quickly in most cases.

I think NERSC is a reliable platform for most of my big computing jobs. Also, information about new change of system is provided properly and user-friendly.

I'm very satisfied with the queue structure on Hopper. Technical support is fast and helpful. It is evident that the people in charge at NERSC are trying hard to continually improve the user experience.

Consulting. Availability of Hopper.

Provides the throughput needed to conduct ab initio molecular dynamics simulations. Provides very well informed and helpful consultants. Provides computer power and data storage that enable large scale studies.

I am very satisfied about NERSC resources.  I am also very satisfied about the support provided by Francesca Verdier and the users support team.

The combination of excellent technical/user support staff, online documentation, and high uptime at NERSC is far superior to other facilities I have used recently (in particular, the NSF XSEDE site at the Pittsburgh Supercomputer Center).

Service, resources, documentation.

Excellent computational services and quick responses for the questions.

I am quite satisfied by the NERSC services and computational environment.

Well maintenance of computing facility and prompt technical support.

Keeps the machines running, consulting responds quickly to inquiries and usually has the answer.

Prompt and courteous support (except for some very unprofessional interactions with <name deleted>, which I think have been looked into by NERSC); large amount of resources that are kept in good repair and kept live a satisfactory amount of time.

Quality assistance. Knowledgeable staff. Excellent Hardware. Queuing system on Hopper meets (and in some cases exceeds) the needs of a DFT computation environment.

Provision of a nice range of reliable high-end computing systems + consulting.

Thank you a lot for great support and the massive amount of resources I could use during the last year. Special thanks to the great support and Francesca Verdier.

It is very stable, and it can run jobs much quicker than Blue Gene.  The disk quota and the related increasing policy also satisfies me.  The support system of NERSC is wonderful, which really help me a lot when I started.

It is relatively easy to use, and very clear to get started. Allows us to get done much more work then we could do on our own. A great resource.

NERSC does very well in the overall user experience and resource management, as well as in the mix of resources that are made available.

Maintain the clusters, write very helpful guides for users and supports users.

Excellent computing capabilities.  Excellent platform for multi-jobs.  Excellent allocation of user memory (~25 Tb in my case, which made move from other clusters for my large data analysis).  Excellent support.

supports scientists, maintains computing facility

Uptime of computing systems and user support

Outstanding systems and services.

Good systems.  Good programming environment.  Good user support.  Good website with useful information.

High availability of the Hopper platform.   Technical assistance.

The help from staff was always great. The system is relatively easy to use, and this was my first experience with being a supercomputer user.

Provide HPC resources and applications with minimal demands on user time for allocations.

Access to large scale computing resources with functioning scientific packages has been valuable for my research. Staff have been helpful and cooperative. Queue waiting times are the biggest negative.

 NERSC Provides Good User Support and Staff

Outstanding service and commitment to science --- this is really transparently evident in how people from NERSC interact with me as a user.

Support. I love having 24 hour support by people who actually know what they are talking about.

I think NERSC is a leader in user service. They really think out of the box or leave their safe zone to reach out to users. This is remarkable. They can do this over the years and so many times. They provide the user experience that is unmatched by other labs. NERSC knows what users need and think, ahead of time. This needs leadership and time and commitment. I congratulate them for their remarkable service. In each and every publication, I acknowledge them.

NERSC has fantastic customer service and support.

Very good support, with clear desire to help. Very good people.

To single out one area, the user support is particularly timely and helpful. However, the bottom line is that year after year NERSC has proven to be actively and effectively engaged in helping us deliver science.

Excellent technical support and response time from NERSC personnel

NERSC staff are very good at listening to their users as well as finding new and multiple ways to get feedback from them. They are also very good at making a variety of software available to their users.

Technical support, allocations

I am definitely very satisfied with the technical support.

Services are outstanding, very quick responses, informative web site, etc.

Carver is doing great, great, and great job for me.

Technical support. Available hours.

The help desk is very helpful and quick at responding. Advanced notice of down time for various machines is appreciated.

Service and help desk tickets are excellent.

Would like to thank Francesca Verdier and the entire users support team for their excellent service!

I recently began using MATLAB at NERSC, and was very pleased with the assistance I received in getting everything running. Thank you!

NERSC provides very good services. Francesca Verdier is very helpful.

Provide environment and consulting support for HPC. I liked a programming course at NERSC.

Responsive support.

The user support is quite impressive.

I usually get good support on questions that I have. It would be good if there was someone who could answer stupid questions that I have.

NERSC certainly excels in having excellent staff to help users, compared to other super computer centers. In my experience, they have been very helpful in resolving any issues or compiling new codes.

Excellent support (and support staff).

NERSC is very willing to work with individual needs, like upping scratch disk quotas. User services is always quick to respond.  I think queue times are in general reasonable (although I'd always prefer my jobs to run right away).

Very helpful and available consultants.

supporting service

Professional and high quality user support. Response time is fast. Getting new user account is not so painful compared to the other centers.

NERSC's consultants staff are very nice and they always reply to my questions very quickly. The solutions they came up were very helpful.

Fast response on incidents. Helpful team.

Every time I have called to report a bug or issue, I always get a human who can help right away. I have been consistently impressed with this.

Provides newbies with good instructions for getting started and promptly provides assistance.

Technical support is excellent. I am also grateful about the extra allocation I received after my project started.

Support staff are excellent--I just need to use them more.

Access to training, troubleshooting and consulting.

I am new to NERSC, but so far I have used the consult desk and every time they are available when I need them and always very helpful. The user environment seems to be robust and dependable and the paths and libs seem to work fine for me.

Overall very impressive set of services and structure available.  Very good and flexible support able to adapt system to specific needs.

NERSC Provides Good HPC Systems

Hopper is the best HPC resource I 've used in 2012 due to its large number of cores, speed interconnect and flexible queuing system. Hopper is well managed and well-supported by NERSC.

I have found the computers to run extremely well, and have been very happy with the reliability and performance.

Provide dependable high-performance computing capability with high interconnect speed.

Provide a reliable HPC environmental that's easy to access and use.

It is my best HPC resource. Lots of cycles are available and the process is very clear and easy. The availability of resources such as carver on which I can run various codes is also very important for me.

Provides high end computing resources.

Provide a great computing platform

Management of its systems.

I think NERSC provides stable and reliable computing platforms that provide computing at a scale that I cannot do at my home institution. That is important to me both for doing my science as well as testing our coding approaches for the performance that we can achieve.

Very accessible, stable, secure, runs smoothly.

NERSC provides a great platform to get my calculations done on. It's relatively easy to use. I only wish we had more cores so the wait times would be smaller.

Access to large machines, reasonable stability and explanation of such. Data management.

Providing high performance computing resources.  Providing and updating high performance programming environment.

Support for very large MPI jobs seems to be good.

Provide a well-administered environment for performing large scale I/O or CPU-intensive computations for large groups of people

HPSS, gridftp, peak throughput.

NERSC Systems Have Good Availability and Reliability

Of all the huge machines I run on, it is the most reliable - more uptime, fewer failures of cores. Access to and from HPSS is fast and seamless. And NERSC has the best debugging suite of all the machines I work on.

Provide reliable, high-performance computing. HPSS is excellent, too.

The availability/reliability is excellent for me.

Availability and stability of the systems

Reliability. Computing power.

Low downtime/uptime ratio on large machines like hopper. User support.

I have been very impressed with (a) uptime and (b) communications about scheduled maintenance on the NERSC production systems in 2012.

Getting large systems to run stably.

Keep systems up and running.

Provide reliable access to high-quality, easy to program systems like Hopper. Unlike other large Cray installations, I have never had my work on Hopper stalled by downtime or disabled access.

Uptime.

the system is solid and reliable.

NERSC Provides Good Software Support

You have excellent software stacks and I have had great experiences with NERSC technical support.

Providing enormous amount of modules to use, from programming environments to performance and debugging tools.  I like that the queue charge factor (for Hopper) for medium to large jobs is just 0.6.  System availability and reliability in general.

Answers questions, gets newest version of existing software very fast, responds positively to requests of additional units at the end of the term since it is sometimes hard to access how many units one needs for a project.

The codes of interest to me, NAMD and VMD are well maintained and newer versions are promptly installed and easily available. Scaling on Hopper is much better than other machines I have used. NERSC should urge all users to benchmark their runs and use the most suitable no. of cores to get the best performance. The data i/o within NERSC and between NERSC and ORNL is absolutely fantastic. With the Globus file transfer protocol, the transfer rates are at a sustained 20-200Mb/s which is plenty fast for my needs.

I have found the Fortran compiler suites on Hopper to be extremely valuable for code debugging and testing. The debugging and interactive queues have very quick turnaround times even for large numbers of processors, so testing a new piece of code is a breeze for a system as massively parallel as Hopper.

Provides access to rapid processing of computational quantum chemistry problems. Provides access to Gaussian software.

I use NERSC to run VASP code which can be done well.

visualization and scientific software support

Good response time for questions.  Great info on the web pages. Good variety of performance and debugging tools.

NERSC Provides Good Documentation and Training

Up to date and accurate information on the website regarding the systems, software, and queue systems.  Overall up-time.  Quick response in case of problems.

Provide high quality research computing systems.  Provide an effective webpage resource to easily use the computing and storage systems.  Help with user problems.

I can reliably access a lot of cores without too much wait time. The website is the best of the supercomputing facilities I use.

Great resources. Very helpful website.

The NERSC website and amount of helpful information to get started is extremely helpful and frequently updated.

The NERSC website is excellent. Easy to find needed documentation, easy to monitor queues and job status, up to date status and problem notifications. Email notifications about library changes were quite useful. They saved a lot of time when trying to figure out why CESM suddenly stopped running normally, and how to patch the build scripts to use the correct libraries.

superb support from staff and very informative website.

Service, website

NERSC is Easy to Use, has Good Communications and Account Management

Easy to access for users without unnecessary security checks

Great user facility. I really like the ease of creating and managing accounts. Avoiding cryptocards is a major benefit, in my opinion.  The application procedure for users with existing DOE grants is also a great system. We spend a lot of time writing grants, and so the ability to apply for time with an existing project is a great time saver for PIs and reviewers.

1. Allocation of CPU hours. 2. Fast computationing ability provided to a huge amount of users. 3. Management of user account information.

provide a nice easy to use environment for doing HPC.

Things work!  Ad hoc flexibility in allocations and quotas that allow simulations and projects to be completed.

The NERSC Services is excellent at helping with allocations for jobs as the year progresses.  The NERSC Consultants are excellent at helping to solve problems in running codes.

Communication.  Uptime.  Allocation procedures.

Explaining what is going on when there's a problem, introductory and example web pages.

Communication is always good. User support overall is very good.

NERSC is very good at getting information to their users and helping with any issues. The computing systems themselves are very well run. I have had an incredibly good experience working with NERSC computing.

NERSC has Good Batch Policies and Queues

NERSC gets lots of jobs through the queue.

Reliable, robust computing environment.  I like the 1hour_queue .... bridges the gaps between debug and production.

NERSC provides an avenue to carrying out massively parallel scientific computing on all the scales needed, from small scoping and debugging jobs to "heroic" runs requiring hundreds of thousands of processors, and the analysis, visualization, and post-processing tools to extract the most return from those computations.

NERSC does a very good job of providing access to very large HPC systems and implementing a batch system that allows me to run many short-lived jobs to perform scaling studies of parallel algorithms on a large portion of the systems.

Fast super computers, well designed queue system, quick turn around time, easy file sharing with collaborators.

NERSC does a good job with the queuing system allowing multiple requests from a large number of sources and seems to have a fair system in dealing with it.

Other Comments

GPU is a new technology, it can improve matrix calculation significantly. However, pic simulation require a large amount of particle transfer among processes, memory is one of the most important issue. Up to now I have not had info that GPU can improve PIC simulation significantly.  NERSC might be able provide more info about PIC simulation running on GPU.

No answer , because I am not sophisticated enough a user to judge.

"hi nersc survey. nersc does everything well, my only real complaint is i/o and general file system performance on hopper. write performance is highly variable, sometimes a checkpoint will take three minutes, then the next one will take 30 minutes (same size, number of files, directory, etc.). one recent example:
checkPoint() time = 153.8045318 secs.
checkPoint() time = 1535.315811 secs.
checkPoint() time = 161.972904 secs.
These should all be about the same (a couple of minutes). These were on /scratch, sometimes doing an ls, du, or editing a small text file is painfully slow. This is very annoying.  Sometimes it takes forever to get through the queue, not much you can fix there short of deleting users.  then there is that pesky disk quota....."

Comments: If there is anything important to you that is not covered in this survey, please tell us about it here.

Queue / Job / Turnaround Comments

queue times are really unpredictable and difficult to work with, especially when my jobs are not using huge numbers of nodes

I would find it useful if the node limit on interactive jobs were increased.

The batch system on Carver needs improving. Sometimes a job crashes for no reason. I run the identical job again and it works. Also, I can only submit a small number jobs at one. I have to figure out which jobs were not processed and then run my script. It would be good for me to be able to SUBMIT several thousand jobs.

It would be useful (if not already available) to make clear how long current wait times are for jobs of varying sizes, especially for those greater than several thousand cores.

We have concerns about performance variations, presumably associated with the communications fabrics on both machines we use (hopper and carver). These mean that the wallclock time required for identical jobs can vary by a factor of up to 2 on hopper and up to 10 on carver.

My biggest concern is the long queueing times.  Finally, I had issues with some processes taking too much memory and killing nodes because they weren't killed after they exceeded their soft limit which I had set to make sure the node doesn't go down.

The turnaround time of jobs submitted to the debug or interactive queues could be better. Waiting on these kinds of jobs is often the rate-limiting factor in development work -- how far my effort will get in a day is often bounded by how many jobs I can submit and see results from before getting too tired to continue. On the other hand, a batch job starting an hour later should not make a whole lot of difference in its value. Perhaps jobs in those queues could age faster, in addition to starting with an aging bonus?

Hopper is over subscribed to the point that I try to avoid using it except when all my other options are unavailable. If I had to rely on hopper for most of my day-to-day HPC computing I my productivity would suffer greatly.

In my research , often the need arises to run a batch jobs with elapsed time of 48-72 hrs with pvmem=24-30 GB, and about 128 processors. I realize that it may be a difficult situation but I am wondering if such a batch job can be run at all at NERSC or not.

I feel the queue structure on Hopper can be improved to reduce the wait time. Current priority system gives too much priority to extremely large jobs. That's make some sense, but then smaller jobs (thousands of cores) are needed for testing, debugging, and parameter tuning, and those sit too long in the queue.

NERSC Response: Prioritization of large jobs will likely stop on Hopper when Edison phase 2 is in production in 2014.  If users have suggestions of how the debug queue setting might be changed to better meet their needs, please send them to consult@nersc.gov

I wish the wait times were less on hopper.

What would be helpful is to have more resources that have a longer wall clock time.

NERSC Response: In response to requests for longer wall times on Hopper, we raised the wall limit in 2012 from the 24-hour limit that was in place during 2011 to as much as 168 hours for the throughput queue (users can only use one or two nodes — up to 48 cores — with this queue), 96 hours for the Cluster Compatibility Mode queue (1–16,368 cores), 48 hours for the regular 1–16,368 core queue, and 36 hours for the 16,369–98,304 core queues. However, these longer wall limits have resulted in longer wait times - there is always a trade-off between wall times and queue wait times.

In recent one month, I think the queuing time of interactive queue is much longer than before.

I would just like to re-iterate that it seems that batch wait times have gone up a lot over the last 6 months. Perhaps there are more users, or perhaps there have been conferences causing the slowdown, but I'm having to wait longer than I'm used to/find acceptable. I can obviously adapt, but I just don't understand what exactly happened to cause the increased batch wait times.

I think the wait time for jobs using 1000-10000p is very long and our productivity would increase if it was shorter.

As someone who runs a large number of jobs, I am somewhat perplexed by the wait time required for some jobs. I will submit identical jobs within approximately 10 seconds of each other and one might take 2 days longer to run. I understand that this might be because of core availability, but in general the wait time on some jobs seems to be very unpredictable and at certain times of the year, very slow. This is understandably because of increased traffic. Other times of the year (July/August) jobs will fly through the queue. This is not a real complaint, but understanding an approximate wait time for my job would be nice. The showstart command is so far from being correct that it is not useful.

Allocations Comments

I am not a fan of how we loose a percentage of our allocation midway through if we are not meeting intermediate deadlines on its use.

The only irritation was the automatic cuts of allocation every quarter when the allocation was underused. Unfortunately in our project the model configuration and data was not fully running until > 6 months into the year. Consequently our allocation was cut twice (but restored later). As a consequence, on one occasion we had to run a "junk" simulation, just to use up our allocation. It caused a reasonable amount of stress among our programmers in the lead up to deadlines.  Although this was irritating, the program managers immediately restored our allocation upon request, so I am happy. We eventually used ~90% of our allocation, thanks.

NERSC Response: Requests for NERSC time are 2 to 3 times more than we can allocated, and most projects do not receive the amount they requested. Some projects do not use the time they were allocated, and the "quarterly allocation reductions" of under used allocations allows adjustments to be made to match actual need.  Projects that have been reduced but need the time should request that the time be returned - only a small fraction do, so overall this process is an effective one.

My lone complaint is about some inconsistency in the allocation of processor-hours to repositories. Despite doing this a while, I have a hard time anticipating how many hours will be awarded for a given ERCAP request because there has been (natural?) variations in the past. I think the present system is largely fine otherwise.

The allocations process seems to result in the amount of time used last year. Not sure how much attention is paid to our proposals. While there is a fixed total number of MPP hours available to allocate, the monthly usage patterns vary, so some "overbooking" must occur. This is a complicated problem, but we feel constrained in our use, and most often cannot get the time we need. Yet queue wait times are long, indicating possible "overbooking". Not having the whole picture in front of me, I don't have an immediate solution, but there may be ways of auto-allocating a fraction of the time in response to system usage.

NERSC Response: We do not over allocate in terms of charged time - we tend to allocate between 85% and 90% of maximum core hours available (ignoring down times).  However, with the big job discount and the low priority queues, we are a bit over allocated in terms of used time. Usage patterns vary a lot among the 700 or so projects using NERSC, so it would be impractical to "auto allocate" a project's allocation in small time periods.  We are experimenting with fair share scheduling on Edison, which effectively does some auto allocating.  So far the user response to this has not been very positive.

Most of the computing resources made available to my SciDAC Center (the Center for Simulation of Wave-Plasma Interactions) comes from NERSC. It is very important to our SciDAC Center that these resources continue to be made available at the level of 8,000,000 to 12,000,000 mpp hours per year.

Just as a comment, my postdoc has used our entire NERSC allocation so I haven't done any computing myself. Hence I could not answer most of the questions on this survey. I have been satisfied with the proposal/allocation process. Thanks!

I have been using NERSC mainly in terms of a NISE award on hopper. That NISE award is what made my research possible. My particular project is computationally very expensive (I used > 5 mio. CPU h in 2012) and I could not find sufficient computer time anywhere else. The first results have recently been accepted in the Physical Review Letters. I am very grateful to NERSC for making this possible and wanted to say a big thank you. Thanks a lot!

Software Comments

Please install CDAT/VCDAT/UV-CDAT in Carver. It is a freely available software at http://uv-cdat.llnl.gov/ .

I think the amount of available precompiled scientific libraries is not enough. I had to compile myself, both cfitsio and healpix, although they are widespread in the scientific community.

Also, I had an issue with a gcc library in hopper that for some reason didn't exist in carver.

Finally, some of the tools for job monitoring and resource usage assume the application uses MPI, which mine doesn't.

Very minor complaint, but changes to the available PETSc libraries seem quite frequent, and usually break our code for a brief period. The longer a PETSc version can be offered, the better.

For want of a better term, "the lifecycle of a compiler" comes to mind - the timescale on which a new version of the hopper pgi compiler goes from the brand-new-experimental-installation stage to the is-the-default-compiler stage to the this-compiler-is-going-to-be-removed-from-the-system stage is shorter than the timescale on which I make significant code changes, and this is sometimes a frustration (e.g. I want to change two lines of source code, but it has been long enough since I changed source code that I need to go through a several-hour build chain on the new compiler in order to do it).  But I realize you've got tension the other way too, in terms of making improved compilers available to users that need them, so I don't have an easy solution here. More generally, I sometimes feel like NERSC is handing me better and better tools with regularity, but so quickly that the (N+1)-st and (N+2)-nd and (N+3)-rd ones have already come along before I've even figured out everything the Nth one can do or how I can effectively use it. (I could do good science on hopper for probably five or ten years if it just stayed exactly the way it was right now - but that doesn't seem to be the model NERSC is designed for.)

The closed-source nature of CrayMPI is crippling to my research. The interoperability problems between CrayMPI and DMAPP on Hopper is equally catastrophic. It's complete nonsense that Cray can't integrate uGNI and DMAPP into a single low-level messaging layer the way IBM did with LAPI and DCMF and now does with PAMI.

Data Storage Comments

It would be more convenient for the users if they can store files on the scratch space for a longer period of time.

I wish that there is always an email notice before the $SCRATCH directory on HOPPER being purged. It may be better that the SCRATCH directory not being purged, but has a hard limit on how much space one user can use, so that there is always sufficient space on SCRATCH.

The lustre file system on hopper is too slow, not responsive, and keeps hanging or stalling file operations.

Also disk I/O performance remains a limiting issue when running many jobs on the cluster (PDSF batch in my case) accessing a decent amount of data at the same time. On average cpu time goes as low as 50% of the total time very easily. On the other hand it's not convenient for the user to "spread" data over various disks but this should be left to the underlying filesystem (yes, I know it's already towards this, just wanted to mention that this is still a limiting factor).

Training / Documentation Comments

Some of the getting started guides could use more detail for people not familiar with HPC.

I wish there is more presentations and tutorials that are available here. Video presentations would be great. Especially for new systems. Maybe there are already but I am not sure.

I feel that perhaps some outreach can be done in regards to data visualization and the various other services that NERSC provides. As it is, I am aware of the available installed software and the compiling/computing capabilities. I am sure that there are other things NERSC makes available that I simply do not know about, and probably would not know to look for.

Guidance on which Fortran compilers to use would be helpful, as the NERSC documentation is too agnostic on this. We used to use PGI, then for a time was using Pathscale, and in the last year have been using Cray. Sadly we have lately run into inexplicable troubles with our code compiled on the Cray compiler, as our large jobs are reliably crashing with FPEs. The identical code has no problem running the same jobs on other platforms. Would be great if your website ranked the compilers in terms of reliability, though optimization performance might be another ranking that would be helpful.

Security / Password Comments

I find two flaws in your password policy: 1. three tries, call in. The cost benefit is bad: this interferes with users too much relative to bad actors. I try never to keep the same password for two different accounts. As a result, I often cycle through more than three, ut then I get locked out. This is stupid. The solution is to call in, but there is not always some one there! Bad actors meanwhile would likely attempt more than three times in the course of an attack. Locking the account at 12 attempts would be just as effective, without screwing users. Otherwise, I bet people are recycling passwords! 2. Password content: http://xkcd.com/936/.  Pleaseletmeconstructapasswordlikethisipromiseiwon'twriteitdown!

I feel that making users change their password so frequently is leading to a security hole. I'm getting to the point where I'm considering writing down my password every time I change it because otherwise I forget some part of it, get locked out of PDSF due to 6 failed password attempts, and have to go through the whole process again. At the very least, the number of allowed failures should be 10 or more.

Security: The 3-chance password rule followed by a call to support seems a bit draconian. What if someone attempts a DOS by grabbing the userlist and then guessing random passwords until everyone is locked out?

NERSC Response: Users are disabled after 5 failed password attempts (not 3).  Soon users will be able to clear their own login failures by logging into NIM.  Meanwhile, you can call NERSC Operations at 510-486-682, 24x7, to get your login failure cleared.

The question about security made me wonder whether NERSC should have a second security step at login such as the RSA securID passcode?

NERSC Response: We have no plans to introduce "one time passwords", and instead rely upon rigorous intrusion detection systems to protect NERSC systems and users.

NX Comments

Some pressure on the company that makes the perpetually in beta slightly quirky Mac NX client would be appreciated.

While NX represents a great way forward for remote working at NERSC, this has often down-times, slowdowns and various limitations if one needs to perform most of the work with it. I don't have a good solution for this, but I wanted to mention as a possible field where more improvements (both as stability / speed) and integration in native systems (especially if using a Linux system) would be very appreciated.

NX server: The server is quite fast and a good replacement for VME. Reconnecting to sessions has some minor issues such as scaling the desktop to a new resolution and fonts not found when connecting to a session started under Linux and resumed in Windows.

Other Comments

I would like to see enforcement on the interactive nodes of job limits. It seems that some users get on these nodes (e.g. euclid) and fire off 40 simultaneous jobs that then swamp the system and violate the wall clock standards, as indicated by using the "top" command. I thought there were other systems that would be more appropriate for these larger scale jobs. It could be that the user does not have access to these systems, or has exhausted their quota on them. At any rate, even when the interactive node is quiet, users should not be allowed to eat up all the resources. This should be relatively easy to monitor and enforce.

Maybe NERSC already planned for this year. One thing that I mentioned previously is the issue when multiple users use the same nodes, the command line execution is slow. I hope this could be resolved quickly.

This survey is too long and the user just ends up clicking random options. It needs to be streamlined to ask only questions that are important.

The connection to Hopper is severed after few minutes of inactivity. I have not measured this time but it seems to be around 3 minutes which seems awful small. Simply thinking about the data on screen takes longer than that. Allowing a longer inactivity period would be of great help.

MOTD is not up-to-date. I have called about problems and found that I was the first to report them.

The ticket system is pretty worthless. It's far too restrictive in what others in the NERSC community can see - for instance, it's impossible to see if someone else has already submitted the same ticket, or if similar tickets have been submitted in the past (and what their resolution might have been). Responses from the on-call shifters typically come after at least an hour's delay, often significantly longer, and support appears to be almost non-existent outside of 9-5, Monday-Friday. I find myself emailing PDSF admins directly most of the time because the ticket system introduces such a long delay in communication.

Overall, the experience with NERSC is quite good. During this year, Carver, the system that I've mainly been using, has had a number of hardware issues which occasionally resulted in down-time. However, delays were minor and acceptable.

I am very satisfied with Francesca Verdier's in time response for all my questions.

I am very happy with my access to NERSC.

None that I can think of. Once again, I want to heartily thank NERSC for their outstanding service.