NERSCPowering Scientific Discovery for 50 Years

2004 User Survey Results

Response Summary

Many thanks to the 209 users who responded to this year's User Survey. The respondents represent all six DOE Science Offices and a variety of home institutions: see Respondent Demographics.

The survey responses provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve. The survey results are listed below.

You can see the FY 2004 User Survey text, in which users rated us on a 7-point satisfaction scale. Some areas were also rated on a 3-point importance scale or a 3-point usefulness scale.

Satisfaction ScoreMeaning
7 Very Satisfied
6 Mostly Satisfied
5 Somewhat Satisfied
4 Neutral
3 Somewhat Dissatisfied
2 Mostly Dissatisfied
1 Very Dissatisfied
Importance ScoreMeaning
3 Very Important
2 Somewhat Important
1 Not Important
Usefulness ScoreMeaning
3 Very Useful
2 Somewhat Useful
1 Not at All Useful

The average satisfaction scores from this year's survey ranged from a high of 6.74 (very satisfied) to a low of 3.84 (neutral). See All Satisfaction Ratings.

For questions that spanned the 2003 and 2004 surveys the change in rating was tested for significance (using the t test at the 90% confidence level). Significant increases in satisfaction are shown in blue; significant decreases in satisfaction are shown in red.

 

Significance of Change
significant increase
significant decrease
not significant

Areas with the highest user satisfaction include the HPSS mass storage system, HPC consulting, and account support services:

 

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
HPSS: Reliability (data integrity)       5   16 97 118 6.74 0.67 0.13
CONSULT: Timely initial response to consulting questions       1 5 38 125 169 6.70 0.55 0.15
CONSULT: overall       3 4 38 132 177 6.69 0.60 0.35
Account support services   1 2 1 2 38 136 180 6.68 0.72 0.29
OVERALL: Consulting and Support Services       3 8 40 146 197 6.67 0.63 0.30
HPSS: Uptime (Availability)     1 3 1 25 89 119 6.66 0.70 0.12
CONSULT: Followup to initial consulting questions       4 5 34 122 165 6.66 0.66 0.17

Areas with the lowest user satisfaction include the IBM SP Seaborg's batch turnaround time and queue structure as well as services used by only small numbers of users (the math and visualization servers, grid services and training classes presented over the Access Grid):

 

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
Live classes on the Access Grid 1     14   11 6 32 5.16 1.44 0.49
Grid services       18 3 5 9 35 5.14 1.31  
Escher SW: visualization software   1 1 9   4 6 21 5.10 1.58 0.35
Math server (Newton)     1 8 1 4 3 17 5.00 1.32 -0.20
Newton SW: application software 1 1 2 8   5 5 22 4.82 1.76  
SP: Batch queue structure 17 9 18 17 30 53 20 164 4.66 1.85 -1.03
SP: Batch wait time 26 16 36 14 27 32 10 161 3.84 1.90 -1.40

The largest increases in satisfaction over last year's survey came from training classes attended in person, visualization services, the HPSS and Seaborg web pages and software bug resolution:

 

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
TRAINING: NERSC classes: in-person       13   5 11 29 5.48 1.40 0.60
SERVICES: Visualization services     2 22 4 12 19 59 5.41 1.37 0.60
WEB: HPSS section       4 11 30 49 94 6.32 0.85 0.58
WEB: Seaborg section       3 10 47 85 145 6.48 0.72 0.48
CONSULT: Software bug resolution     2 11 7 33 47 100 6.12 1.08 0.48

The areas rated significantly lower this year include the IBM SP, Seaborg, and available computing hardware:

 

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
SP: Batch wait time 26 16 36 14 27 32 10 161 3.84 1.90 -1.40
SP: Batch queue structure 17 9 18 17 30 53 20 164 4.66 1.85 -1.03
SP: Seaborg overall 4 7 7 2 26 62 60 168 5.77 1.47 -0.66
OVERALL: Available Computing Hardware 3 2 14 8 34 87 47 195 5.65 1.29 -0.48

Survey Results Lead to Changes at NERSC

Every year we institute changes based on the previous year survey. In 2004 NERSC took a number of actions in response to suggestions from the 2003 user survey.

[Web site] Needs lots of improvement. Most pages cram lots of info in a single page, hard to find what you want, etc. Beyond the home page, the website has an 80's look.

NERSC response: NERSC reorganized its web site last year, merging the previous www.nersc.gov, hpcf.nersc.gov and pdsf.nersc.gov web sites into a newly designed www.nersc.gov web site. The design goals were to have one site that meets the needs of our users, our DOE sponsors, and the general public.

Four of the web interface ratings on this year's survey show increased satisfaction over last year: The HPSS and Seaborg ratings increased by 0.6 and 0.5 points, and the overall web site and software ratings increased by 0.3 points.

IBM's bugfixing is slow!
The compilers and debuggers need to be improved.

NERSC response: In this past year NERSC has established an excellent working relationship with IBM's compiler support group. Representatives of the compiler group now attend the NERSC/IBM quarterly meetings. This has resulted is quicker resolution of bug reports. We think that IBM is now doing an excellent jobs in resolving compiler problems. Also, in the past year compiler upgrades have resulted in compilers that produce better runtime performance. There are currently very few outstanding compiler bugs.

This year's rating for software bug resolution increased by one half point.
totalview is basically unusable.

NERSC response: NERSC has established a good working relationship with Etnus, the vendor that supports TotalView. TotalView upgrades this past year have resulted in more usability features, and Totalview can now debug a wider range of parallel codes and is more stable.

last year we received 8 complaints about TotalView and debugging tools. This year we received two such complaints. User satisfaction with performance and debugging tools on Seaborg increased by 0.3 points and on the PDSF by 0.5 points (these increases were not statistically significant, however).
You need to better develop the floating license approach and make it easier to use for remote users.

NERSC response: During 2004, NERSC finalized consolidation of license management for all visualization software hosted at the Center. The new system, which consists of a set of license servers, also supports remote use of licensed visualization software. See Remote License Services at NERSC.

Please offer more on-line video/on-site courses
It would be nice if NERSC can provide more tutorials.

NERSC response: In 2004 NERSC organized 20 user training lectures in 7 separate events. All were presented via the Access Grid and were captured as streaming videos (using Real Media streaming) so that users can replay them at any time. These lectures have been added to the tutorials page for "one stop" access to training materials. See NERSC Tutorials, How-To's, and Lectures.

more interactive nodes on pdsf
Would prefer faster CPUs at PC farm.
make it faster and bigger diskspace

NERSC response: The PDSF support team has made it possible to run interactively on the batch nodes (there is a FAQ that documents these procedures). They also recently purchased replacement login nodes that are being tested now and should go into production in December 2004. They are top of the line opterons with twice as much memory as the old nodes.

64 3.6 GHz Xeons were added to the PDSF cluster in November 2004. This is 25% more CPU's, and they are almost twice as fast as the older CPU's. We also added about 20 TB additional disk space.

The queue configuration should be returned to a state where it no longer favours jobs using large numbers of processors.
NERSC should move more aggressively to upgrade its high end computing facilities. It might do well to offer a wider variety of architectures. For example, the large Pentium 4 clusters about to become operational at NCSA provide a highly cost effective resources for some problems, but not for others. If NERSC had a greater variety of machines, it might be able to better serve all its users. However, the most important improvement would be to simply increase the total computing power available to users.

NERSC response: NERSC coordinates its scheduling priorities with the Office of Science to accommodate the Office's goals and priorities. This year, the office continued to emphasize capability computing, including large jobs and INCITE jobs. Since NERSC is totally subscribed this means that some other work receives lower priority. In 2004, the Seaborg queue structure still favored large jobs, and was over subscribed more than the previous year. However, several measures have been implemented which should help improve turnaround for all jobs:

 

  • Per user run limits were decreased from six to three, and per user idle limits (the number of jobs that are eligible for scheduling) from ten to two. This provides fairer access to Seaborg's processors.
  • The OMB (Office of Management and Budget) goal for FY 2005 is that 40 percent of Seaborg's cycled should be delivered to jobs using at least 1/8 of its computational processors (in FY 2004 this goal was 50 percent).
  • In early calendar year 2005 NERSC will deploy a new Linux cluster with 640 dual 2.2 Ghz Opteron CPUs available for computations. The target workload for the cluster is jobs that do not naturally scale to 1/8th or more of the computational processors on Seaborg.
  • Thanks to additional funding from the Office of Science, NERSC is in the process of procuring additional computational capability for the 06 and 07 allocation years.

The majority of this year's respondents expressed dissatisfaction with Seaborg turnaround time, and about one quarter were dissatisfied with Seaborg's queue policies. Ratings in these two areas dropped by -1.4 and -1 points. The rating for available hardware dropped by 0.5 points. In general, the users who ran smaller concurrency jobs were more dissatisfied than users who ran larger codes.

Users are invited to provide overall comments about NERSC:

118 users answered the question What does NERSC do well?   68 respondents stated that NERSC gives them access to powerful computing resources without which they could not do their science; 53 mentioned excellent support services and NERSC's responsive staff; 47 pointed to very reliable and well managed hardware; 30 claimed that NERSC is easy to use and has a good user environment; and 26 said everything. Some representative comments are:

NERSC supplies a lot of FLOPs reliably, and provides very competent consultants. It is a good place to use parallel codes that scale well on the available machines.
NERSC does a truly outstanding job of supporting both a small community of "power" users as well as a large community of mid-range users. Both are important, and, as a result of NERSC's success in supporting both communities, the Center facilitates an extraordinary amount of scientific productivity.
NERSC has excellent people working there. I'm VERY happy with everyone I've come across. People have been knowledgeable and professional. I compute at NERSC because it's really big. Seriously, the number of processors allows us to do research on problems that we simply cannot do anywhere else. In that regard I consider NERSC a national treasure. One really silly request, how about a NERSC T-Shirt! I'd buy one.
Overall, the services and hardware reliability are excellent. I think that NERSC sets the standard in this regard.

94 users responded to What should NERSC do differently?. The area of greatest concern is Seaborg job turnaround time and queue management policies. Forty five users expressed dissatisfaction with turnaround time and 37 requested a change in Seaborg job scheduling policies, of which 25 expressed concerns with favoring large jobs at the expense of smaller ones. Twenty five users requested newer processors or more computing resources overall. Fifteen expressed dissatisfaction with the allocations process. Some of the comments from this section are:

Change the batch queues so that ordinary jobs execute in days, not weeks.
The queue wait times have been extremely long (about 2 weeks recently), and this has almost completely stalled my research.
The current focus only on jobs which can exhibit high degrees of parallelism is, in my opinion obviously, misguided. Some problems of great scientific interest do not naturally scale to thousands of processors.
NERSC should return to its original mission of providing the production environment which allowed the scientists to maximize their research. That is NERSC should give satisfying the user priority over satisfying the DOE and the OMB.
Given the amount of computer time that I am allocated, I cannot make use of the large number of processors on Seaborg. Unless everyone is allocated enough time to make use of hundreds of processors, NERSC should give more consideration to providing resources for smaller codes.
Also, the job priority system discriminates against smaller jobs (less than 32 nodes) - i.e. MAJORITY of users!
For the last 24 years NERSC has been the place where "I could get things done". With the initiation of the INCITE program that changed. The machine was effectively taken over by the 3 INCITE groups and work at NERSC stopped. After the upgrade my large calculations no longer run at NERSC and I had to move those computations to a p690 in Hannover, Germany.
The computer code I use becomes more complex from day to day to use the best physics you can. However this increases the computing time. The great management and support at NERSC combined with new hardware would be an irresistible package.
NERSC has done an outstanding job of serving the community. In order for this to continue, NERSC needs continued support from the DOE for its staff and the services they provide, and NERSC needs support for a new high end system to replace seaborg.
Similarly, I've described my problems with the ERCAP proposal process. I feel it gives short-shrift to science, and focuses on code optimization to the exclusion of scientific returns.
I think the INCITE program was ill conceived. Betting that the performance of a tiny subset of the scientific community will payoff enormously better than the community as a whole seems to me like trying to time the stock market. It may work once, but the opportunity costs are enormous.

77 users answered the question How does NERSC compare to other centers you have used?   Thirty nine users stated that NERSC was an excellent center (no comparison made) or was better than other centers they have used. Reasons given for preferring NERSC include excellent hardware and software management, good user support and services. Seven respondents said that NERSC was not as good as another center they used. The most common reason for finding dissatisfaction with NERSC is job turnaround time.

 

Here are the survey results:

  1. Respondent Demographics
  2. Overall Satisfaction and Importance
  3. All Satisfaction, Importance and Usefulness Ratings
  4. Hardware Resources
  5. Software
  6. Security and One Time Passwords
  7. Visualization and Data Analysis
  8. HPC Consulting
  9. Services and Communications
  10. Web Interfaces
  11. Training
  12. Comments about NERSC

Respondent Demographics

Number of respondents to the survey: 209

  • Respondents by DOE Office and User Role
  • Respondents by Organization
  • Which NERSC resources do you use?
  • How long have you used NERSC?
  • What desktop systems do you use to connect to NERSC?
  • Web Browser Used to Take Survey
  • Operating System Used to Take Survey

 

Respondents by DOE Office and User Role:

OfficeRespondentsPercent
ASCR 16 7.7%
BER 23 11.0%
BES 49 23.4%
FES 38 18.2%
HEP 31 14.8%
NP 50 23.9%
guests 2 1.0%
User RoleNumberPercent
Principal Investigators 55 26.3%
PI Proxies 43 20.6%
Project Managers 12 5.7%
Users 99 47.4%

 

Respondents by Organization:

Organization TypeNumberPercent
Universities 109 52.2%
DOE Labs 82 39.2%
Other Govt Labs 10 4.8%
Industry 8 3.8%
OrganizationNumberPercent
Berkeley Lab 42 20.1%
UC Berkeley 12 5.7%
Livermore 10 4.8%
PPPL 8 3.8%
Argonne 7 3.3%
NCAR 5 2.4%
Stanford 5 2.4%
Oak Ridge 4 1.9%
U. Maryland 4 1.9%
U. Washington 4 1.9%
U. Wisconsin 4 1.9%
UC Santa Cruz 4 1.9%
OrganizationNumber
General Atomics 3
Georgia IT 3
Johns Hopkins 3
MIT 3
NREL 3
New York U. 3
Ohio State 3
Stone Brook 3
Tech-X 3
U. Chicago 3
U. Colorado 3
UC Davis 3
Ames Lab 2
Auburn U. 2
Cal Tech 2
Hamburg U. 2
Northwestern 2
SLAC 2
U. Kentucky 2
U. Oklahoma 2
U. Tennessee 2
U. Utah 2
Yale 2
Other University 31
Other Gov. Labs 6
Other DOE Labs 4
Other Industry 1

 

Which NERSC resources do you use?

Note that users did not always check all the resources they use -- compare the table below with How Satisfied are you? (sorted by Number of Responses).

 

ResourceResponses
IBM SP (Seaborg) 173
NERSC web site (www.nersc.gov) 138
NIM 132
HPSS 115
Consulting services 93
Account support services 63
PDSF 37
Visualization services 20
Computer and Network Operations 11
Math server (Newton) 9
NERSC CVS server 8
Vis server (Escher) 8
Grid services 7

 

How long have you used NERSC?

 

TimeNumberPercent
less than 6 months 15 7.3%
6 months - 3 years 83 40.3%
more than 3 years 108 52.4%

 

What desktop systems do you use to connect to NERSC?

 

SystemResponses
UNIX Total 223
PC Total 116
Mac Total 69
Linux 156
Windows XP 79
OS X 58
Sun Solaris 31
Windows 2000 28
IBM AIX 16
MacOS 11
SGI IRIX 10
Windows ME/98 9
Compaq Tru-64 5
HP HPUX 5

 

Web Browser Used to Take Survey:

 

BrowserNumberPercent
Mozilla 110 52.6%
MSIE 6 41 19.6%
Safari 26 12.4%
Netscape 4 17 8.1%
Firefox 13 6.2%
Galeon 2 1.0%

 

Operating System Used to Take Survey:

 

OSNumberPercent
UNIX Total 95 45.5%
Windows Total 71 34.0%
MacIntosh Total 43 20.6%
Linux 80 38.3%
Windows XP 47 22.5%
Mac OS X 38 18.2%
Windows 2000 19 9.1%
SunOS 10 4.8%
MacOS 5 2.4%
Windows 98 4 1.9%
IRIX 2 1.0%
AIX 2 1.0%
HP-UX 1 0.5%
Windows NT 1 0.5%

 

Overall Satisfaction and Importance

 

  • Legend
  • Overall Satisfaction with NERSC
  • How important to you is?
  • Impact of NERSC's Flexible Work Option

 

Legend:

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied 5.50 - 6.49
Somewhat Satisfied 4.50 - 5.49
ImportanceAverage Score
Very Important 2.50 - 3.00
Somewhat Important 1.50 - 2.49
Significance of Change
significant increase
significant decrease
not significant

 

Overall Satisfaction with NERSC

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
Consulting and Support Services       3 8 40 146 197 6.67 0.63 0.30
NERSC security     2 12 3 45 121 183 6.48 0.90  
Mass storage facilities       14 4 60 93 171 6.36 0.88 0.24
Network connectivity     4 9 13 69 95 190 6.27 0.94 0.04
Available Software     1 11 16 66 83 177 6.24 0.90 0.19
Software management and configuration   1 1 14 11 64 78 169 6.19 0.99 0.15
Overall Satisfaction with NERSC 2 6 6 5 16 70 98 203 6.10 1.28 -0.27
Hardware management and configuration 2 5 11 7 19 62 76 182 5.89 1.39 -0.18
Available Computing Hardware 3 2 14 8 34 87 47 195 5.65 1.29 -0.48
Data analysis and visualization facilities     3 41 6 36 29 115 5.41 1.28  

 

How important to you is?

3=Very, 2=Somewhat, 1=Not important

 

Item123ResponsesAverageStd. Dev.
Available Computing Hardware   24 160 184 2.87 0.34
Overall Satisfaction with NERSC 1 23 164 188 2.87 0.36
Consulting and Support Services 2 42 142 186 2.75 0.46
Network connectivity 2 52 120 174 2.68 0.49
Hardware management and configuration 4 50 115 169 2.66 0.52
Mass storage facilities 15 62 84 161 2.43 0.66
Available Software 15 70 83 168 2.40 0.65
Software management and configuration 16 77 70 163 2.33 0.65
NERSC security 22 80 73 175 2.29 0.68
Data analysis and visualization facilities 49 49 34 132 1.89 0.79

 

NERSC participates in Berkeley Lab's Flexible Work Option (FWO) Pilot. FWO means some staff work 9 hours a day and are off one day every two weeks. NERSC always has on duty qualified staff for all areas. Have you noticed any changes specifically due to the FWO participation?

 

AnswerResponsesPercent
No 73 100%
Yes 0 0%

If you have noticed any effect, describe your experience.

Only comments other than a simple "No" or "Not noticed" are shown.

 

Service has always been exceptional.

We have noticed NO changes.

None, everything works perfectly fine.

No. On-duty 24/7 support is very important to our work and is part of the reasons that NERSC, and in particular PDSF, is so useful to us.

I have not noticed any changes. It sounds like a great plan to me.

No problems

No, I find the NERSC staff very responsive and easy to reach.

NO, sounds like a great program to me.

I always find that NERSC staff are available whenever I need assistance. It's a pleasant surprise to find that even at odd hours on the weekends, a telephone call is always answered by a live human being, instead of a machine. Praises to the staff!

I have experienced no negative changes. NERSC consulting staff (Hi, Harsh!) have even made themselves available to us from their home to assist with time-critical interactions.

No problems here - staff seem happier!

I've always been impressed with how hard (and well) NERSC staff work. If this makes them happier, great.

No. All services seem very well (and enthusiastically) supported.

 

All Satisfaction, Importance and Usefulness Ratings

  • Legend
  • All Satisfaction Topics - by Score
  • All Satisfaction Topics - by Number of Responses
  • All Importance Topics
  • All Usefulness Topics

 

Legend

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied 5.50 - 6.49
Somewhat Satisfied 4.50 - 5.49
Neutral 3.50 - 4.49
ImportanceAverage Score
Very Important 2.50 - 3.00
Somewhat Important 1.50 - 2.49
Not Important 1.00 - 1.49
Significance of Change
significant increase
significant decrease
not significant
UsefulnessAverage Score
Very Useful 2.50 - 3.00
Somewhat Useful 1.50 - 2.49

 

All Satisfaction Topics - by Score

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
HPSS: Reliability (data integrity)       5   16 97 118 6.74 0.67 0.13
CONSULT: Timely initial response to consulting questions       1 5 38 125 169 6.70 0.55 0.15
CONSULT: overall       3 4 38 132 177 6.69 0.60 0.35
Account support services   1 2 1 2 38 136 180 6.68 0.72 0.29
OVERALL: Consulting and Support Services       3 8 40 146 197 6.67 0.63 0.30
HPSS: Uptime (Availability)     1 3 1 25 89 119 6.66 0.70 0.12
CONSULT: Followup to initial consulting questions       4 5 34 122 165 6.66 0.66 0.17
CONSULT: Amount of time to resolve your issue     1 5 6 36 118 166 6.60 0.75 0.24
CONSULT: Quality of technical advice       4 7 42 112 165 6.59 0.69 0.05
HPSS: Overall satisfaction   1   4 2 33 84 124 6.56 0.80 0.10
PDSF: Overall satisfaction       3 1 11 31 46 6.52 0.84 0.11
Computer and Network Operations     1 5 7 31 85 129 6.50 0.83  
NERSC security     2 12 3 45 121 183 6.48 0.90  
WEB: Seaborg section       3 10 47 85 145 6.48 0.72 0.48
Network performance within NERSC (e.g. Seaborg to HPSS) 1     2 8 40 75 126 6.46 0.85 -0.08
SP SW: Fortran compilers     2 3 8 53 81 147 6.41 0.80 0.07
WEB: Accuracy of information 1     6 6 64 91 168 6.40 0.84 0.15
PDSF: Uptime (availability)     1 3 2 11 30 47 6.40 0.99 0.05
HPSS: Data transfer rates   1   3 8 41 66 119 6.40 0.84  
PDSF SW: Software environment       4 1 13 24 42 6.36 0.93 0.03
OVERALL: Mass storage facilities       14 4 60 93 171 6.36 0.88 0.24
SP SW: Software environment       5 9 70 71 155 6.34 0.73 0.10
WEB: HPSS section       4 11 30 49 94 6.32 0.85 0.58
WEB: NERSC web site overall (www.nersc.gov)       2 21 76 83 182 6.32 0.72 0.32
PDSF: Batch queue structure       4 3 13 25 45 6.31 0.95 0.31
WEB: Accounts section 1     6 7 41 55 110 6.28 0.97  
OVERALL: Network connectivity     4 9 13 69 95 190 6.27 0.94 0.04
TRAINING: New User's Guide 1     5 6 36 46 94 6.27 0.99 0.01
SP SW: Programming libraries     1 8 2 60 54 125 6.26 0.84 -0.01
SP: Uptime (Availability) 3 1 4 6 6 54 92 166 6.26 1.20 -0.16
SP SW: C/C++ compilers     2 5 6 35 46 94 6.26 0.95 0.04
HPSS: Data access time 1 1 1 6 8 40 61 118 6.25 1.08 -0.21
NIM     1 7 17 64 74 163 6.25 0.85 0.17
OVERALL: Available Software     1 11 16 66 83 177 6.24 0.90 0.19
PDSF SW: C/C++ compilers       5 1 12 20 38 6.24 1.02 -0.20
OVERALL: Software management and configuration   1 1 14 11 64 78 169 6.19 0.99 0.15
WEB: Timeliness of information     3 5 18 76 65 167 6.17 0.87 0.12
WEB: Software section     3 5 11 33 46 98 6.16 1.02 0.29
On-line help desk       10 9 35 45 99 6.16 0.97 0.14
PDSF SW: Programming libraries       5 1 14 16 36 6.14 1.02 0.14
HPSS: User interface (hsi, pftp, ftp)     3 8 12 41 53 117 6.14 1.02 0.16
SP SW: Applications software       10 6 37 38 91 6.13 0.96 0.13
CONSULT: Software bug resolution     2 11 7 33 47 100 6.12 1.08 0.48
E-mail announcements     2 12 6 25 47 92 6.12 1.14  
Remote network performance to/from NERSC (e.g. Seaborg to your home institution) 1 2 6 2 17 57 70 155 6.12 1.15 -0.00
OVERALL: Satisfaction with NERSC 2 6 6 5 16 70 98 203 6.10 1.28 -0.27
TRAINING: Web tutorials 1     8 8 37 38 92 6.10 1.06 0.03
SERVICES: Response to special requests (e.g. disk quota increases, etc.) 2 1 3 8 6 21 52 93 6.08 1.41 -0.27
WEB: Status and Statistics section 1   2 7 11 40 43 104 6.07 1.10 -0.00
SP: Disk configuration and I/O performance 2   8 11 13 53 60 147 5.94 1.28 -0.21
WEB: PDSF section     1 6 5 16 17 45 5.93 1.12 0.10
SERVICES: Allocations process 1 4 5 7 17 57 57 148 5.93 1.27 0.24
SP SW: General tools and utilities     4 8 14 43 33 102 5.91 1.06 -0.07
OVERALL: Hardware management and configuration 2 5 11 7 19 62 76 182 5.89 1.39 -0.18
WEB: Ease of finding information 1   9 6 30 78 54 178 5.89 1.09 0.09
PDSF SW: Fortran compilers       6 1 6 10 23 5.87 1.25 -0.16
PDSF: Batch wait time     1 6 5 19 14 45 5.87 1.08 -0.06
SP SW: Performance and debugging tools   2 5 7 16 54 34 118 5.84 1.14 0.27
PDSF SW: General tools and utilities   1   5 2 17 10 35 5.83 1.18 -0.10
PDSF SW: Applications software     1 6 2 14 10 33 5.79 1.17 -0.08
PDSF SW: Performance and debugging tools   1   5 4 15 10 35 5.77 1.19 0.46
SP: Seaborg overall 4 7 7 2 26 62 60 168 5.77 1.47 -0.66
PDSF: Ability to run interactively   1 5 5 3 16 17 47 5.68 1.45 -0.09
OVERALL: Available Computing Hardware 3 2 14 8 34 87 47 195 5.65 1.29 -0.48
WEB: Searching     4 14 6 27 20 71 5.63 1.24 0.19
WEB: SciDAC       12 4 11 13 40 5.62 1.23  
PDSF: Disk configuration and I/O performance   1 5 4 6 13 15 44 5.59 1.45 -0.10
TRAINING: NERSC classes: in-person       13   5 11 29 5.48 1.40 0.60
Vis server (Escher)       8 1 3 7 19 5.47 1.39 0.24
OVERALL: Data analysis and visualization facilities     3 41 6 36 29 115 5.41 1.28  
SERVICES: Visualization services     2 22 4 12 19 59 5.41 1.37 0.60
SP SW: Visualization software   2 2 18 4 19 17 62 5.40 1.41 0.32
SP: Ability to run interactively 3 5 12 24 22 52 38 156 5.34 1.50 -0.23
NERSC CVS server       21 2 8 14 45 5.33 1.35  
Live classes on the Access Grid 1     14   11 6 32 5.16 1.44 0.49
Grid services       18 3 5 9 35 5.14 1.31  
Escher SW: visualization software   1 1 9   4 6 21 5.10 1.58 0.35
Math server (Newton)     1 8 1 4 3 17 5.00 1.32 -0.20
Newton SW: application software 1 1 2 8   5 5 22 4.82 1.76  
SP: Batch queue structure 17 9 18 17 30 53 20 164 4.66 1.85 -1.03
SP: Batch wait time 26 16 36 14 27 32 10 161 3.84 1.90 -1.40

 

All Satisfaction Topics - by Number of Responses

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
OVERALL: Satisfaction with NERSC 2 6 6 5 16 70 98 203 6.10 1.28 -0.27
OVERALL: Consulting and Support Services       3 8 40 146 197 6.67 0.63 0.30
OVERALL: Available Computing Hardware 3 2 14 8 34 87 47 195 5.65 1.29 -0.48
OVERALL: Network connectivity     4 9 13 69 95 190 6.27 0.94 0.04
NERSC security     2 12 3 45 121 183 6.48 0.90  
WEB: NERSC web site overall (www.nersc.gov)       2 21 76 83 182 6.32 0.72 0.32
OVERALL: Hardware management and configuration 2 5 11 7 19 62 76 182 5.89 1.39 -0.18
Account support services   1 2 1 2 38 136 180 6.68 0.72 0.29
WEB: Ease of finding information 1   9 6 30 78 54 178 5.89 1.09 0.09
CONSULT: overall       3 4 38 132 177 6.69 0.60 0.35
OVERALL: Available Software     1 11 16 66 83 177 6.24 0.90 0.19
OVERALL: Mass storage facilities       14 4 60 93 171 6.36 0.88 0.24
CONSULT: Timely initial response to consulting questions       1 5 38 125 169 6.70 0.55 0.15
OVERALL: Software management and configuration   1 1 14 11 64 78 169 6.19 0.99 0.15
WEB: Accuracy of information 1     6 6 64 91 168 6.40 0.84 0.15
SP: Seaborg overall 4 7 7 2 26 62 60 168 5.77 1.47 -0.66
WEB: Timeliness of information     3 5 18 76 65 167 6.17 0.87 0.12
CONSULT: Amount of time to resolve your issue     1 5 6 36 118 166 6.60 0.75 0.24
SP: Uptime (Availability) 3 1 4 6 6 54 92 166 6.26 1.20 -0.16
CONSULT: Followup to initial consulting questions       4 5 34 122 165 6.66 0.66 0.17
CONSULT: Quality of technical advice       4 7 42 112 165 6.59 0.69 0.05
SP: Batch queue structure 17 9 18 17 30 53 20 164 4.66 1.85 -1.03
NIM     1 7 17 64 74 163 6.25 0.85 0.17
SP: Batch wait time 26 16 36 14 27 32 10 161 3.84 1.90 -1.40
SP: Ability to run interactively 3 5 12 24 22 52 38 156 5.34 1.50 -0.23
SP SW: Software environment       5 9 70 71 155 6.34 0.73 0.10
Remote network performance to/from NERSC (e.g. Seaborg to your home institution) 1 2 6 2 17 57 70 155 6.12 1.15 -0.00
SERVICES: Allocations process 1 4 5 7 17 57 57 148 5.93 1.27 0.24
SP SW: Fortran compilers     2 3 8 53 81 147 6.41 0.80 0.07
SP: Disk configuration and I/O performance 2   8 11 13 53 60 147 5.94 1.28 -0.21
WEB: Seaborg section       3 10 47 85 145 6.48 0.72 0.48
Computer and Network Operations     1 5 7 31 85 129 6.50 0.83  
Network performance within NERSC (e.g. Seaborg to HPSS) 1     2 8 40 75 126 6.46 0.85 -0.08
SP SW: Programming libraries     1 8 2 60 54 125 6.26 0.84 -0.01
HPSS: Overall satisfaction   1   4 2 33 84 124 6.56 0.80 0.10
HPSS: Uptime (Availability)     1 3 1 25 89 119 6.66 0.70 0.12
HPSS: Data transfer rates   1   3 8 41 66 119 6.40 0.84  
HPSS: Reliability (data integrity)       5   16 97 118 6.74 0.67 0.13
HPSS: Data access time 1 1 1 6 8 40 61 118 6.25 1.08 -0.21
SP SW: Performance and debugging tools   2 5 7 16 54 34 118 5.84 1.14 0.27
HPSS: User interface (hsi, pftp, ftp)     3 8 12 41 53 117 6.14 1.02 0.16
OVERALL: Data analysis and visualization facilities     3 41 6 36 29 115 5.41 1.28  
WEB: Accounts section 1     6 7 41 55 110 6.28 0.97  
WEB: Status and Statistics section 1   2 7 11 40 43 104 6.07 1.10 -0.00
SP SW: General tools and utilities     4 8 14 43 33 102 5.91 1.06 -0.07
CONSULT: Software bug resolution     2 11 7 33 47 100 6.12 1.08 0.48
On-line help desk       10 9 35 45 99 6.16 0.97 0.14
WEB: Software section     3 5 11 33 46 98 6.16 1.02 0.29
WEB: HPSS section       4 11 30 49 94 6.32 0.85 0.58
TRAINING: New User's Guide 1     5 6 36 46 94 6.27 0.99 0.01
SP SW: C/C++ compilers     2 5 6 35 46 94 6.26 0.95 0.04
SERVICES: Response to special requests (e.g. disk quota increases, etc.) 2 1 3 8 6 21 52 93 6.08 1.41 -0.27
E-mail announcements     2 12 6 25 47 92 6.12 1.14  
TRAINING: Web tutorials 1     8 8 37 38 92 6.10 1.06 0.03
SP SW: Applications software       10 6 37 38 91 6.13 0.96 0.13
WEB: Searching     4 14 6 27 20 71 5.63 1.24 0.19
SP SW: Visualization software   2 2 18 4 19 17 62 5.40 1.41 0.32
SERVICES: Visualization services     2 22 4 12 19 59 5.41 1.37 0.60
PDSF: Uptime (availability)     1 3 2 11 30 47 6.40 0.99 0.05
PDSF: Ability to run interactively   1 5 5 3 16 17 47 5.68 1.45 -0.09
PDSF: Overall satisfaction       3 1 11 31 46 6.52 0.84 0.11
PDSF: Batch queue structure       4 3 13 25 45 6.31 0.95 0.31
WEB: PDSF section     1 6 5 16 17 45 5.93 1.12 0.10
PDSF: Batch wait time     1 6 5 19 14 45 5.87 1.08 -0.06
PDSF: Disk configuration and I/O performance   1 5 4 6 13 15 44 5.59 1.45 -0.10
NERSC CVS server       21 2 8 14 45 5.33 1.35  
PDSF SW: Software environment       4 1 13 24 42 6.36 0.93 0.03
WEB: SciDAC       12 4 11 13 40 5.62 1.23  
PDSF SW: C/C++ compilers       5 1 12 20 38 6.24 1.02 -0.20
PDSF SW: Programming libraries       5 1 14 16 36 6.14 1.02 0.14
PDSF SW: General tools and utilities   1   5 2 17 10 35 5.83 1.18 -0.10
PDSF SW: Performance and debugging tools   1   5 4 15 10 35 5.77 1.19 0.46
Grid services       18 3 5 9 35 5.14 1.31  
PDSF SW: Applications software     1 6 2 14 10 33 5.79 1.17 -0.08
Live classes on the Access Grid 1     14   11 6 32 5.16 1.44 0.49
TRAINING: NERSC classes: in-person       13   5 11 29 5.48 1.40 0.60
PDSF SW: Fortran compilers       6 1 6 10 23 5.87 1.25 -0.16
Newton SW: application software 1 1 2 8   5 5 22 4.82 1.76  
Escher SW: visualization software   1 1 9   4 6 21 5.10 1.58 0.35
Vis server (Escher)       8 1 3 7 19 5.47 1.39 0.24
Math server (Newton)     1 8 1 4 3 17 5.00 1.32 -0.20

 

All Importance Topics

3=Very important, 2=Somewhat important, 1=Not important

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
OVERALL: Available Computing Hardware   24 160 184 2.87 0.34
OVERALL: Satisfaction with NERSC 1 23 164 188 2.87 0.36
OVERALL: Consulting and Support Services 2 42 142 186 2.75 0.46
SERVICES: Allocations process 3 32 99 134 2.72 0.50
OVERALL: Network connectivity 2 52 120 174 2.68 0.49
Account support services 2 49 108 159 2.67 0.50
OVERALL: Hardware management and configuration 4 50 115 169 2.66 0.52
SERVICES: Response to special requests (e.g. disk quota increases, etc.) 6 21 65 92 2.64 0.60
Computer and Network Operations 4 43 77 124 2.59 0.56
OVERALL: Mass storage facilities 15 62 84 161 2.43 0.66
OVERALL: Available Software 15 70 83 168 2.40 0.65
OVERALL: Software management and configuration 16 77 70 163 2.33 0.65
NERSC security 22 80 73 175 2.29 0.68
E-mail announcements 30 49 20 99 1.90 0.71
OVERALL: Data analysis and visualization facilities 49 49 34 132 1.89 0.79
SERVICES: Visualization services 33 24 15 72 1.75 0.78
NERSC CVS server 31 14 14 59 1.71 0.83
Grid services 38 6 8 52 1.42 0.75

 

All Usefulness Topics

3=Very useful, 2=Somewhat useful, 1=Not useful

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
E-mail announcements 1 46 121 168 2.71 0.47
TRAINING: New User's Guide 6 20 59 85 2.62 0.62
TRAINING: Web tutorials 7 31 50 88 2.49 0.64
MOTD (Message of the Day) 9 72 71 152 2.41 0.60
SERVICES: Announcements web archive 11 70 70 151 2.39 0.62
Phone calls from NERSC 24 35 43 102 2.19 0.79
Live classes on the Access Grid 17 12 10 39 1.82 0.82
TRAINING: NERSC classes: in-person 22 8 9 39 1.67 0.84

Hardware Resources

 

  • Legend
  • Hardware Satisfaction - by Score
  • Hardware Satisfaction - by Platform
  • Max Processors Effectively Used on Seaborg
  • Hardware Comments

 

Legend:

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied 5.50 - 6.49
Somewhat Satisfied 4.50 - 5.49
Neutral 3.50 - 4.49
Significance of Change
significant decrease
not significant

 

Hardware Satisfaction - by Score

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
HPSS: Reliability (data integrity)       5   16 97 118 6.74 0.67 0.13
HPSS: Uptime (Availability)     1 3 1 25 89 119 6.66 0.70 0.12
HPSS: Overall satisfaction   1   4 2 33 84 124 6.56 0.80 0.10
PDSF: Overall satisfaction       3 1 11 31 46 6.52 0.84 0.11
Network performance within NERSC (e.g. Seaborg to HPSS) 1     2 8 40 75 126 6.46 0.85 -0.08
PDSF: Uptime (availability)     1 3 2 11 30 47 6.40 0.99 0.05
HPSS: Data transfer rates   1   3 8 41 66 119 6.40 0.84  
PDSF: Batch queue structure       4 3 13 25 45 6.31 0.95 0.31
SP: Uptime (Availability) 3 1 4 6 6 54 92 166 6.26 1.20 -0.16
HPSS: Data access time 1 1 1 6 8 40 61 118 6.25 1.08 -0.21
HPSS: User interface (hsi, pftp, ftp)     3 8 12 41 53 117 6.14 1.02 0.16
Remote network performance to/from NERSC (e.g. Seaborg to your home institution) 1 2 6 2 17 57 70 155 6.12 1.15 -0.00
SP: Disk configuration and I/O performance 2   8 11 13 53 60 147 5.94 1.28 -0.21
PDSF: Batch wait time     1 6 5 19 14 45 5.87 1.08 -0.06
SP: Seaborg overall 4 7 7 2 26 62 60 168 5.77 1.47 -0.66
PDSF: Ability to run interactively   1 5 5 3 16 17 47 5.68 1.45 -0.09
PDSF: Disk configuration and I/O performance   1 5 4 6 13 15 44 5.59 1.45 -0.10
Vis server (Escher)       8 1 3 7 19 5.47 1.39 0.24
SP: Ability to run interactively 3 5 12 24 22 52 38 156 5.34 1.50 -0.23
Math server (Newton)     1 8 1 4 3 17 5.00 1.32 -0.20
SP: Batch queue structure 17 9 18 17 30 53 20 164 4.66 1.85 -1.03
SP: Batch wait time 26 16 36 14 27 32 10 161 3.84 1.90 -1.40

 

Hardware Satisfaction - by Platform

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
SP: Uptime (Availability) 3 1 4 6 6 54 92 166 6.26 1.20 -0.16
SP: Disk configuration and I/O performance 2   8 11 13 53 60 147 5.94 1.28 -0.21
SP: Seaborg overall 4 7 7 2 26 62 60 168 5.77 1.47 -0.66
SP: Ability to run interactively 3 5 12 24 22 52 38 156 5.34 1.50 -0.23
SP: Batch queue structure 17 9 18 17 30 53 20 164 4.66 1.85 -1.03
SP: Batch wait time 26 16 36 14 27 32 10 161 3.84 1.90 -1.40
HPSS: Reliability (data integrity)       5   16 97 118 6.74 0.67 0.13
HPSS: Uptime (Availability)     1 3 1 25 89 119 6.66 0.70 0.12
HPSS: Overall satisfaction   1   4 2 33 84 124 6.56 0.80 0.10
HPSS: Data transfer rates   1   3 8 41 66 119 6.40 0.84  
HPSS: Data access time 1 1 1 6 8 40 61 118 6.25 1.08 -0.21
HPSS: User interface (hsi, pftp, ftp)     3 8 12 41 53 117 6.14 1.02 0.16
PDSF: Overall satisfaction       3 1 11 31 46 6.52 0.84 0.11
PDSF: Uptime (availability)     1 3 2 11 30 47 6.40 0.99 0.05
PDSF: Batch queue structure       4 3 13 25 45 6.31 0.95 0.31
PDSF: Batch wait time     1 6 5 19 14 45 5.87 1.08 -0.06
PDSF: Ability to run interactively   1 5 5 3 16 17 47 5.68 1.45 -0.09
PDSF: Disk configuration and I/O performance   1 5 4 6 13 15 44 5.59 1.45 -0.10
Network performance within NERSC (e.g. Seaborg to HPSS) 1     2 8 40 75 126 6.46 0.85 -0.08
Remote network performance to/from NERSC (e.g. Seaborg to your home institution) 1 2 6 2 17 57 70 155 6.12 1.15 -0.00
Vis server (Escher)       8 1 3 7 19 5.47 1.39 0.24
Math server (Newton)     1 8 1 4 3 17 5.00 1.32 -0.20

 

What is the maximum number of processors your code can effectively use for parallel computations on Seaborg?   140 responses

 

Num ProcsNum
Responses
6,000+ 6
4,096+ 4
4,096 6
2,048-4,096 2
2,048 9
1,024-2,048 11
1,024 16
512-1,024 5
512 16
256-512 3
256 11
128-256 5
128 15
64-128 3
64 9
32-64 3
32 5
16 7
<16 4

 

Hardware Comments:   51 responses

One user made the general comment that hardware is "very stable and satisfactory"; 50 other users commented on specific systems.

 

Comments on NERSC's IBM SP and computational requirements

[Read all 37 responses]

 

21   Turnaround too slow
16   Queue /job mix policies should be adjusted
11   Seaborg needs to be upgraded / computational requirements
5   GPFS and upgrade problems
4   Provide more interactive and debugging resources
3   Needs more disk

Comments on NERSC's PDSF Cluster

[Read all 10 responses]

 

6   Problems with login nodes / slow access
3   Disk vault comments
2   Long job waits / need more cycles

Comments on NERSC's HPSS Storage System

[Read all 4 responses]

 

2   Needs better user interfaces
1   Good capacity
1   Will use soon

Comments on Networking Performance

[Read all 4 responses]

 

 

Comments on NERSC's IBM SP and computational requirements:   37 responses

 

Turnaround too slow:   21 responses

For some reason, the recent turnaround times on seaborg have been atrocious. This doesn't seem to be a hardware problem, the machine is always up. The turnaround is so bad that my group's computers are now 'faster' than seaborg, which is totally unexpected and I don't understand why this is the case. My seaborg usage dropped for that reason, if I have to wait a week for a job it's faster to run them on PCs. One of my students constantly has this turnaround problem and just gave up on seaborg. I've never seen it like that before.

Seaborg had been a joy to use for several years, much better than other high performance systems. But in the last few months the scheduling system has been much harder to use, and my group has had a hard time getting our main jobs on the machine.

The batch wait times are too long. NERSC needs to provide good turnaround to the majority of its users, not the minority. NERSC needs a mix of platforms to support small and large MPP jobs.

Seaborg has now become too "small" to handle so many users running jobs of 1,024 processors or more. The batch wait time is now much too long.

Seaborg hardware seems to be OK. The problem is the wait time and queue management. ...

The queue wait time is abysmal.

Strong dissatisfaction with Seaborg because it has been difficult to work on computing projects since the end of June, due to the batch queue priorities. My main projects have been on hold since then. ...

Batch wait times are getting quite long. This really diminishes the worth of the resource.

Extraordinarily long delays in the queues have made seaborg almost unusable at times. NERSC seems to have lost sight of its mission of providing a resource that meets _user's needs_ to advance energy research. ...

Only issue to with long wait times for moderate (less than 32 node) runs during 2004. These have been very long (2 weeks) in some cases. Decreasing my run time helped.

Seaborg is very oversubscribed. Queue wait times are long.

The batch wait time is at times very long. I think 2-3 days is reasonable, but 5 is unacceptable. Our group is typically awarded hundreds of thousands of hours and getting all of time used is reliant on the number of jobs we are able to get through the queue. Since we run on smaller numbers of processors, we need to get a larger number of jobs through to meet the quota.

Queue / job mix policies should be adjusted:   16 responses

More capacity calculations for real science. Put more emphasize (higher priority) on medium size jobs where most science are done.

The large number of processors on Seaborg are useless to me. Given the amount of computer time that I am allocated, I cannot run large codes that use more than one node. I think that NERSC should consider the needs of users who are not allocated enough time to make use of the large number of nodes on Seaborg.

NERSC is now completely oversubscribed. The INCITE program has been a disaster for the average user. INCITE had pretty much taken over the machine before the upgrade. ...

The queue structure is so ludicrously biased toward large jobs that it is sometimes impossible to use one's time with a code that is optimum at 128-256 processors. That limit is set by the physics of the problem I'm solving, and no amount of algorithmic tinkering or optimization is going to change it much. NERSC gave my research group time in response to our ERCAP request, but to actually use the time we won, we wind up having to pay extra to use the express queue. Otherwise we would spend a week or more waiting in the regular queue each time we need to restart and job, and we'd never actually be able to use the time we were granted. I understand that NERSC wants to encourage large jobs, but the current queue structure guarantees that anyone who can't scale to 1000 processors is going to have to use the premium queue to get anything done.

I do not understand why the batch structure in seaborg discourages batch jobs using moderate amount of processors. Isn't the queue already long enough?

... Allowing various groups (often doing non-energy related research) to "jump the queue" is frustrating and bewildering. Favoritism towards "embarrassingly parallel" codes employing huge numbers of processors on Seaborg makes sense only if NERSC can provide adequate resources elsewhere (including fast inter-proc communication, not just a beowulf cluster) for smaller jobs. Again, local clusters are not really a solution to this problem, because the codes in question use limited numbers of procs in the first place because they are communication intensive - moving these codes from machines with fast interconnects like seaborg to local clusters using myrinet etc is strongly damaging to performance.

... The summer clog up problem is due to mismanagement (from MICS/OMB constraints): The 50% discount on >512ps jobs plus the "head of the line" priority given to 3 INCITE PI's, blocked use by nearly everyone else. Wait time on 48hr 512ps jobs was more than 3 weeks in Sept. NERSC has persistently over-allocated the machine. A more moderate priority (like 2 day age priority) would have been adequate. Seaborg need a complete rethinking of the batch queue system.

Time spent waiting in Seaborg queues has increased in the past year. Is this from greater use of the system or less efficient queuing? It would be nice to have this issue addressed, with either greater hardware resources or better system configuration. ...

Seaborg is seriously oversubscribed this year and is much less useful to the vast majority of users than in previous years. Policies have been severely distorted in order to meet milestones and this has been a great dis-service to almost all users of seaborg. It is very important to communicate to those who set the milestones that users are not well served by requiring that seaborg mainly run very large jobs. Very few users or projects are served well (even those that can run large jobs) by such policies. Raw allocation (CPU-hours) is the most important thing for most users, and users are in the best position to determine the optimum number of CPUs to use.

I'm getting less research accomplished now that the seaborg queues give much higher priority to larger jobs (512 and higher) because to get better turn-around everyone tries to run jobs with large numbers of processors and so fewer jobs can run simultaneously and so the queue times have become very long, as you surely know. Many projects that involve cutting edge research are attempting to run in parameter regimes never before attempted and so require close monitoring and typically many trial runs. It is not efficient to do this with large jobs that sit in the queue for more than a week. For my current projects I would prefer to run a 12 hour job with, say, 128 processors every day, as was the situation a year ago, than to have to wait more than a week and hope my 512-processor job then runs at least 12 hours. Runs fail either because I've increased the parameters too far or because a processor or node goes down during my run, which happens much more frequently now because of the larger number of processors per job.

The INCITE program should be stopped immediately. The program destroys scientific computing progress of general public for the sake of a few. Until the INCITE is stopped, the program should be managed more strictly. For example, the INCITE awardees should NOT be allowed to occupy SEABORG above a certain percentage (much more restriction of allowed time for an INCITE individual). Users who require a large number of parallel processors should be given priorities to SEABORG without the INCITE program. NERSC should have more computer resources available for users who do not require massively parallel computing.

We perform a large variety of computations with our codes. However, the simulations we need to perform most often 'only' scale well to 100 or so processors on seaborg. I appreciate that NERSC has been under pressure to show usage by 1000-processor jobs, and that has led to the queue structure preferences that we have seen over the past year. However, some large-scale computations run algorithms that need better communication to computation speeds in order to scale well. Devising scheduling policies that favor only the computations that run well on large parts of seaborg discriminates against other applications. Optimally, a supercomputing center should offer a mix of hardware so that a variety of computation needs can be met. Otherwise, seaborg should be partitioned to allow different types of computations without scheduling conflicts.

Seaborg needs to be upgraded / computational requirements:   11 responses

Seaborg, though very well maintained, is getting old and slow. It would be great if you had a machine in the pipeline now, as Seaborg becomes less competitive with other machines. (I'm thinking in specific of SP4 machines like Datastar at SDSC/NPACI.)

The fact that seaborg's single-cpu performance is lagging well behind my desktop machines is making it seem much less attractive. I look forward to being able to run large, parallel-processing jobs on a machine with respectable single-cpu performance.

Seaborg is becoming obsolete. It would be great to upgrade to an SP4 fairly soon.

... SP3 is stable and reliable platform but it is becoming a thing of the past these days. Time to look for another large system?

Need the new computer system sooner. Need access to cluster and vector technology in order to keep stagnant models up to date.

... I typically run a number of 'small' (eg, 32 proc) jobs for multiple days, rather than large number of processors for a short time. While the code will scale efficiently to very large jobs, the computer time and human time is much better spent on smaller jobs, when developing new physics models and algorithms. The NERSC emphasis on massively parallel computing also means that the main code runs only on a limited subset of other machines that have PETSc, although in principle it would run well on many different systems that I have access to. The regular Wed downtimes on Seaborg are very inefficient from this point of view, when the batch queues are drained. Usually only half or less of the usual batch job productivity is possible during the entire maintenance week. The backfill queue is only marginally useful for most of my jobs. ...

The only thing I can say is, that the wallclock-times of my jobs are quite large as soon as I do my most complex computations. Sometimes I have to split one computation in two parts. So more powerful hardware would be the icing on the cake, but even so I'm greatly satisfied with the computational performance of SEABORG.

It's too bad that Alvarez [a linux cluster] is going down; that was a handy system to use for test runs.

My statement of 192 as the number of processors my code can use effectively refers to the 1 code I have been running on it in FY2004. Some codes, which only run effectively on smaller numbers of processors have been moved elsewhere. These include codes which run best on < 16 processors! I object to the penalties for 'small' jobs. Having run out of allocations half way through the current allocation period, clearly I feel that the current compute resources are inadequate. Of course, they would be more adequate if Seaborg were capable of delivering 50% of peak, which is what we used to get out of the old Cray C90. As it is 20% is more typical from a modest number of processors, and by the time we push this to 192 processors the number is more like 10-15%.

GPFS and upgrade problems:   5 responses

I can't use more than 32 seaborg nodes for most models, if I try running with 64 or more I am getting I/O errors from gpfs. No idea why, the code runs fine on a Pwr4 with Colony or Federation switch. The code itself should easily scale up to 128 CPUs for many runs.

Is it now safe to run programs on NERSC's seaborg? Has the IBM bug issue been resolved? Please send us an email about the update.

Remote network seems to drop out without warning. I can press return and get a new line, but can't get anything to execute. Tried ^x^c, ^q escape, but still getting only the new line, nothing executes. Looked at MOTD at nersc home page, but nothing to indicate its gone down. Nothing works. Is anyone watching this?

The new OS has been a bit of a disaster for us. My production code now runs at half of the speed it had before. Also, since poe+ doesn't seem to be working I cannot document the exact speed decrease but I know it's slower since runs that used to take about 8-12 hours now take 2 days and must be split over two runs.

... Now with the degraded status post upgrade it is hard to know what the status will be like.

Provide more interactive and debugging resources:   4 responses

... I do appreciate the relatively good interactive access to Seaborg, since it is crucial to code development. It would be nice to maintain some priority for interactive and debug jobs later in the day.

... Interactive performance continues to be suboptimal. See my comments at the end for greater detail on this problem. [A problem that I bring up every year is the quality of interactive service. Although this has improved since the last survey, the lack of ability to do small debugging runs interactively (at least with any reliability) is a problem. Would it not be possible to set aside a few nodes that could run with IP protocol (rather than US), in order to create a pool of processors where users could simultaneously run in parallel?]

Only using Seaborg at this point. My only real complaint is that the wait times for debug jobs with a large number of processors are rather inconsistent and hard to predict.

turn around time for small jobs ( ~ 4 nodes) is too long. Sometimes, the SP is dominated by large jobs and even debug jobs have to wait for sometime.

Needs more disk:   3 responses

At the upper end of data storage demands, our jobs quickly fill up allocated scratch space. Even 512 Gbytes is too little and I heard scratch space is always a concern. Should more be added? ...

Default scratch space of 250GB is somewhat small for very large jobs that tend to write a lot of data to the disk. To deal with this I asked for temporary allocation of 1TB of scratch space.

Need Terabytes of scratch storage to work with results large simulations. ...

 

Comments on NERSC's PDSF Cluster:   10 responses

 

Problems with login nodes / slow access:   5 responses

Sometimes pdsf login nodes are very slow.

1. I experience frequent (and rather frustrating) connectivity problems from LBL to PDSF (long latency).
2. PDSF's interactive nodes are almost always overburdened.

My only complaint is that sometimes the system (PDSF) slows down. For example, a simple "ls" command will take seconds to execute, probably due to some jobs accessing some directory (such as /home) heavily. This can be really frustrating.

My office is in LBL, Building 50 and I log in to PDSF using FSecure SSH from my PC I am not sure where the problem lies (NERSC, LBL, Building 50 itself...) but on some mornings the network connection is INCREDIBLY slow. It is bad enough that PDSF becomes temporarily useless to me. It can take 5 minutes just to process the "exit" and "logout" commands. Things always improve during the day and by the end of the afternoon the network is so fast it is no longer a factor in how long it takes me to do anything.

There seem to be frequent network problems, on the order of once per week during the day. This can be frustrating.

NERSC response: The PDSF support team has made it possible to run interactively on the batch nodes (there is a FAQ that documents these procedures). They also recently purchased replacement login nodes that are being tested now and should go into production in December 2004. They are top of the line opterons with twice as much memory as the old nodes.

Disk vault comments:   3 responses

The datavault disks are invaluable to me.

Since aztera died I am limited by the disk vault I/O resource bottleneck.

The cluster resources of PDSF have been useful and consistent in disk vaults and computer processes. However, the simultaneous load limit on the disk vaults limits the number of jobs we can run. ...

Long job waits / need more cycles:   2 responses

Re: PDSF concurrency: Analysis of each event is independent, so in principle with millions of events I could use millions of processors. In practice for a typical analysis pass I submit from 20 to 200 jobs with each job taking a few hours at most to finish. Of course, competing with other users, some jobs wait in the queue for a day or two. ...

We submit single jobs (many) to batch queues via Globus Grid 2003 gatekeepers. Need more hardware resources in PDSF.

NERSC response: 64 3.6 GHz Xeons were added to the PDSF cluster in November 2004. This is 25% more CPU's, and they are almost twice as fast as the older CPU's.

 

Comments on NERSC's HPSS Storage System:   4 responses

 

Needs better user interfaces:

... I harp on this annually. Cannot someone in the HPSS consortium spend the couple man-days required to put some standard shell interfaces to hsi? This would improve productivity immeasurably. ...

... Need to be able to transfer data between HPSS and LLNL storage directly - at the moment this is very tedious and error prone...

Good capacity:

... The very large storage capacity of HPSS has been key to our work.

Will use soon:

... Re: HPSS: I have not directly used HPSS myself yet. I will need to do so soon though. ...

 

Comments on Network Performance:   4 responses

 

Connection from HPSS to MIT is still rather slow and makes downloading/visualizing large runs a chore, often an overnight chore.

The NERSC connection to the WAN is fine but there is a gap in services for people doing distributed computing in that it is hard to get all the people lined up needed to diagnose application end-to-end performance. We have difficulties sometimes with our BNL-NERSC performance.

... Re: network: Generating graphics on a PDSF computer and displaying in Seattle is noticeably slower than generating the graphics on a local computer. An increase in the network speed would be nice.

Data transfer between NERSC and our institution is made at just a moderate rate, which is sometimes difficult to transfer a large data set for visualization. [Seaborg user at MIT]

Software

 

  • Legend
  • Software Satisfaction - by Score
  • Software Satisfaction - by Platform
  • Software Comments

 

Legend:

SatisfactionAverage Score
Mostly Satisfied 5.50 - 6.49
Somewhat Satisfied 4.50 - 5.49
Significance of Change
not significant

 

Software Satisfaction - by Score

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
SP SW: Fortran compilers     2 3 8 53 81 147 6.41 0.80 0.07
PDSF SW: Software environment       4 1 13 24 42 6.36 0.93 0.03
SP SW: Software environment       5 9 70 71 155 6.34 0.73 0.10
SP SW: Programming libraries     1 8 2 60 54 125 6.26 0.84 -0.01
SP SW: C/C++ compilers     2 5 6 35 46 94 6.26 0.95 0.04
PDSF SW: C/C++ compilers       5 1 12 20 38 6.24 1.02 -0.20
PDSF SW: Programming libraries       5 1 14 16 36 6.14 1.02 0.14
SP SW: Applications software       10 6 37 38 91 6.13 0.96 0.13
SP SW: General tools and utilities     4 8 14 43 33 102 5.91 1.06 -0.07
PDSF SW: Fortran compilers       6 1 6 10 23 5.87 1.25 -0.16
SP SW: Performance and debugging tools   2 5 7 16 54 34 118 5.84 1.14 0.27
PDSF SW: General tools and utilities   1   5 2 17 10 35 5.83 1.18 -0.10
PDSF SW: Applications software     1 6 2 14 10 33 5.79 1.17 -0.08
PDSF SW: Performance and debugging tools   1   5 4 15 10 35 5.77 1.19 0.46
SP SW: Visualization software   2 2 18 4 19 17 62 5.40 1.41 0.32
Escher SW: visualization software   1 1 9   4 6 21 5.10 1.58 0.35
Newton SW: application software 1 1 2 8   5 5 22 4.82 1.76  

 

Software Satisfaction - by Platform

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
SP SW: Fortran compilers     2 3 8 53 81 147 6.41 0.80 0.07
SP SW: Software environment       5 9 70 71 155 6.34 0.73 0.10
SP SW: Programming libraries     1 8 2 60 54 125 6.26 0.84 -0.01
SP SW: C/C++ compilers     2 5 6 35 46 94 6.26 0.95 0.04
SP SW: Applications software       10 6 37 38 91 6.13 0.96 0.13
SP SW: General tools and utilities     4 8 14 43 33 102 5.91 1.06 -0.07
SP SW: Performance and debugging tools   2 5 7 16 54 34 118 5.84 1.14 0.27
SP SW: Visualization software   2 2 18 4 19 17 62 5.40 1.41 0.32
PDSF SW: Software environment       4 1 13 24 42 6.36 0.93 0.03
PDSF SW: C/C++ compilers       5 1 12 20 38 6.24 1.02 -0.20
PDSF SW: Programming libraries       5 1 14 16 36 6.14 1.02 0.14
PDSF SW: Fortran compilers       6 1 6 10 23 5.87 1.25 -0.16
PDSF SW: General tools and utilities   1   5 2 17 10 35 5.83 1.18 -0.10
PDSF SW: Applications software     1 6 2 14 10 33 5.79 1.17 -0.08
PDSF SW: Performance and debugging tools   1   5 4 15 10 35 5.77 1.19 0.46
Escher SW: visualization software   1 1 9   4 6 21 5.10 1.58 0.35
Newton SW: application software 1 1 2 8   5 5 22 4.82 1.76  

 

Comments about Software:   15 responses

 

6   Suggestions / requests for improvements
4   Comments on debugging and performance tools
4   Some dissatisfaction
2   Satisfied
Suggestions / requests for improvements:   6 responses

It would be nice to have uniform support for 64-bit including GNU software. But consultants quickly addressed and solved problems.

Loadleveler is a mess. In there nothing better?

... Running multiple processors interactively is such a pain.

I would like to have VASP available.

In general, the NERSC software configuration suits my needs very well.
One anticipated development would be welcome: In order to take advantage of some of latest enhancements to the Parallel Environment on AIX, one needs to compile with -q64. However, we require HDF5 for our I/O. To my understanding there is no HDF library that has been build to link to a 64-bit addressing code. Can this be addressed so that we can make better use of the system?

NERSC response: 64-bit versions of the HDF5 library are available on Seaborg. Refer to the I/O Libraries on Seaborg web page for more information.

My applications involve the discretization of nonlinear operators. To solve these problems I have used the Toolkit for Advanced Optimization (TAO) and PETSc. Its nice to have these latest versions of these packages on the system, but I have installed them myself on a couple of occasions.

Comments on debugging and performance tools:   4 responses

Totalview is such a pain. ...

need better debugging and performance evaluation tools. The totalview is not so great. The xprofiler is great. But need better performance evaluation tools for MPI communication times, etc.

Part of the problem here is my lack of knowledge of performance, debugging, general tools and debugging. This may be as much my fault as anyone, but for some reason I have not learned how to find them and learn about them. I do believe that there was a one line lecture on some debugging tool but either I was not available at that time or the type of network it was offered on was not available to me.

The loss of poe+ after the OS upgrade in September 2004 is a significant problem.

Some dissatisfaction:   4 responses

I miss Linux's less command when I use Seaborg.

The IBM compilers tend to be annoying, but that's mostly because I'm used to GNU C.

f-secure does not seem to have any graphics capabilities. So use of visualization software seems limited.

Connections to Escher/Newton are too slow. There are considerable amount of time delay when using AVS/Mathematica remotely from Escher/Newton.

Satisfied:   2 responses

My uses are limited to a few compilers and IDL. Both of these work up to my satisfaction.

We have been very much satisfied with NERSC's software resources so far.

 

Security and One Time Passwords

Have any NERSC security procedures affected the way you work at NERSC?

 

AnswerResponsesPercent
Yes 27 13.4%
No 154 76.2%
Not Sure 21 10.4%

If so, how?

[Read all 27 responses]

 

10   Comments about passwords
4   Comments about access to HPSS
4   Need for shared accounts
4   Comments about network access / file transfers
5   Other comments

Do you have any experience accessing other sites using one-time passwords?

 

AnswerResponsesPercent
Yes 74 36.6%
No 104 51.5%
Not Sure 24 11.9%

Please make any comments that you think may be helpful to NERSC regarding one-time password authentication.

[Read all 52 responses]

 

20   Please don't use them / very inconvenient
9   If you have to, only one key for multiple sites
7   You should do this / works at our site
6   If it is implemented in a convenient way
5   OK / only a minor nuisance
5   Don't know / other

 

Have any NERSC security procedures affected the way you work at NERSC? If so, how?   27 responses

Comments about passwords:   10 responses

Choosing a valid password is close to impossible, the procedure is close to ridiculous and doesn't add security anyway. I use ssh keys to log in, that's far better. Unfortunately I have to change the regular password from time to time and that is most annoying. I always have to write down the new password if I manage to find one that the machine accepts.

I am always forgetting this awful password that I used on no other machines than this one. But that's fine if this is the way it has to be.

... Also, changing passwords under the tight controls is very difficult. Worse thing is that in the end, I almost always need to write it down somewhere because nearly anything memorable is not acceptable. But writing down passwords is almost always a bad idea. It seems ridiculous that the security folks pretend that this is not a problem.

In the past, the requirement to continually come up with new passwords has made accessing NERSC difficult.

Password updates are inconveniently frequent.

I often have problems with expired password

NIM, HPSS, and Seaborg etc. have different passwords as well as different mandatory update periods, which are difficult to keep track and manage.

Why can't I have one password for NERSC: nim, seaborg, .etc.?

Too much of the NERSC web info requires NIM password. Password should not be required except to manage accounts.

I don't know whether this is the right place make the following comment. I forgot my seaborg password every now and then. I called NERSC and they gave me a temporary password [after verifying the caller's identity]. I am glad the procedure is so simple.

Comments about access to HPSS:   4 responses

Access to HPSS from offsite was somewhat awkward, until I used it often enough for it to become routine.

They have made external access to HPSS less convenient.

In the way to remotely access HPSS, having to use proxy passwords. It also affected the way we could use Globus-based applications, such as gridftp. But this was mainly due to the learning process of running Globus on such a big system. It works well by now.

Continued existence of secure authenticated FTP into HPSS from remote sites is vital to our high speed data transfer.

Need for shared accounts:   4 responses

The inability to create a second account for the same user, a'la group account, causes some problems. Some of that could be eased by having a possibility to set different access defaults for the project directories: e.g. all users in the group to have write permission for all content of the project directory. [PDSF user]

The concept of a project account, while an anathema to NERSC policies, maps very well to the actual project work done. Unfortunately, as we transition through people's coming and goings it is difficult to have to transition the processing to each pipeline. [PDSF user]

The fact that you can only have one account per user. I would like to have a production account and other accounts for various tasks that people do and share responsibility. I understand that this is however what DOE no longer allows. [PDSF user]

Since we share data with several different collaborations, it would be very useful for us to have a group account for managing the data which is accessible by more than one (authorized) person. This is, of course, against the rules, and so for the past month or two we've been trying to find a way to get our data where it needs to go and accessible to the people who need to be able to access it. The NERSC support staff has been obliging, but it's still a problem which seems needlessly complicated, even though we understand the reasons underlying the rules. [PDSF user]

Comments about network access / file transfers:   4 responses

Difficulty in transferring data from seaborg to local machines.

move over to SSH and SFTP required new software for my Mac

After the supercomputer breakins 6 months ago, we are no longer able to move data back to NCAR automatically. We understand the necessity for closing this hole since it was exploited by the hacker. I do want to say that the security personnel who helped deal with the breakins including Steve Lau were all very understanding and helpful.

The transfer of data between sites can be frustrating. NESC is probably the best compromise between a secure site and ease of data transfer (ease of use).

Other comments:   5 responses

The requirement that the security banner flash on every file transfer and login is utterly ridiculous, whether or not it is a DOE policy. A person may read this once, but afterwards it is just a waste of bandwidth. ...

I get logged out too readily and have to keep logging back in. It is distracting. [Seaborg user]

New procedures always entail some startup effort, but so far there has not been any long-lasting impact that is significant.

Setting up accounts with the current Grid3 limited user model is often hard/time consuming, and potentially dissuades some use. Hopeful new Grid3 user/VO models will help. [PDSF/HPSS user]

One of my students who is from Iran cannot use NERSC. I understand that this is a DoE policy but wish it were changed.

 

Please make any comments that you think may be helpful to NERSC regarding one-time password authentication:   52 responses

Please don't use them / very inconvenient:   20 responses

One time password authentication will effectively halt all progress we have made in automating our scientific workflow in conjunction with the SciDAC Scientific Data Management ISIC Team. Continued automatic authentication within workflow automation scripts is vital to our efforts. Having to utilize cryptocards for one time authentication is a complete show-stopper for us!

From my understanding of how they works, one-time passwords would have a major negative impact on the way we are trying to automate our workflow. Presently, much of our data handling is manual; we are actively working to automate it. One-time passwords would seem to inhibit automation by setting up numerous roadblocks that require explicit human intervention. Productivity would thus suffer greatly.

Usually a pain and the increase in security is marginal. If we were doing classified work I could see it but not when we're just doing research and we're supposed to back up our stuff.

This was a long time ago using a device which delivered a new number to be added to a 4-digit password each time one accessed the frontend system. Because one had to access the compute system by first logging in to the frontend server, file transfers were a problem. In addition there was the problem that one had to have the device in order to login. I don't think one time passwords are desirable. I favor simpler methods of improving security such as removing the 8 character limit on passwords, using higher levels of encryption and using systems where the password file is itself encrypted, etc.

Relative to the current procedure, going to one-time passwords would not be desirable. It becomes cumbersome to access the machine.

It's a pain!!!!

I think it makes life difficult; I do not like it.

Time consuming and inconvenient.

I have used this type of access at my current place of work (LANL) and, while it adds to security it might be very inconvenient for users accessing from all across the country. I think the current access procedures seem fine - unless there has been many hacking attempts?

Why use that?

Currently using a keytag random number generator and web authentication for one system. Inconvenient; have to logon several times a day; number generator has failed once.

I would encourage NERSC to avoid the use of one-time passwords if at all possible.

I found it quite cumbersome, I don't like carrying things around when I want to quickly login somewhere. I am an avid user of SSH keys and a long passphrase, I think that this really should be offered as an alternative to the one-time passwords.

It sounds like a pain in the neck.

Not very enthused about the possibility.

Don't know much about this but it seems like a pain to have one-time passwords. I'd give it a try trough.

Please no.

It's obviously more secure - but also obviously more of a pain.

I'm not a big fan of this. It should only be considered if there is a pressing need.

Is it really necessary?

If you have to, only one key for multiple sites:   9 responses

It's a pain in the butt. Crypto cards/keys, etc. are never where you need them. They get munged, seem to need to be reset about every 6 weeks, are not made for fat fingered people; on and on. A universal one-time password authentication system for use at all DOE sites would be the best of a bad thing. Lets face it : multiple passwords for multiple sites requiring frequent change, just invite Post-Its on displays; or a pocket note book page...

I detest one time passwords, but I think that they may be necessary to keep hackers off our very expensive resources. It would be a great help to me if NERSC, LANL, NCAR, and ORNL all used the same CrypotCARD access so that my single CryptoCARD could be used for all the computers I work on.

If you're going to use RSA securID tags, please find a way to make then distinctive. I already have one, and having two is just a pain if they look identical. I suspect I'm not the only use in that situation.

I already have two OTP keys to carry around, for LLNL and PNNL. I do not want a third one! If the DOE sites can get together and use a single OTP key for all sites, that would be progress.

One-time password tokens are used at LLNL and work fine. Having multiple one-time password tokens to keep track of might be a hassle.

OTPs complicate access for me since I have to carry the token to wherever I'm working. Also, if I had more than one token, as some people in my group do, there might be some minor problem confusing one for the other.

Important to try to coordinate with "trusted" sites, e.g. other national labs, to accept their OTP tokens - a nuisance to carry around a stack of devices. Note that it might be a one-way agreement; they need not accept NERSC-issued tokens for this to be a big help.

There is only one disadvantage: one more secure card to carry !

If this refers to kerberos/SecureID based authentication, I use this to access DoD accounts. It works fine, but since I do a great deal of work at home at in the office, I must remember to carry the ID card back and forth. If I had to carry more than one of these, this would be annoying, and the chances increase tremendously that I couldn't do my job because my card is elsewhere.

You should do this / works at our site:   7 responses

I have no problem using the RSA SecurID for Livermore. If you were to implement it, I would be happy to do it if it makes Seaborg safer.

One-time password authentication provides more secure connection to NERSC, it is very critical and important.

Increases security at LLNL

I think it is very secure

works fine for me

Connect to fnal.

The PNL SecurID method works ok.

If it is implemented in a convenient way:   6 responses

Getting a key that would last for a good fraction of a day seems to work for me. However, some sites give you password that expires in a few mins, but that's extremely counter-productive.

Simply to make sure that change is managed in a timely fashion with regard to academic researchers sometimes being away or being more or less active at certain times.

It is very important to have some way for users to log in if they lose their password generator. The machine operators should have some way of helping such users since they are the only group that has someone available around the clock.

It will need to be integrated with single sign-on for grid services.

One time password will only be useful if they are well integrated into Grid usage.

1. Minimize the number of passwords that need to be entered to access the site.
2. Have some procedure that allows code and data to be migrated from other sites to NERSC without requiring use of a OTP

OK / only a minor nuisance:   5 responses

They are only a minor nuisance.

It will not change the way I work if NERSC introduces one-time password authentication to improve security, it is just a small nuisance.

SecureID tokens work fine but you do have to carry them with you at all time...

Sure, it is a bit onerous, but I think using one-time passwords at NERSC would be a completely understandable step in improving security.

It has been working for my case. There are some inconvenience, namely to constantly keep in mind to put it in a safe place, I do not dare to take it with while I travel.

Don't know / other:   5 responses

I don't know how this works, nor how much of a nuisance it'll be.

Don't know what one-time password means. I think constant forced change of password is of little value.

Not sure what one-time password is about.

No comment as no prior experience.

I think NERSC is doing a superb job in this area.

Visualization and Data Analysis

Where do you perform data analysis and visualization of data produced at NERSC?

 

LocationResponsesPercent
All at NERSC 12 5.8%
Most at NERSC 33 16.0%
Half at NERSC, half elsewhere 40 19.4%
Most elsewhere 57 27.7%
All elsewhere 49 23.8%
I don't need data analysis or visualization 15 7.3%

Are your data analysis and visualization needs being met? In what ways do you make use of NERSC data analysis and visualization resources? In what ways should NERSC add to or improve these resources?

[Read all 80 responses]

 

20   Comments about Seaborg use
15   Don't use / don't need
11   Requests for additional services
10   Interactions with Visualization Group members
10   Do data analysis / visualization locally
8   Comments about PDSF use
7   Services meet our needs
1   Comments about the Math Server Newton

 

Comments about Seaborg use:   20 responses

Just need to have GrADs, NCL, NCO, ferret on Seaborg.

NERSC response: ferret and NCAR/NCL are installed on escher. NCAR/NCL is installed on seaborg. For the others, you may formally request that the center obtain and install software by completing the Software Request Form.

Yes-- no significant or unusual needs (mostly using idl)

Data analysis and visualization is fine. We use a custom-written IDL library for analysis, and seaborg has an adequate number of IDL licenses.

... I do some runtime visualization using standard pgn library.

I am a frequent user of the IDL software on seaborg. The service is basically flawless, very satisfactory, reliable and useful. The occasional problem is the lack of available licenses (which have been recently really sporadic). The other issue is the tight run time limit for the interactive jobs which applies also to IDL runs and prevents a full exploitation of the IDL capabilities in terms of the image processing, basic mathematical operations etc.

NERSC response: You may wish to consider using the visualization server escher (to be upgraded to DaVinci, an 8-CPU SGI Altix system, in the first half of 2005) for your IDL processing. Interactive time limits are much longer (essentially unlimited) on the vis server.

For some of our data analysis, we use serial queues just to get access to memory. This seems like a waste of resources. If other options are available, then it would be nice to know about them.

NERSC response: When the new vis server, DaVinci (an 8-CPU SGI Altix system) goes into production in the first half of 2005 it will likely offer batch queues that you may find useful.

Currently, most data analysis is done offsite. However, I do occasionally run serial jobs on Seaborg to perform post-processing of data. We are looking at making greater use of Escher in the future.

yes. I like the serial queues on Seaborg for post-processing.

In practice, I only use gnuplot on NERSC. More easy to use and powerful (e.g, plotting molecules) visualization tool will be helpful. They should be on seaborg, so the calculated data can be analyzed immediately. Anything requires the transfer of data to another machine will not be so useful as an analytical tool.

NERSC response: garlic, vmd and rasmol (molecular viewers) are installed on both escher and seaborg.

I occasionally submit jobs to the NERSC serial queue to process large numbers of data files. Jobs in the serial queue typically started quickly and otherwise ran fine.

I do most of my post-processing off site, so simple visualization with GNUPLOT or GRACE is sufficient.

I don't have any personal ones. My group's concerns have been met through my NUG involvement. [pre and post processing queues on Seaborg]

I am using IDL mostly, and it works fine for me.

IDL was once used. It was did not seem very interactive with the required SSH software which does not support graphics.

NERSC response: Use the -X argument to your ssh client to force X11 traffic to be tunneled from the NERSC Center back to your workstation. see How To Route Graphics Output through SSH.

No. AVS 5.6 has been dead on seaborg for a while and I am told that it may remain dead in seaborg forever. That has been very inconvenient for the instant monitoring of data output. Currently I have to transfer data from seaborg to escher for visualization and analysis. I will NERSC can restore AVS 5.6 on seaborg soon.

NERSC response: To be solved, this particular issue will require the vendor to provide a version of the application updated for the most recent version of AIX -- there is nothing that NERSC "can fix" to make AVS 5.6 work on seaborg. A trouble ticket was filed with the vendor early in Fall 2004. Note that AVS version 5.5 is available for your use on seaborg, and it is substantially similar in function to AVS version 5.6. Also, the most recent version of AVS/Express is installed and functional on seaborg.

Through routines written for IDL, running on Seaborg.

I only use xmgrace on seaborg as long as visualization because I don't know much about other softwares. ...

Suggestions: ...
2- Devote a small subset of nodes (2-4) to serial jobs for data analysis.
3- Devote one node to fast data transfer to/from HPSS, using htar, etc.

Please keep IDL.

... I do use Seaborg for some data analysis when the data set is really large and would not fit on our local filesystem or would take too long to transfer.

Don't use / don't need:   15 responses

Not applicable at this time. Could be important for in future calculations.

Don't need

I don't use data analysis and visualization at NERSC.

I do not uses these NERSC resources regularly.

Do very little data analysis and visualization at NERSC.

I do not use NERSC facilities for data analysis or visualization.

No comment, others in our group carry out the data analysis/visualization.

I do not do any visualization work.

Not used.

Don't use.

I hardly use at present visualization facilities and so I am unable to make any comments.

I don't do much Visualization and Data Analysis at NERSC.

I have none.

Do not use

We have not yet done substantial analysis and visualization to give you meaningful comment.

Requests for additional services:   11 responses

It would be nice if we have GRACE and gnuplot on the pdsf interactive machines, so that we can plots and check some data remotely. [PDSF user]

NERSC response: To request new software, please fill out the Software Request Form.

I use a package developed at LLNL (called VCDAT) to work with NetCDF files. Currently I drag the necessary results from NERSC to LLNL in order to use VCDAT. It would be useful if VCDAT were installed at NERSC. However, I understand that this would require the LLNL developers to put extra effort in too.

NERSC response: NERSC attempted to install VCDAT on Seaborg. However, there is no AIX version available and the installation failed. If VCDAT releases an AIX version we will revisit this issue.

Do you have a license for Tekplot? That would be handy for Nimrod simulation data analysis.

NERSC response: We do not have any Tecplot licenses. Each floating license costs $3200 for the initial purchase, and $640/yr thereafter (per license) in maintenance. You may formally request that the center purchase and install Tecplot by completing the Software Request Form.

Would like to do more analysis/vis on NERSC. Many of my jobs require multiple restarts on a 12 or 24 hour queue. In particular, would like to have some plots done automatically at the end of a (batch) run so no time is wasted shipping data to another computer to analyze before deciding on restarting. My present analysis programs are not designed for MPP.

I don't know if it is my fault but it seems to me that the visualization tools are not very transparent for the users. It would be useful access to some tutorial or short on-line courses that would be widely advertised to learn the visualization capacities at NERSC.

I want imagemagick on seaborg. They I could make movies there, and that would complete my viz needs. I asked and was told that this is not possible.

NERSC response: NERSC will investigate installing ImageMagick on seaborg.

Tools capable of visualization and quantitative analysis of terabytes of data are urgently needed

NERSC response: During 2004, NERSC evaluated visualization software that uses a scalable, distributed architecture. EnSight was installed on both Seaborg and Escher.

... How about sending "Tips of the month" to users by e-mail.

Suggestions:
1- Make getting accounts in Escher easier. ...

I don't know how to use the tools available at NERSC. Maybe NERSC can offer some training.

NERSC response: There is a wealth of information, including "how-to" material on the Visualization Group's website.

The visualization group has provided no useful service to me. I generate very large data sets that require the development of specialized analysis tools for each new problem. I generate the data with a parallel run, then postprocess that data either in parallel or serial to produce a reduced dataset which I then transfer to my local machine for further processing or visualization. Alternatively, my data may require no post-processing, and is visualized directly with our own tools based on demand-driven I/O and X-windows/Motif windows, and run directly on seaborg. For my application, and for many others I'd guess, the role of a viz group directly in the scientific analysis is not clear, for they should not be expected to understand or care about the sort of data derivation and analysis specific to my field.
If it were up to me, the viz group would be creating new and interesting ways to look at my data (new sorts of 2D or 3D vector/scalar/isosurface plots, with interesting new ideas to bring out features I hadn't seen). They'd have a web page full of neat and new interesting viz ideas, and links to software/code written to be applied to simplified datasets as examples in order to demonstrate the technique. Furthermore, they'd be funded to supply some ideas/manpower on how to generalize these neat new ideas into the format of an interested user. Such applications might cover the range from debugging tools, presentation graphics, web-based java/flash animation control, stereo, 2D and 3D line-integral convolution things, isosurface extraction and processing, whatever. Finally, they might even develop data standards on which they or others can build such viz tools primitives, working more with the folks in the NERSC community that must do all this stuff themselves already.
From my perspective, the viz group has access to a tremendous amount of compute hardware, and intellectual access to a huge variety of scientific research groups. However, they have neither the manpower nor the inclination to build such generalized tools or data standards for the NERSC community. The extent to which I've seen their contributions has been limited to demonstrations of viz-data pipe throughputs, and multi-lab transfer rates, etc, and my from my viewpoint these demonstrations are of very questionable utility.

NERSC response: In some cases, general purpose "hammers" are handy, while in others, more specialized tools are appropriate. In these latter cases, it is crucial that the tool makers understand the ultimate use of the tool. Generally speaking, it is crucial that the visualization community have some level of understanding of the science problem in order to produce relevant and useful technology. It sounds like your project has its own set of tools and techniques that are used for generating and analyzing data. Perhaps your project would be able to take advantage of NERSC's analysis/visualization facilities, which include (as of the time of this writing) a pair of SMP platforms: one has 12 CPUs/24GB RAM/4TB of scratch disk; the other has 8 CPUs/48GB RAM/3 TB of scratch disk. Both have excellent system balance for data analysis: favor memory size and I/O rates of raw cycles.

You make many good suggestions and the NERSC Visualization group has implemented many of them already as evidenced on our website:

  • Example visualizations
  • Novel use of standard, web-based delivery for interactive 3D visualization (QuickTime VR Object movies); see Leveraging QuickTime VR as a Delivery Vehicle for Remote and Distributed Visualization
  • Data Conversion utilities for using standard visualization tools with AMR data; see AMR Visualization at Berkeley Lab
  • We created an HDF5 "data standard" for use by the 21st Century Accelerator SciDAC; see: Particle Viewer

 

It should be kept in mind that the NERSC Visualization group is more in the deployment business than the "research and development business." As such, it is usually beyond the scope of the NERSC mission to define data standards for projects. The concept of focusing on data management and modeling to form the central implementation core of a computational science simulation and analysis project is sound. As the scope of such central cores continues to diversify, the challenge (for NERSC) is to provide a combination of software tools and infrastructure that are sufficiently flexible to be widely used across many projects. This is a moving target that requires constant input from the NERSC user community as well as constant effort to maintain.

Interactions with Visualization Group members:   10 responses

My group has done both very large cosmological simulations, which we have visualized elsewhere, and the largest program of hydrodynamic simulations of galaxy interactions, some of which we have visualized with the help of NERSC visualization staff. We now have done many more, and higher resolution, hydro simulations, and we look forward to working with the NERSC visualization folks to make state-of-the-art movies based on them. The challenge is to visualize the many dimensions represented by our outputs.

I've used these services only once or twice to produce digital movies

Yes, the Viz group has been very helpful.

Excellently met. Cristina Siegerist has been helping us generate amazing images related to our research.

Work with visualization group.

I've used the viz group once with help producing a movie and would love to use their services again. They were prompt, efficient and helpful. The majority of my data analysis is done elsewhere, but I've been satisfied with what I've done at NERSC.

Yes, our needs are largely being met by NERSC visualization services staff who have devoted substantial time to working on our color images.

I am part of one of the INCITE projects and for us visualization has been a very important tool. We as chemists are not so familiar with visualization programs, so the help we have received at NERSC has been extremely valuable for us. Specially the work done by Cristina Siegerist who has been of real value to us, she has been an incredible help for this project, she is extremely hard working and knows perfectly what she is doing so must of the times we have received much better results than what we were thinking, she always goes beyond our expectations and has even proposed new ideas of taking advantage of the information we generate.

My research group has worked with Cristina Siegerist, who has created images that allow us to visualize, for the first time, the walkers in our Monte Carlo simulation. She has also enabled us to visualize aggregate data from the ensemble of walkers. These contributions have been extremely valuable.

Escher -- making 2D movies using IDL codes developed by the visualization group

Do data analysis / visualization locally:   10 responses

We do our analysis at our own sites.

We have our own local viz expert so we do the visualization locally. ...

We don't need these services. All graphics data output is analyzed and visualized on our local LINUX workstations (using mostly XMGRACE and PLOTMTV).

Do this locally.

Most of my visualization needs are done by the visualization group at Argonne National Lab. ...

I do all analysis of the results at local machines.

I generate viz data on NERSC and visualize it on local machine. I am considering using NERSC viz tools.

Postprocessing code written by ourselves treat the data on Seaborg, and we download and visualize them locally with an LINUX machine. So far, we have not used NERSC visualization tools yet.

I do all my data analysis and visualization on local PCs running linux. Reasons are that I use software probably not installed on NERSC, and don't want to experience delays from the display of postscript graphics from NERSC to a local machine that is at the other end of the country when the connection is heavily used.

NERSC response: You might have better luck using gs to render your PostScript file to a raster image, then display the image using ImageMagick or similar utility. In some cases, using gs to render large postscript files across a slow network would be quite slow. Refer to One-Step JPEG-from-PS Creation for details on using gs to render your PostScript file to a JPEG file.

In most the cases, I copy the data files to local computers and do the visualizations.

Comments about PDSF use:   8 responses

Only use ROOT.

We have our own IDL-based routines for data and visual analysis.

We actually use PDSF for all of these.

Data analysis on PDSF. No real visualization to speak of.

All my needs are met by software which is custom to me and/or my group.

I just use PDSF

I do my data analysis using the batch queues on PDSF. Its an extremely useful system for me

 

switch to sge is great!

Services meet our needs:   7 responses

The present visualization resources are adequate for my needs.

satisfied

Yes, it meets my needs so far.

present service is adequate.

Yes

Yes.

Adequate

Comments about the Math Server Newton:   1 response

Access to Matlab on Newton.

HPC Consulting

Legend:

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied 5.50 - 6.49
Significance of Change
significant increase
not significant

Satisfaction with HPC Consulting

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
Timely initial response to consulting questions       1 5 38 125 169 6.70 0.55 0.15
HPC Consulting overall       3 4 38 132 177 6.69 0.60 0.35
Followup to initial consulting questions       4 5 34 122 165 6.66 0.66 0.17
Amount of time to resolve your issue     1 5 6 36 118 166 6.60 0.75 0.24
Quality of technical advice       4 7 42 112 165 6.59 0.69 0.05
Software bug resolution     2 11 7 33 47 100 6.12 1.08 0.48

 

Comments about Consulting:   46 responses

35   Good service
4   Mixed evaluation
3   Unhappy
3   Didn't use
Good service:   35 responses

Everyone I have ever spoken with has been great.

Outstanding quality. There is very little I could add to this.

We have had nothing but good experiences. Keep up the good work!

The quality of NERSC's consulting service is what puts it at the top of the supercomputing centers.

Don't use very much since I have local experts, but generally have favorable impression.

NERSC consulting is the strength of the facility

Generally consultants do a very good job and are of great help.

David Skinner's help has been essential to the progress of our INCITE project. He really has been extremely supportive and responsive, and has helped us look at our code in detail, has implemented parts of it, and has fixed bugs and improve performance.

I have only good things to say about NERSC consulting. They are prompt, anticipate problems and bend over backwards even when I've done stupid things.

I'm very pleased with the pdsf team. Iwona is great.

Good job!

The NERSC consulting team has done everything I could ask of it. From a user standpoint, I can't think of a way it needs to be improved.

I have nothing but praises for the PDSF support staff. Great job!

The quality and timeliness of PDSF and NERSC support in general have been a critical factor in making our investment in NERSC worthwhile.

I have been uniformly impressed by the consulting services at NERSC for over 25 years.

When I first applied to NERSC to create my new group (EUSO) I had never really worked with NERSC before. Iwona Sakrejda explained to me what all the procedures were, helped me complete the right forms and came back and sat with me once the group was created to get my user environment set up correctly. She put in a lot of effort to make this all happen fast and smoothly and I really appreciate everything she did. Thank you Iwona. [PDSF user]

always a pleasure to work with [PDSF user]

was extremely happy with the quickness and effectiveness of NERSC's response to my requests for help. Keep up the good work.

These guys have been great. I'd be lost without them.

They're great.

Every time I've had to contact someone for help, they always came through, and in a very short period of time. [PDSF user]

Support is always timely and professionally handled. It's a pleasure to work with NERSC consulting. This is part of what makes NERSC a valuable resource.

Good Job!

Things DON'T fall through the cracks. Good followup, and pro-activity.

First rate service - please keep up the good work...

I think you have a wonderful system of consulting, I have always found the help I need for any problem I have ever had. I would like to specially comment on the work done by David Skinner, he has been of invaluable help for our project, he has always been in great disposition of helping us. He has always helped us and even pointed out new improvements we may do to our code that we were not even aware of. I think interacting with him has been of great use for us. He has always been very fast and precise in responding to our needs.

David Skinner has devoted a substantial amount of his time to helping us optimize our code for Seaborg. David constructed the zori module (which makes compiling and using Zori on Seaborg much more convenient), added parallel I/O using the HDF library and helped steer our development with extensive profiling.

David Skinner and Christina have been providing very useful help with the INCITE project. They facilitated the enhancement of the mpi parallelization of our group's code, and the visualization of our results.

The only person I have consulted thus far is Francesca Verdier, who has been very responsive and helpful.

My use of consulting has generally been routed to PDSF people, and has been good.

Great, especially compared to places like NPACI.

For as long as I have been associated with NERSC the quality of its consulting staff has been one of its strongest virtues, even more so than its hardware. Based on my most recent experiences this continues to be the case.

NERSC consulting has very good, very responsive consistently over time.

The staff are very competent and friendly, and generally a credit to NERSC. [PDSF user]

We have been very much satisfied with the assistance that we have received from the user services group. It has been prompt and quite helpful.

Mixed evaluation:   4 responses

The email responses to consulting questions are prompt. But the overall resolution is not prompt. It takes so long that some times the issues are resolved by other means.

Questions concerning passwords were handled very promptly. One question concerning software could have been addressed better by simply referring me to the developers of the package.

While the consultants try to be helpful, and they are for smaller problems, the major software bugs in the past couple years for the M3D code were resolved only by eventual upgrades to the operating system. They involved interaction of MPI or PETSc with C or Fortran code. The consultants were unable to debug them, in part because of the large size of the code, which they had trouble running. Also, the available debug tools are not very effective for very large and complicated codes.

Sometimes, NERSC consultants provide very valuable advice. But they have dropped the ball and don't followup especially after a few e-mail exchanges. There is too much readiness to assume the problem has been solved or abandoned.

Unhappy:   3 responses

The NERSC consulting stuff should be augmented by a C++ expert!

NERSC consulting staff often to not take me seriously enough at first. In the last year, I needed more disk space to run jobs for the IPCC initiative that our group is working on. I applied for more disk space, and the consultant was incredulous. I asked him to forward it anyway, and when he did, the request was approved. I met a similar response when I encountered a bad node. I saw a serious slow down on one of my runs, and reported the bad node. The consultants didn't believe me at first, but systems did in fact locate a bad node.

They have resolved (not resolved) about 50% of the questions/problems I have reported.

Didn't use:   3 responses

I never sought advice at HPC, so I cannot comment. However everything worked fine from the beginning.

There is a HPC consulting???????? Glad to hear about this, I will give you my feeling about this next year. [new user]

Have not used any consulting services this year

 

Services and Communications

 

  • Legend
  • Satisfaction with NERSC Services
  • How Important are NERSC Services to You?
  • How useful are these methods for keeping you informed?
  • Are you well informed of changes?
  • Comments about Services and Communications

 

Legend:

SatisfactionAverage Score
Very Satisfied 6.50 - 7.00
Mostly Satisfied 5.50 - 6.49
Somewhat Satisfied 4.50 - 5.49
ImportanceAverage Score
Very Important 2.50 - 3.00
Somewhat Important 1.50 - 2.49
Not Important 1.00 - 1.49
Significance of Change
significant increase
not significant
UsefulnessAverage Score
Very Useful 2.50 - 3.00
Somewhat Useful 1.50 - 2.49

 

Satisfaction with NERSC Services

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
Account support services   1 2 1 2 38 136 180 6.68 0.72 0.29
Computer and Network Operations     1 5 7 31 85 129 6.50 0.83  
E-mail lists     2 12 6 25 47 92 6.12 1.14  
Response to special requests (e.g. disk quota increases, etc.) 2 1 3 8 6 21 52 93 6.08 1.41 -0.27
Allocations process 1 4 5 7 17 57 57 148 5.93 1.27 0.24
Visualization services     2 22 4 12 19 59 5.41 1.37 0.60
NERSC CVS server       21 2 8 14 45 5.33 1.35  
Grid services       18 3 5 9 35 5.14 1.31  

 

How Important are NERSC Services to You?

3=Very important, 2=Somewhat important, 1=Not important

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
Allocations process 3 32 99 134 2.72 0.50
Account support services 2 49 108 159 2.67 0.50
Response to special requests (e.g. disk quota increases, etc.) 6 21 65 92 2.64 0.60
Computer and Network Operations 4 43 77 124 2.59 0.56
E-mail lists 30 49 20 99 1.90 0.71
Visualization services 33 24 15 72 1.75 0.78
NERSC CVS server 31 14 14 59 1.71 0.83
Grid services 38 6 8 52 1.42 0.75

 

How useful are these methods for keeping you informed?

3=Very useful, 2=Somewhat useful, 1=Not useful

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
E-mail announcements 1 46 121 168 2.71 0.47
MOTD (Message of the Day) 9 72 71 152 2.41 0.60
Announcements web archive 11 70 70 151 2.39 0.62
Phone calls from NERSC 24 35 43 102 2.19 0.79

 

Are you well informed of changes?

Do you feel you are adequately informed about NERSC changes?

 

AnswerResponsesPercent
Yes 185 97.4%
No 5 2.6%

Are you aware of major changes at least one month in advance?

 

AnswerResponsesPercent
Yes 162 89.5%
No 19 10.5%

Are you aware of software changes at least seven days in advance?

 

AnswerResponsesPercent
Yes 171 95.0%
No 9 5.0%

Are you aware of planned outages 24 hours in advance?

 

AnswerResponsesPercent
Yes 167 91.8%
No 15 8.2%

 

Comments about Services and Communications:   16 responses

8   Comments about e-mail / information services
5   Comments about staff
4   Comments about ERCAP / allocations
Comments about e-mail / information services:   8 responses

I feel that emails are the best way to inform people. This also allows users who want to screen or parse information do so via their email system.

All of the ways listed above are very useful but I find the email list to be the most effective.

Email has usually been adequate for most communications. I've been caught off guard by scheduled downtimes once or twice, but that's mostly my fault.

I get plenty of emails. I may read occasional emails, but frequent emails simply get deleted.

Did not even know there is a announcement archive.

I am very happy with information services provided by NERSC.

The information from NERSC on computing issues is generally excellent.

The message of the day is often too long, and I don't try to scroll it back. The security warning is too repetitive for regular users, and scrolls away more important information. There are some nersc machines that send out virus e-mails, which bounce occasionally into my regular e-mail. Perhaps nersc addresses have been taken over by spammers, as well. This only happens occasionally, but its bothersome that it happens at all.

Comments about staff:   5 responses

Staff seems very responsive to users.

... On the other hand, the consultants have been very helpful in responding to special requests. They have made useful suggestions and helped me be more efficient.

I want take this opportunity to thank NERSC staff for their great performance.

Francesca has always been very helpful and responsive to my repo and account inquiries.

I would like to commend Francesca Verdier for her anticipation of user needs and her tracking with and dealing with the subaccounts of users in the repo I supervise.

Comments about ERCAP / allocations:   4 responses

The ERCAP allocation process is not very good. At other supercomputer centers, an allocation request is written like a scientific proposal. It includes certain required topics, but the proposers are free to write a proposal that makes sense in the context of their code and their problem. NERSC's proposal form is too much of a one-size-fits-all solution. For example, the form asks for my code's poe+ stats. My code can do hydrodynamics, MHD, gravity, and radiation, and each of these modules gives different numbers. Different problems use different modules, and even runs using identical modules can give different numbers on different problems. My poe+ numbers can even change by an order of magnitude over the course of a single simulation, as the physical conditions within the simulation evolve. With the ERCAP proposal form, there's no place to put in a nuanced discussion of this issue. Instead, I get to submit a single number, which, for my code, is almost meaningless. That's just one example of the problems of trying to reduce a scientific proposal to filling out a form. ...

The allocation process consumes too much time. The forms change every year. Too much technical info is required. We never get what we ask for so it becomes a bidding game. For example Fusion allocations should be made by OFES managers following review of contracts/grants. Furthermore allocation without priority access has come to mean that one cannot use the allocation because of long wait times. NERSC should stop over-allocating Seaborg, should give out fractional allocations on quarterly basis (penalty for non use following quarter and possibility of getting more time dependent on current loading.)

Need quick allocation process otherwise no login is allowed and data cannot be recovered.

The allocations process gets more time-consuming each year. Hopefully, this trend can be reversed.

 

Web Interfaces

Legend:

SatisfactionAverage Score
Mostly Satisfied 5.50 - 6.49
Significance of Change
significant increase
not significant

Satisfaction with Web Interfaces

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
Seaborg section       3 10 47 85 145 6.48 0.72 0.48
Accuracy of information 1     6 6 64 91 168 6.40 0.84 0.15
HPSS section       4 11 30 49 94 6.32 0.85 0.58
NERSC web site overall (www.nersc.gov)       2 21 76 83 182 6.32 0.72 0.32
Accounts section 1     6 7 41 55 110 6.28 0.97  
NIM     1 7 17 64 74 163 6.25 0.85 0.17
Timeliness of information     3 5 18 76 65 167 6.17 0.87 0.12
Software section     3 5 11 33 46 98 6.16 1.02 0.29
On-line help desk       10 9 35 45 99 6.16 0.97 0.14
Status and Statistics section 1   2 7 11 40 43 104 6.07 1.10 -0.00
PDSF section     1 6 5 16 17 45 5.93 1.12 0.10
Ease of finding information 1   9 6 30 78 54 178 5.89 1.09 0.09
Searching     4 14 6 27 20 71 5.63 1.24 0.19
SciDAC       12 4 11 13 40 5.62 1.23  

 

Comments about web interfaces:   19 responses

11   General Web Comments
4   NIM / ERCAP Web Comments
3   PDSF Web Comments
1   Online HelpDesk Comment
General Web Comments:   11 responses

The web information on the NERSC web site is so great that I rarely have to call the NERSC staff.

This is a big help - makes using the systems at NERSC much more transparent...

They are useful and I use them fairly regularly. Note that I've a fairly good information digger and others who I work with are not as successful at finding relevant information on your website.

I don't use the web interface very often. For checking my used CPU-hours I just use the getnim command. However when I'm looking for some information, I found it on the webpage without any problems.

I find that searching for information is so time consuming I just normally don't do it.

A more effective search engine of NERSC web pages would be appreciated. For the software, hyperlinks to the home page of each package would be good.

Mostly don't use. Software information is difficult to find.

NERSC response: We hope that you find the organization of software information on the new www.nersc.gov website has improved.

Need to get rid of password access except for account management.

NERSC response: Certain kinds of information needs to be protected from general public access, e.g. pages that display usernames. These pages are password protected. There are no longer special web passwords; access is now via your NIM/LDAP password.

I want a single password for all things nersc.

NERSC response: NERSC is moving in the direction of using the NIM/LDAP password for everything. As of January 2005 NIM, the PDSF and web pages use this password. As new machines are brought online they will also use the NIM/LDAP password.

It would be nice if it were more oriented to problem solving, instead of serving up huge amounts of only slightly useful information.

Forgot the ISIC centers on the scidac page

NIM/ERCAP Web Comments:   4 responses

Request for allocations is the best! Easy to use, only thing is, hard to get a good print copy or electronic copy for our records.

NERSC response: You can use the Show selected requests in PDF format button located at the bottom of the ERCAP Request List window. For 2006 requests we plan to use PDF for the "Show Complete Request" tab.

The many frames that are part of the NIM interface are sometime annoying.

The NIM access repeatedly rejects my first access attempt (citing wrong password); this happens EVERY time I try to login to NIM. The next access attempt generally works fine. I'm not sure if other users experience this issue; or if it can be resolved?

It would help if the NIM accounting information had a timestamp of when it was updated last. I often ingest the 1-line current repo summary into a spreadsheet to forecast our progress/burnrate vs our remaining allocation. This becomes particularly important as we near the end of our allocation. Not knowing the time that the NIM reports are updated adds uncertainty to my estimates.

PDSF Web Comments:   3 responses

There are new machines added to the pdsf cluster beyond pdsflx251. It would be nice if their information be listed on the 'hardware-> compute nodes' page so that we can plan our work accordingly.

After the new web sites were introduced some FAQ answers with links in them point to nowhere. Sometimes these other websites are supposed to give the final explanation, which is bad when it's missing.
("The answer to the problem was explained earlier http://..., see http://... also.")

The recent NERSC web site reorganization seems to have left at least the PDSF portion of the website somewhat disorganized...

Online HelpDesk Comment:   1 response

The web tracking system for help desk requests is excellent!

 

Training

 

  • Legend
  • Satisfaction with Training
  • How Important are these Training Methods?
  • What Training Methods should NERSC Offer?
  • Comments about Training

 

Legend

SatisfactionAverage Score
Mostly Satisfied 5.50 - 6.49
Somewhat Satisfied 4.50 - 5.49
ImportanceAverage Score
Very Important 2.50 - 3.00
Somewhat Important 1.50 - 2.49
Significance of Change
significant increase
not significant

 

Satisfaction with Training

7=Very satisfied, 6=Mostly satisfied, 5=Somewhat satisfied, 4=Neutral, 3=Somewhat dissatisfied, 2=Mostly dissatisfied, 1=Very dissatisfied

 

ItemNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.Change from 2003
1234567
New User's Guide 1     5 6 36 46 94 6.27 0.99 0.01
Web tutorials 1     8 8 37 38 92 6.10 1.06 0.03
NERSC classes: in-person       13   5 11 29 5.48 1.40 0.60
Live classes on the Access Grid 1     14   11 6 32 5.16 1.44 0.49

 

How Important are these Training Methods?

3=Very important, 2=Somewhat important, 1=Not important

 

MethodNum who rated this item as:Total ResponsesAverage ScoreStd. Dev.
123
New User's Guide 6 20 59 85 2.62 0.62
Web tutorials 7 31 50 88 2.49 0.64
Live classes on the Access Grid 17 12 10 39 1.82 0.82
NERSC classes: in-person 22 8 9 39 1.67 0.84

 

What Training Methods should NERSC Offer?

 

MethodResponses
Web documentation 121
Web tutorials on specific topics 115
Live web broadcasts with teleconference audio 29
Live classes on the Access Grid 27
In-person classes at your site 21
Live classes at LBNL 18

 

Comments about Training:   19 responses

7   Comments on web documentation / tutorials
5   Comments on video lectures
5   Don't use / rarely use
2   Generally satisfied / ACTS workshops
1   Specialized training desired
Comments on web documentation / tutorials:  7 responses

Easier tutorial for debuggers capable of checking memory usage would be definitely helpful.

I like to see examples of using the hardware and software on the web. They can be much more helpful that a manual or a manual page.

Most of my training has been done via documentation, but the personal attention allowed by a class might be very valuable to improve the efficiency or scalability of my code.

... For me the availability of "old" tutorials on the web is important for refreshing my memory because there are long periods of dis-use of complex topics coupled with frenzied refreshing and short intense (re)use.

*Better organized* web documentation would be the key for me...I learned the most useful things about PDSF from my co-workers who were already using the system. The PDSF FAQ page (on help.nersc.gov) was very helpful. [PDSF user]

I'd love to have more interactive tutorials available on the website, organized on a step-by-step basis. At the moment, it seems like information is usefully organized for anyone who already knows what to look for, but harder to access when doing a "cold" search.

I understand that web tutorials are expensive to put together, but it is rarely the case that me and ten of my buddies all need help on the same thing at the same time. The sort of class I might go to then would either be amazingly well-timed to my work flow issues at the time, or a general enrichment sorta thing.....I'm just not likely to go to a general enrichment kinda class due to my workload and focus.

NERSC response: All training materials, tutorials, quick references, and video lectures are now presented from a single page for "one stop" access to training materials. See NERSC Tutorials, How-To's, and Lectures.

Comments on video lectures:  5 responses

I've used the streaming video lectures, and find them very useful.

Would like to see more video lectures.

Believe you have tried some real broadcast which I prefer over the Access Grid because I do not have access to the access grid.

As usual, NERSC is the most proactive site in producing training. And most of it is applicable to other sites, too. Maybe this is why they pale in comparison.
GRID based technology still needs the kinks worked out and reliability improved. ...

Presently, I have no great need for training. However, when NERSC eventually moves to a newer generation architecture, training will be very important. I do not have access to the Access Grid and the technology for live web broadcasts leaves something to be desired. Thus, planning to host classes at NERSC is probably a good idea. [Stony Brook user]

Don't use / rarely use:  5 responses

Haven't used, but assume useful for others.

I haven't used these resources.

Rarely use.

I learn on my own.

I never got time to visit those web site but I will. No opinion.

Generally satisfied / ACTS workshops:  2 responses

I think generally training is very good.

Several colleagues in our SciDAC project have had audiences at the ACTS workshops and a few students from our groups have been through the ACTS workshops. These are popular and useful.

Specialized training desired:  1 response

Maybe some specialized training in certain types of solvers (ODE, etc.) or methods (Ewald, Fast Multipole).

 

Comments about NERSC

What does NERSC do well?

[Read all 118 responses]

 

68 Provides access to powerful computing resources, parallel processing, enables science
53 Excellent support services, good staff
47 Reliable hardware, well managed center
30 Easy to use, well managed software, good user environment
26 Everything, general satisfaction
20 HPSS, disk space, data services
7 Documentation, NERSC web site
6 Allocations process, ERCAP, INCITE
2 Visualization
2 Training

What should NERSC do differently?

[Read all 94 responses]

 

45 Improve Seaborg turnaround time
37 Change Seaborg job scheduling policies - 25 users requested more support for midrange jobs
25 Provide more/new hardware; more computing resources
15 Improve the allocations process / ERCAP / INCITE
8 Other Seaborg improvements
7 PDSF improvements
4 Software improvements
3 More/better training
3 Network improvements
8 Other
3 Don't change, no suggestions

How does NERSC compare to other centers you have used?

[Read all 77 responses]

 

39 NERSC is the best / overall NERSC is better / positive response
20 NERSC is the same as / mixed response
7 NERSC is less good / negative response
11 No comparison made
 

What does NERSC do well?   118 responses

Note: individual responses often include several response categories, but in general appear only once (in the category that best represents the response). A few have been split across response categories (this is indicated by ...).

  Provides access to powerful computing resources, parallel processing, enables science:   68 responses

NERSC is of the utmost importance for our research in theoretical nuclear structure physics funded by DOE. While we can use our local workstations (LINUX, Mac-OSX) to develop and test our Fortran-95 codes, it is not possible to run production jobs on on our local machines. Without access to the NERSC supercomputers we would not be able to do our research! Thank you for providing supercomputer resources to us.

NERSC supplies a lot of FLOPs reliably, and provides very competent consultants. It is a good place to use parallel codes that scale well on the available machines.

NERSC does a truly outstanding job of supporting both a small community of "power" users as well as a large community of mid-range users. Both are important, and, as a result of NERSC's success in supporting both communities, the Center facilitates an extraordinary amount of scientific productivity.

NERSC provides the state-of-the-art computing environment for my scientific work. We are simulating ultrafast photo-induced electron transfer processes in the condensed phase and on nano-particles, which cannot be accomplished on a single workstation. NERSC really makes it possible for us to perform accurate quantum simulations for such processes.

NERSC helps me providing the computing power for RHIC related simulations. I appreciate very much the good resources and very timely response from the support team. [PDSF user]

NERSC is important to me because it provides a way to perform computer intensive tasks in a reasonable turn-around time. NERSC has a good staff and reliable computer environment.

NERSC provides a very powerful and stable computing environment. We have been pleased with the responsiveness of the staff on both technical and accounting matters. NERSC is important to us because our DOE work is dominantly computational and we are dependent on NERSC for the compute power to carry out our work.

I believe NERSC offers: (1)- excellent computer power; (2)- excellent queuing system; (3)- accounts that are not restricted to US citizens. I am myself not a US citizen, and I collaborate on few projects with "sensitive countries" foreign nationals. Without (1)-(3) these projects on basic research would be impossible to carry out.

Its important to me to have access to high performance machines, so performance information can be published and the work appears very current.

Facilities not available locally.

Generally satisfied with NERSC management. Appreciate that MICS puts too many constraints on NERSC. Applaud move to added clusters to improve capacity computing limit. NERSC is at least half my computing resource. Have found ORNL.CCS to be more responsive to my needs of late.

Many processors; ability to access large number of processors in a timely manner.

pdsf

I compute at NERSC because Seaborg is one of the few computers on which I can run the desired calculations. Generally NERSC has offers good consulting and good set of software capabilities.

Historically NERSC has done a good job of providing and maintaining supercomputing resources that meet the needs of the fusion computation community. NERSC is a critical resource to me - without it substantial amounts of my computational fusion research would not be possible.

NERSC (in the form of PDSF) has a good allocation of resources for the STAR experiment. I often use it in preference to the RHIC Computing Facility (RCF) at Brookhaven because there is more adequate disk space etc. [PDSF user]

NERSC has an overall good queueing system for debug jobs. We compute at NERSC primarily for the access to a larger number of processors than is available on our local parallel computing system, permitting the running of larger simulations than would otherwise be possible.

I compute at NERSC because of the large arrays of processors that I am able to use simultaneously.

The ability to run on a large number of processors is most important to me.

The processing ability at NERSC, parallel processing, Is very important to my project and is not available at OK State. That part works well and is providing me on what I need.

I use PDSF because without the parallel processing I could not analyze my data. I am very pleased with my ability to do work on the cluster.

The PDSF system at NERSC is powerful and convenient to use. Without this facility, my job would be much harder. [PDSF user]

Most of my parallel computing with NERSC has been for homework assignments and projects for the CS 267 (parallel computing) course at UCB. I also run serial benchmarks on NERSC machines, for performance comparisons between architectures for various sparse matrix operations. In the future, I plan to use NERSC computers for development and testing of parallel numerical algorithms.

NERSC provides a stable computer environment for large number of processors. It does that well. I realize there are many diverse demands for its resources, but my overall comment would be: during core working hours, high priority should be given to developer queues or else the inertia of the production runs (especially encouraged through rebates on > 1024 processor jobs ) swamps the machine. Without development there is a dwindling return for all our efforts.

I perform large-scale calculations that require a huge amount of floating-point operations and a lot of storage, both of which are provided by NERSC.

NERSC is very important for my science projects. The class of problems I solve with NERSC can be done only using supercomputers.

When the queue waiting time was shorter earlier this year (see below), NERSC was an invaluable resource for my research on cosmological simulations. Some research groups in Universities can pool together resources to start a Beowolf type of cluster with 10s of processors, but the support staff, subsequent hardware upgrades, and space/cooling have often proven difficult. NERSC provides centralized hardware as well as an excellent support system for individual PIs to focus on the scientific and computing issues.

The great attraction that supercomputer facilities like NERSC have is their powerful computing resources that make computing faster. It seems that NERSC keeps improving on that by adding more nodes etc.

The ATLAS experiment make use of PDSF and it is an important component of our overall distributed grid based access to computing and storage resources. [PDSF user]

NERSC do well in providing and maintaining a resourceful, reliable, and high-end supercomputing environment. That is why I compute at NERSC and why NERSC is important to me.

We have accounts at NERSC because our applications partners have accounts at NERSC. We are not heavy cycle users on our own.

 

  Excellent support services, good staff:   53 responses

Consulting service is extremely good. The consultants are very knowledgeable and make NERSC a superior computing center. The system administrators are excellent as well since the computers are always up and running. If there is a problem, it is solved very quickly. Of course, the computing power of the NERSC computers is also a major reason why I like to run there.

The staff is always available and very responsive, and the computing facilities are some of the best available anywhere.

I have always been extremely happy with NERSC consulting and support. This allows me to more efficiently use NERSC's hardware and it makes NERSC a tremendous asset even during times when the hardware is showing its age.

I am just beginning work that puts me on NERSC for the first time after several years' hiatus, so I limited my replies in this survey. You can see from those replies what I am pleased with [high scores for consulting support]. I am computing on Seaborg for the parallel processing capability. I have not yet needed more than the 30 minute queue. I expect to be trying more services in the future.

I am mostly satisfied with turn over of my jobs at NERSC. But of late, the jobs wait for a long time in the queue. The consulting service and the web site of NERSC is really wonderful.

The consulting is doing a great job to help us to get our work quickly. The seaborg is very robust, not like other systems which has so many crashed.

good consultant services; I compute at NERSC because I have no other place to compute

I really like the ability to use (and develop) my large parallel applications. The NERSC staff has always been very helpful and competent.

NERSC is very supportive of its users. This year is special because we are involved in INCITE, but nevertheless, being involved in the project made us discover all the resources that NERSC has available for its users. I compute at NERSC because my boss gets time at NERSC :) But whenever (if ever and hopefully) I am able to obtain allocations for my own projects, I will choose NERSC for sure. [INCITE user]

Other than this issue [bad turnaround], I have been happy over the years with NERSC's responsiveness to users. Accounts are created quickly; phone assistance with passwords or other issues are always friendly & helpful.

NERSC provides support and true CS people very well. [PDSF user]

PDSF support is wonderful! Both interactive and batch PDSF nodes are crucial to my work. [PDSF user]

Excellent user support (web+consulting staff)

NERSC consultants are great!

See collective previous open ended comments. [Things DON'T fall through the cracks. Good followup, and pro-activity. As usual, NERSC is the most proactive site in producing training. And most of it is applicable to other sites, too. Maybe this is why they pale in comparison.]

Responsiveness of consultants and opportunity for requesting large awards. [INCITE user]

Staff and support.

I was very satisfied with very kind help from David Skinner and Francesca Verdier. Their help was very important to me. I was also very satisfied with NERSC consulting - people there did great job. Thanks very much for the help from NERSC.

I am very satisfied with the consultant services and visualization. I have found very useful the storage services. [INCITE user]

The support provided by NERSC has been exceptional this year. Kudos to David Skinner and Christina.

NERSC offer outstanding service. The hardware uptime and the reliability of the production environment is unmatched.

PDSF provides outstanding user support. The staff are extremely responsive to user requests and questions. They both care about users and do a good job of helping us. [PDSF user]

The consulting help is outstanding

  Reliable hardware, well managed center:   47 responses

seaborg is still the best managed and run machine that I know. Uptime and availability are stellar. The hardware starts to show its age, though, and an upgrade to faster processors would be welcome. The fact that seaborg just works is extremely important, it is so reliable and useable. Over the last years, NERSC has provided a large fraction of my total computer time usage and has made large scale model calculation possible.

Excellent facility and support team.

NERSC provides a highly professional, highly reliable computing experience with access to significant amounts of computing power. The NERSC staff are well informed, helpful and pleasant to deal with.

NERSC provides a particularly stable and reliable service, and is relatively free of quirks. This is a big help when developing code.

Seaborg is a rock-solid, dependable, fast platform. It has far fewer outages and system problems than other supercomputer platforms I've used. The consultants are generally more knowledgeable and more available than on other systems. Seaborg just works, in ways many other supercomputers don't.

The IPM SP is a fast, reliable computer and the support is very good.

NERSC machines are almost always available
Consulting staff is great
Large number of nodes allows large memory applications

Seaborg has been very useful for me as it is stable compared with LINUX clusters and its uptime has been very good except for the past month.

I appreciate the large-scale, robust, reliable, stable computing resources provided by NERSC.

Excellent computing resources and consulting service.

Our DOE projects makes it possible to use facilities at NERSC. The super-stability is what I am most pleased with, so it is very important to us.

It is good for large-scale problems. The availability of computer is very good but the turnaround time can be somewhat slow. Also seaborg seems to be an aging machine compared to other institutions.

you do a good job keeping the production machines running and a problem which i had once with my account password was resolved very quickly.

Maintain a reliable computing platform with sufficient capacity for the user base. My group often uses seaborg just for its solid Fortran compilers, complete with (mostly) working bounds checking, and debugging environment.

The management and maintenance of computers are doing well. Many of my research problems rely completely on NERSC computers. These research problems require large memory.

 

  Easy to use, well managed software, good user environment:   30 responses

We have DOE supports grants and find that applying for time and using time at NERSC is better and easier than at other facilities.

Most pleased with the efficient ease of use and sensible software configuration and queue structure.

Seaborg has (until the last few months) been the most useful high performance system that my group has used. Its combination of power and ease of use has been great.

Capacity & capability computing. Common computing environment for large international collaborations.

NERSC is invaluable for running the large-scale simulations that are needed for modeling intense ion beams. For me, the best aspect of the system is that most of the complexity remains invisible. The only irritation is that jobs sometimes wait a week or more in the queue.

Machines are stable and good for longer simulations.
Many software resources are available especially for profiling/debugging.
All information is accessible on the web.
It is the most user friendly machine.

I use NERSC for my work on the EUSO project, which at this stage involves software development within a ROOT framework. NERSC provides a nicely setup cluster (PDSF) with all the necessary compilers etc... that have enabled me to download the CERN ROOT package and all the software from my EUSO collaborators and compile it without any problems. The usual selection of editors is also available so I have an environment that I am comfortable with. [PDSF user]

Seaborg is relatively reliable, and shows little down time. High availability and high performance, they are the most critical advantages.

The software compilers are well maintained. Mass storage is very well done. The web site is well organized and looks good. The consultants are very good.

I compute at NERSC because
(1) Seaborg is still a good machine for parallel computations. (This is changing; see my comment on it's getting old, above.)
(2) Interactive and debugging runs are far easier to do here than at SDSC/NPACI
(3) Software resources and User environment is far superior to NPACI.
(4) Consultants are great.
At this stage, if I were NERSC, I would start moving some resources from consulting to procurement and eventually testing of a new machine.

I am most satisfied with NERSC's software environment and the availability of the hardware suitable for large-scale computation. NERSC is important to me because some of our computer simulations have to be finished using a larger computing resource, such as SEABORG.

I find the allocation and queuing procedures at NERSC to be eminently logical. Support for standard libraries is reliable and thorough.

We compute at NERSC because of the large-scale computer facilities and resources easily accessible and easy to use.

  HPSS, disk space, data services:   20 responses

data storage resources

NERSC handles large data sets easily. We use NERSC because of its large processor farms and storage. [PDSF user]

Well, I use NERSC/PDSF because our (KamLAND's) data is stored there. :) NERSC does a good job storing the huge amounts of data we generate in an easily accessible manner. [PDSF user]

PDSF and HPSS (which is all I use) [PDSF user]

Has a large amount of computing power and a huge secure data storage capacity. Especially this later point is of upmost importance to my current computing needs.

NERSC have tended to respond to users' concerns and tried to create an environment to satisfy the users' needs. Prior to the last couple of years NERSC appeared to give creating a good production environment for ALL their users top priority. NERSC used to be a centre devoted to scientific computing not to computer science. Of course, NERSC is currently the only general purpose DOE supercomputer centre. NERSC's HPSS is reliable enough, and we feel certain enough about continued access that we use it as the central repository for our multi-system multi-site projects.

Fast I/O, access to > 100 processors.

I do 99% of my computations at NERSC, mostly because of large storage available to keep the runs output and the software support at NERSC. I really like NERSC web documentations and queue pages, and NIM.

Big data is easier to deal with a NERSC than the other sites I deal with.

  Everything, general satisfaction:   26 responses

NERSC has excellent people working there. I'm VERY happy with everyone I've come across. People have been knowledgeable and professional. I compute at NERSC because it's really big. Seriously, the number of processors allows us to do research on problems that we simply cannot do anywhere else. In that regard I consider NERSC a national treasure. One really silly request, how about a NERSC T-Shirt! I'd buy one.

Overall, the services and hardware reliability are excellent. I think that NERSC sets the standard in this regard.

NERSC is doing great. The uptime is fantastic and the system configuration is superb. I compute at NERSC because of it's great reliability. Whenever there is a lot to compute in a certain time I use NERSC. Also I never experienced any problems while compiling code or running jobs. So I'm greatly satisfied.

I am a very satisfied customer - no complaints here.

I strongly appreciate (1) the well-maintained and well-organized hardware and software, particularly on seaborg; (2) the size of the available resources (e.g., seaborg and HPSS), and (3) the presence of dedicated, knowledgeable consultants. These three factors make it possible for me to do science.

The quality of the hardware available at NERSC is extremely good, the possibility to scale to a very large number of processors is also a big point in favor, as it is the large availability of software, especially of parallel libraries like PARPACK and SUPERLU_DIST, very important for me. The quality of the service (like the help desk and the account management) is extremely good. The NIM interface is simply great, this year for the first time I've contributed to the ERCAP proposal for my group and I found it superb in this respect.

I want to praise NERSC staff. Seaborg is by far the best maintained machine I have run on. The support is great, and I can always count on it. I wish other supercomputing centers followed your business model. [INCITE user]

I am very happy with the overall level of service, it is excellent.

Overall everything is fine (except for some hick-ups at pdsf). I especially like the very timely response of the NERSC staff to problems/questions. It almost feels like being a part of a big family. Keep on with this good work for us! [PDSF user]

NERSC is very reliable, very well managed. I don't have to worry about whether a job will run there, only about when. There is a good amount of space in home and scratch, though I keep needing more scratch space, which nersc grants me from time to time. The mass storage is good, the rate of transfer from and to mass storage, and the ease of transfer are other plus points. I do most of my work at nersc, and I am very reluctant to work on other machines. I guess that's the highest praise I can give you guys!

1. high performance, reliable computing resource for large applications;
2. the large data set storage: long (HPSS) and short (scratch disc) term;
3. 24/7 accessibility;
4. handy debugging, profiling and visualization software.

I compute at NERSC because:
1) it's easy to transfer files to/from NERSC (unlike RCF at BNL)
2) almost no problems with disk space
3) my jobs move through the queue reasonably fast
4) very friendly, helpful staff [PDSF user]

excellent !

NERSC has been, and continues to be, the best-run HPC center that I have used (and I have used numerous sites over my career). NERSC's management strategy results in a very stable usable facility that optimizes the amount of science that can be accomplished. I appreciate that they do not attempt to be at the "bleeding edge" with respect to production hardware and software. Although there is a place for such systems (even within my own research program), productive science requires stable productive systems. NERSC is excellent at providing this need.

NERSC offers a very professional services in three areas: Hardware, Software and Consulting! I am most please BECAUSE all of the three components are provided by NERSC which is absolutely necessary at the forefront of HPC-Science!

My research needs intensive computing, that's why NERSC is important to me. I have an overall good impression of NERSC. One can use as much as 2,000 processors or even more at one time and finish jobs very quickly. The changes of allocations are reasonable. And people are doing hard work to improve the performance of SEABORG.

Runs a quality computing service. PDSF along with the research staff developing more advanced grid & data management services are a valuable resource for us. [PDSF user]

The service from NERSC is great, which provides convenient and reliable computing resources for our work.

I am very pleased with HPSS and PDSF - I think that both facilities are extremely valuable and useful. HPSS covers our data storage needs and PDSF provides a great facility for doing our data analysis. The fact that both facilities are housed at NERSC and can be used together make them more than the sum of the parts. [PDSF user]

Overall, I am very satisfied with the support and computing environment at NERSC. While the limited scalability of our code really hurts us on IBM P3 platforms, the support, HPSS and large numbers of processors allows us to get a great deal of research done at NERSC. NERSC has been very responsive to our requests for rapid batch queue access to complete our runs on time and consulting support to increase the numbers of processors that we can apply to the code.

NERSC is world-class state of the art supercomputing facility unmatched anywhere in the world, and is doing a superb job of meeting the challenges in solving Computational problems in diverse area of science. I would like to congratule wholeheartedly NERSC for running such a facility with utmost of efficiency and professional competence. Congratulations and thanks to all at NERSC,especially Horst Simon, Francesca Verdier and David Turner.

NERSC is very well suited to performing computational science through the combination of first rate, hardware, software, mass storage, support and training and should be a model for other sites to follow.

I am most pleased with the reliability of NERSC, the hardware and software work as described on their website. Important information can easily be found. Problems are dealt with quickly. [PDSF user]

Seaborg machine is great. Even though it is pretty dense and hard to have a spot, this is the only machine where I can run: that long on that much nodes that are dedicated. So that's fine with me. Change nothing!

I am a new user and so far my impression is that NERSC seems to do everything very well except possibly queueing of jobs requiring less than 32 nodes. I am impressed by the available hardware, software, support etc.

Good overall service; good codes available.

  Documentation, NERSC web site:   7 responses

do well: documentation, support, overall maintenance and management.

  Allocations process, ERCAP, INCITE:   6 responses

Large allocation allowing to address particular timely and important science problem; INCITE is a truly great idea! [INCITE user]

 

What should NERSC do differently? (How can NERSC improve?)   94 responses

  Improve Seaborg turnaround time:   45 responses

Change the batch queues so that ordinary jobs execute in days, not weeks.

What I am somewhat dissatisfied with is the batch queue time, which is very long at times, ...

The queue wait times have been extremely long (about 2 weeks recently), and this has almost completely stalled my research. ...

I think seaborg is pretty saturated. Sometimes it takes long time to get batched jobs started. This has forced me to find other sources of computing power.

(1) Batch queue wait times-- these have become horrible in the past half-year. ...

As mentioned earlier, the time spent sitting in queues is creeping up. I would appreciate seeing this problem addressed. ...

1. To change a bit the queue policy so that one job won't wait too long; ...

Change the queue structure to eliminate two week waits. I'm not sure this is possible but I'm writing it anyway.

Work on shortening queues on seaborg.

The queue becomes too crowded recently. We understand that INCITE runs are important, but their priority should not be too different from other regular jobs. We also hope that the new machine would solve part of the problem.

For my needs I would prefer less priority on 512+ processor job requests so more users could user seaborg simultaneously, reducing the long queue times. It's difficult to do cutting edge research when one has to wait for a week or more for each job to run.

there have been too long queuing times that essentially counteract the attraction of supercomputing resources. ...

The waiting time for the batch jobs is way too long. Sometimes, the submitted jobs have been idling for more than 10 days, which just defeats the purpose of the supercomputing. Other than that, I am very satisfied. Thank you.

It would great if queueing could be improved to wait times of week or preferably less.

Faster queues.

The turnaround time for batch jobs could be shortened.

Some resources should be devoted to offering cycles with minimal wait time. Wait times in seaborg queue are far too long.

 

  Change Seaborg job scheduling policies:   37 responses

The current focus only on jobs which can exhibit high degrees of parallelism is, in my opinion obviously, misguided. Some problems of great scientific interest do not naturally scale to thousands of processors.

NERSC should return to its original mission of providing the production environment which allowed the scientists to maximize their research. That is NERSC should give satisfying the user priority over satisfying the DOE and the OMB.

Pay attention to what its users need - resist as best as possible calls from "above" from those who know little about actual scientific research. Provide resources that fit the profile of the jobs your users actually want to run (which I would guess peaks somewhere around 64-128 procs if artificial pressure is not applied). Do not reward users for wasting computational resources by running using very large numbers of procs, even when their codes scale significantly less than perfectly (and yes, this is essentially waste, because any real research involves multiple code runs, so 2 512 proc runs will generally be better than 1 1024 proc run unless the code scales perfectly to 1024 - but your policies encourage users to run higher than they should in order to be able to run at all, wasting already oversubscribed CPU hours).

... Also, the job priority system discriminates against smaller jobs (less than 32 nodes) - i.e. MAJORITY of users!

the allocation process was very distorted this year by the requirement that most of the computing be done with Ncpu>1000. This is terrible for almost all users. NERSC should do everything it can to get sensible operating requirements - even if management would like to have most users run huge jobs - if they are going to truly serve the needs of the users. In the current situation the users are under severe pressure to serve the needs of the computer center.

For the last 24 years NERSC has been the place where "I could get things done". With the initiation of the INCITE program that changed. The machine was effectively taken over by the 3 INCITE groups and work at NERSC stopped. After the upgrade my large calculations no longer run at NERSC and I had to move those computations to a p690 in Hannover, Germany.

The current queue structure on Seaborg encourages large jobs, but is less efficient for two reasons: (1) there are more idle processors waiting for a big job to start, and (2) it encourages people to submit jobs on more processors than their code can efficiently use.

Better management of the resources. I know that DOE wants to see very large jobs running on Seaborg and also support "big splash" projects, but unfortunately, it has prevented the vast majority of users from running jobs in a timely fashion in the past few months. This is not NERSC's fault. However, I think that the recipients of the INCITE awards should spread their computer usage over the whole year instead of doing all the runs during the last months of the program and thus preventing everybody else from running.

Given the amount of computer time that I am allocated, I cannot make use of the large number of processors on Seaborg. Unless everyone is allocated enough time to make use of hundreds of processors, NERSC should give more consideration to providing resources for smaller codes.

I have experienced a significant increase in the queue waiting time (often days) in reg_1 on seaborg recently, which seems to be correlated with the discount policy on large-processor jobs and the increased number of jobs in the reg_32 queue. Some of my colleagues at UC Berkeley also voiced similar frustrations, and a few have started to look for other computing resources. As much as we would like to use reg_32, the central issue is some scientific problems simply do not seem to scale well beyond 64 or 128 processors. The large number of users waiting in the seaborg reg_1 indicates that I am not in the minority. The wait time was much more reasonable earlier this year, so I would like to see NERSC modify the current queue priority and reduce the wait time.

... it would be good to state more precisely the queueing policies for jobs of the same category based on the number of processors and wall clock time. ...

NERSC should preserve a fraction of its resources for medium size parallel computing. State-of-the-art scientific research often times needs up to a 100 CPUs per job. Encouraging people to submit larger and larger jobs (thousands of CPUs/job) puts this distinct class of projects (in need of medium range parallel computing) in a very difficult position, as their jobs wait in the queue for a very extensive period of time, waiting for either these super large jobs to be picked up or to be done.

I already described the problems with the queue structure, so I won't repeat them. That's by far my biggest complaint. [The queue structure is so ludicrously biased toward large jobs that it is sometimes impossible to use one's time with a code that is optimum at 128-256 processors. That limit is set by the physics of the problem I'm solving, and no amount of algorithmic tinkering or optimization is going to change it much. NERSC gave my research group time in response to our ERCAP request, but to actually use the time we won, we wind up having to pay extra to use the express queue. Otherwise we would spend a week or more waiting in the regular queue each time we need to restart and job, and we'd never actually be able to use the time we were granted. I understand that NERSC wants to encourage large jobs, but the current queue structure guarantees that anyone who can't scale to 1000 processors is going to have to use the premium queue to get anything done.]

... The other thing that NERSC needs to do is allow a mix of small and large jobs to get onto Seaborg with relatively little waiting time in the queues, as used to be the case until the last few months.

Improve queue for medium sized jobs (128 proc+)/large sized jobs -- it seems that a few large jobs run and many small jobs (~64 proc or less) fit in the holes but these jobs are so small that they should really e run elsewhere on small cluster type machines or interactively so that the seaborg nodes are mainly reserved for larger jobs.

do not just emphasize high-performance computing -- it would be nice for me to feel like my work is welcome by NERSC even if it does not involve massively parallel programs

My only problem is the heavy emphasis on extreme levels of parallelism. Our problem sizes are often just not large enough to suit that. But I understand that is one of the reasons for a facility like this, so you need to emphasize it.

Sort out the queues & adopt a consistent priority policy to support the kind of computing that can only be done at a facility like NERSC. This summer has been a nightmare.

... 2) Reduce/remove the disparity between charge factors for reg_32 and reg_1 queue. I am very dissatisfied with the one-half charge for jobs in reg_32 queue. My jobs are waiting for a very long time. I wonder if this policy indeed results in optimal use of resources anyway - reg_32 and reg_128 queues already had a higher priority than the reg_1 and reg_1l queues, so if users could run their jobs as efficiently in the bigger queues, they would already have been doing it. (Maybe the average wait time was longer for reg_32 queue, but I see no reason for the wait time to improve with the implementation of the new policy, so that is not the factor encouraging users.) As far as I understand, usually if it takes 2t wall-clock time with n processors, using 2n processors wall-clock time is greater than t, unless there is a cache problem. So most efficient use of resources is to use the least number of processors on which you can fit your problem. I suppose with the uniform charge factor, that is why there were fewer reg_32 and reg_128 jobs. With uniform charge factor, I would guess people would only use the bigger queues if they need to, so that their jobs would finish in the allowed wall-clock time. In my opinion, this is how it should be.
Now with the new policy, for 16<=n<32, using 2n processors is a better option even if the code doesn't gain a second in wall-clock time as opposed to using n processors! reg_32 queue has a higher priority, too. A user might even use half the tasks per node with 2n processors; the usual charge penalty is simply compensated for by the new policy. That does not make sense. Moreover, I suppose the loadleveler has a much harder task with all these big jobs, and I am curious if the nodes are waiting longer in between jobs.

NERSC should move to supply capacity computing (cycles per year) rather than capability (cycles/sec). Should move from supercomputer center to a "cluster farm center" to supply cycles per year at greater cost benefit. Supercomputers of the parallel type are very costly in a wide time share environment. Their use should be limited to large jobs and underloaded to preserve good turn-around: when the [run time] /[turn around time] is < than 0.5, the supercomputer is running at half nominal speed.
The folly of time shared supercomputers is this: if properly loaded one is likely to wait twice as long to get half the machine as compared to a quarter and the turn-around time is likely to be faster using a quarter of the machine. If the optimal job for Seaborg is 512ps (OMB rule 50% jobs over 512ps), it is being used as 12 512ps clusters....but the unused connectivity is costly.

... Change queue management in summer, when it seems lots of summer interns put extra charge on the system. Stop queuing privileges for INCITE programs (I agree with giving priority to IAEA and IPCC).

Job queuing on Seaborg: jobs requiring 128 nodes or more should be allowed to have priority over those requiring 32 nodes or less.

different queues for big jobs, medium jobs and small jobs (# cpus). This may cut down wait times, by making sure peoples jobs are competing with like-jobs.

... Better queues.

... Also, I hope that one user can only have two jobs running in queue so that no user has to wait for a few days to get one job run.

... One of the causes [of long queue waits] is that some users run multiple huge jobs using more than 128 nodes with greater priorities. I mostly run jobs using less than 10 nodes, but queuing times can reach as long as 3 weeks. I think that NERSC can set a new policy to minimize queuing time for small jobs.

The batch Queue scheduling should be improved.

I would like to see NERSC offer more for medium-scale jobs (ones using < 128) processors. ...

The code I use (Guassian) is not well tuned on Seaborg. Queues are geared to massively parallel codes. Guassian can use only 12 procs efficiently. Turnaround time is bad; max wall time is too small.

... 3) Judging by a biased look at the jobs in the seaborg queue at any given time, NERSC caters too much to folks that can do their work on desktop machines or small clusters, particularly given the low price of amazingly powerful small-scale systems. Those needing the larger facility wait in a very long line with these projects...it takes days or weeks to run a job on seaborg unless you happen to have a very specific kind of job---namely the sort that scales well to a large number of processors so that you can get into the "special" big queue. But I repeat that the science done by those codes is not necessarily better than us doing work down at the lowly 512 processor level, and at the 512 level we sit in line with the desktop folks running scaling studies for ERCAP instead of doing real science. ...

The batch queues are configured as to make supercomputing facilities almost unusable, except for the privileged few. It takes 2-3 weeks to run a regular (nowadays even priority) job. It is impossible to debug a code, or do physics research, in this environment. You are on the verge of becoming irrelevant with regard to scientific computing.

I am most interested in rapid turnover of jobs requiring 1-20 SEABORG nodes (16-320 processors), with jobs mostly requiring 5-24 hours of wall clock time. I sometimes require significantly more time than this, but if the turn around time is fairly fast, I can break it down in sections. Until about 6 months ago, the turn-around was fine - occasional delays, but mostly could submit jobs and have them begin within 24 hours. Lately this has not been the case.

... A standby queue would be helpful to enable some activity to continue on such [exhausted] accounts. ...

We hope the max wallclock could be increased.

My only additional expectation is that the high-performance computation will be allowed to run longer time - 24 hours for the presence. ...

... max wall time is too small.

 

  Provide more/new hardware; more computing resources:   25 responses

NERSC has done an outstanding job of serving the community. In order for this to continue, NERSC needs continued support from the DOE for its staff and the services they provide, and NERSC needs support for a new high end system to replace seaborg.

... I would especially like to see NERSC maintain hardware with cutting-edge single-cpu performance.

Perhaps NERSC should upgrade its Seaborg processors.

The SP3 processors on Seaborg are very slow, compared to currently available processors. This limits how much cosmologists like my group can do. A more powerful system would allow us to solve problems that cannot be tackled by U.S. users (although our European and Japanese competitors already have much more computing time on their dedicated systems, and they are attacking these important problems). ...

Get some vector computers again or hybrid parallel/vector machines

I welcome your announcement that you plan to provide an additional supercomputer with fewer nodes. I believe this is very important because not all computational problems can effectively use hundreds of processors. It would be nice to have access to CRAY supercomputers again at NERSC.

NERSC needs to move on from the IBM-SP3

The usual: More processors and greater throughput.

Faster hardware, better turnaround times ... the usual requests of users.

Get a larger computer :)

The computer code I use becomes more complex from day to day to use the best physics you can. However this increases the computing time. The great management and support at NERSC combined with new hardware would be an irresistible package.

Even more processors.

Keep adding nodes.

See collective previous open ended comments. [Need the new computer system sooner. Need access to cluster and vector technology in order to keep stagnant models up to date.]

NERSC needs a big upgrade to its hardware to keep up-to-date with current computing technology.

I wish they've kept PVP computers longer. It was very frustrating to port all codes to Seaborg from Crays, and find out that single processor at Seaborg is much slower, than it used to be at Crays. I've been working at NERSC for ~6 years, and by now I've had to port my codes to a new computer at least 5 times (!). I guess, going for a new machine is good, but may be you should keep the older ones longer? ...

Larger computer ...

NERSC response: In early calendar year 2005 NERSC will deploy a new Linux cluster with 640 dual 2.2 Ghz Opteron CPUs available for computations. The target workload for the cluster is jobs that do not scale well on Seaborg.

  Improve the allocations process / ERCAP / INCITE:   15 responses

Create a pdf file from allocation request for our records.

NERSC response: This has already been implemented. At the bottom of the ERCAP request lists page is a button Show selected requests in PDF format. The 2006 ERCAP request form will use this PDF format for the "Show Complete Request" tab.

... Similarly, I've described my problems with the ERCAP proposal process. I feel it gives short-shrift to science, and focuses on code optimization to the exclusion of scientific returns. [The ERCAP allocation process is not very good. At other supercomputer centers, an allocation request is written like a scientific proposal. It includes certain required topics, but the proposers are free to write a proposal that makes sense in the context of their code and their problem. NERSC's proposal form is too much of a one-size-fits-all solution. ...]

NERSC response: DOE does not conduct a science review of the ERCAP request forms. This is because they have already conducted a science review for the DOE grant request. DOE does ask NERSC to conduct a computational review of the requests. See Types of Awards and Review Process.

I would like to see a little bit more forgiveness for the projects like INCITE and anticipation that most allocation will likely be used near the allocation period. This follows from the fact that scientific application software (code) is constantly being developed and the science result is a function not only of the raw idea, computing resources, but includes "latest the greatest" element of scientific software development. For that reason, I foresee many INCITE projects being slow at the beginning and consuming most of their resources before end of November. Extending data storage allocation might be very important to allow for careful data analysis and maximizing scientific impact of the project. In essence, INCITE projects are long-term and long-lasting even if allocation is nominally for one year. [INCITE PI]

I believe a mistake was made in the allocation of resources to the INCITE teams. It was my understanding from the proposals that these groups should have been able to run immediately on Seaborg. Since the majority of them didn't, and they were not docked accordingly at the end of each quarter, we are now in the position of watching them try to burn almost all of their time in the last quarter of the year. This now gives them override on more than a third of the remaining time causing huge backups for everyone else. If they had run smoothly from the start of the award, or were docked when they didn't use the time, we wouldn't be in this situation.
I do believe in the INCITE process and think this is an excellent program, I just have a problem with the implementation.

NERSC response: It is important that the INCITE projects began using their time shortly after their award. As is stated in the call for proposals, Applicant codes must be demonstrably ready to run in a massively parallel manner on Seaborg. In 2005 NERSC will work closely with the INCITE awardees to help them start computing as early as possible.

... Also, the allocations process was not clearly explained, and we consequently lost some of our computer time because we did not use enough of it by a certain date (and the reason we could not use it all was because of the long queue wait times). In addition, our allocated hours were/are constantly being changed (added and subtracted) without any notification given to us.

NERSC response: We apologize for the confusion and hope that the 2005 award letters more clearly states this policy: Repositories (both MPP and HPSS) that haven't used significant amounts of time (or SRUs) each quarter are adjusted by transferring a part of the unused balance to the corresponding DOE Office reserve. See: Allocation Management.

The allocation of one of my accounts (mp169) was exhausted in June. A standby queue would be helpful to enable some activity to continue on such accounts. Alternatively, a better redistribution system would enable some additional allocation to be available when an account is exhausted with six months remaining in the operating year. Wise allocation management is of course a user responsibility, however, in some years circumstances result in shortages whereas in other years surpluses occur.

Given the amount of computer time that I am allocated, I cannot make use of the large number of processors on Seaborg. ...

When account allocations expire, some transition period should follow to allow users to process data. Large projects generate massive amounts of data, and a typical user has no storage resources at home to transfer this data to.

NERSC response: NERSC policy is that when an account expires the user has one month on Seaborg (or on the PDSF) in limited status (cannot submit batch jobs). During that time they can do data cleanup and they have full access to HPSS. For the next 5 months they have no access to Seaborg (or other computational systems) but they do have read/only access to HPSS. After that 6 months their access to HPSS is terminated but their files remain (indefinitely for now, but in the future there may be a policy on how long files will be retained).

I think the INCITE program was ill conceived. Betting that the performance of a tiny subset of the scientific community will payoff enormously better than the community as a whole seems to me like trying to time the stock market. It may work once, but the opportunity costs are enormous.

The NERSC allocation for FY 2004 seemed to be a mess; this may not be NERSC's fault.

Quicker allocation process for short period for emergency purpose such as data recovery.

... Also, reward all time in one simple, direct process, using application forms that do not require an enormous amount of time to fill out. Treat all users equally. Avoid special targeted initiatives at all costs - these both waste user time by requiring them to fill out multiple applications, and justifiably anger users because they make the machines very difficult to use for "regular" users.

Improve the allocation process to allow a better planning of computational resources. Many projects were put on hold since not enough resources were allocated at the start of the fiscal year.

... and larger allocation of the computation time.

1) In the ERCAP process, the focus on GFlops and scalability is just plain stupid. These numbers are not a measure of a code's ability to do science. I can put a huge useless do-loop in my code and really get the flop count up high. In fact since it wont have communication, I can look pretty good on scalability too. That loop's gonna make my real work come out the end a lot slower, but who cares because I'll get a big allocation with my flop count and scalability. These statistics are only good because they provide a simple set of numbers that management types can hold onto. Worse though, near the ERCAP submit time, how many jobs are stacked in the queues of seaborg just to get these scaling numbers for the ERCAP submission (trust me, you can see these jobs, clear as day). This compounds the problem of people rushing in at the end of the year to use their time. I think NERSC should stop focusing on flop counts and raw scalability and find another way to measure scientific relevance.
2) The ERCAP review process is silly. The reviewers are not qualified to comment on the scientific appropriateness of the numerical methods in a code, since every code is application-specific. The projects are funded by DOE, and therefore automatically qualify for time at NERSC. Exactly how are the reviews used in the allocation process? It seems that the review process is a way to make NERSC appear to doing due diligence in spreading out the hours in a scientifically justifiable way, but too little information is given to under-qualified reviewers, and it is unclear how their reviews are even used, if at all.
... 4) Lose the INCITE thing. Again, why are applications that scale well held in an exalted status? Certainly I can lean as much from 20 512-processor runs of linear-algebra-bound code than I could with a few 4096 runs of a well-scaling, but algorithmically inefficient explicit algorithm that requires orders of magnitude more time-steps or grid points. But the INCITE jobs are sucking away time along with the scaling studies and desktop-size runs. Also, looking at last year's project, it is unclear to me that the INCITE jobs have accomplished anything spectacular.

  Other Seaborg improvements:   8 responses

... A problem that I bring up every year is the quality of interactive service. Although this has improved since the last survey, the lack of ability to do small debugging runs interactively (at least with any reliability) is a problem. Would it not be possible to set aside a few nodes that could run with IP protocol (rather than US), in order to create a pool of processors where users could simultaneously run in parallel?

Better interactive access on seaborg for debugging and development jobs. Perhaps a few more dedicated nodes for these sorts of tasks.

Improve interactivity. Why do small jobs have to be so difficult and problematical to run at times?

... (3) Seaborg needs more large-memory nodes. 1GB per processor isn't enough for many of our larger jobs. ...

Improve Seaborg's processors. ...

... 2. To fix the I/O problems so that we always have all the nodes available to us; ...

Access to the home directory source code even when the Seaborg is down.

The new OS has really been a bummer. I'm going to have to spend a lot of time trying to figure out why our performance is so degraded. This is unfortunate.

  PDSF improvements:   7 responses

PDSF could have a few more interactive machines. Sometimes they're fairly heavily loaded.

The PDSF interactive nodes are rather sluggish; I am no longer able to link software in a reasonable amount of time on these nodes any more.

More responsive interactive nodes on PDSF!! I can't stress this enough.
Maybe a bigger user disk quota (500MB can sometimes be frustrating during analysis jobs or when testing new software codes).

Improve I/O problems.

NERSC response: The PDSF support team has made it possible to run interactively on the batch nodes (there is a FAQ that documents these procedures). They also recently purchased replacement login nodes that are being tested now and should go into production in December 2004. They are top of the line opterons with twice as much memory as the old nodes.

more disk space (always good), increase the number of PDSF login nodes.

I am quite worried about the performance of the disk vaults at PDSF. It seems that the combination of very large disks (>2.5 TB per node) and NFS does not scale very well to many jobs. I know that this problem is actively being addressed at PDSF, but it is my single complaint about the current situation.

NERSC response: The PDSF support team has added about 20 Terabytes additional disk space. As to disk performance, there is unfortunately no technology at this point that the PDSF could afford to deploy.

Consider more whether using 'experimental' systems on trial is always appropriate - i.e. determine a better balance between increased likelihood of failure and free or better resource

  Software improvements:   4 responses

... Also the xlf fortran compiler, which makes somewhat difficult sometimes to port codes from other platforms (I'm porting mostly from tru64 and intel compilers). Also the debugging tool is not so easy to use and a somewhat more extended documentation on it would be welcome.

NERSC could improve their C++/HPC services to the users.

... 3. To upgrade ESSL if possible.

Please fix AVS5.6 on seaborg.

  More/better training:   3 responses

I should be informed if there is a course or tutorial class at least one month ahead of time by email no matter it's at NERSC or LLNL, so I can plan to go there to use grid.

Offer more video lectures for remote users.

Better educations as to what resources are available, what they are for and how to use them.

NERSC response: In 2004 NERSC organized 20 user training lectures in 7 separate events. All were presented via the Access Grid and were captured as streaming videos (using Real Media streaming) so that users can replay them at any time. These lectures have been added to the tutorials page for "one stop" access to training materials. See NERSC Tutorials, How-To's, and Lectures.

  Network improvements:   3 responses

... (4) Faster network connectivity to the outside world. I realize that this may well be out of your hands, but it is a minor impediment to our daily usage.

As I mentioned earlier, some improvements in the network access would be nice, but I do not know if the problems I am seeing are anything to do with NERSC/PDSF or if they originate in the network local to my office. [PDSF user]

More storage space and faster access.

  Other suggestions:   8 responses

1) Reduce/remove hpss charge for TRANSFERING files to and from mass storage. The charge for making transfers to and from mass storage does not make sense to me. It sounds too harsh, most of our hpss allocation is exhausted by the transfers rather than actual storage. We have huge files, huge amounts of data. We can not keep it in scratch. Sometimes we need to analyze another aspect of the data, and then need to retrieve it. Unfortunately, we are charged every time we access it. ...

... (2) More resources dedicated to consulting: I think the consultants are great and I'm extremely appreciative of their time and thought; but difficult problems seem to take too long to resolve, or are resolved only incompletely-- and it's really only those difficult problems for which we seek NERSC's expert advice in the first place. Perhaps more consulting resources would improve follow-up, allow consultants to spend more time on user problems, etc. ...

NERSC should have project liaisons for each major project to help the very competent technical staff understand the priorities and important of the scientific work. [PDSF user]

Get more involved in Open Science Grid

To be more user friendly. The complexity of having to run a computer center with a very diverse public makes difficult to concentrate in creating easy user interfaces that would reach the full potential of the computer center.

One password for all access. ...

To make security measures less boring.

1) It seems that there is still room for improvements in the area of data analysis and visualization.
2) I'd like to run some jobs that would use few TB of disk space. There could be a disk array for these types of jobs, where files would be removed after 1 month or so to avoid filling-up the disk.

 

  Don't change, no suggestions:   3 responses

No suggestions at this time.

Keep on doing the superb job !

No major complaints. (I've only been using NERSC irregularly for the past two years, so I haven't had time to develop very specific complaints.)

 

How does NERSC compare to other centers you have used?   77 responses

  NERSC is the best / overall NERSC is better / positive response:   39 responses

My experience with supercomputing centers (no longer existing) has been spotty at best - user support was often lacking and certainly slow. Thus, I have really appreciated using NERSC (and especially Seaborg and its predecessor) over the years.

NERSC stands head and shoulders above all other centers I have used. (I am comparing it to ORNL, NCSA, PSC, SDSC, NASA (Ames and Goddard), and various defense and commercial sites.)

For me NERSC is the center which sets the standard in the combination of services and hardware (well even if the SP3 is somewhat outdated). Centers I compare are: CSCS, ETH and LANL.

Eagle supercomputer at Oak Ridge National Laboratory
NERSC's more varied queueing system and classes provides greater flexibility than does the ORNL system. For example, some large scale debugging is feasible at NERSC but not at ORNL.

NERSC is the user's heaven compared to RCF at BNL

NERSC is the best computing facility I have ever used as compared to hardware reliable, software availability, consulting helps, .... Anyway, I choose to run on seaborg and wait for other computing facility to be mature.

In terms of availability, usability and overall quality, NERSC is unbeatable. I'm using SDSC (very little these days, NERSC is so much better that even SDSCs faster hardware cannot keep up with it) and HLRN (Hoechstleistungsrechenzentrum Nord in Hannover and Berlin), the HLRN has much more configuration, availability and usability issues than NERSC (that's a Pwr4 with Fed. switch). Overall, NERSC is the best computer center I've ever used.

NCAR/NCSA
NERSC is MUCH easier to apply for time to. Also, we can run for a much longer time, NCAR has a 6 hour limit.

I would rank NERSC among the best of all the major compute centers.

NERSC is much better than a computer center at NASA/GSFC. The latter is less powerful and is down very often.

Much better than BNL RCF, which I have stopped using.

Outstanding. My baseline for comparison is Rechenzentrum Garching, Germany, and San Diego Supercomputer Center.

I use sometimes Oakridge computers Cheetah and Eagle. They are also great but the machines are much smaller than seaborg. So for big jobs, NERSC is the place to go!

Compares very well

I've done a lot of computing at NCAR on their IBM SP systems. I feel the NERSC staff has been more responsive to requests for help. Good job.

NERSC is far superior. I worked on computers at ORNL, ARSC, and Earth Simulator Center in the past few years.

Our group has used NERSC supercomputers for about 10 years. Prior to that, we used supercomputers at NCSA (Urbana, Illinois) funded by NSF. In our experience, NERSC is better than NCSA!

Oh, nersc is by far the best.
Compared to sdx (local UK machine), jlab cluster, PSC.

Superb service! Other centers (SDSC, PSC, big LANL and LLNL machines) do not come even close to NERSC's level.

I have used the supercomputing facilities at ORNL and LANL.
NERSC rates highly compared to these; I would however like to see more interactive access if possible.

I also compute at the HLRN (HoechstLeistungsRechenzentrumNord). I don't want to flatter NERSC, however HLRN is no match against NERSC. The downtime of SEBORG for instance is 1day (scheduled) against what the HLRN had a downtime of about 2months in 2004. So the HLRN does nothing you should start doing.

Probably better than RCF (fewer problems).

Better, more smoothly run that BNL/RCF.
Much more accessible than PNNL EMSL.

Compared to BNL the PDSF system is more user friendly and more reliable.

Very well. The support at PDSF is superlative. I am comparing to CERN (a few years back) and MIT Physics and Stanford

I have also attempted to use a cluster at LSU, but found it unusable due to extremely poor administration, software, batch queuing, etc. PDSF is like heaven in comparison. I have to admit that I have not used the LSU cluster in the last 6 months, however.

I think that NERSC stands head and shoulders above some other centers I have used. I have used most extensively the RHIC Computing Facility besides NERSC. RCF has improved quite a bit in recent years, but I still think that the NERSC personnel is far superior. Our technical problems on both HPSS and PDSF were always addressed rapidly and it was made clear what and why there were certain technical limits.

I've also used the Citris IA-64 and Millennium x86 clusters at UCB. NERSC administers its computing services much more professionally and with much greater attention to the needs of its users. As a result, it is much easier to get maximal effectiveness out of NERSC equipment, even though some of that equipment is a couple years behind the technological edge.

I also compute at NCAR and ORNL. Given the very different support requirements and operating constraints of these different sites, NERSC is very competitive and is an essential part of our research effort.

NERSC is probably the most user friendly site I use. As I said, the people are knowledgeable, friendly, helpful and proactive. The hardware is reliable and productive. I appreciate Jim Craw's and the consultants proactive and voluntary keeping us informed about situations.
ORNL, NCAR

NERSC is much better managed than NSF centers at UCSD and PSC.

NERSC has tended to provide more of our computing capacity with a more user friendly environment than other centres. They have been effective at bringing new systems into production faster than other centers. The only centre that has been more user friendly than NERSC is the LCRC here at Argonne. However, they serve a much smaller group of users. They also do not provide the archival capabilities of NERSC. We are also comparing NERSC to NPACI, which appears to be less well organized than NERSC. In addition the NSF tends to move us between sites and machines from year to year, which we find annoying. Finally we are comparing NERSC to NCSA. The ongoing problems they have had with their Tungsten Linux cluster makes us doubt their competency in running such a centre. Note that none of these other centers allow remote access to their archival systems.

Off all the centers I have used NERSC is by far the best. The list of centers I have experience at and which I am comparing to includes ORNL CCS, NCSA, SDSC, NASA Ames, NASA Goddard, PSC, and LLNL.

I've used the RHIC computing facility at Brookhaven. NERSC (particularly PDSF) are far, far better in every respect: reliability, uptime, and particularly in user support. NERSC is also far more effective at keeping intruders out, without burdening their users.

Best I have ever used. In the past I have also used centers at LLNL and LANL.

I am not currently using other centers. I used to use LLNL, NERSC is significantly better to use, mainly because of the user support.

I use LLNL computers and I believe that:
1) the queuing system is much better at NERSC.
2) the possibility to debug large jobs does not exist at LLNL. We have found bugs in our code that only show-up for large numbers of CPU's (>128). Being able to debug these jobs was crucial for success in couple projects.

I've tried running at NAS (at NASA Ames) recently, and your shop is run much better, from reliability to access to documentation.

NERSC computing resource is much more stable, compared to the Grid Parallel Computing system at Tokyo Institute of Technology, Japan.

  NERSC is the same as / mixed response:   20 responses

3 years ago NERSC was by far the best center I had access to. I have the feeling that other supercomputing centers have upgraded their computers faster than NERSC. For instance, I had expected that Seaborg would have been upgraded to an SP4 already, since when seaborg was put together it was among the very first SP3 to become available for scientific computing in the US.

NERSC has a higher level of reliability. It has less "spinning disk" which puts it at a disadvantage. Compared to NCSA, UCSD.

I have used the San Diego supercomputer center and the open computing facility at LLNL. In comparison, NERSC's systems are more stable, reliable, and better run. However, the queue structure is much worse. At LLNL, the queuing system is not so biased to large jobs that smaller ones can't get through. At San Diego, they are experimenting with different queueing algorithms to try to make it possible for smaller and larger jobs to co-exist. The consultants there have tinkered with the queues manually to ensure that it is possible for people to use the time they have been allocated. NERSC should start doing something similar -- if not manual tinkering, then at least experimenting with a queue structure that makes it possible to be a small-node user without having to wait a week in order to get 24 hours of computing.

Compared to NCAR, I think that NERSC has a more reliable system. However, NCAR has significantly faster IBM Power4 processors.

See answer to first question above. [Seaborg has (until the last few months) been the most useful high performance system that my group has used. Its combination of power and ease of use has been great.] Other centers that we have used recently include NCSA and Pittsburgh in the U.S., and several supercomputers in Germany.

Other than the INCITE fiasco it is the best I have run at.

Other than NERSC I use only jazz cluster in ANL. jazz is worse than NERSC in almost every respect. I guess the only thing that is better on jazz is that its processors (Pentium IV Xeon) are much faster than Power 3.

I have used LLNL's MCR, ASCII White and ASCII Blue computers. What I like of those systems is that the allocations are not time-based, but fair-use based. Maybe a fair-use policy could be implemented at some levels at NERSC. Other than that, both centers, LLNL's and LBL's are excellent.

Well NERSC is still better than SDSC, but post-upgrade the machine is in a bad state and is useful only for a small class of my calculations. I have collaborators clamoring for results I can't deliver because I can't run the calculations.

I have only used the Ohio Supercomputing Center's clusters. They offer smaller amounts of resources but have generally been easier to start jobs on in a shorter period of time.

I have use the Ohio Supercomputing Center. Their batch time is less, but they cannot provide as many processors.

Consulting is as good as NCAR, not quite as good as ORNL.
Seaborg is way overloaded. I rarely encounter queues more than a day long at either NCAR or ORNL.
Seaborg is up and available more than computers at either NCAR or ORNL, ditto for NERSC HPSS.

There are a number of differences between NPACI (Datastar) and NERSC. Interactive queues and number of processors are better on Seaborg. However, the batch wait time tends to be a lot longer. The processor speeds are comparable.

It compares well to PPPL and IPP-Garching and San Diego Supercomputing center. Small jobs run very well at these other centers, but not so well on SEABORG. I get not enough resources available for these small jobs at NERSC, but not at these other centers.

The old days of preferential treatment for large MPP runs on SEABORG was wonderful (for us!). There are somewhat long turnaround times for small processor runs.

NERSC does better than San Diego in the quality of people who assist the user but San Diego gets high marks for the speed of the machine (Power 4!) and the turnaround time. NERSC does better for large problems (S.D. tops out around 1000 procs).

- CINES Montpellier France
- CEA Grenoble France
About the same quality, they are good and you are good. Your web site is much better than the one of CINES which gives you superiority for access information, etc. Also I think you can do better, they are really worse. regarding CEA I do not even think there is a web site to get information. I can also compare to CERFACS (France), but let say that the exploitation of the machines is a less professional.

Overall, NERSC do better than the ORNL center that I am also using. But the ORNL center is more friendly to serial jobs and small parallel jobs. Also the ORNL consulting staff sometimes provide better answers to my questions.

Again, SDSC/NPACI has a new machine, Datastar, that in my experience is 2-3 times faster on my problems than Seaborg. Unfortunately, their user environment makes it difficult to do much more than run large batch jobs there. Perhaps I'm not in the loop, but the fact that I've heard nothing of a replacement for Seaborg makes me unsure how useful NERSC will continue to be.

The support is top notch, the file systems and overall reliability are second to none. The allocation process is silly. The queue structure unfairly favors jobs using a large number of processors. The INCITE system is counterproductive. There are way too many small users.

  NERSC is less good / negative response:   7 responses

Everyone is migrating to smaller clusters that have better turn around. Even if the peak performance is worse, it is easier to do physics in such an environment.

I have also worked on NCSC(North Carolina Supercomputing Center---now defunct), ORNL(eagle), LCRC(Argonne national lab), NCSA (tungsten), and CACR (mulder---now defunct). The queue wait time on every other supercomputer center has been reasonable. This was not true for NERSC. To me the fairest queue system is one that applies unilaterally to everyone (and not assigning different priorities so people can manipulate the system to their own advantage---all of us would like to have the highest priority) . The fairest system I worked with was one that had a wall clock limit of 12 hours, and a limit of 2 queued jobs by any one person at any time. And this applied to everyone. This prevented a few people from shutting everyone else out of the computer.

I have been getting better service from CCS.ORNL. I think it might be that they have a bigger range of computing size. Note that before CHEETAH got the new federated switch, the node-node communication was so poor that most people just used 1 node=32ps...CHEETAH was like a "cluster farm"....after the new switch allowed muti-node use it became a supercomputer....became overloaded and it's usefulness was degraded by long wait times.
ANY QUESTIONS: email: waltz@fusion.gat.com

Comparing to other centers I have used, at NERSC the max wallclock is not enough for our calculation. We use 16 processors and there needs about 2-3 days for one job.

Long wait times make it difficult to get timely results and lowers my overall productivity. NERSC is therefore not as good as other facilities where the wait times are shorter.

The San Diego Supercomputer Center.
Their turnaround times for batch jobs are shorter.

NCSA, PSC. Queues dedicated to codes like Gaussian which are poorly parallelized.

  No comparison made:   11 responses

The only other center I use is Livermore Computing, which is not comparable in scale.

The only other centers I have used have local users and are much smaller than NERSC; there is really no comparison.

NERSC is the only computing center that I have been using.

Out side of parallel PC at Oklahoma State University I have nothing to compare to. The resources at OSU were not adequate to do the research I am doing at NERSC.

Homegrown beowulf computers, and PSC.

The other centers I have used are miniature compared to NERSC and really no comparison would be meaningful.

LLNL, NCAR, ANL

I had used the Illinois Supercomputing Center before but am not a current user.

Upgrade the grid gatekeeper frontends to PDSF. It is fairly easy to saturate them.

I haven't used other centers.

Center for Scientific Computing, Goethe University, Frankfurt/Main, Germany

Show Pagination