NERSCPowering Scientific Discovery for 50 Years

2001 User Survey Results

Response Summary

NERSC extends its thanks to the 237 users who participated in this year's  survey; this compares with 134 respondents last year. The respondents represent all five DOE Science Offices and a variety of home institutions: see User Information.

Your responses provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve.  Every year we institute changes based on the survey;  some of the changes resulting from the FY 2000 survey are:

  • We increased the SP home inode and disk quotas as well as the SP scratch space.  SP disk configuration satisfaction was higher this year and only one user requested more inodes on this year's survey.
  • Last year one of the two top SP issues was that the "SP is hard to use".    Based on comments we received in last year's survey we wrote more SP web documents and made changes to the user environment..  This year only 12% (compared with 25% last year) of the comments reflected that the SP is hard to use.
  • We added resources to the T3E pe512 queue and created a new long64 queue:          satisfaction with T3E turnaround time improved this year.
  • Last year we moved PVP interactive services from the J90 to the SV1 architecture and provided more disk resources. Overall PVP satisfaction was rated higher in this year's survey.

Users rated us on a 7-point satisfaction scale, with 7 corresponding to  Very Satisfied and 1 to Very Dissatisfied. Based on responses from the Overall Satisfaction with NERSC questions, we are doing as well as or better than last year.  Two areas showed significant improvement:

  • available computing hardware
  • allocations process

The areas of most importance to users are:

  • available computing hardware
  • overall running of the center
  • network access

See Overall Satisfaction and Importance

The average satisfaction scores from the questions about specific NERSC resources ranged from a high of 6.6  to a low of 4.5. Areas with high user satisfaction include

  • HPSS reliability, performance and uptime
  • Consulting responsiveness, quality of technical advice, and follow-up
  • Cray programming environment
  • PVP uptime
  • Account support

Areas with lower user satisfaction include

  • Visualization services
  • Batch wait times on all platforms
  • SP interactive services
  • Training services
  • SP performance and debugging tools

The largest increases in user satisfaction came from the PVP cluster: four PVP ratings increased by 0.3 to 0.8 points. This was true last year as well (where the increase in satisfaction from 1999 was even greater). Other areas showing a significant increase in satisfaction are

  • T3E and SP batch wait times
  • SP disk configuration
  • SP Fortran compilers
  • HPSS
  • allocations process

Several scores were significantly lower this year than last:

  • training scores
  • SP uptime
  • SP interactive resources
  • PVP Fortran compilers

See All Satisfaction Questions     and Changes from Previous Years.

When asked what NERSC does well, 35 respondents pointed to our stable and well managed production environment, and 31 focussed on NERSC's excellent support services. Other areas singled out include well done documentation, good software and tools, and the mass storage environment. When asked what NERSC should do differently the most common responses were to provide more hardware resources, and to enhance our software offerings. Of the 49 users who compared NERSC to other centers, 57% said NERSC is the best or better than other centers.  Several sample responses below give the flavor of these comments; for more details see  Comments about NERSC.

  • "NERSC makes it possible for our group to do simulations on a scale that would otherwise be unaffordable."
  • "The availability of the hardware is highly predictable and appears to be managed in an outstanding way."
  • "Provides computing resources in a manner that makes it easy for the user. NERSC is well run and makes the effort of putting the users first, in stark contrast to many other computer centers."
  • "Consulting by telephone and e-mail. Listens to users, and tries to setup systems to satisfy users and not some managerial idea of how we should compute"
  • "The web page, hpcf.nersc.gov, is well structured and complete. Also, information about scheduled down times is reliable and useful."

Some of the suggestions for improvements:

  • "Don't become oversubscribed. I'm worried that SciDAC will push for oversubscription, please don't go there."
  • "Get more hardware. DOE is falling way behind NSF."
  • "Install zsh, please."
  • "more debugging and optimization support for MPP platforms like seaborg"
  • "I want something 10 times faster than Killeen but not MPP".
  • "More access to capability machines that let long jobs of 32-64 pes go for 8 hours or more. Although many applications can use a lot of processors, science studies often ramp up and down in size as one walks through parameter spaces. Having a complement of smaller parallel machines to match the big one is very useful. These smaller machines do not need to scale much past 64 pes."
  • Better indexing of the sprawling website. Finding, e.g. compiler options or queue limits takes some knowledge."

     

User Information

 

 

Number of responses to the survey: 237

 

 

Respondents by DOE Office and User Role:

   

       
OfficeNumberPercent
ASCR 18    8
BER 37   16
BES 69   29
FES 38   16
HENP 70   30
guests 5    2
 
         
User RoleNumberPercent
Principal Investigators 46 19
Repo Managers 49 21
Users 141 60
DOE Managers 1 -
 

 

 

 

Respondents by Organization:

 

             
Org TypeNumberPercent
Universities 134 57
DOE Labs 92 39
Industry 5 2
Other Govt Labs 3 1
Private Labs 2 -
DOE 1 -
OrganizationNumberPercent
Berkeley Lab 36 15
UC Berkeley 28 12
Livermore 14 6
Argonne 7 3
Brookhaven 7 3
PPPL 6 3
Oak Ridge 5 2
PNNL 5 2
Los Alamos 4 2
U. Maryland 4 2
 
         
OrganizationNumber
Florida State 3
NREL 3
New York Univ 3
Ohio State 3
U Arizona 3
U Colorado 3
U Georgia 3
U Michigan 3
U Montana 3
U Washington 3
UCLA 3
Vanderbilt 3
other universities 69
other gvt labs 8
industry 5
private labs 2
DOE 1
 

 

 

 

Which NERSC resources do you use?

 

         
ResourceNo. of
Responses
Responses to
this Section
Later in Survey
T3E 120 92
SP 115 84
HPSS 94 149
PVP 76 69
NIM 74 159
Consulting 70 177
HPCF Website 57 185
Account Support 50 158
PDSF 19 -
Computer Operations and Network Support 14 -
Newton 8 15
Escher 8 13
Alvarez 2 -
FTG PC cluster 1 -
 

 

 

 

How long have you used NERSC?

 

     
TimeNumberPercent
6 months or less 51 22
6 months - 3 years 92 40
more than 3 years 87 38
 

 

 

 

What desktop systems do you use to connect to NERSC?:

 

       
Operating System TypeNumber
Unix 309
PC 114
MAC 40
Other   3
Individual SystemsNumber
UNIX-linux 134
UNIX-solaris 73
PC-win2000 62
PC-win98 41
UNIX-irix 37
MAC-macos 33
UNIX-tru64 21
         
Individual SystemsNumber
UNIX-aix 18
UNIX-hpux 15
UNIX-other 10
PC-win95 10
MAC-OSX 7
PC-OS/2 1
Digital Alpha 1
Open VMS 1
PalmOS 1
Digital UNIX 1
NCD X terminal 1
 

   

Web Browser Used to Take Survey:

 

       
BrowserNumberPercent
Netscape 4 167 70.8
MS Internet Explorer 5 44 18.6
Mozilla 10 4.2
MS Internet Explorer 6 5 2.1
Netscape 6 4 1.7
MS Internet Explorer 4 2 0.8
Konqueror 2 0.8
Netscape 3 1 0.4
Opera 5 1 0.4
 

 

 

 

Overall Satisfaction and Importance

 

Legend:  

   
SatisfactionAverage Score
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4
 
     
ImportanceAverage Score
Very Important 2.5 - 3
Somewhat Important 1.5 - 2.4
 
 
Significance of Change
significant increase
significant decrease
not significant
 

 

 

Overall Satisfaction with NERSC:

 

           
Topic No. of
Responses
 Average
Score
 Std. Dev.Change
 from 2000 
Account Support Services 185 6.43 0.93 0.04
Consulting Services 177 6.30 1.07 -0.09
Overall satisfaction with NERSC 219 6.25 1.02 0.10
HPCF Website 184 6.18 0.96 0.05
Available Computing Hardware 178 6.11 1.07 0.21
Mass Storage Facilities 148 6.05 1.17 0.02
Network Connectivity 176 6.03 1.19 0.02
Allocation Process 150 6.00 1.10 0.21
SW maintenance and configuration 144 5.92 1.15 -0.16
HW management and configuration 159 5.82 1.25 -0.18
Available software 179 5.81 1.21 -0.17
Software documentation 157 5.60 1.20 -0.02
Training 77 4.92 1.26 -0.21
Visualization Services 68 4.51 1.20 -0.16
 

 

 

 

Importance to Users:

 

           
Topic No. of
Responses
 Average
Score
 Std. Dev.Change
 from 2000 
Overall satisfaction with NERSC 198 2.82 0.41 -0.03
Network Connectivity 165 2.82 0.40 0.02
Available Computing Hardware 165 2.81 0.41 -0.10
Allocation Process 137 2.67 0.50 -0.03
Consulting Services 174 2.64 0.59 -0.03
HW management and configuration 140 2.62 0.53 -0.04
SW maintenance and configuration 136 2.60 0.56 -0.01
Available software 167 2.56 0.59 -0.03
Software documentation 148 2.50 0.57 -0.01
HPCF Website 171 2.49 0.60 0.06
Mass Storage Facilities 140 2.48 0.68 -0.08
Account Support Services 172 2.44 0.56 0.06
Training 89 1.80 0.77 0.05
Visualization Services 86 1.71 0.76 0.06
 

 

All Satisfaction Questions and Changes from Previous Years

 

Legend:  

   
SatisfactionAverage Score
Very Satisfied 6.5 - 7
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4
 

 

 

How Satisfied are you?

 

                                                                                                                                                                                                                                                                                                 
TopicNo. of
Responses
Average Score
HPSS: Reliability  83 6.63
 
Consulting: Timely response 155 6.56
 
Software: T3E Fortran Compilers  59 6.53
 
HPSS Overall 101 6.50
 
Software: PVP Fortran Compilers  54 6.48
 
Software: PVP Programming Libraries  36 6.47
 
Consulting: Quality of Technical Advice 151 6.46
 
PVP: Uptime  64 6.45
 
Account Support Overall 185 6.43
 
Consulting: Followup 125 6.37
 
HPSS: Performance  89 6.36
 
HPSS: Uptime  88 6.33
 
Consulting Overall 177 6.30
 
Software: PVP User Environment  57 6.28
 
Software: SP Fortran Compilers  65 6.26
 
Overall satisfaction with NERSC 219 6.25
 
Consulting: Response to Special Requests 113 6.23
 
T3E Overall  92 6.23
 
T3E: Uptime  81 6.22
 
HPCF Web Site Overall 184 6.18
 
Software: SP Programming Libraries  53 6.15
 
Web: Accuracy 142 6.15
 
PVP Overall  69 6.14
 
Available Computing Hardware 178 6.11
 
Software: PVP Local documentation  44 6.09
 
Web: Timeliness 132 6.08
 
Software: PVP Application software  33 6.06
 
Mass Storage Overall 149 6.05
 
Software: T3E Programming Libraries  53 6.04
 
Network Connectivity 176 6.03
 
NIM: User Management 117 6.03
 
HPSS: User Interface  92 6.02
 
NIM: Ease of Obtaining Account Info 138 6.02
 
Allocations Process Overall 150 6.00
 
Software: SP User Environment  71 6.00
 
PVP: Disk configuration and I/O performance  52 6.00
 
Software: T3E User Environment  73 6.00
 
PVP: Ability to Run Interactively  60 5.98
 
Training: Online  Tutorials  75 5.97
 
Software: SP Local documentation  59 5.97
 
Software: SP C/C++ Compilers  35 5.97
 
Software: PVP General tools and utilities  37 5.97
 
NIM: Performance 140 5.96
 
Web: Getting Started Guide 108 5.94
 
Software: PVP Bug resolution  30 5.93
 
NIM: Allocations Request Interface 111 5.92
 
Web: NERSC-specific Info 126 5.92
 
Software Maintenance and Configuration 144 5.92
 
NIM: Ease of Use 159 5.91
 
NIM: Functionality 137 5.91
 
Web: T3E Section  97 5.90
 
Software: T3E Local documentation  57 5.88
 
Web: Ease of navigation 166 5.88
 
Software: PVP C/C++ Compilers  23 5.87
 
Software: T3E C/C++ Compilers  36 5.86
 
Hardware Management and Configuration 159 5.82
 
SP Overall  84 5.82
 
Available software 179 5.81
 
Web: File Storage Section  78 5.79
 
Web: SP Section  96 5.78
 
Software: PVP Performance and Debugging Tools  36 5.78
 
Web: Programming Info 131 5.72
 
NIM: Ability to Find Info 151 5.68
 
SP: Disk configuration and I/O performance  54 5.67
 
Web: PVP Section  69 5.64
 
T3E: Ability to Run Interactively  74 5.64
 
Software: T3E Performance and Debugging Tools  52 5.63
 
Software: PVP Vendor Documentation  32 5.62
 
Software Documentation 157 5.60
 
T3E: Disk Configuration and I/O Performance  63 5.60
 
Software: T3E General tools and utilities  47 5.55
 
Web: Searching 130 5.55
 
Software: SP Application software  38 5.55
 
Training: Classes  24 5.54
 
SP: Uptime  77 5.53
 
Software: T3E Bug resolution  29 5.52
 
Software: SP General tools and utilities  46 5.52
 
Math Server: Newton  15 5.47
 
Software: SP Bug resolution  34 5.44
 
Software: T3E Vendor Documentation  37 5.41
 
PVP: Batch Queue Structure  54 5.41
 
Software: T3E Application software  31 5.39
 
T3E: Batch Queue Structure  75 5.36
 
Software: SP Vendor Documentation  47 5.32
 
SP: Batch Queue Structure  68 5.19
 
Training: Online Class Slides  33 5.15
 
Training: Teleconference Lectures  19 5.11
 
Visualization Server: Escher  13 5.08
 
Software: SP Performance and Debugging Tools  53 5.00
 
T3E: Batch Wait Time  80 4.97
 
Training Overall  77 4.92
 
SP: Batch Wait Time  76 4.92
 
SP: Ability to Run Interactively  68 4.71
 
PVP: Batch Wait Time  59 4.56
 
Visualization services  68 4.51
 

 

 

 

 

2000 to 2001 Changes:

  The following are statistically significant changes for responses to questions common to the 2000 and 2001 user surveys.

                                                                                                                                                                                                                                                                 
Topic2001
Satisfaction
2000
Satisfaction
 Change
PVP Software Bug Resolution 5.93 5.10 +0.83
PVP Programming Libraries 6.47 5.81 +0.66
T3E Batch Wait Time 4.97 4.33 +0.64
SP Disk Configuration and I/O Performance 5.67 5.20 +0.47
PVP Queue Structure 5.41 5.03 +0.38
SP Batch Wait Time 4.92 4.54 +0.38
PVP Overall 6.14 5.86 +0.28
SP Fortran Compilers 6.26 5.96 +0.30
HPSS Reliability 6.63 6.39 +0.24
HPSS Overall 6.50 6.26 +0.24
Allocations Process 6.00 5.79 +0.21
T3E Overall 6.23 6.01 +0.22
Available Computing Hardware 6.11 5.90 +0.21
PVP Fortran Compilers 6.48 6.66 -0.18
NERSC Online Tutorials 5.97 6.22 -0.25
SP Ability to Run Interactively 4.71 5.51 -0.80
NERSC Teleconference Lectures 5.11 6.00 -0.89
SP Uptime 5.53 6.52 -0.99
NERSC Training Class Slides 5.15 6.13 -0.98
NERSC Training Classes 5.54 6.71 -1.17
 

 

1999 to 2001 Changes:

  The following are statistically significant changes for responses to questions common to the 1999 and 2001 user surveys.

                                                                                                                                                                                                                                                                                                                 
Topic2001
Satisfaction
1999
Satisfaction
 Change
PVP Overall 6.14 5.05 +1.09
PVP Ability to Run Interactively 5.98 5.18 +0.80
PVP Batch Wait Time 4.56 3.95 +0.61
PVP Vendor Documentation 5.62 5.03 +0.59
PVP Programming Libraries 6.47 5.94 +0.53
HPSS Performance 6.36 5.90 +0.46
PVP Application Software 6.06 5.54 +0.52
PVP Disk Configuration and I/O Performance 6.00 5.56 +0.44
PVP Fortran Compilers 6.48 6.02 +0.44
PVP Local Documentation 6.09 5.68 +0.41
HPSS Overall 6.50 6.12 +0.38
PVP Batch Queue Structure 5.41 5.03 +0.38
T3E Disk Configuration and I/O Performance 5.60 6.23 +0.37
HPCF Website Overall 6.18 5.87 +0.31
T3E Fortran Compilers 6.53 6.20 +0.33
HPSS Reliability 6.63 6.46 +0.17
Web: Ease of Finding Information 5.88 5.70 +0.18
Available Software 5.81 5.99 -0.18
Consulting Overall 6.30 6.58 -0.28
T3E General Tools and Utilities 5.55 5.91 -0.36
T3E Programming Libraries 6.04 6.42 -0.38
NERSC Teleconference Lectures 5.11 5.78 -0.67
NERSC Training Classes 5.54 6.19 -0.65
NERSC Training Class Slides 5.15 5.95 -0.80
 

NERSC Information Management (NIM) System

 

Legend:  

   
SatisfactionAverage Score
Mostly Satisfied 5.5 - 6.4
 

 

Satisfaction with NIM:

 

       
Topic No. of
Responses
 Average
Score
 Std. Dev.
User management 116 6.02 1.01
Ease of obtaining account info 136 6.01 1.20
Performance 138 5.96 1.27
Allocations request interface 109 5.91 1.08
Ease of use 157 5.90 1.16
Functionality 136 5.90 0.95
Ability to find desired info 149 5.67 1.18
 

 

 

 

What are the most useful features of NIM for you?   56 responses

     

26   Usage information
12   Account management
11   Web interface / easy to access / complete info
8   Allocations process
8   Don't use
1   Don't like

   

Comments and suggestions regarding NIM   37 responses

     

14   Would like additional functionality
9   Web interface is complex - hard to understand at first
7   Problems with logging in / passwords
6   NIM is fine / no suggestions
5   Poor performance

 

Comments and suggestions regarding GETNIM   19 responses

   

12   Don't use it (much)
4   Improve / expand it
2   It's useful

 

 


What are the most useful features of NIM for you?   56 responses

 

 

 

Usage information:

Check daily CPU time usage

  daily usage

figuring out remaining allocation time

Checking my resource allocations and those of the other members of my group... to see what we have left so we can use them effectively.

Checking usage and remaining time.

Ability to se how many hours I've left in my account; also how much others on my allocation have used

Seeing how many hours we've used and how many we have left.

get information about how much time left

[...] find out CPU time balance

I use mostly the information on my use and that of all users in my group. In particular, I want to see total time remaining.  

To find out from time to time the usage of CPU , so that I do not go over my allocation, if possible. Ms. Francesca Verdier, Dr. Dave Goodwin of HENP have rescued my jobs because of timely infusion of CPU . I am most grateful to Dr. Goodwin and Ms. Verdier for their help, and I would try my best in FY 2002 to use no more than my allocated CPU.

Monitoring allocation usage

allocation usage and computing resources

account information

tracking my usage

Information available on usage to date [...]

usage per user in my project and total

[...] Reviewing repository usage.

  Monitor user repo usage

Account information and management [...]

Checking account status, primarily usage.

Obtaining up to date account information

account status

Access to the number of used computing hours.

checking on SRU usage

[...] SRU accounting  

 

Account management:

being able to add or delete users easily being able to manage CPU time allocations for users in an easy fashion (much better than setcub)

Account management is good.

[...] Ease of managing user accounts within my repo

Basic account management

[...] Adding/deleting users.

[...] and account management

  [...] user management

Account information and management [...]

[...] and account management

Management of accounts and users. [...]

add/delete users; [...]

changing user allocations  

 

Web interface / easy to access / complete info:
 

access from anywhere, clear instructions

The online web interface

Completeness of the information

It lists all my accounts!

Easy to use

allocation for all the [accounts] is shown at one glance. And everything can be done on one panel.

In general, it is very easy to use, and makes it easy to see all of the relevant account information easily.

Web interface

  Accessability, ease of use

I can use it even if I am traveling.

It's less aggravating than its predecessor setcub.  

 

Allocations process:

ERCAP interface is improved.

Making the allocations.

ERCAP and [...]  

[...] Reviewing proposals

ERCAP and [...]

[...] ERCAP Reviews and Allocation Awards

I like the combined PVP/SP allocations

ERCAP  

 

Don't use:

  I have never used it.

I don't know what NIM is, and I am not sure that I care.

That I rarely use it.

I will let my account manager answer this.

Have no interaction with it! Am only a user of HPSS and PPDG.

never used so far.

the visualization server  

 

None:
 

None really. But since I don't know what came before, it is hard to compare.  

 

 


Comments and suggestions regarding NIM   37 responses

 

 

 

Would like additional functionality:

Better documentation. Better report generation  

  NERSC response: We have updated the NIM Users Manual since the survey opened.  We will provide a monthly usage report by January 2002.  If you have specific requests or suggestions please send them to consult@nersc.gov.

  I would like to have the ability to upload proposals for allocations as a single file (pdf, postscript). It would make life much easier for proposers and reviewers.  

  NERSC response: This will be considered but no commitment to do this can be made at this time.

I resent NIM insisting on setting cookies, which I normally don't allow  

  NERSC response: Cookies are needed for the interface to work.  For Netscape 4 under Edit/Preferences/Advanced, check "Only accept cookies originating from the same server as the bage being viewed."

[...] Allow for multiple logins so as to display windows side by side  

  NERSC response: With Netscae 4 you can bring up the lower NIM "result frame" in a new window by right clicking your mouse and selecting "Open Frame in New_Window".  

If it is possible how much allocation has been used per running job, it will be great to trace cpu time per job to benchmark the program or find the errors (for example, if there is leakage in dynamic array allocation, and you have very long time running jobs.  

  NERSC response:  This will not be incorporated in the near future but will be considered as a possible future (at least a year away) enhancement.

I would like a more complete NIM interface on the compute platforms, SP and SV1's, so that I don't have to go back to a local workstation launch a netscape window and surf to NIM.  

  NERSC response: This is unlikely to happen.  The main purpose of getnim is to provide an interface that scripts can call.  It is not intended to duplicate the web interface.

[...] it would be nice to get a month-by-month usage list; [...]    

  NERSC response: This is coming, hopefully by January, 2002.

I have no idea what NIM is (maybe it doesn't apply to PDSF?)  

  NERSC response: True - PDSF has not yet been integrated with NIM, but will be.  You should be able to see the status of PDSF login names by June 2002.

HPSS accounts are not always mapped to NIM

Some of my group is not in the system. If you look, there are significant fractions of people that were pre-NIM and have not been added to they system retroactively.  

  NERSC response:   These comments pertain to HPSS users -- HPSS has not yet been integrated with NIM but will be. You should be able to see the status of HPSS login names by March 2002, and of Storage Resource Units (SRUs) by May 2002.  Meanwhile see: About SRUs.

Ability to transfer resources to and from different accounts  

  NERSC response: This already exists for account managers.  Select Transfer Resources from the Actions menu.

I may just have not realized how to use it, but I sometimes would like to see broken down by user and time period time use.  

  NERSC response:   This already exists.  Select Rept: Daily Usage from the Search & Reports menu.

I would like to have access to the usage history of my allocation time, as it was before NIM took place.  

  NERSC response:
[NERSC comment: this already exists but is currently available only to account managers.  We will make this info available to all users.]

[...] Also, when time is added or removed from accounts at the end of quarters, NIM does not seem to provide the repo manager an explicit record of this activity.  

  NERSC response:   This already exists but is currently available only to account managers.  We will make this info available to all users by December 2001.

 

 

Web interface is complex - hard to understand at first:

could be simpler to get basic info

[...] the two top windows, especially the top right window is a bit confusing (too many options)

The user manual and the menu choices are not as easy to deal with as a lot of other documentation and modern software. The functionality seems quite complete, but it isn't as easy as it could be to figure out for the first time how to use NIM.

The menu structure is too complex with too much overlap in functionality (i.e., to check the time used in my repository, I can probably find 3-4 different ways of getting there from the initial NIM page). It would seem like the menus could be greatly simplified. [...]  For those of us in the Eastern US the update time of NIM information is somewhat of a mystery. It often seems that when I arrive in the morning, NIM does not reflect charges due to jobs that ran overnight. [...]

  the visual interface is not very appealing on the Web.

I found NIM very confusing when I first started using it. It took me some time to learn my way around. An option giving an overview of the functions and when you would be likely to need to use them would be helpful. For example, it took me quite some time to figure out that I needed to use NIM to request an account for myself once my allocation was approved.

The interface is not particularly slick. The "My Stuff" pulldown and attached frame are not particularly intuitive, but are OK once familiar.

When validating user lists, it's a bit hard to tell whether any changes made have actually taken effect. It would be nice to get a message saying, "Ok, your user list is now validated for next year." or something of the kind.

The interface for applying for new allocations is just a mess. As is the actual application form! Why is it so fragmented?  

 

Problems with logging in / passwords:

It would be nice to relax the constraint of having at least one non-alphanumeric character for the NIM password so that I can Have the same password for NIM and the machine I'm using (IBM SP).  

Hmm. I don't seem to have a .nimpw in my $HOME, as promised. Perhaps this is something that only shows up after a while.  

  NERSC response: .nimpw was only used when NIM first opened in September 2000 (it was a mechanism to distribute many new passwords at once to all users).  It is now obsolete.  To obtain a NIM password, contact the Account Support group at Support@nersc.gov.

Automatic access with the main loginname and password would be desirable

I always have to login twice, don't know why but it always doesn't work the first time and does work the second time.  

  NERSC response: This problem is being actively investigated.  As of November 2002 we do not know what causes it.

  I had difficulty to "login" sometimes, as I get a message "Login Failed". Upon second attempt , it seems to work OK. I do wait for "Document done" message , but some how I am still facing login problems at first attempt. I had pointed this out to the consultants, and I guess in the new FY 2002, I should request them to look into this again, if they have spare time, since the consultants are really very hard workers and may be they are hard pressed for time. It is not an issue, and I do not mind trying twice to "login".

There is a bug in the login; I always have to login twice.

I'm not that familiar with NIM yet, as I was unable to log in the one time I tried.  

 

NIM is fine / no suggestions:

Seems fine

Nothing I can think of...

everything is perfect  

Satisfied.

Looks great to me!

No comments  

 

Poor performance:

[...] Also, the performance is not too good from remote sites. It often takes a lot of patience to access even fairly simple account information. It would seem like there should be some way for you to supply "coarse-grained" account information without having to invoke the full machinery behind NIM that seems to take so long. Either that, or put NIM on a faster server. If NIM were an e-commerce site, it would have long since gone out of business - it's too slow. [...]

Off-site speed slow.

  Make it faster. [...]

Improvement of speed (updates, lookups)

it takes a long time to load on my Mac; [...]  

  NERSC response: We are working on performance issues.

 

 

   


Comments and suggestions regarding GETNIM   19 responses

 

 

 

Don't use it (much):

only use it through the Web

slow, but I usually use the web interface (used getnim 2-3 times).

  I didn't even know this existed

Never used it.

haven't used it

I don't use it

Unused

no idea

have not used getnim

Haven't used that....

I use this very little.  

N/A  

 

Improve / expand it:

Essentially useless. But, a command line interface is nice, so getnim should be improved.

Primitive. It needs at least the functionality of setcub -- far superior to NIM in all regards.

I don't find it as easy to use as the old system

First of all, using capital letters for commonly used command flags (i.e., getnim -R) should be a no-no. This makes it more awkward than typing getnim -r and I see no reason why you need to be using capital letters here. Okay, I could set up an alias to this, but I already have quite a few of those to deal with. Second, it would seem like you should have "getnim -R repo" provide a little more information than just an unadorned number. You should also add text stating what the number is, what its units are, and when it was last updated. While you're at it, you might as well also add the number of hours that have been used in addition to the hours remaining. For users that need only a single number (such as for a script) from getnim, just give them a different switch (or let them keep -R and use -r for the more user-friendly output). The updating of getnim information is strange; for example, it's now Oct. 3 and I've gotten several batch runs through since Oct.1, but yet when I just now typed getnim -R, it still shows the amount of time I was initially allocated on Oct. 1. Why is this?  

  NERSC response:  

  1. Major enhancements to getnim are unlikely.  The main purpose of getnim is to provide an interface that scripts can call.  It is not intended to duplicate the web interface.
  2. It took several days after the new fiscal year for bugs related to the fiscal year changeover to be corrected.  By October 4, 2001 all usage information was available and retroactive to October 1.

     

     

     

It's useful:

I mostly use it, the web interface is flashy but not so convenient

None. Satisfied.  

 

Other:

The ercap review process is very painful with nim as it is.  

   

Web and Communications

 

Legend:  

   
SatisfactionAverage Score
Mostly Satisfied 5.5 - 6.4
 
   
Significance of Change
not significant
 

 

Satisfaction with the Web:

 

         
TopicNo. of
Responses
Average
Score
 Std. Dev.Change
from 2000
Accuracy 142 6.15 0.94 -0.07
Timeliness of info 132 6.08 0.97 0.08
New Users Guide 108 5.94 1.10 -0.02
NERSC-specific info 127 5.93 1.05 0.02
T3E Pages 97 5.90 0.97 -0.10
Ease of finding info on web 166 5.88 0.96 0.10
File Storage Pages 78 5.79 1.10 0.01
SP Pages 96 5.78 1.11 0.03
General programming info 131 5.72 1.05 0.09
PVP Pages 69 5.64 1.08 0.08
Search facilities 130 5.55 1.31 -0.06
 

 

Comments and suggestions concerning the HPCF web site:   29 responses

   

11   Good website
7   Provide additional or clearer information
7   Problems navigating / better organization
3   Improve searching

 

 

 

How useful are these for keeping you informed?

   

     
Question No. of
Responses
Average
Score
 Std. Dev.Change
from 2000
Email 145 2.43 0.64 -0.01
MOTD 138 2.18 0.80 -0.09
Announcement web archive 126 2.02 0.81 -0.03
Phone calls 112 1.79 0.91 -0.02
 

 

       
Question No. of Yes
Responses
No. of No
Responses
Percent Who
Said Yes
Do you feel you are adequately informed? 160 10 94
Are you aware of major changes at least 1 month in advance? 119 28 81
Are you aware of software changes at least 7 days in advance? 110 26 81
Are you aware of planned outages 24 hours in advance? 135 14 91
 

Comments regarding how NERSC keeps you informed of changes 13 responses

 

 


Comments concerning the HPCF web site   29 responses

 

 

 

Good website:

Just what it should be -- informative, easy, and not butt-ugly.

Your website is great! Many times, I get all my answers from there

I like it. It's very useful and helpful to me.

The information on the web pages is very thorough and informative.

The web sites are GREAT! They are the standard benchmark for comparing other supercomputer website. And, in my opinion, the NERSC web pages are the best. Just keep the load light (no Java and crazy graphics which as just silly and useless). The information is up to date and very well organized. Go NERSC!  

Very good in general. Unfortunately, PDSF specific information sometimes isn't easy to find. Often, it's hidden in archived emails.

Hey, within a day or two I was up and running. Without much hassle I might add. [...]

Great web site! The information is always current!

I use this little, but my experience has always been satisfactory.

It works pretty well.

I will let my group members who specifically use T3E or SP answer these questions.  All I can say is that they don't complain much about NERSC in comparison with NSF operation.  

 

Provide additional or clearer information:
 

I would like to see more in-depth documents about NERSC's experience with the computers, especially Seaborg. How about a document with tests results (performance, scalability, etc.)?

More documentation on software needed and more tutorials needed on general topics.

The introduction of MPI is extremely well written and easy to understand, but that for OpenMP is not so much readable in my opinion (it does not seem more than the original manual of OpenMP). For example, I finally needed to read full original manual to write a production code with OpenMP, though I didn't have to do that in case of MPI. This part (i.e., webpage for OpenMP) is what I want the improvement. In particular, please add more explanation about the concept of 'thread', what is the difference between MPI and OpenMP, and more detailed example to mix MPI and OpenMP (i.e., combine fine and coarse-grain parallelism).  Thank you for your consideration.

Maybe it's my fault, but I seemed to have a lot of trouble getting the batch queues to work, despite reading the webpages very carefully

I am new to the supercomputing game, and am having difficulties getting going. I have not yet been able to port a fairly simple fortran program over and run it in parallel. Most of my difficulties are related to OpenMP (and then MPI), So I appreciate the large amount of documentation. This is exactly the right idea. I still have had some problems finding what I was looking for (special options needed for ssh, the fact that my ssh on the PC (version 2) doesn't seem to work with seaborg, and OpenMP and MPI tutorials not being in the same place, so I didn't know the latter existed, etc.)  Am getting off the ground, however.

More documentation on parallelization

I hardly ever use them, they don't get to the point quickly enough for my taste.  

 

Problems navigating / better organization:

The informations regarding the queues on the IBM SP should be more apparent (maybe with an explicit "queues" link). They are somewhat hidden in the current web site.

For some reason my brain has refused to learn how to get the uptime status and scheduled downtime pages easily.

Better overall organization. [...]

sometimes had trouble finding nim quickly, and queue descriptions quickly (cray t3e pvp)

Documentation as it is is useful only if you know what specifically you are looking for.

There is such a large volume of information there that it's hard to weed through it and find what I needed. As a new user, a separate section, tutorial, perhaps just having basic information on how to get a feel for the resources available for different types of tasks, would be helpful.

  [...] and OpenMP and MPI tutorials not being in the same place [...]  

 

Improve searching:

In general, there is a lot of good information. However, I occasionally run into fairly common things that are very hard to find. For example, in trying to find out the meaning IBM compiler and Loadleveler error numbers, I've had to ask the consultants, and getting to that page took quite a few clicks and I would never have found it myself. One would think that many users might need to look up these error numbers and it would be easy information to get to. Doing open-ended searches on the NERSC web site are not usually too successful, but this a problem with many web-sites. Part of the problem is that the links from the search don't really take you directly into the place on the linked page that you need to go to; also the text provided on the links from the search is often not very useful. I don't know what the solution is other that look for a better search engine.

Improved search capacity.

[...] I answered dissatisfied with the search because it returns too much information to pour over, this probably isn't bad, but it has caused me to look for the information elsewhere rather than go through the entire list.  

 

Don't use:
 

I have never accessed the HPCF web site

I have never used the HPCF web site.

I have not used it recently enough to have a ready comment.  

 

Other:

How come PDSF did not make it to this form?  

 

 


Comments regarding how NERSC keeps you informed of changes and issues   19 responses

 

 

 

Comments on using email:

  I don't really care about any of this, but if Killeen was to go down for more than a week or two, then I would be unhappy until I found an alternate cpu. If something major is going to happen I would prefer to get an e-mail instead of having to read some message flying by my screen.

Changes should be sent as e-mail to the machine to which they are relevant. This means that mail should be enabled on all login platforms.

I think mailing list might be better to inform users about any changes. Also, it may be useful as a place to discuss technical (programming )problems in a public manner.

The one common comment - get email that system is underutilized but, upon checking, that does not seem the case.

I guess I need to be on the mail list.... MOTD and web site on useful after I have noticed a failure.. Not good coverage of PDSF outages ... they have separate mail list!  

 

Comments on the MOTD:

Updating the MOTD on a shorter time scale and telling us a little more about what is happening would be helpful.  

Customizing the info presented would be helpful

Thank God the MOTD was finally changed to be machine based (one section per machine). Please keep the new format!

Great job! The MOTD at login is really helpful.  

 

Comments on software changes:

A change in the MPI libraries on the T3E broke my code. I had to debug this myself -- consulting services was unable to help me. Now my code is broken again (works fine on Linux cluster), possibly due to another change in the T3E configuration -- I don't know.

I had some trouble when the NCAR library changed... it came as a surprise

  I would like to be informed directly when bug reports I write are resolved. I typically file 4-5 bug reports per year. (some of them are user error)  

 

Satisfied:

It is fine

I am satisfied with this  

 

Other:
 

The users are generally well informed about the changes but not about the reasons why. For example, why did Seaborg get all the extra nodes all of a sudden? I never got any information about this. Users should be kept informed about the performance tests that NERSC performs on the computers...

I use PDSF, questions ar not relevant. Information flow to users from PDSF team is uniformly excellent.

Can't recall about the last two.

I'm a too new user to provide suggestions on this.

How do I use these services?  

 

Hardware Resources

 

Legend:  

   
SatisfactionAverage Score
Very Satisfied 6.5 - 7
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4
 
     
Significance of Change
significant increase
significant decrease
not significant
 

 

 

Satisfaction - Compute Platforms:

 

             
Topic No. of
Responses
Average
Score
 Std. Dev.Change
from 2000
PVP Uptime 64 6.45 0.83 0.04
T3E Overall 92 6.23 0.96 0.22
T3E Uptime 81 6.22 1.11 0.13
PVP Overall 69 6.14 1.06 0.28
PVP Disk Configuration and I/O Performance 52 6.00 1.03 0.23
PVP Ability to Run Interactively 60 5.98 1.05 -0.13
SP Overall 84 5.82 1.39 -0.06
SP Disk Configuration and I/O Performance 54 5.67 1.35 0.47
T3E Ability to Run Interactively 74 5.64 1.30 -0.07
T3E Disk Configuration and I/O Performance 63 5.60 1.25 0.25
SP Uptime 77 5.53 1.71 -0.99
PVP Queue Structure 54 5.41 1.38 0.38
T3E Queue Structure 75 5.36 1.31 0.09
SP Queue Structure 68 5.19 1.41 -0.03
T3E Batch Wait Time 80 4.97 1.48 0.64
SP Batch Wait Time 76 4.92 1.65 0.38
SP Ability to Run Interactively 68 4.71 1.85 -0.80
PVP Batch Wait Time 59 4.56 1.65 0.30
 

 

 

 

Max Processors Used and Max Code Can Effectively Use:

 

           
Processor TypeNo. of
Responses
Average No.
of Processors
 Std. Dev.Change
from 2000
Max SP Processors Used 72 202 326 +61
Max SP Processors Can Use 56 751 994 +160
Max T3E Processors Used 73 133 151 -13
Max T3E Processors Can Use 54 356 529 +56
Max PVP Processors Used 46 10 21 +0.8
Max PVP Processors Can Use 36 30 93 +20
 

 

 

 

Satisfaction - HPSS:

 

     
Topic No. of
Responses
Average
Score
 Std. Dev.Change
from 2000
Reliability 83 6.63 0.62 0.24
HPSS Overall 101 6.50 0.74 0.24
Performance 89 6.36 0.92 0.16
Uptime 88 6.33 0.91 0.02
User Interface 92 6.02 1.22 -0.12
 

 

 

 

Satisfaction - Servers:

 

   
Topic No. of
Responses
Average
Score
 Std. Dev.Change
from 2000
Newton 15 5.47 1.19 -0.08
Escher 13 5.08 1.04 -0.17
 

 

 


Summary of Hardware Comments

 

Comments on NERSC's IBM SP: 43 responses

       

14   provide more interactive services
13   good machine / useful features
7   stability problems
6   provide longer queues
5   hard to use/software problems
4   don't like charging structure
3   improve turnaround time
3   slow communications
1   more inodes
1   remove small jobs

   

Comments on NERSC's Cray T3E:   26 responses

     

11   good machine / useful features
4   provide longer queues
3   improve turnaround time / obsolete
2   hard to use/software problems
2   more inodes / more disk
2   stability problems
1   provide better interactive services
1   remove small jobs

   

Comments on NERSC's Cray PVP Cluster:   18 responses

     

6   too slow / obsolete
5   good machine / useful features
3   provide longer queues / fewer checkpoints
2   improve service for big memory jobs
2   improve turnaround time
2   more inodes / more disk

 

Comments on NERSC's HPSS Storage System:   29 responses

     

10   good system
4   hard to use / software problems
3   authentication / password issues
3   need expanded functionality
3   don't like the down times
1   don't like the SRU accounting
1   performance improvements

   

Comments about NERSC's auxiliary servers:   5 responses

 

 


Comments on NERSC's IBM SP:   43 responses

 

 

 

Good machine:

Seaborg has been a terrific resource for my group. We couldn't get 90% of the work done without it.

[...] Individual processor performance is quite good.

Generally it works quite well and I prefer it to the T3E (batch turnaround seems faster)

Great machine! Perfect for my work that requires computing power

The best machine ever, no comparison to anything else that I can access.  

This is a great machine ! I think it\'s really powerful ...

Great Improvement over previous machine, now that increased memory per node is available.

Have only tested small problems on seaborg; know from ASCI blue pacific that this architecture is useful for domain decomposed PDEs.

Like it a lot save for a few modifications that would be self serving. I realize that what I would like probably wouldn't be so great for the rest of the community.

The number of nodes and processors available makes this facility ideal for our purposes. [...]

The machine is great; the queues are fine; it is very speedy and responsive. [...]

The machine is excellent. The turn around time is sometimes longer than I would like. I hope that the computing power available at NERSC will continue to grow rapidly.

Fun, [...]  

 

Provide more interactive services:

MORE/BETTER SUPPORT FOR INTERACTIVE JOBS!!! Debugging parallel programs on this machine is a total pain because you're lucky if you can get 2 interactive nodes, and even then you have to wait forever to get them. [...]

Running parallel debugging (Totalview) and profiling (Vampyr) tools interactively has not been working well for me. Totalview usually complains about not getting enough resources even for a small (4) number of processors.

I was trying to debug a program on the SP. It was an utterly frustrating and tedious process. Either seaborg was down completely, or the loadleveler would not respond, or the loadleveler would not let me run my demanding one-node-one-CPU-five-seconds-of-CPU-time job, or the debugger would not start because its license server was down. It took me days to complete a trivial debugging job which would have been a matter of hours on a workstation.

My postdoc has been getting our application up and running and I have not yet had a chance to do large runs myself. For learning the system, it is always easier to try things interactively. Certainly this was the case on T3E. It is so easy to get batch scripts wrong.

Debugging environment is atrocious.

Would like to see more resource allocation for interactive use.  

When I submit an interactive job with poe and there are no processors available, it bounces back, and I have to submit it again. I find it more convenient to have the job put in a "dormant" status, and start once it gets enough processors available. This is what happens in the cray machine with mpprun. [...]

The interactive runtime limits are too short. Debugging large runs is painful. [...]

Running interactively is almost impossible since there is no separate interactive queue. Waiting around for an interactive processor to come free --- or getting lucky --- is not a good use of time. Using the debug queue is the only alternative.

[...] Running interactive is hard sometimes. There should be more processors allocated to running interactive files. [...]

The 20 minute interactive single processor limit will kill my ability to do in-situ data manipulation and analysis effectively.

Currently, running small jobs interactively to debug can be nearly impossible because no nodes are available. Thus, there is no way to debug a code using totalview. [...]

The fact that there is no interactive queue is sometime a problem, especially for totalview.

My only difficulty in running here was to try to get a serial run as a benchmark for code performance  

 

Stability problems:

NERSC seems like they take a longer time to bring up new hardware that other centers I use and often it's starts out flakier than it should be. [...]

Improved stability. [...]

[...] It is just down a lot of time (or was in recent past). Is there a way to cut down on the down time?

[...] Seaborg is down a lot. Gseaborg was never down that much. Other than that, I have no complaints

IBM switch doesn't really seem to be reliable

I am VERY discouraged with the amount of down time lately on SEABORG. This is a very nice configuration which does NOT live up to its potential because of the hardware problems. I hope it improves. [...]  

[...] uptime also needs work [...]  

 

Provide longer queues:

The 8 hour real time limit makes my project totally impossible. It is the worse regulation imaginable.

Maximum of 8 hrs per job could be increased.

[...] Also, there are no queues with long CPU time (more than 8 hours). Sometimes, they are useful.

a longer running queue (say, 24 hours maximum) with limited number of nodes on seaborg may be useful.

  a maximum cpu time greater that 8 hrs. would sometimes be helpful.

The Clock limit on SP of 8 hrs is set assuming, I guess, that all the users can run parallel code across a large # of processors. There might be cases like mine, where i have a serial code which when compiled with SMP options can run on 4-8 proc in the same node. But it takes lots of hours (~180 clock hrs), a lot more than the limit set for SP users. [...]  

 

Hard to use / software problems:

[...] Improved documentation on queue status.

[...] The xlf compiler isn't very good, IMHO.

[...] Also, the ksh shell doesn't seem to work properly.

[...] Also, it will be of great help for people like me, who are not used to parallelizing the code, to have a consulting person who can work one on one with us to give us suggestions on how to go about doing the parallelization for my problem effectively. I know it may be too much to ask for, but i\'m sure a lot more people will come forward to use these resources in a more efficient way.  

I only just started using seaborg. I can only log in from mcurie -- not from my home machine. [...]  

 

Don't like charging structure:

[...] I also strongly disagree with the charge factor

[...] The charging of 2.5X for processor hours, however causes us to use up all of our time very quickly. We could go through 100,000 hours easily within weeks and we limit our jobs because of this. The wait in the queue is satisfactory EXCEPT at the end of a fiscal year as everyone used "premium" to get their jobs going. We had to use "premium" because "regular" left us with a six day wait time (compared with a 24 hour wait time a month earlier).

I don't like the new way the hours are charged. I don't always use 16, 32, etc processors. [...]

[...] In addition, although I get charged 2.5 times as much to run on the SP, for jobs with internode communication it really only gets slightly better performance than the T3E. What a waste.  

 

Improve turnaround time:

[...]  For a new machine (Seaborg) the queues seemed to become slow very quickly.

[...] The turn around time is sometimes longer than I would like. [...]

[...] Also, it is again, like all other NERSC machines, oversubscribed, so at first it is convenient to use, but then it becomes so slow that turn around times go out the roof!  

 

Slow communications:
 

Fun, even with nasty latencies.

Limited by communications both on and off node. Not only does it need higher bandwidth and lower latency and/or truly asynchronous communications, but it also needs the ability to transfer data directly between the L2 caches of different CPU's (on- and off-node). [...]

[...] for jobs with internode communication it really only gets slightly better performance than the T3E.  

 

More inodes:

[...] abolish inode quotas!!!!  

 

Remove small jobs:

It seems like a lot of people are not using it effectively. That is, it's a world-class machine being used for a lot of smallish jobs that would be more effective elsewhere. NERSC in general needs a mid-class machine to take the load off of the high-end machines. [...]  

 

Just starting / don't use:

I would like to try it out

still coming up on learning curve

Just started to utilize IBM SP3 at NERSC, so I am unable to make specific comments.  

I haven't extensively used the SP since it moved into phase II although I anticipate this increasing the number of processors that some of my codes can use.

Group members will answer detailed questions. My answers to top two questions reflect the complaint level I hear.

 

 


Comments on NERSC's Cray T3E:   26 responses

 

 

Good machine / useful features:

sad if has to go

Good communication bandwidth.

One can get 4 hours a day in on the 256 pe queue, which is really good for production. The smaller queues are not very effective, in that a 16 beowulf system running 24 hours a day on 1 gigahz processors gives me a factor of three improvement in speed over the 64 pe queue running 4 hours a day and 1.5 improvement over the 128 pe queue. As this beowulf has 512 mb per pe, the 64 pe queue can still do a problem twice as large and the 128 queue one four times as large.

Fun. [...]

Have made good use of this resource for published data on parallel scaling of domain decomposed PDE solvers in the past (mostly through junior collaborators, not personally)

Its great, and is worthy rival to the SP considering the charge factor and good inter-processor comms.

  The Cray compiler and Cray totalview debugger are fantastic for development. The SP3 totalview is not even in the same league. I will be extremely sad to see this machine go. The throughput is currently much faster on this machine as well.

Getting old, but still good. [...]

It's been great so far, but I must admit I'm on the steep side of the learning curve. It's easy to use and my jobs seem to go quickly. Note that I'm constantly reevaluating how effective my code is, so this answer may change in two days or two months or never.

A very nice machine. Too bad it's obsolete, and I wish they built a successor.

I am happy with the T3E.  

 

Provide longer queues:

Is the 4-hour CPU limit extendable?  

The time in the queue is far too short for today's applications - this is I know similar to other facilities. It means jobs must often be stopped and started and while most information can be stored in is somewhat frustrating that a complete calculation must take many many submits.

The small amount of time allotted for the largest jobs limits what I can get done on the T3E.

4 hours is far not enough  

 

Improve turnaround time / obsolete:

Beginning to show its age. Batch turnaround is often slow

that thing should go into a museum.

  queue much too long    

 

Hard to use / better software / better docs:

debugging support is lousy - aren\'t there some decent debuggers for C code out there that run on Unicos? [...]

Several I/O and particularly default variable issues (I*4 vs I*8) that hindered the porting of my code, and in fact never got completely ported.  

 

More inodes / more disk:
 

[...] abolish inode quotas!!!!

Disk space and inode availability is becoming a significant headache here, almost preventing useful work.  

 

Stability problems:

[...] uptime also needs work

[...] Down too often.  

 

Provide better interactive services:

Interactive response time is terrible.  

 

Remove small jobs:

[...] The smaller queues are not very effective, in that a 16 beowulf system running 24 hours a day on 1 gigahz processors gives me a factor of three improvement in speed over the 64 pe queue running 4 hours a day and 1.5 improvement over the 128 pe queue. [...]  

 

Just starting / don't use much:

only used for testing

still not into production running yet

I am not using the t3e very much anymore  

 

Other:

i used the t3e for benchmarking some numerical application. The batch queue structure seems to make this benchmarking difficult. Benchmarking may not be a major issue for applications that are run in production mode.  

Fun. I wish there were more thorough low-level docs available, but that's difficult.

 

 


Comments on NERSC's Cray PVP Cluster: 18 responses

 

 

 

Too slow / obsolete:
 

The J90/SV1 cluster has never provided the performance of the C90. It was obsolete before it was purchased. A replacement needs to be purchased. I have projects running on this system that should have been finished 3 years ago. [...]

I find interactivity (e.g. compiling) on Killeen is 3-5X slower than on the MPP platforms. I generally now avoid running on Killeen if possible

It seems that it just isn't all that fast compared to my desktop Linux box; the IMSL & NCAR libraries are the main thing.

Faster (clock time) to run PIC code on desktop machine. No support for Python problems.

Not using; no real reason to use since desktop machines are now powerful and cheap. Having it go away at the end of this FY will not be a big loss.

What is Killeen? Whether it is a T3E or a PVP Cluster, it seems to do what I want, but I wish it were 10 times faster.  

 

Good machine / useful features:
 

They are essential to run some of the invaluable legacy codes I need. native double precision is a big help. some i/o cray features are also essential, along with some libraries

[...] he IMSL & NCAR libraries are the main thing.

Great cluster! Always seems to be space and runs efficiently.

Mr. David Turner and others in the USERS GROUP are most helpful and sympathetic to the needs of PVP users and I personally wish to thank them for their excellent support and cooperation which has made NERSC the most user-friendly supercomputing facility for superior scientific research in areas related to the mission of the DOE, USA.

I am pretty uninformed. the PVP Cluster is like a black box into which I drop problems. I am quite satisfied with the way it provides results. I actually am quite satisfied with the batch job wait time. Except for the time before the end of the fiscal year. With 400-500 jobs in the queue the wait can be pretty awesome.  

 

Provide longer queues / fewer checkpoints:
 

The time in the queue is far too short for today's applications - this is I know similar to other facilities. It means jobs must often be stopped and started and while most information can be stored in is somewhat frustrating that a complete calculation must take many many submits.

Better real time limit.

To many system checkpoints which results in Gaussian failures.  

 

Improve service for big memory jobs:

My annual mantra BIG MEMORY jobs. Now, more than ever, these are the codes the PVP cluster should be targeted for.

I run mostly highly vectorized large memory production runs (230 or 450MW). According to the CPU time limit, the full time evolution for one model requires a sequence ~10-20 single jobs, each depending on the results of the previous one. However, the batch job wait time for large memory jobs is highly unpredictable. If there is not accidentally a job of the same size that quits in the right moment, the job appears to be held in most cases for more than a week, while later submitted smaller jobs continuously refill the machines. cqstatl -f gives detailed information about submitted jobs. Sometimes, it does not list long-pending jobs anymore that are listed in cqstatl -a.    

 

Improve turnaround time:

The batch queue waiting time is intolerably long - I only use this machine as a last resort, and it's a pleasant surprise when anything finishes.

 I actually am quite satisfied with the batch job wait time. Except for the time before the end of the fiscal year. With 400-500 jobs in the queue the wait can be pretty awesome.  

 

More inodes / more disk:
 

[...] File system inode quotas are ridiculous. More disk space is also needed.

Disk space and inode availability is becoming a significant headache here, almost preventing useful work.  

 

Don't use much:

only used for testing

 

 


 

Comments on NERSC's HPSS Storage System:   29 responses

 

Good system:

extremely good system. Much faster than SDSC's version (I don't know why).

things were great until recent problems using hsi from seaborg. I like the unix like interface on hsi. have not used pftp

Great. I hope hsi gets fixed soon

This is the main system that I access so that I can retrieve the NCEP Reanalysis II weather products. I then run some scripts and programs on killeen to cut the files down to the variables I need. Finally I ftp the data to our Sun. Overall I'm happy with the response!  

Apart from occasional times when it goes down, it is usually excellent.

The system is excellent

easy and reliable.

It is a superb storage system, and is managed by exceptionally qualified professionals. Congratulations!

PCMDI is a large user of the HPSS to distribute climate data to a wider community. Performance has been excellent. I am especially pleased with HSI.

Great connection - super fast AND the ftp back 'home' speed is speed racer.  

 

Hard to use / software problems:
 

HSI is somewhat awkward, but does the job.

We have found that LINUX ftp does not generate the information that HPSS needs to properly archive the data... but only after we stored ~5 TBytes of data. We have trouble accessing our data. [...]

I am storing and retrieving larger and larger files as MPP hardware evolves and this is not becoming easier.

I could use a tutorial about which user interface to use in various circumstances.  

 

Authentication / password issues:

It's annoying that this uses a different password to seaborg, mcurie ......  

I don't understand why I have to use a separate login password to get to hsi. Other computer centers I work at don't seem to require this.

I'd be more satisfied if I reliably remembered by password, or if it were automounted.  

 

Need expanded functionality:

I would like to see hsi available on linux arch.

Need high performance interface from outside NERSC and LBNL, ie ESnet sites! Need support for Globus Grid tools and authentication!

[...] These days ftp on linux is a security risk so more and more systems do not run the server... In a year or two, we need to convert to a secure file transfer system.  

 

Don't like the down times:

I don't like the Tuesday 10-12 a.m. downtimes. It's generally just when I've gotten settled in and started to work for the day. An hour at lunchtime would be better.

Having the weekly downtime in the middle of a work day, although understandable from a staffing perspective, can be annoying. If most of NERSC's users are based in the US perhaps having it in the late afternoon on the West Coast would affect fewer users.

It always seems like I need to access data on Tuesdays when the storage system is down. Is there any way that the weekly maintenance on HPSS could be moved to the evening?  

 

Don't like the SRU accounting:
 

The SRU system, which includes transfer charges, appears redundant for projects with small IO requirements [say 100GB]  

 

Performance improvements:

maybe should be faster

 

 

Don't use / don't need:
 

I don't know what this is.

This is not a major concern for us.

We are not production users, so we do not have huge data sets.

We don't need it

I have tended to not use HPSS with hsi, pftp or ftp. Not because of any problem with these interfaces. I just have not informed myself or felt the need for them.

 

 


Comments about NERSC's auxiliary servers:   5 responses

 

 
Slowness of network connections still limits the usefulness of these servers (especially Escher) to remote users. Those who can still rely on local workstations for most of this type of work.

i use escher mainly for access to software like IDL

the application "mediaconvert" takes over 5 minutes to load and it is painfully slow to use. Could this be remedied?

The IDL program that I use to view results from simulations run on the IBM SP works poorly at NERSC due to the older version available there (5.3 vs the current 5.4). Keeping software current would be helpful.

 

 

 

Software

 

Legend:  

   
SatisfactionAverage Score
Very Satisfied 6.5 - 7
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4
 
     
Significance of Change
significant increase
significant decrease
not significant
 

 

Satisfaction with Software:

 

                       
TopicNo. of
Responses
Average
Score
 Std. Dev.Change
from 2000
T3E Fortran Compilers 59 6.53 0.57 0.13
PVP Fortran Compilers 54 6.48 0.61 -0.18
PVP Libraries 36 6.47 0.77 0.66
PVP User Environment 57 6.28 0.88 0.03
SP Fortran Compilers 65 6.26 0.99 0.30
SP Libraries 53 6.15 0.99 0.15
PVP Local Documentation 44 6.09 1.03 -0.29
PVP Applications 33 6.06 0.97 0.23
T3E Libraries 53 6.04 1.07 -0.14
SP User Environment 71 6.00 1.08 -0.07
T3E User Environment 73 6.00 1.14 -0.18
PVP General Tools and Utilities 37 5.97 1.04 0.04
SP C/C++ Compilers 35 5.97 1.10 0.25
SP Local Documentation 59 5.97 1.00 -0.08
PVP Bug Resolution 30 5.93 1.28 0.83
T3E Local Documentation 57 5.88 0.87 -0.12
PVP C/C++ Compilers 23 5.87 1.22 -0.13
T3E C/C++ Compilers 36 5.86 1.22 -0.07
PVP Performance and Debugging Tools 36 5.78 1.05 -0.28
T3E Performance and Debugging Tools 52 5.63 1.31 0.07
PVP Vendor Documentation 32 5.62 1.21 -0.07
T3E General Tools and Utilities 47 5.55 1.10 -0.10
SP Applications 38 5.55 1.25 -0.12
SP General Tools and Utilities 46 5.52 1.13 -0.20
T3E Bug Resolution 29 5.52 1.33 -0.18
SP Bug Resolution 34 5.44 1.21 -0.01
T3E Vendor Documentation 37 5.41 1.32 -0.18
T3E Applications 31 5.39 1.43 -0.39
SP Vendor Documentation 47 5.32 1.20 -0.18
SP Performance and Debugging Tools 53 5.00 1.69 0.31
 

 

 

 

Comments about Software:   13 responses

   

17   Tools, utilities, and bug resolution
6   Compilers
5   Libraries
4   General software comments

   

 

 

Tools, utilities, and debugging:

It would be nice to have some info on debugging on the web site (not sure this is the case at the moment).

  Need a good debugger on the Cray - totalview sucks (very flaky, gives wrong answers, crashes frequently, etc) [...]

The totalview debugger is great when it works, but it is unreliable (at least on the T3E and PVP).

better debuggers for Seaborg needed!!!!!!

NERSC response:  We try to maintain the best debuggers available.  Totalview is, despite its failings, the best debugger we have.  Software requests may be made via our online form. NERSC consultants try to keep abreast of the latest software developments,  including new debuggers. Feel free to contact us with suggestions for new debuggers or bug reports  for our current ones.

The lack of an interactive queue on seaborg makes debugging, via, for instance, totalview, impossible.  

NERSC response:   Indeed, this was a problem until recently. We have expanded the resources directed at debug and interactive needs. There are now resources dedicated to interactive/debug work during the day.  See    NERSC Queue Policies on the SP    for more information.  

   

bug resolution usually requires help from consultants

I want to learn to use more performance analysis tools, but just haven't had the chance. The debugging has been fairly straightforward, at least for the simple things I ask it to do.

All of my dissatisfaction is due to the lack of proactive acquisition, user training, and the availability of a suitable partition for the in-depth performance analysis of large production codes on seaborg. I would like to learn and be able to get hardware performance counter information and MPI performance information for PCM. I have tried to do this at the NCAR site using the available IBM documentation. It has been an extremely frustrating and daunting task. I think that NERSC has the resources to provide this capability and training......

 

NERSC response:  Hardware performance monitoring on the SP is somewhat underdeveloped.  Two API's exists for gathering HPM,  PAPI and  PMAPI.  Neither of these APIs provide HPM  data for parallel applications without code modification and recompiling.  Also, neither provides a mechanism for aggregating HPM data  from all of the tasks in the parallel application.  We will continue to search for ways to make instrumenting code for HPM easier.

A tar with the -z option (like GNU\'s) would be good. Some of the GNU tools seem to run (extremely* slowly for no apparent reason. [...] Using Xwindows from here is so slow as to be nearly useless. I really need a tool that will run anywhere to translate Cray binary files into more conventional formats.

 

NERSC response:    The GNU version of tar is available in the GNU module.  NERSC is working with IBM's xlf compiler group to enable reading of Cray unformatted binary data on the SP.  When this functionality is available it will be announced to users.

    NERSC supports a lot of neat DOE software, and also serves the function of identifying what part of that software is ready to be supported big-time.

The support work you do is outstanding. Some improvements could be made in how quickly new releases of supported software are installed, e.g. petsc.

The default SP account configuration leaves all customization *and usability creation* to the user. Please, give users default shells/configurations with command line histories etc. This is the 21st century. NERSC also appears to be continuing its unofficial shell war - no/little bash support - and while there are many technical reasons for not liking it, the shell is the primary interface through which users use the machines. Please, at least provide some example "comfortable" configurations on the web page. Keeping load on interactive nodes down is presumably a consideration, but this should not be at the expense of usability. Sorry for the rant - but by default the SP has an interface from the 1980s.

NERSC response:   Good suggestions! Sorry if it seems like we are playing favorites with shells, but the issue is really software support. We have recently allowed users to choose tcsh (and soon bash!) as their default login shell. While there is still no official software support available, we will make our best effort to accommodate shells requested by users.
We try to make the default configurations for all shells reasonably comfortable. Specific suggestions are welcome. Realize also that we prefer to use modules to provide functionalities which not everyone might be interested in. E.g., the GNU module provides color ls under bash, something that not all bash users might want.

 

More general tools could be helpful. Compression utility on IBMSP would allow large file transfer faster.

It would be very useful for me to have some simple visualization software on the IBM SP (seaborg), for example, gnuplot, because it is very helpful for me to be able to take a quick look at the data, without having to transfer large files out.

NERSC response:  A gnuplot module has been added.  

I would love to see vim installed. It's the best programming editor around.

 

NERSC response:  The vim and gvim editors have been on  seaborg for some time.  Everyone should have these in their path.  NERSC has added a vim  module as well to increase its visibility. Likewise you can alias  vi to vim.  

I need some support when there are Python problems.

 

NERSC response:  Feel free to contact NERSC consultants with Python questions.

update emacs to xemacs.

NERSC response:    Looking into this.

 

 

Compilers:

[...] Cray cc also has many annoying quirks (e.g. no support for 16-bit data types) - it would be nice to have other options gcc has just been ported to the T3E - put it on mcurie!!!

The Cray C++ compiler is not very good, and KCC is too slow. The IBM xlC compiler seems to be better.

I wish some vendor had a compiling Common Lisp for either of the useful machines. sigh... Other high-level languages would be useful (Haskell, OCaml, etc), as well.

The only thing that really matters to me is ANSI C++ compiler with KAI preprocessor. NERSC does reasonable well in this regard. My group my have specific comments.  

[...] linking my F90 code seems to take a very, very long time (up to 8 min.); linking the same code here takes ~8 seconds The Cray f90's compilers idiosyncrasies are such that having the GNU gcc/g77 compiler would be handy. [...]

NERSC response:  We have not been able to compile gcc under unicos, but will investigate the tip above about a recent port. We agree that having multiple sets of compilers is  useful. Requests for new languages can be made through the online software request form. Python was made available by such a request.  

   

 

Libraries:
 

Well... IBM is known for its great hardware, not software. 64-bit MPI has been part of SGI software for several years by now and it works very well. Same thing for the profiling tools. Why can't IBM figure it out? NERSC should be more daring at getting beta-level software that the users could try out... as long as that software doesn't crash the computer though.

I will like to do compiling with 64-bit addressing for my high resolution simulation of climate. I hope it will be available on IBM SP soon

 

NERSC response:  NERSC has added the mpi64 module which provides this beta software.  Sorry for the delay, our highly unique dual adapter switch topology conflicted  with this software.  This conflict was recently resolved and the module is now available.  We look forward to the non-beta release of this software.  

more documentation online about fortran libraries ..

NERSC response:    NERSC maintains local copies of most IBM documentation Fortran, C,  SP related subjects from    usgibm.nersc.gov  . Our main site hpcf.nersc.gov provides a great deal of  information and where appropriate links to usgibm. Please make the most of  these resources and feel free to  request additions or changes.

I'm using all the time the BSP library and the ARPACK package. It would be nice if they were installed at NERSC

 

NERSC response:  We are looking into this. It would help us to have more details a  written into a software request.

It would be nice to install LAPACK95 on the Cray PVP. Right now there is only the ancient LAPACK77. Because we like to have PORTABLE codes, we had to install and compile LAPACK95 ourselves in our user directory. The consultants did not seem to understand why we did not want to make calls to LAPACK77 directly (which would destroy code PORTABILITY).

NCAR/NCL never adequately replaced DISSPLA, after several years of bad experience  

 

General software comments:

Again, I would like to be directly informed when a bug I report is resolved. It usually doesn't happen or I have to crawl through a long doc to see if it is addressed

Again, my postdoc has had to grapple with software problems.

Documentation lags reality.

I need training! I am mostly learning from the previous user, but I feel I could be a lot more productive with better training

NERSC response:  We try our best to maintain both software and documentation.  Please  let us know    if you are have trouble or find out of date information.  Look for announcements regarding NERSC training coming soon!

 

 

Other:

 

I use PDSF. Software maintenance is excellent, we have what we need and upgrades and bug fixes are done intelligently, carefully, and in a timely way.

Never used! Is PDSF a part of NERSC or not!

PDSF isn't even included in this portion of the survey. By now PDSF has a fairly large user community; it should at least be on the survey.

 

NERSC response:  PDSF is part of NERSC. User support for PDSF and HPCF are in the  process of being integrated. PDSF is being considered for next year's survey.

Keep the SV's around. Find a suitable replacement soon

 

NERSC response:  We are carefully looking into our forward direction in regard to the PVPs.  The NERSC PVP roadmap is aiming towards as smooth a transition as possible to appropriate hardware when the PVPs are retired.  

 

 

Satisfied:

Adequate software resources for my purpose

I don't use too many. I am satisfied.  

 

Don't use:

I have only used software services to a bare minimum.

  not using the resources; just plugging away!  

Training

 

Legend:  

   
SatisfactionAverage Score
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4
 
     
ImportanceAverage Score
Very Important 2.5 - 3
Somewhat Important 1.5 - 2.4
 
 
Significance of Change
significant decrease
not significant
 

 

Satisfaction with Training:

 

     
TopicNo. of
Responses
Average
Score
 Std. Dev.Change
from 2000
NERSC Online Tutorials 75 5.97 1.04 -0.25
NERSC Classes 24 5.54 1.32 -1.17
Slides from classes 33 5.15 1.37 -0.98
NERSC Teleconferences 19 5.11 1.24 -0.89
 

 

How Useful are these resources for training in HPC?

 

     
TopicNo. of
Responses
Average
Score
 Std. Dev.Change
from 2000
Online Tutorials 66 2.55 0.68 -0.07
Slides from classes 40 2.10 0.81 -0.33
Classes 41 2.07 0.82 -0.60
Teleconference classes 31 1.61 0.76 -0.52
 

   

Comments about training:   25 responses

   

11   Haven't used
5   Offer more training in general
4   Offer more online tutorials
4   Satisfied / online documentation is adequate

 

 

 

Haven't used:

You are too far away for us to use your training.

I haven't used. Information about these is not well advertised.

haven't taken advantage of this at nersc but I think it's great the options are there!

Never went to these... what type of audience are they for?

Have never used, since they do not cover HPSS and PDSF!!!!

  not used

Haven't participated in training.

I don't use any of this.

i didn't know

I have not used the training services.

I haven't done any.  

 

Offer more training in general:
 

Slides from presentations are usually not very useful for somebody who didn't attend the course. I would like to see more teleconference lectures since my lab would probably not send me to Berkeley more than once or twice a year... I do agree that attending a class in person and talking directly to the instructor is MUCH better than teleconference though.

All of my dissatisfaction is due to the lack of proactive acquisition, user training, and the availability of a suitable partition for the in-depth performance analysis of large production codes on seaborg. I would like to learn and be able to get hardware performance counter information and MPI performance information for PCM. I have tried to do this at the NCAR site using the available IBM documentation. It has been an extremely frustrating and daunting task. I think that NERSC has the resources to provide this capability and training......

I need info on how to get trained so I can be better productive

Parallelization (to enable the transition for users from serial to parallel versions, more easily). Visualization (handling huge amounts of data)

Offer more training. Similar to what is going on right now (Oct. 10-13), but at a time when I can make it!  

 

Offer more online tutorials:

  Online web tutorials should be increased and made available to public

perhaps more online tutorials on using MPP's more effectively

Improve tutorials. Focus on the scientist who isn't a trained Computer Scientist

Web tutorials!  

 

Satisfied / online documentation is adequate:

Have not participated in training. We did try one teleconference but reading online documents seemed to suit us better.

I've found your web tutorials are very useful. FAQ's are especially useful, as many users make the same mistakes  

Nothing beats a well written and easy to find manual

Two of my students have attended ACTS 2000 and found it very useful, so I am happy though I have not attended.  

 

Other:

Again PDSF is a special case. Web-based documentation could improve but that's always true, the key people are too busy making the system work well.  

   

User Services

 

Legend:  

   
SatisfactionAverage Score
Very Satisfied 6.5 - 7
Mostly Satisfied 5.5 - 6.4
 
     
Significance of Change
not significant
 

 

Satisfaction with User Services:

 

     
TopicNo. of
Responses
Average
Score
 Std. Dev.Change
from 2000
Timely response 155 6.56 0.81 -0.07
Account Support services 158 6.53 0.89 0.00
Technical advice 151 6.46 0.81 -0.03
Followup to initial questions 125 6.37 1.00 -0.05
Response to special requests 113 6.23 1.10 0.07
 

 

 

 

Comments about Consulting and Account Support:   24 responses

     

15   Good service
4   Improve follow-up / speed of response
1   Mixed evaluation

   

 

 

Good service:

  Mostly NERSC consulting and Account support stands head and shoulders above that of the other Supercomputer Centers.

keep them happy, both pay and workplace environment-wise!

What a great group! Everyone should get a week in Hawaii!

The consultants are outstanding. I have computed at many centers, and the NERSC consultants are the best. They are very knowledgeable and very responsive.

Great team of consultants at NERSC!!

Best of any of the centers I have used

Your consultants have been great. They had the patience to go sift through my nasty old code.

Surprisingly good and patient responsiveness to users.

These professionals are doing excellent! Sincerest thanks and heartiests congratulations on excellent job done.  

I had some teething problems regarding how to use killeen and the mass storage system when I first opened my account. The NERSC support people helped me in an efficient and timely manner.

This service - primarily the consulting service - has been absolutely invaluable and top notch. Thank you very much. For 20 years this has been outstanding help.

The quality of your consulting and account services has always been and continues to be extremely good.

very good

The support is great

I have had nothing but good experiences with consulting and support staffs.  

 

Improve follow-up / speed of response:
 

I know, most of our questions don't have an easy answer and usually require some serious work and testing. They can be easily put on the back burner when more serious day to day problems arise. However, it would be nice to know once in a while that these questions are still being worked on without us having to harass the consultants (although the users themselves usually forget the original question that they asked...).

I work in a large collaboration (STAR). It has happened that a week or more has passed between a new request for a STAR-related account (new user) and the granting of this request. Seems to me that this could be sped up, though in general it is not a big deal.

All of the problems I've encountered were eventually resolved thank to the consultants, but not always quickly.

Account support service is a little busy.

I'm very satisfied overall with the user services, though sometimes there is a little lag between requesting an account an getting it started.  

 

Mixed evaluation:

Not always the most friendly people, but always helpful and responsive.    

 

Other:

Can not change HPSS password, except by telephone!

Abolish inode quotas!!!

The unwillingness of the administrators to remove the 8-hour real time limit on gseaborg is highly unreasonable. Let users use their CPU time as they please in order to do their job.  

   

Comments about NERSC

 

 

What does NERSC do well?   69 responses

     

35   Stable, well managed production environment / good hardware
31   User support
10   Documentation announcements
9   Everything
6   Software, tools
4   Storage environment
3   Other

 

What should NERSC do differently?   50 responses

       

10   Provide more cycles / improve turnaround time
9   Software enhancements
8   Provide different hardware / keep PVP cluster
6   Better documentation
5   Provide more training
4   File storage improvements
3   Manage systems differently
3   Better interactive services
3   Longer batch queues
3   No need for change
2   Accounting / allocations improvements
2   Authentication / password improvements
2   Networking improvements

 

How does NERSC compare to other centers you have used?   49 responses

   

28   NERSC is the best / better than
8   NERSC is the same as / mixed response
5   NERSC is good / only use NERSC
4   NERSC is less good
4   No comparison made

 

 


What does NERSC do well?   67 responses

 

 

 

Stable, well managed production environment / good hardware:

The availability of the hardware is highly predictable and appears to be managed in an outstanding way.

  It provides a reliable professionaly managed computing resource, which is greatly appreciated. I have had little problems with machines not working properly, which given the flakey nature of parallel machines is very impressive.

Processors provided are reliable. Job scheduling is fair.

provides stable computing environment. [...] uninterrupted service.

NERSC makes it possible for our group to do simulations on a scale that would otherwise be unaffordable.

provide stable and reliable hardware and software for supercomputing.

Provide both capacity and capability. [...]

Delivering capacity and capability. [...]

Having a focal point for DOE computational research in the university community is useful. Of course, having the raw resources is very useful. Having them available to non-US citizens is special among the DOE large-scale facilities.

I have only good experience with running series of jobs on the vector machines. The system is very reliable, fortran compilers cause no problems. Very good place to produce results. In the future I would also like to use NERSC T3E machine.  

NERSC have moved agressively to provide high end computing services to users. I hope that it will continue to do so, as I expect user needs to continue to grow rapidly. I know mine will. NERSC runs its machines very well, [...]

Lots of Computers and Disk

Provide me with access to Crays.

Keep the machines going [...]

Keep good access to the fastest boxes available. Being a little bit undersubscribed is good. It means real work can be done, not just the background stuff.

Provides high-performance computing hardware, [...]

Changes are occurring rapidly in all areas. It sometimes is difficult to keep up with everything and still do research! But certainly the advances in the last few years have been substantial. With a few years of stability, much can be achieved.

You operate faster computers than exist at GA

  [...] provides state-of-the-art hardware

Running machines. ;)

lots of cpu cycles; [...]

providing tremendous computing power

Supercomputer support! But useless to me! [PDSF user]

Hardware, [...]

NERSC has significant computing and data-storage. [...]

Provides exellent computational resources

[...] and efficient use of resources.  

usability, uptime, [...]

Provides a reliable production service.

Have an ideal configuration for "seaborg" and [...]

Provides a reliable and very useable high performance computing power.

Building and maintenance of high performance computers.

Massive computing [...]

Good system support

Keep the computers running smoothly. [...]  

 

User support:

Consulting Service. Excellent!

Consulting by telephone and e-mail. Listens to users, and tries to setup systems to satisfy users and not some managerial idea of how we should compute

[...] I have also found the user support staff to be very helpful and responsive (I particularly appreciate how rapidly they respond to both e-mails and phone calls.)

[...] This combined with strong consulting has been a tremendous resource for my research.

Provides computing resources in a manner that makes it easy for the user. NERSC is well run and makes the effort of putting the users first, in stark contrast to many other computer centers.

[...] and respond to our needs promptly and in a fully satisfactory manner.  

They are always there to help you out.

[...] is responsive to users, and provides outstanding consulting services.

[...] consultant help

[...] excellent support. [...]

[...] consulting.

Your consultants and [...]  is very, very good.  [...]

Provides exellent [...] and support

Consulting and [...]

  support is very well done. [...]

Accounting operations and support is superb.

[...] user support

account support is exception, with prompt responses.

Phone support.

[...] Timely and effective help [...]

[...] Providing first rate consulting services. Providing first rate training.

Supports a large community

Interacting with and helping its users.  

Good [...] and consulting. NERSC responds well to users needs.

Consulting and account services are very helpful whenever I have a problem.

[...] Generally responsive and proactive to user needs.

consulting

Consulting support is excellent and offered in a timely manner.

Communicate!

nterface with users is excellent. Francesca and her people do a great job. Account support is also great. Senior management (Simon and Kramer are excellent)

Consulting  

 

Documentation / announcements:

[...] Very good web site with well arranged information

The web page, hpcf.nersc.gov, is well structured and complete. Also, information about scheduled down times is reliable and useful.

Training and information on web pages are excellent.

Web management, especially NIM is great advantage for the users.

[...] I find the web documentation very good too.  Incidentally, these are the only recourses I use, so I cannot comment on anything else.

[...] good documentation, [...]  

the NERSC web pages [...]

[...] Their web site is especially a great place.

[...] your web site is informative.

[...] Maintain a very good and useful web site.  

 

Everything:

It is great that NERSC does everything related to super computing great! [...]

  As to me - NERSC all does well.

Most parts, including hardware and software capabilities, online documentation and information.

NERSC is great at support, web pages, and keeping well equipped machines running efficiently and with good software that are responsive. When machines are up, it is quite pleasant to work on NERSC facilities.

PDSF is a close to a perfectly run facility as I have ever experienced. Clear strategic planning, highly competent technical support, intelligent management. Don't change it.

Pretty much everything I need. In particular, compilers (f90), running interactive and in batch. Storage works very well for me and seem very reliable.

Nearly every thing.

Most things are fine.

NERSC is state-of the art second to none supercomputing facility available to thousands of scientists all over the world. To run such a complex operation smoothly and efficiently requires at least dedication, intelligence and adeptness in public relation management. Horst Simon and his colleagues at NERSC deserve heartfelt thanks from thousands of scientists all over the world who not only have used the supercomputing facilities at NERSC but enjoyed using this most user friendly supercomputing facility . Need anymore be said? I feel very assured that the management of NERSC in the best of hands and I look forward to continue my research with great joy in the FY 2002 at NERSC.  

 

Software, tools:

[...] Keep up to date the all the software. [...]

[...] excellent software maintenance

[...] and choice of libraries is very, very good. [...]

[...] reasonable software support

[...] generally good software support

[...] software, [...]  

 

Storage environment:

The mass store system is fast and easy to use. [...]

[...] Your archiving system is pretty good, but somewhat slow.

reliable file storage on hpss, ease of access to stored data

[...] allows mass storage  

 

Other:
 

This is too long. Anyway, I have only recently gotten an account and haven't used it much yet. I will have more opinions once I do.

Have only just gotten my account, can't really comment on any of the services/facilities at this point.

I only got my account about 2 weeks ago and have no experience using the machine yet. I believe that it is going to be great but I lack the information to complete a meaningful response to your survey and have thus left most questions blank.

   


 

What should NERSC do differently?   48 responses

 

Provide more cycles / improve turnaround time:

Don't become oversubscribed. I'm worried that SciDAC will push for oversubscription, please don't go there.

Get more hardware. DOE is falling way behind NSF.

My only partial dissatisfaction is caused by the long batch job wait time for large memory jobs on the PVP cluster.

more resources

Batch job waiting period sometimes is too long.

Increase the capability of its computing facilities even more rapidly than it is presently doing.

  Stop oversubscribing machines. If this means limiting quotas, so be it.

No complaints. It would be nice if the batch queues moved faster.

More hours available ... (I'm joking)

Keep the machines up longer.  

 

Software enhancements:

Push Cray harder to support certain software: g++, C++ STL, gdb (totalview is very good, but sometimes has problems.)

I would appreciate having netCDF tools (nco) available on killeen.  

[...]Should support commercial packages such as Code Warrior.

[...] better debuggers, more compilers to offer choices

Install zsh, please.

more debugging and optimization support for MPP platforms like seaborg

[...]  also better debug tools. It might also be useful to have access to applications such as electronic structure codes, MD codes, quantum chemistry codes, etc.

[...] I would like the GNU gcc/g77 compiler on the Cray PVP's; if only to test against some of the Cray f90's idiosyncracies. I've found that Cray binary files are *very* difficult to read elsewhere; a standalone utility (whose source can be exported anywhere - not run only on the Cray) to translate them into more common forms is needed. I was unable to find any useful documentation on the *detailed* structure of the Cray binary files.

This may be something that you already have. I find totalview to be slow since I am always passing graphic information back an forth from here in Michigan. If there is a text only debugger this would be useful to me.  

 

Provide different hardware / keep PVP cluster:

I want something 10 times faster than Killeen but not MPP

[...] Find a replacement for the PVP systems.

More access to capability machines that let long jobs of 32-64 pes go for 8 hours or more. Although many applications can use a lot of processors, science studies often ramp up and down in size as one walks through parameter spaces. Having a complement of smaller parallel machines to match the big one is very useful. These smaller machines do not need to scale much past 64 pes.

It just seems that your (Cray PVP) CPU power hasn't stayed up there w.r.t. PC's. [...]

Supply mid-level computing resources to people to take the load off of the SP3 and T3E. I get the sense many people only need something which could be handled by a Beowolf cluster.

Keep capacity engines around

The PVP cluster should not go away.  

Provide both capacity and capability. [...]  

 

Better documentation:

Better maintained NERSC online consulting answers page - perhaps a more comprehensive FAQ type page. [...]

[...] I've had lots of odd problems with the batch system; I've tried to follow the web directions, but still can't seem to get it right. [...]

Batch queue structure improved or explained in more detail

Better indexing of the sprawling website. Finding, e.g. compiler options or queue limits takes some knowledge.

  Better organization of web based documentation and tutorials

orient webpage more towards beginners to supercomputing with detailed discussion of issues such as optimization etc.

 

 

Provide more training:

Training for the users far from the site would be benefitial.

SAS should have the online tutorial installed; need better training in nersc usage

I'd like to see better online training tools, [...]

  More training classes so that I can effectively use my NERSC time. I'm constantly worried that I'm wasting MPP time with memory leaks, code inefficiencies and the like.

How about having some of NERSC's people involved in profiling and performance tuning of some of the major codes running on Seaborg? I really liked the workshop with the ACTC guys a year and a half ago, where we learned many important details about performance tuning on the SP. I think it would be time for an updated version of this workshop...  

 

File storage improvements:

[...]  Sort out the mess of different home user space on the SP3 and mcurie....

Be more liberal with disk allocations. Chose file systems with better ratios of inodes to disk space -- 1 inode per 5-10 Kbytes would be ideal. [...]

[...]  Some system of notification of when temporary space is to be cleared.

Get rid of some weird limits (number of files?)    

 

Manage systems differently:

NERSC needs to improve their productivity at accepting new hardware; they take way too long at doing this. In spite of the time they take, the hardware is often unstable after it is released (for example, the experiences with early Seaborg use and the addition of the extra 50 nodes have not been good, i.e., the whole machine has been down too much, users have been able to log in, but find that they cannot access their files, etc.). [...]

NERSC appears to be chasing big money and large initiatives. Where these are consistent with the support of a big center, they do not necessarily lead to an efficient, flexible computing environment. Strategic investment in small/mid sized efforts can be as important to the success of NERSC as a few large projects. I would increase the LDRD budget for serious scientific pursuit.

Provide more support for using PDSF and HPSS for outside users.  

 

Better interactive services:
 

better debugging & development support - interactive jobs with fast response time, [...]

Improve interactive response of the machines. Even better, provide some facility for large-scale interactive jobs that would allow direct user control during the run. [...]

[...] Your [PVP] interactive stuff is pretty good, but trying to actually use it from here (on the east coast) is not really practical- it is just too slow to use e asily. I now try to do all code development locally. [...]  

 

Longer batch queues:

[...]  Lengthen the wall-clock time-limit for jobs.

  Expand real time limits.

Work aggressively in getting a checkpointing scheduler on the SP similar to that on the T3e. This should allow for longer queues to be run and make more efficient use of the machine.  

 

No need for change:

Can't think of anything.

Nothing I can think of now.

More of the same.  

 

Accounting / allocations improvements:

[...] The NERSC allocation process needs to be streamlined. The ERCAP form asks for too much overlapping information. It's often difficult for reviewers to evaluate what is written because of too much detail, and information overload over too many proposals. Perhaps all of this is needed to justify NERSC resources to the outside world, but it seems like it uses up more human time (which is also costly) than is productive.

Not charge 2.5X on seaborg for a "regular" job.  

 

Authentication / password improvements:

Sort out the mess where I need to rememeber 3 or 4 different passwords and change them at different times on different machines. [...]

  The expiry of passwords did create some problem.  

 

Networking improvements:

Does not interface PC's and Mac's with Nersc. Too concerned with perfect security to support network access. Need to have a windows interface to NERSC. [...]

Offer a 6to4 gateway? Not much NERSC specific can do... I'm rather disatisfied with the general state of high-performance computing, but that's why I'm in research.  

 

Other:
 

[...] Have a spell checker for the comments boxes.

 A bug was discovered in the CRAY FFT subroutines (scfft2d/csfft2d), this is not very good. The bug is not fixed up till now (i.e. 11/1/2001, it's more than 6 months), this is bad. As far as I know, there seems to be no announcement to warn users about the bug, this is ridiculous!

   


How does NERSC compare to other centers you have used?   49 responses

 

NERSC is the best / better than:
 

The best. Rarely do I bother with others anymore.

I'm using sdsc, texac acc, ornl (ccs) and they are all top-notch but I think the nersc nim is a great tool and I think the nersc web site is more timely. also, the motd on the nersc machines is more informative.

Top of the heap. SDSC is the main other, but I've used other NPACI centers.

NERSC is better than NPACI in the following regards.  Better access to more competent consultants. Listens more to users in making policies. Supplies more permanent disk space on its systems. Allows remote access to HPSS.

More up-to-date webpages than the ASCI platform webpages.

You guys are infinitely better than [name omitted]. Everytime I visit [that center], I want to send you flowers.

In recent years I have computed at the San Diego Supercomputer Center, The National Center for Supercomputer Applications, The Pittsburgh Supercomputer Center, the Cornell Theory Center, Oak Ridge National Laboratory, Los Alamos National Laboratory, and smaller centers at Boston University and the University of New Mexico. I rate NERSC at the very top of this list.

Well  

The mass store system is excellent compared to NCAR's.

It is the best of all centers I have used so far.

Allocation process and utility in NERSC are better than NCSC.

NERSC is more egalitarian than LLNL as one has to have the right connections to get any sizeable allotment, whereas NERSC will throw you in with everybody else and if you are persistent you will get enough resources.

Compared to [name omitted], NERSC is a superior place.

It's the best. My group is using SDSC (BH), UGA's UCNS (IBM SP and O2000) and several french centers (e.g., PSMN). NERSC compares very well with all of those. NERSC also has improved in all aspects during the last few years by a HUGE factor, it is now by far the best supercomputer center that we are using. This is in part due to the hardware it has, but also in other respects it has dramatically improved comapred to, say, 5 years ago.

The best I have experienced in 20 years of experimental physics work (am I that old?). The CERN Computer Center is a close second, however.

Los Alamos (1985) NERSC is much more user friendly. I can actually talk to a consultant. Fermilab (1988-1998) NERSC is much more user friendly. The consultants contact me before terminating my jobs or stalling them.

  The up time seems to be greater than at RCF, and the network is more reliable.

NERSC compares well with respect to NPACI centers we have interacted with, and substantially better compared to DoD supercomputing Centers.

The npaci center (www.npaci.edu) has a rather limited web site compared to yours.

Much better than any I've used in the past. SDSC, Cornell, OU

For quantum Monte Carlo, NERSC is the absolute ideal center . It is much more efficient than any of the unclassified machines at Lawrence Livermore National Labs. Both the wait time and efficiency are much better. The charging of time, though is too aggressive

I work previously only at one large computer center (IDRIS in France -) I found that NERSC is much more confortable for computing.

The best! Compared to [name omitted] (a true horror) and NPACI (medium to okay). LLNL has good machines but documentation can be highly elusive.

# 1. I have used supercomputing facilities at Eagan Falls Cray Centre, Mn, USA, IBM, Kingston, N.Y, U.S.A,etc.

by far the best  

NERSC compares well to most other centers

Best all-round. NCAR has capacity (now), but capability has been slipping. ORNL is informal and responsive, but unreliable.

My initial impression is that working at NERSC is going to be much better - more user friendly, better documentation, etc. - than using the ASCI machines at LLNL. Those were my prior experience with large parallel platforms.  

 

NERSC is the same as / mixed response:

User services are of comparable excellence to those provided by DoD supercomputing centers.

I can compare NERSC only to the NIC in Juelich, Germany. NERSC allows me to work on the same systems and use practically the same type of resources. Maybe only the elapsed time for jobs is longer.

  Seems more powerful, but also more complicated, more waiting time, slower interactive response, than the local supercomputer center (at University of Texas at Austin, "TACC") that I also use.

NERSC generally does a good job of serving computational users. Clearly better than [name omitted], probably better than NCSA, and maybe not quite as good as PSC.

NERSC and LLNL LC compare favorably.

NERSC has very powerful hardware compared to other centers, such as NCSA and NPACI San Diego. But the very long wallclock limit in ncsa is useful for some of our calculations.

ARSC is good because until recently it had no restriction on computation time.

I worked for a little while on the computers at SDSC. I would say that both centers are comparable, although some people at SDSC may be doing more exploratory work (for example, they tried out the beta version of the 64-bit MPI on the SP more than a year ago).  

 

NERSC is good / only use NERSC:
 

NERSC seems very well organised!

i only use NERSC so I can't comment

NERSC is a unique resource and provides an essential environment of MPP simulation.

Sorry, you're the only one!

Really good!  

 

NERSC is less good:

  Acceptance of new hardware seems to be a slower process; even when it is accepted it is still sometimes not too stable. Allocation process is too complicated. Takes up too much human time.

i didn't use the maspar much when it was at lbl, but it least it was simple to use if one just wanted the default fortran parallelization. we need something like that for nersc in at least fortran and c/c++

UCSD has the same IBM SPII and it has better communication between nodes. Our program runs faster there beyond 16 nodes (faster by ~ 50% on 64 nodes.)

The resources at NPACI center (SDSC) was much more useful than the ones at NERSC. Its not about how powerful the resources are, but about how much a wide variety is available, from vector computers to massively parallel. The reason i switched from vector computers was the limitation on the # of CPU hrs i could get, but now with SP seaborg, i'm facing a bigger problem of time limit.  

 

No comparison made:

BNL, CERN, JINR, IHEP, LBNL

Jlab's HPC  

EPCC MSCF @ PNNL

BNL/RCF

 

Show Pagination