NERSCPowering Scientific Discovery Since 1974

2002 User Survey Results

Response Summary

Many thanks to the 300 users who responded to this year's User Survey -- this represents the highest response level in the five years we have conducted the survey. The respondents represent all five DOE Science Offices and a variety of home institutions: see User Information.

You can see the FY 2002 User Survey text, in which users rated us on a 7-point satisfaction scale. Some areas were also rated on a 3-point importance scale.

Satisfaction ScoreMeaning
7 Very Satisfied
6 Mostly Satisfied
5 Somewhat Satisfied
4 Neutral
3 Somewhat Dissatisfied
2 Mostly Dissatisfied
1 Very Dissatisfied
 
Importance ScoreMeaning
3 Very Important
2 Somewhat Important
1 Not Important

The survey responses provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve. The survey results are listed below.

Every year we institute changes based on the survey; this past year's efforts include:

  1. With the NERSC User Group we established a queue committee whose task was to investigate queue issues and recommend improvements. This year's rating for SP: queue structure went up by 0.7 points. Based on the committee's recommendations NERSC did the following:
    • Improved debug and interactive turnaround during prime time by setting aside 5% of the SP compute pool for interactive and debug jobs from 5:00 AM to 6:00 PM Pacific Time Monday to Friday. This year's rating for SP: ability to run interactively went up by 0.8 points.
    • Implemented priority aging for regular class jobs: jobs in the regular class for more than 36 hours will not be preempted by new premium jobs.
    • Provided a new regular_long class with a connect time limit of 24 hours for jobs using 32 nodes or less. Such jobs are not drained for system outages so self-checkpointing is very important for regular_long jobs.
    The NUG queue committee recommended to NOT implement serial queues on Seaborg.

     

  2. NERSC provided more performance analysis tools on the SP along with documentation and training on how to use them. See Programming Tools. This year's rating for SP: performance and debugging tools went up by .8 points.

     

  3. NERSC installed new visualization tools on the Vis Server, Escher, as well as on Seaborg, and streamlined visualization documentation. See Visualization Packages. This year's rating for Visualization Services went up by .3 points.

     

  4. NERSC wrote a number of scripts to improve SP management procedures. This year's rating for SP: uptime went up by 1 point, the largest increase in satisfaction of the whole survey.

     

  5. NERSC started to conduct monthly training sessions on the internet using Access Grid Node technology. This technology is still not completely mature and there have been a few rough spots along the way. Satisfaction with training remains at the same level as last year and we will work to improve our training program in the upcoming year.

The average satisfaction scores from this year's survey ranged from a high of 6.6 to a low of 4.8. Areas with the highest user satisfaction were:

  1. SP: uptime
  2. Consulting: timely response
  3. HPSS: reliability
  4. PDSF: uptime

Areas with the lowest user satisfaction were:

  1. PVP: batch wait time
  2. Visualization services
  3. Training

The largest increases in satisfaction came from the SP: 9 of the 18 ratings that were significantly higher this year than last year were SP ratings. Other areas showing significant improvements were the T3E (queue structure, tools and utilities, uptime), visualization services, hardware and software configuration, and the New Users Guide.

Only two areas were rated significantly lower this year: PVP performance and debugging tools, and the allocations process.

92 users answered the question What does NERSC do well?   71 respondents pointed out that NERSC is a well run center with good hardware. 42 singled out User Support and NERSC's staff, 16 NERSC's documentation and 13 job scheduling and batch throughput. Some representative comments are:

    Among the supercomputing facilities I tried until now, NERSC excells in most aspects. I am most satisfied with the overall stability of the system. This must come from the outstanding competence of the technicians.

    I really appreciate the job fron consult. They always did their best to help me to resolve my technique problems, especially at starting to use seaborg.

    The available hardware and software is very good. It meets my needs well. There is an abundance of documentation I have benefited from. Account support has also been very good. I also appreciate the seeming concern about security.

66 users responded to What should NERSC do differently? The following issues were raised and will be addressed in the upcoming year:

  • SP scheduling:
    • Could more resources be devoted to the regular_long class (more nodes, a longer run time, better throughput)?
    • Could longer run time limits be implemented across the board?
    • Could more services be devoted to interactive jobs?
    • Could there be a serial queue?
  • SP software:
    • Could the Unix environment be more user-friendly (e.g. more editors and shells in the default path)?
    • Could there be more data analysis software, including matlab?
  • Computing resources:
    • NERSC needs more computational power overall
    • Could a PVP resource be provided?
    • Could mid-range computing or cluster resources be provided?
  • Documentation:
    • Provide better searching, navigation, organization of the information.
    • Enhance SP documentation.
  • Training:
    • Provide more training on performance analysis, optimization and debugging.
    • Provide more information in the New Users Guide.

  Here are the survey results:

  1. User Information
  2. Overall Satisfaction and Importance
  3. All Satisfaction Questions and Changes from Previous Years
  4. Visualization and Grid Computing
  5. Web, NIM, and Communications
  6. Hardware Resources
  7. Software Resources
  8. Training
  9. User Services
  10. Comments about NERSC

User Information

Number of respondents to the survey: 300

  • Respondents by DOE Office and User Role
  • Respondents by Organization
  • Which NERSC resources do you use?
  • How long have you used NERSC?
  • What desktop systems do you use to connect to NERSC?
  • Web Browser Used to Take Survey
  • Operating System Used to Take Survey

 

Respondents by DOE Office and User Role:

Office Number Percent
ASCR 22 7
BER 43 14
BES 65 22
FES 55 18
HENP 112 37
guests 3 1
 
User Role Number Percent
Principal Investigators 48 16
PI Proxies 34 11
Project Managers 10 3
Users 208 69

 

 

 

Respondents by Organization:

Organization Type Number Percent
Universities 169 56
DOE Labs 107 36
Industry 9 3
Other Govt Labs 10 3
Private Labs 5 2
Organization Number Percent
Berkeley Lab 42 14
UC Berkeley 17 6
Argonne 12 4
Livermore 10 3
PPPL 9 3
Oak Ridge 8 3
U Wisconsin 7 2
UC Santa Cruz 7 2
Brookhaven 6 2
PNNL 6 2
U. Maryland 6 2
NCAR 5 2
Ohio State 5 2
SLAC 5 2
U Colorado 5 2
Auburn U 4 1
Iowa State 4 1
Los Alamos 4 1
Max Planck Inst. Astro. 4 1
Northwestern U. 4 1
U Texas 4 1
U Washington 4 1
UC Davis 4 1
UCLA 4 1
Organization Number
Ames Lab 3
General Atomics 3
NREL 3
U Georgia 3
U Michigan 3
U Minnesota 3
U Utah 3
UC San Diego 3
Vanderbilt 3
Yale 3
Harvard 2
Inst. Nat. de Physique 2
Inst. Nat. di Fisica Nucleare 2
JGI 2
MIT 2
N.C. State 2
New Mexico St. 2
New York Univ 2
Omega-P 2
Purdue 2
Stanford 2
SUNY Stony Brook 2
U. Illinois 2
U. South Carolina 2
UC Irvine 2
UC Santa Barbara 2
Wayne State U. 2
Other University 44
Other Gov. Labs 4
Other Industry 3
Private Labs 1
Other DOE Labs 1

 

 

 

Which NERSC resources do you use?

Note that users did not always check all the resources they use -- compare the table below with How Satisfied are you? (sorted by Number of Responses).

Resource No. of
Responses
SP 210
HPSS 118
T3E 92
NIM 88
Consulting 80
HPCF Website 66
Account Support 55
PVP 53
PDSF 49
Computer Operations and Network Support 18
Alvarez 11
Escher 8
Newton 8

 

 

How long have you used NERSC?

TimeNumber Percent
less than 6 months 53 18
6 months - 3 years 135 46
more than 3 years 104 36

 

 

 

What desktop systems do you use to connect to NERSC?

Operating System TypeNumber
Unix 350
PC 138
MAC 66
Other   3
Individual SystemsNumber
UNIX-linux 189
PC-win2000 74
UNIX-solaris 72
PC-winXP 39
MAC-OSX 36
MAC-macos 30
UNIX-irix 25
PC-win98 21
UNIX-aix 20
UNIX-hpux 20
 
Individual SystemsNumber
UNIX-tru64 14
UNIX-other 12
PC-win95 3
OSF1 2
PC-windows3.0 1
PC-OS/2 1
Digital Alpha 2
Open VMS 1
PalmOS 1
Digital UNIX 1
NCD X terminal 1
FreeBSD 1
Alpha DS10 1
Putty 1
Ultrix 1

 

 

 

Web Browser Used to Take Survey:

BrowserNumberPercentChange in Percent from '01Annual Growth Rate
Netscape 4 129 42.7 -28.1 -40%
MSIE 6 46 15.2 +13.1 +624%
MSIE 5 46 15.2 -3.4 -18%
Mozilla 35 11.6 +7.4 +176%
Netscape 6 25 8.3 +6.6 +388%
Konqueror 8 2.6 +1.8 +225%
Galeon 6 2.0 +2.0  
MSIE 4 2 0.7 -0.1 -12%
Netscape 7 2 0.7    
OmniWeb 2 0.7    
Netscape 3 1 0.3    

All Netscape 3-4 130 43.0 -28.2 -40%
All MSIE 94 31.1 +9.6 +45%
All Mozilla-based 68 22.6 +16.7 +383%
All Other 10 3.3 +2.1 +175%

 

 

 

Operating System Used to Take Survey:

OSNumberPercent
Linux 95 31.5
Windows NT 88 29.1
Macintosh 41 13.6
SunOS 31 10.3
Windows 98 22 7.3
DEC OSF 12 4.0
SGI IRIX 6 2.0
HP-UX 2 0.7
IBM AIX 2 0.7
Unknown X11 1 0.3
OS/2 1 0.3
Windows 95 1 0.3

All UNIX 149 49.3
All Windows 111 36.8
All Macintosh 41 13.6

Overall Satisfaction and Importance

Legend:

Satisfaction 

Average Score 

Mostly Satisfied

5.5 - 6.4

Somewhat Satisfied

4.5 - 5.4

 

 

Importance 

Average Score 

Very Important

2.5 - 3

Somewhat Important

1.5 - 2.4

 

 

Significance of Change 

significant increase

significant decrease

not significant

 

 

 

Overall Satisfaction with NERSC:

Topic 

 No. of Responses

 Average Score

 Std. Dev.

Change from 2001

Account Support Services

240

6.38

0.95

-0.05

Overall satisfaction with NERSC

285

6.32

0.93

0.07

Consulting Services

229

6.30

1.03

-0.00

SW maintenance and configuration

202

6.17

1.03

0.25

HW management and configuration

210

6.10

1.04

0.28

HPCF Website

224

6.09

0.85

-0.09

Network Connectivity

230

6.07

1.15

0.04

Mass Storage Facilities

205

6.04

1.11

-0.01

Available software

243

5.98

1.02

0.17

Available Computing Hardware

242

5.97

1.20

-0.14

Allocation Process

194

5.84

1.16

-0.16

Training

103

4.99

1.23

0.07

Visualization Services

94

4.83

1.24

0.32

 

 

 

Importance to Users:

Topic 

 No. of Responses

 Average Score

 Std. Dev.

Change from 2001

Available Computing Hardware

228

2.89

0.36

0.08

Overall satisfaction with NERSC

261

2.84

0.37

0.02

Network Connectivity

214

2.75

0.48

-0.07

HW management and configuration

198

2.71

0.53

0.09

Consulting Services

222

2.58

0.63

-0.06

Allocation Process

185

2.57

0.65

-0.10

SW maintenance and configuration

187

2.54

0.58

-0.06

Available software

220

2.53

0.56

-0.03

HPCF Website

215

2.52

0.63

0.03

Mass Storage Facilities

191

2.51

0.67

0.03

Account Support Services

219

2.47

0.62

0.03

Training

118

1.75

0.76

-0.05

Visualization Services

112

1.62

0.74

-0.09

 

 

All Satisfaction Questions and Changes from Previous Years

  • How Satisfied are you? (sorted by Average Score)
  • How Satisfied are you? (sorted by Number of Responses)
  • 2001 to 2002 Changes
  • 1999 to 2002 Changes

 

Legend:

 

SatisfactionAverage Score
Very Satisfied 6.5 - 7
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4

 

 

How Satisfied are you? (sorted by Average Score)

Topics from the Overall Satisfaction with NERSC section are indicated in bold; they tend to be answered by more users and rated lower than the topics in the ensuing sections that rate specific areas in more detail.

 

TopicNo. of Responses Average Score
SP: Uptime 176 6.56
Consulting: Timely Response 207 6.51
HPSS: Reliability 127 6.51
PDSF: Uptime 41 6.51
T3E: Uptime 62 6.48
Consulting: Technical Advice 205 6.47
PVP: Fortran Compilers 34 6.47
PDSF: C/C++ Compilers 35 6.46
PDSF: Fortran Compilers 22 6.45
T3E: Fortran Compilers 49 6.45
Consulting: Time to Solve Problems 198 6.40
Consulting: Response to Special Requests 136 6.40
Consulting: Followup 188 6.39
Account Support Services 240 6.38
HPSS: Overall 142 6.39
PDSF: User Environment 39 6.38
SP: Overall 187 6.38
HPSS: Uptime 125 6.37
SP: Fortran Compilers 125 6.37
HPSS: Performance 126 6.35
Overall Satisfaction with NERSC 285 6.32
PVP: Uptime 39 6.31
Consulting Services 229 6.30
PDSF: Overall 46 6.26
Web: Accuracy 202 6.25
PDSF: Software Libraries 29 6.24
T3E: User Environment 54 6.24
Training: New User's Guide 138 6.21
PDSF: Applications 29 6.21
Web: Timeliness 181 6.20
PDSF: Running Interactively 39 6.18
Software Maintenance & Configuration 202 6.17
Web: Running Jobs 198 6.15
PVP: User Environment 34 6.15
T3E: Software Libraries 39 6.13
SP: User Environment 165 6.12
T3E: Overall 72 6.11
SP: C/C++ Compilers 76 6.11
Hardware Management & Configuration 210 6.10
HPCF Website 224 6.09
SP: Software Libraries 119 6.09
Network Connectivity 230 6.07
PVP: Overall 47 6.06
Mass Storage Facilities 205 6.04
T3E: C/C++ Compilers 25 6.04
PDSF: Tools & Utilities 28 6.04
PDSF: SW Bug Resolution 24 6.00
NIM Interface 91 6.00
PVP: I/O 27 6.00
Available Software 243 5.98
Training: Online Tutorials 108 5.97
Available Computing Hardware 242 5.97
T3E: Tools & Utilities 35 5.97
PDSF: Queue Structure 39 5.97
SP: I/O 143 5.97
Web: SP Pages 181 5.96
Web: Software Pages 168 5.96
HPSS: User Interface 130 5.95
Consulting: Online Web Interface 109 5.94
PVP: C/C++ Compilers 14 5.93
SP: Queue Structure 165 5.92
Web: T3E Pages 84 5.89
Training: Web Pages 99 5.89
Web: Programming Info 178 5.85
T3E: SW Bug Resolution 26 5.85
Web: File Storage Pages 114 5.84
Allocations Process 194 5.84
T3E: SW Applications 31 5.84
PVP: Ability to Run Interactively 33 5.82
Web: PDSF 37 5.81
Web: Finding Info 229 5.80
SP: Tools & Utilities 113 5.80
T3E: Perf. & Debugging Tools 32 5.78
T3E: Queue Structure 56 5.76
SP: SW Applications 90 5.70
PVP: Queue Structure 36 5.69
T3E: I/O 47 5.68
Web: Searching 175 5.66
PDSF: I/O 38 5.63
PVP: SW Libraries 28 5.61
PVP: SW Bug Resolution 15 5.60
SP: SW Bug Resolution 78 5.59
PVP: Tools & Utilities 21 5.57
Web: PVP Pages 55 5.55
SP: Perf. & Debug Tools 112 5.49
SP: Ability to Run Interactively 150 5.47
PVP: SW Applications 16 5.44
Viz Server: Escher 9 5.44
SP: Batch Wait Time 175 5.41
Math Server: Newton 8 5.38
PVP: Perf. & Debug Tools 18 5.33
Training: Slides from Classes 60 5.33
PDSF: Perf. & Debug Tools 28 5.25
T3E: Batch Wait Time 61 5.23
Training: Classes 40 5.12
Training: Overall 103 4.99
Visualization Services 94 4.83
PVP: Batch Wait Time 35 4.77

 

 

How Satisfied are you? (sorted by Number of Responses)

This ordering helps to indicate which services are used most by users (and is probably a better indicator than the services clicked for the question What NERSC resources do you use?

 

TopicNo. of Responses Average Score
Overall Satisfaction with NERSC 285 6.32
Available Software 243 5.98
Account Support Services 240 6.38
Available Computing Hardware 242 5.97
Network Connectivity 230 6.07
Consulting Services 229 6.30
Web: Finding Info 229 5.80
HPCF Website 224 6.09
Hardware Management & Configuration 210 6.10
Consulting: Timely Response 207 6.51
Consulting: Technical Advice 205 6.47
Mass Storage Facilities 205 6.04
Web: Accuracy 202 6.25
Software Maintenance & Configuration 202 6.17
Consulting: Time to Solve Problems 198 6.40
Web: Running Jobs 198 6.15
Allocations Process 194 5.84
Consulting: Followup 188 6.39
SP: Overall 187 6.38
Web: Timeliness 181 6.20
Web: SP Pages 181 5.96
Web: Programming Info 178 5.85
SP: Uptime 176 6.56
Web: Searching 175 5.66
SP: Batch Wait Time 175 5.41
Web: Software Pages 168 5.96
SP: User Environment 165 6.12
SP: Queue Structure 165 5.92
SP: Ability to Run Interactively 150 5.47
SP: I/O 143 5.97
HPSS: Overall 142 6.39
Training: New User's Guide 138 6.21
Consulting: Response to Special Requests 136 6.40
HPSS: User Interface 130 5.95
HPSS: Reliability 127 6.51
HPSS: Performance 126 6.35
HPSS: Uptime 125 6.37
SP: Fortran Compilers 125 6.37
SP: Software Libraries 119 6.09
Web: File Storage Pages 114 5.84
SP: Tools & Utilities 113 5.80
SP: Perf. & Debug Tools 112 5.49
Consulting: Online Web Interface 109 5.94
Training: Online Tutorials 108 5.97
Training: Overall 103 4.99
Training: Web Pages 99 5.89
Visualization Services 94 4.83
NIM Interface 91 6.00
SP: SW Applications 90 5.70
Web: T3E Pages 84 5.89
SP: SW Bug Resolution 78 5.59
SP: C/C++ Compilers 76 6.11
T3E: Overall 72 6.11
T3E: Uptime 62 6.48
T3E: Batch Wait Time 61 5.23
Training: Slides from Classes 60 5.33
T3E: Queue Structure 56 5.76
Web: PVP Pages 55 5.55
T3E: User Environment 54 6.24
T3E: Fortran Compilers 49 6.45
PVP: Overall 47 6.06
T3E: I/O 47 5.68
PDSF: Overall 46 6.26
PDSF: Uptime 41 6.51
Training: Classes 40 5.12
PDSF: User Environment 39 6.38
PVP: Uptime 39 6.31
PDSF: Running Interactively 39 6.18
T3E: Software Libraries 39 6.13
PDSF: Queue Structure 39 5.97
PDSF: I/O 38 5.63
Web: PDSF 37 5.81
PVP: Queue Structure 36 5.69
PDSF: C/C++ Compilers 35 6.46
T3E: Tools & Utilities 35 5.97
PVP: Batch Wait Time 35 4.77
PVP: Fortran Compilers 34 6.47
PVP: User Environment 34 6.15
PVP: Ability to Run Interactively 33 5.82
T3E: Perf. & Debugging Tools 32 5.78
T3E: SW Applications 31 5.84
PDSF: Software Libraries 29 6.24
PDSF: Applications 29 6.21
PDSF: Tools & Utilities 28 6.04
PVP: SW Libraries 28 5.61
PDSF: Perf. & Debug Tools 28 5.25
PVP: I/O 27 6.00
T3E: SW Bug Resolution 26 5.85
T3E: C/C++ Compilers 25 6.04
PDSF: SW Bug Resolution 24 6.00
PDSF: Fortran Compilers 22 6.45
PVP: Tools & Utilities 21 5.57
PVP: Perf. & Debug Tools 18 5.33
PVP: SW Applications 16 5.44
PVP: SW Bug Resolution 15 5.60
PVP: C/C++ Compilers 14 5.93
Viz Server: Escher 9 5.44
Math Server: Newton 8 5.3

 

2001 to 2002 Changes:

The following are statistically significant changes for responses to questions common to the 2001 and 2002 user surveys.

 

Topic 2002 Score 2001 Score Change
SP: Uptime 6.56 5.53 +1.03
SP: Perf. & Debugging Tools 5.49 4.69 +0.80
SP: Ability to Run Interactively 5.47 4.71 +0.76
SP: Queue Structure 5.92 5.19 +0.73
SP: Overall 6.38 5.82 +0.56
SP: Batch Wait Time 5.41 4.92 +0.49
SP: Fortran Compilers 6.36 5.96 +0.40
T3E: Queue Structure 5.76 5.36 +0.40
SP: C/C++ Compilers 6.11 5.72 +0.39
T3E: Tools & Utilities 5.97 5.65 +0.32
Visualization Services 4.83 4.51 +0.32
SP: I/O 5.97 5.67 +0.30
Hardware Maintenance & Config. 6.10 5.82 +0.28
Training: New User's Guide 6.21 5.94 +0.27
T3E: Uptime 6.48 6.22 +0.26
Software Maintenance & Config. 6.17 5.92 +0.25
Web: SP Pages 5.96 5.78 +0.18
Available Software 5.98 5.81 +0.17
Allocations Process 5.84 6.00 -0.16
PVP: Perf. & Debugging Tools 5.33 6.06 -0.73

 


 

1999 to 2002 Changes:

The following are statistically significant changes for responses to questions common to the 1999 and 2002 user surveys (1999 was prior to the SP's arrival at NERSC).

 

Topic 2002 Score 1999 Score Change
PVP: Overall 6.06 5.05 +1.01
PVP: Batch Wait Times 4.77 3.95 +0.82
PVP: Queue Structure 5.69 5.03 +0.66
PVP: Ability to Run Interactively 5.82 5.18 +0.64
Visualization Services 4.83 4.37 +0.46
HPSS: Performance 6.35 5.90 +0.45
T3E: I/O 5.68 5.23 +0.45
PVP: I/O 6.00 5.56 +0.44
PVP: Fortran Compilers 6.47 6.04 +0.43
Hardware Maintenance & Config. 6.10 5.71 +0.39
T3E: Queue Structure 5.76 5.47 +0.29
Software Maintenance & Config. 6.17 5.89 +0.28
HPSS: Overall 6.39 6.12 +0.27
T3E: Fortran Compilers 6.45 6.20 +0.25
HPCF Website 6.09 5.87 +0.22
T3E: Uptime 6.48 6.26 +0.22
Web: Timeliness of Info 6.20 5.99 +0.21
Consulting: Timely Response 6.51 6.64 -0.13
Consulting: Overall 6.30 6.58 -0.28
Training: Slides from Classes 5.33 5.95 -0.62
PVP: Software Libraries 5.61 6.40 -0.81
Training: Classes 5.12 6.19 -1.07

 

Visualization and Grid Computing

  • Are you a current user of visualization services at NERSC?
  • If not, what are your reasons for not using visualization services at NERSC?
  • On which of the following platforms do you prefer to use viz tools?
  • Do you plan to use any Grid applications this year? Which ones?   55 responses
  • Other reasons given for not using NERSC visualization services   40 responses
  • Which Grid applications do you plan to use this year?   14 responses
  • What services, features, or functionality would you like to see available on the DOE Science Grid?   16 responses

 

 

Are you a current user of visualization services at NERSC?

Answer No. of Responses
No 238
Yes 10

 

 

If not, what are your reasons for not using visualization services at NERSC?

Reason No. of Responses
I don't need visualization for my project 127
Don't know about services or can't find information 87
Network is too slow 35
Other 32
Software not applicable to my domain or software doesn't do what I want it to do 10

Other reasons given for not using NERSC visualization services:   40 responses

27   I do visualization at my local institution / remote user
7   Project isn't yet ready to do visualizations
3   Network is too slow
3   I don't know what is available at NERSC
2   Software not available at NERSC
2   NERSC software is too hard to us
2   I don't do visualizations

 

 

On which of the following platforms do you prefer to use viz tools?

Platform No. of Responses
Your desktop 88
Seaborg 50
Newton 10
Escher 9
Other 4

 

 

Do you plan to use any Grid applications this year? Which ones?   55 responses

Use Grid?  No. of Responses
No 36
Maybe 11
Yes 8

 

 

7   Globus Toolkit
7   Data services
4   Processor / queuing services
4   General interest in the Grid
2   Network support and open ports

 

 

What services, features, or functionality would you like to see available on the DOE Science Grid?   16 responses

4   Data services
4   Processor / queuing services
4   Network support
2   Globus Toolkit
1   Ease of use

 

 


Other reasons given for not using NERSC visualization services:   40 responses

 

I do visualization at my local institution / remote user:

I usually use the visualization tools on my desktop

I do visualization on local computers

... Easeier to transfer numbers and use local workstation to plot. ...

We use the visualization tools in our own laboratory systems.

Remote user (BNL)

Just used to use my PC for visualization

We have our own home grown viz tools

I did visualization on my local computer.

I have all my own visualization software on my local machine

Prefer storing, visualizing data locally. (Typical dataset sizes < 25 MB).

We have pretty adequate visualization capabilities on our local cluster.

My viz needs will be handled elsewhere

I use local visualization tools at pppl.

We use our specific visualization program for investigation of cardiac processes

have my own visualization, albeit not great...

Simply used to my own tools I use on other machines

Whatever visualization we need to do for our project, we do on our local resources.

We use our own program for viusualization and I was not aware of the visualization services at NERSC.

I use locally developed visual and post-processing tools. ...

... Currently I use perl/gnuplot scripts that I've written rathen than NERSC visulization services. ...

My visualization needs are largely met from other sources.

We have excellent visualization software on our local LINUX workstations. No need for visualization at NERSC (we transfer visualization output files back to local workstations).

Have own facilities available

I rely on our own team's local expertise.

As a member of a large computational project, I have access to that project's visualization resources outside of NERSC. ...

prefer local visualization

I do visualization locally.

 

Project isn't yet ready to do visualizations:

Project is just developing need for visualization

So far I did not look for visualization services at NERSC, but I am planning to do so soon....

I began to use the computer less than a year. But I still have many problems to run my programs. Most my programs are running at Lawrence Livermore National Laboratory but I am having problem even to make an executable file.

We are just not at the stage of our research were we need it. but when we do, we may use them, if our local resources do not suffice

Just haven't gotten to that point. We'll try to use.

I don't have time to investigate what softwares at NERSC may be useful to my projects.

Haven't felt the need yet for my project. But this will change soon.

 

Network is too slow:

... I do use gnuplot and sometimes xv in seaborg, for simple plotting, but I would not call that visualization, and the network/soft is too slow when the plot involves above of few hundred thousand points to do things interactively.

The network connection is too slow to allow for remote X connections. In the future I would like to try running the OpenDX server on seaborg and the client on my laptop. ...

I am interested in visualization, but in my experience rendering over the network is too slow.

 

I don't know what is available at NERSC:

Don't know what's available. ... I should take some time to investigate the services available.

Didn't know NERSC had any

... For other, smaller projects in which I am involved, it would be very useful to be able to use NERSC services and packages, since our group is too small to maintain our own computer for this kind of work. Quick and simple plotting packages on Seaborg are needed. NCAR graphics is a start. Packages should not be changed frequently as they have been in past years, since there are substantial learning curves and conversion costs.

 

Software not available at NERSC:

I may ask for an account in escher to do some vis of my atomistic simulations in the near future, but one of the main packages I use, rasmol, is not available there. ...

... However, for remote visualization it would be more useful to use gridftp to access the data on seaborg directly from my desktop. We didn't have the technology to do this during our last simulations but I would like to try it for the next one.

 

NERSC software is too hard to use:

Encountered difficulties getting graphics to display on local PC workstation. Easeier to transfer numbers and use local workstation to plot. On line documentation difficult to read and understand; hard to make high level decisions about which to use when there is so much minutia to wade through.

... Using viz at NERSC would be more attractive if the viz platforms mounted the Seaborg disks (home and scratch).

 

I don't do visualizations:

Others in my group do use the vis servers - I've concentrated on other matters.

My colleague does the visualization for our project.

 

 


Which Grid applications do you plan to use this year?   14 responses

 

Globus Toolkit:

Yes. Globus, GDMP, Condor, and tools from the SRM group.

Globus

gridftp, condor-g, globus

We are making increasing use of Grid technologies in our production physics code, and would like to be able to use - interactive monitoring and steering (requires open ports to the client tools) - real time visualization (requires open ports to client tools and GridFTP) - simulation compilation, management, tracking, etc through our Cactus grid portal (requires Globus GRAM, GSISSH, GSIFTP, open ports to portal, standard queues available to globus etc). - access to machine though Globus GSI tools (GSISSH, GSIFTP). - network monitoring using Network Weather Service (NWS)

I will probably use globus for a MHD code I have developed.

I use MDS+ which is part of the fusion grid. [MDS - Monitoring & Discovery Service - is a part of Globus]

Yes. Globus, EUDataGrid (WP1-JSS, WP2-Replica Service, GDMP,etc) GRAPPA, GANGA, Pacman, Netlogger

 

Data services:

Distributed data analysis seems to be what we're moving towards, but I haven't seen it happen yet. ...

I would like to access data for visualization on my desktop via gridftp.

... GDMP ... tools from the SRM group [GDMP: Grid Data Mirroring Package; the SRM group is the Storage Resource Management Middleware Project at LBNL; their tools are used to transfer files from Brookhaven to NERSC]

Maybe - use for transferring bulk data volumes to RCF [RCF: RHIC Computing Facilty at Brookhaven]

gridftp ...

If my data analysis benefits more, why not give it a try.

... EUDataGrid

 

Processor / queuing services:

... I would like to see examples of how the Grid might free up the analysis being tied to a given linux farm, of course the hope is to save time, but I haven't seen actual implementations.

... Condor ...

... condor-g ...

... GRAPPA, GANGA ... [GRAPPA: Grid Access Portal for Physics Applications - a portal for ATLAS physicists to easily submit jobs; part of GriPhyN - the Grid Physics Network Project; GANGA is another interface for managing jobs]

 

General interest in the Grid:

No definite plans at this point, but I'll try to follow what is available and see if it is relevant to my work.

We are interested in working on grid computing applications, but we are not sure if they will be suitable for our application.

Probably at some point for ALICE computing.

No. But I professionally want to get Grid applications enabled at NERSC. In the future, I plan to use SciDAC collabortory tools (CMCS) which is a grid application. [CMCS: Collaboratory for the Multi-scale Chemical Sciences]

 

Network support and open ports:

... interactive monitoring and steering (requires open ports to the client tools) - real time visualization (requires open ports to client tools and GridFTP) ... network monitoring using Network Weather Service (NWS)

... Netlogger

 

 

What services, features, or functionality would you like to see available on the DOE Science Grid?   16 responses

 

Data services:

 

My impression of the grid is that it's mostly for large collaborative projects. At this point, I, probably like many NERSC users work more in the mode as an inidividual researcher, only ocassionally accessing data from larger groups. If the grid allows me to more easily access this data, it would be of use. Also, there is the potential in the future that I may become more involved in larger collaborative code efforts (e.g., ISOFS, SCIDAC) and would be more involved in grid computing.

Mostly gridftp as we could use it to access data for visualization

VDT ... [Griphyn Virtual Data Toolkit]

Minimal security interferance with file and data movement.

 

Processor / queuing services:

Very fast interprocessor communication

Queing system, disc space, fortran77, perl

I would like to have access to a linux (intel) based cluster of reasonable size as NERSC resource. The prospect of having only an IBM cluster is somewhat dissatisfactory.

... Finally, if the grid gave me access to a wider range of parallel computing resources for my physics projects that would be quite useful.

 

Network support:

... Network support is also important and has been good.

... and a contact for dealing with any firewall issues which arise.

Direct, fast access to NERSC HPSS from ORNL. Direct, fast access to ORNL HPSS from NERSC. Fast connectivity between NERSC and ORNL HPSS.

... connectivity

 

Globus Toolkit:

metadata catalog, replica catalog, virtual organization (collaboration) membership servers, gram interface to PDSF, SRB

We would like to have the Globus toolkit and the NWS installed and supported,

 

Ease of use:

Ease of use, connectivity

 

Don't know / no need:

No comment

No requests at this time.

I do not know.

no need up to now

 

Other:

would like to see use of xmgr (xmgrace) on NERSC machines [xmgr is a plotting program; this response might have been intended for the Vis section]

 

Web, NIM, and Communications

  • Legend
  • Satisfaction with the Web
  • How useful are these methods for keeping you informed?
  • Are you well informed of changes?
  • Summary of Information Technology Comments
  • Comments concerning the HPCF web site:   22 responses
  • Comments concerning the PDSF web site:   6 responses
  • Comments concerning NIM:   20 responses
  • Comments concerning how NERSC keeps you informed of changes:   19 responses

 

Legend:

Satisfaction 

Average Score 

Mostly Satisfied

5.5 - 6.4

 

 

Importance 

Average Score 

Somewhat Important

1.5 - 2.4

 

 

Significance of Change 

significant increase

significant decrease

not significant

 

 

Satisfaction with the Web:

Topic 

 No. of Responses

Average Score

Std. Dev.

Change from 2001

Accuracy

202

6.25

0.90

0.10

Timeliness of info

181

6.20

0.90

0.12

Running Jobs Section

198

6.15

0.98

 

NIM

91

6.00

1.10

 

SP Pages

181

5.96

1.18

0.18

Software Section

168

5.96

0.94

 

T3E Pages

84

5.89

1.08

-0.01

General programming info

178

5.85

1.08

0.13

File Storage Pages

114

5.84

1.11

0.05

PDSF

37

5.81

1.10

 

Ease of finding info on web

229

5.80

0.99

-0.08

Search facilities

175

5.66

1.10

0.11

PVP Pages

55

5.55

1.15

-0.09

 

 

 

How useful are these methods for keeping you informed?

Method

 No. of Responses

 Average Score

 Std. Dev.

Change from 2001

Email

201

2.46

0.68

0.02

Announcement web archive

183

2.34

0.75

0.29

MOTD

185

2.09

0.82

-0.18

Phone calls

147

1.68

0.87

-0.13

 

 

 

Are you well informed of changes?

Question 

No. of Yes Responses

No. of No Responses

Percent Who Said Yes

Percent Who Said Yes in 2001

Do you feel you are adequately informed?

222

9

96

94

Are you aware of major changes at least 1 month in advance?

184

18

91

81

Are you aware of software changes at least 7 days in advance?

163

27

86

81

Are you aware of planned outages 24 hours in advance?

176

26

87

91

 


Summary of Information Technology Comments

Comments concerning the HPCF web site:   22 responses

8  

Good website

6  

Provide additional or clearer information

5  

Improve searching

3  

Problems navigating / better organization

3  

Remove outdated info / correct erros

Comments concerning the PDSF web site:   6 responses

4  

Provide additional or clearer information

2  

Good website

Comments concerning NIM:   20 responses

8  

Issues with ERCAP or the allocations process

8  

Good interface

3  

Problems with reliability

1  

Difficult to use

1  

Violates privacy

Comments concerning how NERSC keeps you informed of changes:   13 responses

6  

Satisfied / well informed

2  

Comments on using email

2  

Comments on the MOTD

2  

Comments on system outages

1  

Violates privacy

 


Comments concerning the HPCF web site:   22 responses

Good website:

Web site is complete and informative.

Generally good, easy to get information from. ...

It is great!

Very useful, and I was able to get up and running reading them, without needing much other help.

It's one of the best places to find information about HPC.

Generally it is excellent.

The HPCF web site is easy to use and is full of pertinent information.

Very complete. ...

Provide additional or clearer information:

... Could never find out how to get names, owners of currently running jobs.

providing more dynamic information about the statics of usage.

IBM web pages lacked details on selecting number of pe's within a node for a long time.

It would be great if movies of the latest trainig sessions could be made available for the multimedia classes. It is great that the presentation files are available, but a video would be helpful, too. I was unable to find a video dealing with the IBM SP and debygging/developing tools for example. However there have already been training sessions and presentation files are available. However I feel that actually watching the presentation would help.

It took me a week to get a job script to work on theIBM SP first relying on the web information, then also on the consultants (who were generally very helpful). It is not clear which POE parameters need to be specified, such as when you run on just 8 processors on 1 node. What is on the web pages led me to set the wrong parameters. ...

More examples of everything would be helpful.

Improve searching:

I rely on the consultants to tell me which parts are relevant, and which parts are out of date. Its tedious to search through it for answers to a specific problem. Unless you know the answer, the searches don't yield solutions. Example I know about .login and .cshrc from the crays, but not about .login.ext and .cshrc.ext on the IBM. If I search for .login or .cshrc, the results don't lead easily to the .login.ext or .cshrc.ext. [Improve information about the Unix environment on the SP.]

search by date does not behave usefully since it seems all files have a recent date

The search facility uses a port which is blocked by my companies firewall. This is very inconvenient.

The search tool I haven't had much success with, but my own searching has been very effective. I have always found the information I'm looking for with pertinent examples and clear explanations. ...

It is very difficult to find specific information on the web site. The introductory writeups are mostly good as far as they go, but do not provide enough information at the level that I usually need. For most questions that require timely answers I have to consult other people or call the NERSC consultants. I'm not sure what the best solution is. Better search and links to the manual pages would help, but often it is difficult to identify a simple search topic.

Problems navigating / better organization:

... The information is scattered about and it takes quite a while to look thru all of it to find what you need. It is not terrible, it could be much worse. But it could also be much better.

... Perhaps a grand index would be helpful as well.

... A bit complicated to navigate, however.

Remove outdated information / correct errors:

Some key pages are still Cray specific, for example the makefile examples page refers to the Cray machines when the IBM SP machines would be better.

Occasionally, the consultants and the web pages together are needed because there are small errors on the web pages.

Quite a few things are outdated

Don't use:

I have not looked at the web pages extensively yet.

 


Comments concerning the PDSF web site:   6 responses

Provide additional or clearer information:

For the most part I like the pdsf web site, but in the section on running jobs, for the LSF queing commnads it give bsub with a few options, but I know that there are other options out there (such as -w to wait for another job to finish) which are not mentioned. If you don't want to include all of the options to keep the page simple, then maybe you should point to another source for more detailed info.

NERSC response: The page that you mentioned is simple by design so that a new user gets enough information to get started without being overwhelmed with details. We provide access to the LSF manuals as well. It is available from our Software page: . If you follow the LSF link on that page you get access to all the manuals with a good indexing. PDSF also offers their users man pages that cover all the options of available commands. These facts should be mentioned at the end of the introductory page and we will do that.

Not enough information available. More FAQ would be useful.

NERSC response: The general FAQ page: http://pdsf.nersc.gov/help/faq.html and the experiment specific pages: http://pdsf.nersc.gov/help/Atlas_at_PDSF.html and http://www-rnc.lbl.gov/PDSF/ cover questions submitted so far. Users are most welcomed to submit suggestions via the support request form: http://pdsf.nersc.gov/help/sendpr_form.html . We will also solicit more FAQ candidates from users within the couple of next weeks.

Needs to more details. There should be lots of examples on how to use software. Also, there should links to things like Cern Library. Tutorials, etc.

NERSC response: PDSF works under assumption that that all the participating groups have their main computing resources at other locations and that users get help on how to use the experiment software there. We do not have the resources to provide experiment specific help, but users can find links to web pages for their experiments at: http://pdsf.nersc.gov/research.html . We also provide links (and space if needed) to PDSF specific group pages: http://pdsf.nersc.gov/help/project_help.html, but the pages have to be maintained by the groups themselves (see atlas and STAR).
We provide links from our Software page ( http://pdsf.nersc.gov/software/software.html) to a documentation for general purpose packages. That's where a link to the CERN Library page can be found: http://pdsf.nersc.gov/software/software.html#CERNlib. Actually an entry for each of the packages includes a link to its home page. Right now we are updating software pages following the system upgrade to RH 7.2 when lots of software had to be reinstalled. We welcome specific suggestions as to what kind of tutorials or documentation users would like to see.

I have passed along some recent suggestions to Iwona & Shane on ATLAS documentation on the PDSF web site. As always, they have been very responsive and helpful.

Good website:

Love it

Very good, I have no problems with it.

 


Comments concerning NIM:   20 responses

Issues with ERCAP or the allocations process:

Had trouble saving ERCAP in a format I could send around to others in my department using IE. The consultants were very helpful, but it took them a while to figure out how to do it. Also, there should be a spell checker built in to the ERCAP interface, as well as in this survey form.

... In general, it is far too difficult to obtain run time, and the time is not divided equally or fairly within repositories. This may be endemic to marrying time allocations with research proposals. It is also very difficult to obtain time mid-year if one changes jobs and there is no time available to his/her new institution.

I have never liked the ERCAP interface. I would prefer a system where the user prepares a proposal and submits it by email. The size limits on the current interface are much to small. The changes this year which deemphasized science and empasized computing should be reversed.

It was awkward to prepare the ERCAP because jumping between pages was very difficult. I ended up doing it locally and pasting the pieces in when I was done

Last year I had trouble submitting a proposal via ERCAP, until a consultant suggested that the size limit had been exceeded in one case. An error message would have helped. This year I prepared detailed information on timing, in response to NERSC's request. But it turned out that the length limit had not been increased, so there was no way to answer the questions in the detail requested via the form -- and we had to put much of the information on a web page. Please be consistent!

... ERCAP is somewhat inconvenient, but manageable.

... Sometimes it would be nice to actively enforce the 4000 character limit, so you don't have to delete things after you enter them, but this is pretty minor.

kind of cumbersome to answer some of questions of working on sp when we are just working on porting thing there from the cray

Good interface:

Very useful interface!

Very convenient for PI's usage

In my opinion the NIM/ERCAP web site has been improved significantly in the last year, and I now find it much easier to use than I did when I first started.

NIM is great for managing accounts. ...

This is a great system

This is one of the best features of the web site, and is a easy and quick tool for both daily monitoring and the entire allocation process. I especially like how easy the ERCAP is to roll over previous allocation requests and save them while you continue working on them. ...

It works. NERSC staff is responsive on helping and making improvements.

Good for what it is supposed to do.

Problems with reliability:

Sometimes the getnim command is broken (gives error message)

NIM sometimes seems buggy, unreliable.

There were a few glitches, but the Nersc support staff were wonderful and things ran pretty smoothly.

Difficult to use:

The NIM login screen never accepts my password the first time. I always need to "try again".

Violates privacy:

violates privacy and facilitates spying on others, which has been noticable and had negative effects in my opinion. Having proposals available may be ok. ....

Don't use:

I have yet to make the transition to really using the NIM interface...

I have not currently asked for a NIM password.

 


Comments concerning how NERSC keeps you informed of changes:   19 responses

Comments on using email:

I find the emails useful, however I would like to receive multiple reminders on the same issue. I accept that maybe other people don't want as much traffic, but if a major change is about to happen in a months, then from about 10 days in advance I would like to get an email about this every day. Maybe it is possible to configure this on a per user or per mailing list basis so that users can define how often they want to get this kind of information. ...

Email announcements are the most effective way for me to learn about changes.

Comments on the MOTD:

The message of the day is too long, I only recently bought an ssh that allowed me to scroll back to see the first part of it. Otherwise it just goes by too fast. ...

MOTD is quite long. I basically ignore it when I log in. If I have a problem, occasionally I think to "more" it, but not very often. Just seems like a pain. But I'm sure someone finds it useful, so I wouldn't get rid of it.

Comments on system outages:

Outages should be scheduled immediately following major conferences, rather than just before them, as often happens.

I think NERSC should do more system changes on weekends. This would affect the users less.

Satisfied / well informed:

I am very satisfied with the information policy. Iwona does a great job.

... In general however I feel well informed by NERSC about changes to their systems and during the time I used it mostly the system (IBM SP) was extremly useable with little changes.

I feel I am well informed.

You're doing very well in this regard!

They are doing a great job.

I am very satified with the current information.

Other:

Also the warning message from hsi is too long and isn't worth seeing thousands of times over and over again. It also effectively deletes the useful information obtained during a run since many scripts invoke hsi to save results, and that lengthy message just scrolls it right off screen and out of memory.

It would be nice to have some stability for a while. Things change too rapidly.

Unfortunately, sometimes the "cry wolf" effect applies here, and I don't always pay attention to things until you notice they change or don't work. But this is my own problem, in all cases it was my hastiness rather than lack of warning which caused an issue.

 

Hardware Resources

  • Legend
  • Satisfaction - Compute Platforms (sorted by Average Score)
  • Satisfaction - Compute Platforms (sorted by Platform)
  • Max Processors Used and Max Code Can Effectively Use
  • Satisfaction - HPSS
  • Satisfaction - Servers
  • Summary of Hardware Comments
  • Comments on NERSC's IBM SP:   54 responses
  • Comments on NERSC's Cray T3E:   18 responses
  • Comments on NERSC's Cray PVP Cluster:   14 responses
  • Comments on NERSC's PDSF Cluster:   9 responses
  • Comments on NERSC's HPSS Storage System: 31 responses
  • Comments about NERSC's auxiliary servers:   3 responses

 

Legend:

Satisfaction 

Average Score 

Very Satisfied

6.5 - 7

Mostly Satisfied

5.5 - 6.4

Somewhat Satisfied

4.5 - 5.4

 

 

Significance of Change 

significant increase

significant decrease

not significant

 

 

 

Satisfaction - Compute Platforms (sorted by Average Score):

Topic

 No. of Responses

Average Score

 Std. Dev.

Change from 2001

SP Uptime

176

6.56

0.81

1.03

PDSF Uptime

41

6.51

0.78

 

T3E Uptime

62

6.48

0.84

0.26

SP Overall

187

6.38

0.87

0.56

PVP Uptime

39

6.31

0.89

-0.14

PDSF Overall

46

6.26

0.93

 

PDSF Ability to Run Interactively

39

6.18

0.97

 

T3E Overall

72

6.11

1.03

-0.12

PVP Overall

47

6.06

0.94

-0.08

PVP Disk Configuration and I/O Performance

27

6.00

1.11

0.00

PDSF Queue Structure

39

5.97

1.14

 

SP Disk Configuration and I/O Performance

143

5.97

1.25

0.30

SP Queue Structure

165

5.92

1.07

0.73

PVP Ability to Run Interactively

33

5.82

1.26

-0.16

T3E Queue Structure

55

5.76

1.09

0.40

PDSF Batch Wait Time

39

5.74

1.21

 

T3E Ability to Run Interactively

56

5.73

1.37

0.09

PVP Queue Structure

36

5.69

1.09

0.28

T3E Disk Configuration and I/O Performance

47

5.68

1.12

0.08

PDSF Disk Configuration and I/O Performance

38

5.63

1.30

 

SP Ability to Run Interactively

150

5.47

1.58

0.76

SP Batch Wait Time

175

5.41

1.47

0.49

T3E Batch Wait Time

61

5.23

1.52

0.26

PVP Batch Wait Time

35

4.77

1.44

0.21

 

 

 

Satisfaction - Compute Platforms (sorted by Platform):

Topic

 No. of Responses

Average Score

 Std. Dev.

Change from 2001

SP Uptime

176

6.56

0.81

1.03

SP Overall

187

6.38

0.87

0.56

SP Disk Configuration and I/O Performance

143

5.97

1.25

0.30

SP Queue Structure

165

5.92

1.07

0.73

SP Ability to Run Interactively

150

5.47

1.58

0.76

SP Batch Wait Time

175

5.41

1.47

0.49

 

 

 

 

 

PDSF Uptime

41

6.51

0.78

 

PDSF Overall

46

6.26

0.93

 

PDSF Ability to Run Interactively

39

6.18

0.97

 

PDSF Queue Structure

39

5.97

1.14

 

PDSF Batch Wait Time

39

5.74

1.21

 

PDSF Disk Configuration and I/O Performance

38

5.63

1.30

 

 

 

 

 

 

T3E Uptime

62

6.48

0.84

0.26

T3E Overall

72

6.11

1.03

-0.12

T3E Queue Structure

55

5.76

1.09

0.40

T3E Ability to Run Interactively

56

5.73

1.37

0.09

T3E Disk Configuration and I/O Performance

47

5.68

1.12

0.08

T3E Batch Wait Time

61

5.23

1.52

0.26

 

 

 

 

 

PVP Uptime

39

6.31

0.89

-0.14

PVP Overall

47

6.06

0.94

-0.08

PVP Disk Configuration and I/O Performance

27

6.00

1.11

0.00

PVP Ability to Run Interactively

33

5.82

1.26

-0.16

PVP Queue Structure

36

5.69

1.09

0.28

PVP Batch Wait Time

35

4.77

1.44

0.21

 

 

 

Max Processors Used and Max Code Can Effectively Use:

Topic

 No. of Responses

 Average

 Std. Dev.

Change from 2001

SP Processors Can Use

115

546.25

948.16

-204.75

T3E Processors Can Use

36

520.00

1179.09

164.00

Max SP Processors Used

151

171.38

272.76

-30.62

Max T3E Processors Used

48

149.35

160.08

16.35

PDSF Processors Can Use

22

97.50

241.58

 

Max PDSF Processors Used

22

35.14

81.22

 

PVP Processors Can Use

19

6.95

7.90

-23.05

Max PVP Processors Used

22

5.45

5.18

-4.55

 

 

 

Satisfaction - HPSS:

Topic

 No. of Responses

 Average

 Std. Dev.

Change from 2001

Reliability

127

6.51

0.94

-0.12

HPSS Overall

142

6.39

0.91

-0.11

Uptime

125

6.37

1.04

0.04

Performance

126

6.35

1.06

-0.01

User Interface

130

5.95

1.30

-0.07

 

 

 

Satisfaction - Servers:

Server

 No. of Responses

 Average

 Std. Dev.

Change from 2001

Escher (viz)

9

5.44

1.13

0.36

Newton (math)

8

5.38

1.30

-0.09

 


Summary of Hardware Comments

Comments on NERSC's IBM SP: 54 responses

21  

Good machine

15  

Queue issues

9  

Needs more resources / too slow

7  

Provide more interactive services

6  

Hard to use / would like additional features

2  

Stability issues

2  

Disk issues

1  

Need cluster computing at NERSC

Comments on NERSC's Cray T3E:   18 responses

11  

Good machine / sorry to see it go

2  

Mixed evaluation

2  

Provide better interactive services

1  

Queue issues

Comments on NERSC's Cray PVP Cluster:   14 responses

11  

Good machine / sorry to see it go / need PVP resource

2  

Improve batch turnaround time

Comments on NERSC's PDSF Cluster:   9 responses

4  

Good system

2  

Queue and priority issues

2  

Disk issues

1  

Would like new functionality

1  

Needs more resources

 

 

Comments on NERSC's HPSS Storage System:   31 responses

16  

Good system

6  

Don't like the down times / downs need to be handled more gracefully

4  

Performance improvements needed

4  

Would like new functionality

1  

Hard to use

1  

Authentication issues

Comments about NERSC's auxiliary servers:   3 responses

 


Comments on NERSC's IBM SP:   54 responses

Good machine:

Excellent platform for efficient parallel computing. Among the best managed supercomputers, if not the best, we have pursued our work on!

Excellent support. We've gotten some custom mods to the system for our use which has been very helpful. Consultants are always available and helpful. Excellent collaboration.

A truely great machine. Extremely well run. ... Worldwide the BEST machine my group uses.

It is very good machine. But too many people are using it.

Very good machine as setup, my research relies heavily on it.

This has been a very productive machine for us and the support of our efforts has been excellent.

Always has been up when I have wanted to test my code. I like that I can get jobs in for debugging purposes easily.

I've been incredibly happy with the SP. Batch queue turnaround times are very quick, and one can usually get away with the low queue on weekends. We've investigated efficiency quite extensively and found that we can run on 2-4 nodes fairly effectively and have run on up to 16 nodes (16 processors per node, in all cases).

We are rewriting our code to effectively use 64+ processors and then we will see if we are able to get our jobs through in a timely manner. So far, using one node, we have been happy.

I am very happy with using IBM SP

... The system had fantastic uptime, I got a lot of jobs done. The system was very stable and had high production quality unlike some other systems, in particular clusters. The maximum number of processors I used on seaborg is fairly low, since I did mostly parameter studies of smaller runs and other members of the group did large simulations. The code has been running on more than 1500 processors already.

I think the machine is great. I plan to do more performance analysis in the future.

I think seaborg provides a very friendly user interface.

Great machine - many processors and sufficient memory for each.

Everything is good ...

The best experience I have had with any system! ...

It works very well. ...

Great machine. ...

The system is excellent. ...

Perfect for my needs (testing scalability of numerical algorithms).

very efficient system with well provided service

Queue issues:

... Also, it would be great to have a slightly faster queue time for regular_long jobs (current wait is about 3 days).

... though I did put in a request under the Big Splash initiative to get another regular_long job to go through (2 at a time) and it hasn't been carried out yet.

My code parallelizes well for many processors, but I have only used up to 80 processors in order to decrease the waiting time.

... (1) one really long queue would be handy (*) ...

A 48-hr queue would be desirable for some large jobs.

Job running is great, but the walltime hard limit is too "hard". I do not know if there are some techniques to flash the memory data into disk when jobs are killed. That's very important to my time-consumed project....

The 8 hour limit on the regular queue is too short.

Queue waits for jobs larger than 32 procs have been horrible (up to 7 days for a 128 processor job). ...

The queues have gotten really bad, especially if you have something on low priority. ...

Allocation of processors in chunks smaller than 16 would be useful. More and longer regular_long time should be allocated.

I was very impressed by the short times I spent on the queue, but the short maximum run-time limits really limits the applicability of the SP for my systems of interest.

Check restart needed if system goes down. Longer time limits needed. I have been trying to use Gaussian 98 which typically runs for 4 days so the 24 hr limit is not enough.

My jobs run most efficiently on 32 processors (or 16) over several days rather than short periods of time on a large number of processors. When the job is interupted data is lost so when I restart I lose information. It would most efficient if I could access the nodes for extended periods, and a low number of CPU.

It would be nice to have a queue for serial jobs, where they could share a node without being charged for the entire node.

... It is a handicap to be charged for all 16 processors even if you use only 1.

Needs more resources / too slow:

The individual processors are really slow - slow compared to every other chip I use, Athlon, P4, etc. This machine is a real dog.

The system is excellent. However, I wish that NERSC had a much larger and more powerful computer. The problems I would most like to attack requir two to three orders of magnitude more computing power than is currently available at NERSC. (In responding to the last question, I indicated the maximum number of processors my code can use per job on Seaborg. On a more powerful machine it could effectively use thousands of processor).

... Individual processor speed is relatively slow compared with other parallel systems I use (mostly PC based LINUX clusters with 2.2 GHz procs.) However, the stability of the system is better than most computers that I have used.

Processor by processor, it is much slower than Intel P4.

... But the processors are getting slow -- my code runs 30% faster on Athlon 1.4 Ghz.

CPU performance is fine. Communication performance restricts the number of nodes our codes can use effectively. At the number of processors we use for production, the codes typically spend 70% of their time communicating and only 30% of their time calculating.

I have a code which uses a lot of FFTs and thus has a decent amount of global communication. The only problem that I have with the IBM SP is that communication between the nodes is too slow. Thus, it takes the same amount of time to run my code on the Cray T3E as it does to run on the IBM SP.

Try to get more memory

... For future improvements, please increase I/O speed, it's a limiter in many of my jobs (or increase memory to 16GB/CPU, which is obviously too expensive). ...

Provide more interactive services:

Interactive jobs are effectively impossible to run, even small serial jobs will not launch during the day.

Debugging code is currently VERY FRUSTRATING because of LACK OF INTERACTIVE ACCESS. You can't run totalview unless the job clears the loadleveler, which has become dramatically more difficult in the last couple months.

I find it very difficult to run interactively and debug. There seems to be only a few hours per day when I can get any interactive work done.

I wish it was easier to get a node when running interactively. I realize that most of the nodes are running batch jobs, but it might make sense to allocate more nodes for interactive use.

... (4) sometimes the interactive jobs are rejected... why? can some rationale be given to the user other than the blanket error message? (*) ...

A few more processors for interactive use would be helpful.

Needs to run(debug)interactive job on > 16 processors. ...

Hard to use / would like additional features:

I just don't like IBM.

need mechanism to inquire on remaining wall clock time for job when jobs are terminated by the system for exceeding wall clock time, a signal should be sent to the job with some time remaining to prepare

... I would like the default shell to be C shell.

i need to understand the POE better in order to optimize my code better. ...

... Nice machine overall, but I miss some of the software that was on the T3E (Cray Totalview and the superior f90 compiler).

Great machine. It would be nice to have (not in order of importance, * means more important than the rest)- (1) one really long queue would be handy (*) (2) a program that tries to estimate when a queue job will run (very hard I realize but still useful) (3) when submitting a job, the llsubmit tells you how many units it will use (max) so you know what you're getting into (4) sometimes the interactive jobs are rejected... why? can some rationale be given to the user other than the blanket error message? (*) (5) PLEASE REOFRM THE ACCOUNTING SYSTEM OF MPP HOURS WITH ALL THE CRAZY FACTORS (2.5 is it or not?) (**)

Stability issues:

We have had some problems with jobs with high I/O demands crashing; I don't know if the stability of these operations could be improved or not. ...

When can we get the new version of operating system installed in order to run MPI 32 without hanging the code on large number of processors?

Disk issues:

... The big drawback is there is no back-up for user home directory.

The number of inodes per user was too small. ...

Need cluster computing at NERSC:

My main complaint is that this 2000+ supercomputer is being used in a broad user based time-share envionment as 10-20 clusters. The factional use with > 512 (0r even >256) is to small. We are paying dearly for unsued connectivity and the system is not being used properly. For the same hardware price, NERSC could be a "cluster center" offering 3-5x more FLOP's per year. The users need a computer center with more capacity (not capabiity). If we had a "cluster center", a machine like seaborg could be freed up the its best use....codes that need and can used > 1024 (or2048) ps. The expansion ratio (turn-around time/actual run time) has much improved this year (generally below 2 and used to be often > 10); but next year seaborg is going to be overloaded again

Other:

Max processors depends on the job. [comment about the survey question]

I would welcome examples on how to improve application performance on the SP with increasing numbers of processors.

Just starting / don't use:

Usage limited so far, not much to say.

Have not used it yet.

we are still working on putting mpi into our model. once we have that completed we can use 32 maybe 64 processors to do our model runs

 


Comments on NERSC's Cray T3E:   18 responses

Good machine / sorry to see it go:

I understand that it has to go, but I am sad about it.

Better balance of communication and computation performance then the IBM SP.

Good machine. Shame to see it go.

Though cache limitations were a problem, overall an excellent machine -- sorry to see it go.

Saying goodbye to an excellent resource.

Sad to see it go. (Especially now that everyone has switched over to seaborg).

I like the T3E.

With the demise of the T3E seaborg loading is going to get worse.

I wish it had more memory per node, faster nodes and lived forever.

Operating environment is easier than SP.

good system, more memory is needed.

Mixed evaluation:

Single-processor speeds on this machine have fallen behind the curve; we have ceased development on this platform. Communication speeds are excellent, however.

Not competitive with the SP; otherwise find for parallel computing.

Provide better interactive services:

Too small time is avaible for interactive mode

Even though interactive jobs have top priority, MCurie's list of started jobs would get filled with long batch jobs, and interactive jobs couldn't get in. In the end I gave up and moved to Seaborg.

Queue issues:

128 processor queue can be very slow -- up to 1 week.

Don't use:

I do not use T3E nowadays.

Other:

I wish the default shell would be C shell

 


Comments on NERSC's Cray PVP Cluster: 14 responses

Good machine / sorry to see it go / need PVP resource:

It is unfortunate that no viable replacement for the vector machines is planned. By a viable replacement I mean a machine which can run straight fortran or C codes without having to resort to something like MPI to use it effectively. The current PVP cluster is far too slow, which has effectively slowed down those projects that we have assigned to it.

Not competitive with the SP; otherwise fine or parallel computing.

interactive use is very important

Saying goodbye to a useful and excellent resource.

I'm sorry to see the Cray's go. One of my codes is optimized for these vector machines and runs quite efficiently on them. Now we will have to run it on the IBM SP where it will not run efficiently at all unless parallized.

I wish that there were going to be a replacement for the PVP cluster.

Wish it were around a bit longer

Hope it can continue to exist in nersc.

My codes are legacy codes from ancient NERSC (MFENET) days. They now run on local UNIX machines but the larger mesh jobs run only on the NERSC PVP cluster.

Overall I like everything about the PVP Cluster ...

With the demise of the PVP, the loading of seaborg is going to get worse....there is now no place to run the vector codes

Improve turnaround time:

batch turn around can be slow. killeen turn around usually good

Overall I like everything about the PVP Cluster except for the long wait times in the queue, and the bad presentation of queue status information so I can gauge how long my wait will be.

Don't use:

I do not use

Never used it.

 


Comments on NERSC's PDSF Cluster:   9 responses

Good system:

runs great. ...

Generally excellent support and service. What issues do arise are dealt with promptly and professionally.

Keep up the good work

beautiful! but sometimes misused/stalled by infinitely stupid users.

Queue and priority issues:

*** Obscure priority setting within STAR - Not clear why individual people get higher priority. - Not clear what is going on in starofl. Embedding gets always top priority over any kind of analysis made by individual user. + intervention of LBL staff in setting embedding/simulation priority should be stopped. ...

NERSC response: Rules for calculating dynamic priorities are explained in the STAR tutorial   (05/03/02). They are based on the user's group share, the number of jobs the user currently has in execution and the total time used by the running jobs. Shares for all the STAR users (with the exception of starofl and kaneta) are equal (see bhpart -r). starofl is designated to run production that is used by the whole experiment and kaneta does DST filtering. The justification is that no analysis could be completed without embedding and many users run on pico-DST's produced by Masashi. STAR users should direct their comments regarding share settings within STAR to its computing leader (Jerome Laurent - jeromel@bnl.gov). NERSC staff does not set policies on the subdivision of shares within the STAR group.

The short queue is 1 hour, the medium queue is 24 hours, and the long queue is 120 hours. There's only a factor of five difference between the long and mediume queues, while there's a factor of 24 between the short and medium queues. An intermediate queue of 2 or 3 hours would be useful as it is short enough to be completed in a day but can encompass jobs that take a bit more than an hour.

NERSC response: To answer that question we have to ask another one. How and under what circumstances would users benefit from the introduction of this additional queue. The guess is that the user hopes for a shorter waiting time if her/his job would be submitted to such a queue.

The PDSF cluster works under fair share settings on the cluster level. This model allows groups of varying size and "wealth" to share the facility while minimizing the amount of unhappiness among the users. In this model each user has a dynamic priority assigned based on the user's group share, subdivision of shares within a group (decided by the groups), the number of jobs the user has currently executing and the total time used by the user's running jobs. Jobs go into execution based on that dynamic priority and only if two users have identical dynamic priority is the queue priority is taken into account. So the queue priority is of secondary importance in this model unless a pool of nodes is dedicated to run a given queue.

We use the queue length to manage the frequency with which job slots open. The short queue runs exclusively on 30 CPU's (as well as on all the other CPU's sharing with medium and low). This means that on average a slot for a short job opens every 2 minutes. These settings provide for a reasonable debugging frequency at the expense of those 30 nodes being idle when there are no short jobs.

We created a medium queue based on the analysis of average job length and in order to provide a reasonable waiting time for the "High Bandwidth" nodes which only run the short and the medium queues. We have 84 such CPU's. So on average (if those nodes were running only medium and no short jobs) a slot opens every 15 minutes. In practice a fair number of those nodes runs short jobs too so the frequency is even better. But then again in the absence of short and medium jobs, those nodes idle even if we have a long "long" queue.

Introducing one more queue would have a real effect only if we allocated a group on nodes that would run that semi-medium or short queue exclusively. That would only further increase resource fragmentation and encourage users to game the system by subdividing jobs which only increases the LSF overhead and wastes resources. We closely monitor the cluster load and job length and if a need shows up we will adjust queue length and node assignments, but we do not plan on adding more queues.

Disk issues:

... Not clear how the disk space is managed ***

NERSC response: Disk vaults are assigned to experiments based on their financial contribution. STAR subdivided their disk space between the physics working groups and Doug Olson (DLOlson@lbl.gov) should be contacted for more details. A list of disk vaults with their current assignments is available at: http://pdsf.nersc.gov/hardware/machines/pdsfdv.html. PDSF staff (by request from the experiments) does not interfere with how the disk space is subdivided between the users but if experiments wish we can run a cleanup script (details set by the requester). Currently this is in place on pdsfdv15.

Data vault are not pratical to use => IO requirements are just a patch not a true solution.

NERSC response: Indeed disk vaults have poor performance while serving multiple clients. Very recently we converted them from software to hardware raid which improved their bandwidth. We also brought in an alternative solution for testing, the so called "beta" system. PDSF users loved the performance (instead of couple tens, it could serve couple hundred clients without loss in performance), but such systems are much more expensive (currently factor of 4 at least) and at the end the experiments decided to go with a cheaper hardware raid solution. We are watching the market all the time and bring in for testing various solutions (like "beta" and "Blue Arc") and if anything that the experiments can afford comes by, we will purchase it.

*** High bandwidth node usage limited to very specific tasks.

NERSC response: High Bandwidth nodes (compute nodes with a large local disks - 280GB in addition to 10GB /scratch) are all in the medium queue and access is governed by the same set of rules as for any other compute node in the medium queue. The only restriction is the allocation of the 280GB disk where only the starofl account has write privileges. That is necessitated by the type of analysis STAR does and the experiment financed purchase of this disk space. If you are from STAR and do not agree with this policy, please contact Doug Olson (DLOlson@lbl.gov).

Need more disk space on AFS. AFS should be more reliable. I find AFS is very reliable from my laptop. It is much less for PDSF.

NERSC response: PDSF AFS problems result from afs cache corruptions. It is much easier to preserve cache integrity if there is only one user (like on the laptop) than tens of users (on PDSF). Heavy system use exposes obscure bugs not touched upon during single user access. To improve the afs performance on PDSF we are moving away from the knfs gateway model for the interactive nodes. The afs software for linux matured enough so that we just recently (the week of 10/14/02) installed local afs clients on the individual pdsfint nodes. This should greatly reduce afs problems and boot its performance.

Would like new functionality:

... however, using the new INTEL fortran compiler might be useful, since it increases speed by a factor 2 (or more) at least in INTEL CPUs.

NERSC response:The two largest PDSF users groups require the pgf compiler, so we can look at the INTEL FORTRAN license as an addition and not a replacement. Additionally INTEL FORTRAN does not work with the Totalview debugger, currently the only decent option for debugging jobs that are a mixture of FORTRAN and C++ on linux. Also INTEL licenses are pricey, but we will check what kind of user base is there for this compiler and see whether this is something we can afford.

Needs more resources:

Buy more hardware ! when STAR is running a large number of jobs (which is almost all the time!) it's a pain for other users..

NERSC response:PDSF is a form of cooperative. NERSC helps to finance it (~15%) All the groups get access that is proportional to their financial contribution. These "Shares" can be checked by issuing a bhpart command on any of the interactive nodes. STAR is the most significant contributor, thus it gets a high share. However, the system is not quite full all the time - please check our record for the past year at: http://www-pdsf.nersc.gov/stats/showgraph.shtml?merged-grpadmin.gif .
We did purchase 60 compute nodes (120 CPUs) recently and we are introducing them into production right now (a step up on the magenta line in http://www-pdsf.nersc.gov/stats/showgraph.shtml?lsfstats.gif ).
Also it helps to look at this issue in a different way. In times of high demand everybody is getting what they paid for (their share) and when STAR is not running, other groups can use the resource "for free".

Don't use:

What is it?

NERSC response:PDSF is a networked distributed computing environment used to meet the detector simulation and data analysis requirements of large scale High Energy Physics (HEP) and Nuclear Science (NS) investigations. For updated information about the facility, check out the PDSF Home Page.


Comments on NERSC's HPSS Storage System:   31 responses

Good system:

Generous space, quick network.

I really like hsi. ...

Gets the job done.

Flawless. Extremely useful.

Excellent performance for the most part, keep it up!

Not very useful to me at present, but it works just fine.

NERSC has one of the more stable HPSS systems, compared to LANL/ORNL/NCAR

I really love being able to use hsi, extremely user friendly.

This is a big improvement over the old system. I'm impressed with it's speed.

Really nice and fast. ...

HPSS us terrific

Easy to use and very efficient. We went over our allotted HPSS allocation, but thank you for being flexible about this. We plan to use this storage for all our simulations for years to come.

Everything is great, ...

It works well.

After the PVP cluster disappears the HPSS will still be my primary storage and archive resource.

very useful system with high reliability

Don't like the down times / downs need to be handled more gracefully:

... Downtime is at an inconvenient time of day, at least in EST.

The weekly downtime on Tuesdays from 9-noon PST is an annoyance as it occurs during the workday for users throughout the US. It would seem to make more sense to schedule it for a time that takes advantage of time differences --- e.g., afternoon on the west coast --- to minimize the disruptions to users.

... except that it goes down for 3 hours right in the middle of my Eastern Time day every Tuesday. How annoying.

It is unfortunate HPSS requires weekly maintenance while other systems are up. This comment is not specific to NERSC.

I don't like the downtimes during working hours.

The storage system does not always respond. This is fatal when the data for batch jobs is too large to fit on the local disk. I had several of my batch jobs hang while trying to copy the data from the HPSS system to temporary disk space.

Performance improvements needed:

cput for recursively moving directories needs to be improved both in speed and in reliability for large directories.

scp can be very slow for large files

Commands that do not require file access or transfer pretty slow, e.g. listing or moving files to different directory

large file ~10gbyes is hard to get from local desktop

Would like new functionality:

It would be very helpful, if the HPSS works in the background bufferd by a huge hard disc

would like to try srb as combined interface to hpss and metadata catalog

... What would be nice is a command that updates entire directory structures by comparing file times and writes the newer ones to disk (like a synchronization). Currently, I use the cput command but it doesn't quite do this. Having such a command would be a great help (maybe it cal already be done with a fancy option which I don't know).

It would be nice to navigate in the HPSS file system directly from PDSF via NFS (of course not to copy files but to look at the directory structure). This is done at CERN.

Hard to use:

too cumbersome to use effectively

Authentication issues:

The new authentication system for hpss seems to be incompatible with some windows OS secure shell soft. Since the change was mande I have not been able to connect using my laptop. I am still trying to get this fixed with the help of the support people here at LLNL, but no good news so far.

Other:

When questions arise and NERSC is contacted for guidance the consultants always come across as condescending. Is this intentional and for what purpose?

Don't use / don't need:

We are not using this at this time.

 


Comments about NERSC's auxiliary servers:   3 responses

Is there a quick way on Escher to convert a power point picture into a ps file without using "xv"?

I have never used these servers, would I need a separate account on these?

Never used.

 

Software

  • Legend
  • Satisfaction with Software (sorted by Average Score)
  • Satisfaction with Software (sorted by Platform)
  • Comments about Software:   26 responses

 

Legend:

Satisfaction 

Average Score 

Mostly Satisfied

5.5 - 6.4

Somewhat Satisfied

4.5 - 5.4

 

 

 Significance of Change 

significant increase

significant decrease

not significant

 

 

Satisfaction with Software (sorted by Average Score):

Software Category

No. of Responses

Average Score

 Std. Dev.

Change from 2001

PVP Fortran Compilers

34

6.47

0.71

-0.19

PDSF C/C++ Compilers

35

6.46

0.70

 

PDSF Fortran Compilers

22

6.45

0.74

 

T3E Fortran Compilers

49

6.45

0.84

0.05

PDSF User Environment

39

6.38

0.78

 

SP Fortran Compilers

141

6.36

1.10

0.40

PDSF Libraries

29

6.24

0.95

 

T3E User Environment

54

6.24

0.87

0.06

PDSF Applications

29

6.21

0.98

 

PVP User Environment

34

6.15

0.78

-0.10

T3E Libraries

39

6.13

0.92

-0.05

SP User Environment

165

6.12

1.13

0.05

SP C/C++ Compilers

76

6.11

1.09

0.39

SP Libraries

119

6.09

1.03

0.09

T3E C/C++ Compilers

25

6.04

1.10

0.11

PDSF General Tools and Utilities

28

6.04

1.04

 

PDSF Bug Resolution

24

6.00

1.14

 

T3E General Tools and Utilities

35

5.97

0.86

0.32

PVP C/C++ Compilers

14

5.93

1.21

-0.07

T3E Bug Resolution

26

5.85

1.08

0.15

T3E Applications

31

5.84

1.04

0.06

SP General Tools and Utilities

113

5.80

1.09

0.08

T3E Performance and Debugging Tools

32

5.78

1.07

0.22

SP Applications

90

5.70

1.18

0.03

PVP Libraries

28

5.61

1.37

-0.20

PVP Bug Resolution

15

5.60

1.24

0.50

SP Bug Resolution

78

5.59

1.31

0.14

PVP General Tools and Utilities

21

5.57

1.25

-0.36

SP Performance and Debugging Tools

112

5.49

1.28

0.80

PVP Applications

16

5.44

1.59

-0.39

PVP Performance and Debugging Tools

18

5.33

1.57

-0.73

PDSF Performance and Debugging Tools

28

5.25

1.65

 

 

 

Satisfaction with Software (sorted by Platform):

Software Category

No. of Responses

Average Score

 Std. Dev.

Change from 2001

SP Fortran Compilers

141

6.36

1.10

0.40

SP User Environment

165

6.12

1.13

0.05

SP C/C++ Compilers

76

6.11

1.09

0.39

SP Libraries

119

6.09

1.03

0.09

SP General Tools and Utilities

113

5.80

1.09

0.08

SP Applications

90

5.70

1.18

0.03

SP Bug Resolution

78

5.59

1.31

0.14

SP Performance and Debugging Tools

112

5.49

1.28

0.80

 

 

 

 

 

PDSF C/C++ Compilers

35

6.46

0.70

 

PDSF Fortran Compilers

22

6.45

0.74

 

PDSF User Environment

39

6.38

0.78

 

PDSF Libraries

29

6.24

0.95

 

PDSF Applications

29

6.21

0.98

 

PDSF General Tools and Utilities

28

6.04

1.04

 

PDSF Bug Resolution

24

6.00

1.14

 

PDSF Performance and Debugging Tools

28

5.25

1.65

 

 

 

 

 

 

T3E Fortran Compilers

49

6.45

0.84

0.05

T3E User Environment

54

6.24

0.87

0.06

T3E Libraries

39

6.13

0.92

-0.05

T3E C/C++ Compilers

25

6.04

1.10

0.11

T3E General Tools and Utilities

35

5.97

0.86

0.32

T3E Bug Resolution

26

5.85

1.08

0.15

T3E Applications

31

5.84

1.04

0.06

T3E Performance and Debugging Tools

32

5.78

1.07

0.22

 

 

 

 

 

PVP Fortran Compilers

34

6.47

0.71

-0.19

PVP User Environment

34

6.15

0.78

-0.10

PVP C/C++ Compilers

14

5.93

1.21

-0.07

PVP Libraries

28

5.61

1.37

-0.20

PVP Bug Resolution

15

5.60

1.24

0.50

PVP General Tools and Utilities

21

5.57

1.25

-0.36

PVP Applications

16

5.44

1.59

-0.39

PVP Performance and Debugging Tools

18

5.33

1.57

-0.73

 

 

 

Comments about Software:   26 responses

9  

Debuggers / performance analysis and other tools

9  

Unix environment (shells, editors, GNU, modules)

6  

Compilers

6  

Applications (math, viz, chemistry)

3  

Satisfied

1  

Libraries

Debuggers / performance analysis and other tools:

The present default version of Totalview on the IBM SP does not work correctly with our code, making debugging very difficult. We had been using the previous version, which did work but is no longer available. We would be very grateful if it were returned to service.

Losing Cray's totalview and debuggers is a real loss as they were very superior to what's available on the Cray's. I also don't like XProfiler too much. ...

... The debugging tool was a bit tricky to use, and I should probably come to one of the classes/training sessions. This would allow me to really use the debugger as a tool.

Totalview runs waaaaay too slow when I run it on seaborg. I think it is slow because my network connection is not fast enough to run such a windows intensive piece of software. I would love it if there was some kind of text based debugger I could use like dbx.

More software tools to analyze results (particularly in the netcdf format) would be very helpful. Tools for performing quick visual checks of model outputs in netcdf format (simple stuff like ncview) are missing. Having some of these tools avaialble on seaborg will cut down on ftping in out of HPSS.

My dissatisfaction with the IBM SP is mostly related to tools and compilers provided by IBM. The ACTS group is making good progress putting together a useful tool set for applications.

I think the software could be expanded to include more modules, including other text editors, graphics programs, debuggers, etc. Of course, this should not compromise the performance of the main machines. If I have to choose, I still prefer performance.

Need to have real debuggers that are GUI. [PDSF user]

I am not aware of debugging utilities on PDSF other than gdb and I find gdb very unsatifying. I had looked for some alternative on the PDSF web pages, but I haven't found any documentation indicating a different way to debug FORTRAN and/or C/C++ code.

NERSC response: There are not many alternatives to gdb on linux. STAR is looking into purchasing Totalview (which has a GUI interface and works well with both gcc and pgf). The licenses are expensive ($5700.00 for 8 floating licenses +20%/year maintenance fee - that is a cost of 1TB of disk vault), but we will consult with all the group representatives to find out whether they are interested in financing it.

Unix environment (shells, editors, GNU, modules):

It would help me enormously if you had emacs or some other text editor other than VI on the log in node. I hate VI and have to do editing locally and do more file transfers than I would like. When debugging, it is useful to make quick changes on the local computer.

... A simple editor would be useful for interacting with users who don't know vi or emacs; e.g., pico is pretty good.

... including other text editors, ...

It would be nice if the default environment setup was a little more user friendly (i.e. due to the ubiquity to emacs as an editor I was shocked that I manually had to load the module on startup, etc.)

... I am happy you have made the GNU packages available and I use them every time I log in.

The environment is extreemly difficult. No command history, the term caps aren't set to common keyboards (no arrow keys for command line). I think that these problems can be fixed if I fiddle with the settings, but it seems that it would make more sence to just include them especiall since most UNIX OS's use them as standard (Linux, Irix, DEC).

hard to know which modules to load -- guessing game

I cannot get a reasonably modern ksh. ... [seaborg user]

cray T3E needs better shell like tcsh

Compilers:

The fortran compiler on seaborg is a bit slow, but overall it is good. ...

Our application seems to perform particularly badly on SP machines, where we typically get around 6% peak floating point performance. We've worked at applying various compiler optimisations, but have not succeeded in improving the performance beyond this limit. This is in contrast to other platforms (ia32, compaq, hitachi) where we typically run at 15-25%.

Current version of IBM FORTRAN compiler has a bug. I reported this, but they don't fix it.

My dissatisfaction with the IBM SP is mostly related to ... and compilers provided by IBM. ...

Compiling and running was really very easy. There are lots of little issues to be uncovered as time has passed, but each has been straighforward. ...

... How about being able to use g++ in parallel? ...

Applications (math, viz, chemistry):

It would be very helpful to have matlab since most of my analysis routines are written in matlab and then I wouldn't have to transfer the data to another machine to analyze it.

matlab on seaborg?

Get matlab (is it already there?)

Some ab initio molecular dynamics code (as VASP) will be helpful

... Need a better DX with more modules.

... other graphics programs ...

Satisfied:

I use mostly IBM and I am very satisfied with software performance on this machine

All software I need is available.

My needs are limited to a C compiler and MPI. Everything I have used has been excellent.

Libraries:

Need lapack and scalapack 64 bit libraries!

Training

  • Legend
  • Satisfaction with Training
  • How Useful are these resources for training in HPC?
  • What training methods would you like NERSC to offer?   161 responses
  • Comments about training:   22 responses

 

  Legend:

SatisfactionAverage Score
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4

ImportanceAverage Score
Very Important 2.5 - 3
Somewhat Important 1.5 - 2.4
Significance of Change
significant increase
not significant

 

 

Satisfaction with Training:

Training Resource  No. of Responses Average Score Std. Dev. Change from 2001
New User's Guide 138 6.21 0.95 0.27
NERSC Online Tutorials 108 5.97 1.06 0.00
NERSC Training Web Pages 99 5.89 1.15  
Slides from classes 60 5.33 1.35 0.18
NERSC Training Classes 40 5.12 1.45 -0.42

 

 

How Useful are these resources for training in HPC?

Training Resource  No. of Responses  Average Score  Std. Dev.Change from 2001
Online Tutorials 60 2.52 0.68 -0.03
Slides from classes 45 2.13 0.79 0.03
Classes 41 2.00 0.89 -0.07

 

 

What training methods would you like NERSC to offer?   161 responses

Training Method No. of Responses Percentage
General online web documentation 144 89
Online web tutorials on specific topics 126 78
Live DOE Access Grid Classes 21 13
Live web broadcasts with teleconference audio 19 12
Live in-person classes at your site 19 12
Live in-person classes at LBNL 17 11

 

Comments about training:   22 responses

The training topics users requested most frequently are optimization, debugging, performance analysis, and parallel programming.

 

11   Suggested topics
4   Didn't use / didn't know about training offerings
4   Use online documentation for training
2   New user concerns
2   Problems with training offerings

 

Suggested topics:

performance analysis and optimizing debugging

system specific debugging and performance optimizations

machine specific issues, i.e. performance, use of software

As much debugging and compiler (eg. useful flags, specific optimisations) information as possible is always useful.

Profiling and optimization classes would be very useful.

Optimization of applications.

Parallel programming

At the present I don't have any needs, but in the future (5+ years) I intend to write software and will need training to program parallel codes.

The NERSC webpage is a great place to find training material. I would like to learn more about good programming (something in the style of your "good programming practices" section) and about code optimization.

pages on porting are important

I always find it useful to see simple, specific examples of how to run jobs, use software packages, etc.

 

Haven't used / didn't know about training offerings:

I was not aware of all these useful sources of information. Will use them more in the future.

We have not yet had experience with this, but will do so soon.

We are perhaps not the best folks to ask about this as we have not actually used the training domain very much.

i was not aware some of the remote training options were available or how to get to them

 

Uses online documentation for training:

I don't need this. Generally I learn what I need from the web.

For various reasons, I never go to actual classes at LBNL for all the training stuff. One reason is that one wants to avoid repeat material. Having the class stuff on the web as html (not just powerpoint which linux machines can't read) is great and more useful in the long run.

I wish I would come to some of the training classes, but it always seems as if I've already got too many things going on. I'm sure I could be using the NERSC facilities more effectively with a little guidance. But the web site has been great at making the training an after thought rather than a necessary step.

 

New user concerns:

It would be useful to new users if they were sent an email containing information about the resources available - for instance, which machines are available to them and for which machines their usernames and passwords are valid, and maybe a summary of what the different machines are generally used for (development/visualisation..). For example, I was not aware that visualisation services are available at nersc - I'm sure this information is located on the website but if you're not looking for it you're not going to see it - even though visualisation services would be of interest to me.

Make sure new users know what aids are available

 

Problems with training offerings:

The one class I tried to follow via teleconferencing was finally cancelled 30 minutes after its supposed start when Access Grid could not be made to work. Perhaps attention should be paid to insuring that such facilities work before classes are scheduled with them.

Remote training classes would help those not being able to travel.

 

 

User Services

 

Legend:

Satisfaction 

Average Score 

Very Satisfied

6.5 - 7

Mostly Satisfied

5.5 - 6.4

 

 

Significance of Change 

not significant

 

 

Satisfaction with User Services:

Topic 

 No. of Responses

Average Score

 Std. Dev.

Change from 2001

Timely response

207

6.51

0.89

-0.05

Technical advice

205

6.47

0.85

0.01

Time to solve problems

198

6.40

0.95

 

Response to special requests

136

6.40

1.03

0.17

Followup to initial questions

188

6.39

0.97

0.02

RightNowWeb interface

109

5.94

1.14

 

 

 

 

Comments about Consulting and Account Support:   21 responses

14  

Good service

5  

Mosty happy, mixed evaluation

2  

Unhappy

Good service:

Consultants are wonderful. On one ocasion I was sent email about a bug in my batch job file before I even knew there was a bug. Very professional service.

Consultants at NERSC do a wonderful job!.

Getting better and better with time. Resolution of obscure or hard to detect problems is now outstanding. We had a strange segfault problem that was solved by going to 64bit address mode, I'd never have guessed that.

Everything fine.

Excellent support services from everyone. I'm especially happy to have Iwona as a resource at NERSC. She knows her stuff, is always helpful and tries to respond quickly. Thanks!

Account Support Services were very helpful. Thanks.

Wonderful!

NERSC User Services are BY FAR the BEST we have worked with, compared to a large number of other supercomputing sites we have worked with. Superb work, terrific responsiveness, in all aspects of our interaction with the team. In particular, F. Verdier is doing an exemplary job as the Leader of the team we are interacting with.

My one-on-one interactions with NERSC staff have been great! Thank you very much.

I have been very happy with the technical support for the ACTS toolkits. My requests have been fielded in a prompt and accurate manner.

The consulting are outstanding. I have worked at many centers. NERSC consulting services are the best I have encountered.

For what I use, NERSC gives me great service. I appreciate it.

It seems to me that there's been a great improvement in the services in the last year or two. Keep up the good work!

very good service

Mosty happy, mixed evaluation:

For the most part, the consultants have been great. I hoped to get more help on code optimization, but this seems to be very far down the list.

Consulting questions tend to be tough. So, even though I would like better answers I believe the consultants do as well as can be expected.

The staff is very prompt and very fast and as helpful as they can be. My only problem is sometimes they can't provide the answers I wanted (which may be more a function of the question but I lack sufficient expertise to judge).

Only in one instance, and this was a desire to be pointed in the right direction for debugging, did the technical consultant not help much, but in all other cases I've been very satisfied.

It sometimes takes awhile to resolve our detailed technical issues, but this is probably inevitable with a system as complex as NERSC.

Unhappy:

I work in France, so my working time corresponds to night for you. Therefore, it is more difficult to consult.

I reported a bug in IBM FORTRAN compiler, but they don't fix it. This is not good.

 

Comments about NERSC

 

What does NERSC do well?   92 responses

49  

User support, good staff

42  

Well run center, good hardware (specific systems not specified)

16  

Documentation

13  

Job scheduling / batch throughput

10  

Seaborg

8  

HPSS / data storage

7  

PDSF

7  

Software / user environment

2  

Training

 

What should NERSC do differently?   66 responses

10  

Job Scheduling

7  

Allocations / accounting

7  

Software

7  

Documentation

7  

Provide more computing resources

6  

Provide better hardware

5  

Keep the PVP Cluster / maintain a PVP resource

5  

Training / new user services

4  

No need for change / no suggestions

3  

PDSF

3  

More outreach / collaborations

2  

More interactive services

2  

Networking

1  

HPSS

1  

No inode quotas

 

How does NERSC compare to other centers you have used?   62 responses

36  

NERSC is the best / very good / overall NERSC is better

11  

NERSC is the same as / mixed response

8  

No comparison made

7  

NERSC is less good


Notes:

  • Comments displayed in green are repeated in multiple categories.
  • Comments with ... have been split and only part of the response is displayed in a given category.

What does NERSC do well?   92 responses

User support, good staff:

High level of technical expertise!

... The staff is friendly, helpful and proactive in developing training on new hardware and making debugging and performance tools easier to use.

People at NERSC that I interact with are great! They take pride in their work.

your consultants are your best feature; ...

User Services are great. The people are extremely competent and dependable, and a pleasure to work with.

Technical and general consulting services are fantastic.

The overall job scheduling (load), user support, and documentation are the best in the world.

Support users and applications. Willingness to work with specific applications that need special access or help. Availability of support personel.

Your consultants and account folks have been very responsive, helpful, informative, and easy to work with.

NERSC's response to users needs and consulting are the best.

Provide information (as on web sites), accounts management, consulting.

Consulting, web-site, provision of a lot of good relevant information.

Support. Hardware availability.

Consulting and web resources.

Among the supercomputing facilities I tried until now, NERSC excells in most aspects. I am most satisfied with the overall stability of the system. This must come from the outstanding competence of the technicians.

Good response and professional service.

... Information services, training, account management, and consulting are all excellent.

Accuracy of information, platforms available and support.

Excellent user support, good hardware selection.

I really appreciate the job fron consult. They always did their best to help me to resolve my technique problems, especially at starting to use seaborg.

User support is great.

Usually can add users quickly, get problems resolved quickly. Seems very "customer oriented". ...

... Account support has also been very good. I also appreciate the seeming concern about security.

hardware management and consulting services

I only use PDSF and HPSS but am very happy with all aspects, uptime, disk volume, batch processing, expert help ....

1. Excellent User services and support. ...

Consulting and supporting services.

... consulting service ...

... After a few initial problems getting the code compiled and running (the support was very helpful in this) we haven't seen many problems. Our project seems to have been given an allocation outside of the usual procedure, as far as I understand. We also had some special requests for larger disk allocations, and I think a problem with the number of inodes which was limiting us. These things were resolved, and we were very impressed by your ability to make accomodations for these special requests.

The support we get from the PDSF staffer is superb. Keep up with the good work!

... account creation ...

Good consulting services.

NERSC user services have been very helpful. ...

Very good hardware and support.

NERSC functions amazingly well and seems to have a user-oriented focus that is very refreshing to deal with. My colleagues and I generally feel that we are not taking full advantage (due to time constraints) of all that NERSC has to offer.

... And the consultanting and help system is fantastic. Keep it up!

... Its consulting services are outstanding.

Consultant service is great.

... Good consulting support.

consulting and account support services excellent.

Web site, consulting, account queries

Human service is great.

I have always found consultants helpful. ...

... Account support

Consulting is generally good. NERSC seems to be better at not "fixing things not broken" than the previous LLNL management

User support and interaction.

Consulting.

Support. ...

Excellent user support, ...

... Good consulting support.

Well run center, good hardware (specific systems not specified):

This is one of the best centers I ever used and I've used many in different countries.

Very reliable high-performance computing environment. Thanks!

Among the supercomputing facilities I tried until now, NERSC excells in most aspects. I am most satisfied with the overall stability of the system. This must come from the outstanding competence of the technicians.

NERSC provides access to very high performance computing facilities on a platform with a simple interface, making development easy. ...

The computers work well and consistently. ...

Machines are available for use most of the time.

In the little time that I have been working with NERSC, everything has run very smoothly.

Avalability of resources

provide good facilities that are useable after some learning

Provides state-of-the art parallel computing platforms, and a batch environment which allows us to use our allocations in a timely fashion.

Facilities are fairly easy to use. Computation speed is very good. System reliability is good.

I have the best experience working at NERSC.

I think that all around NERSC is the most well run easy to use computer center that I have encountered so I really have no complaints (except that I would like to see emacs on the login nodes)

The machine is powerful

I just run software that other guys have developed, but I think NERSC has been very reliable and consistent over the past few years that I have been a user.

Support. Hardware availability.

The machine stays up and I can connect from anywhere with ssh.

As an all-round scientific computing center, NERSC does an excellent job.

high performance supercomputing and data storage.

Accuracy of information, platforms available and support.

Excellent user support, good hardware selection.

It has very good processors.

keeps machine up & running for a smooth working environment

The available hardware and software is very good. It meets my needs well. ...

In general, I'm very happy with NERSC services.

hardware management and consulting services

Keeps the systems running.

... and less downtime of the hardware

The overall management is excellent. ...

The system runs well and seems well-maintained

... I'm happy with the quality of the hardware.

NERSC has given me access to computing resources not available at my own institution.

Very good hardware and support.

NERSC manages its computers and mass storage system in excellent fashion.

supporting large computers.

Provide a lot of computing power, fast, secure, and always up. Really, I'm bottlenecked at the analysis side rather than the simulation side. ...

The flexibility and availability of computing resources. ...

New computing capabilities. Large scale capacity production computing. ...

I think overall NERSC provides very good support.

NERSC continues to provide consistent, high quality MPP capability and access. It is one of our primary production resources. ...

Lots of processors. Fast disk I/O on /beta disk. All disks should have this type of I/O. I do not know why you continue to use dv27-type disks when /beta is much more superior.

Big fast machines with large memory and excellent fortran environment. NERSC should continue to increase the speed and memory of its machines as the techonolgy evolves.

Documentation:

The overall job scheduling (load), user support, and documentation are the best in the world.

Provide information (as on web sites), accounts management, consulting.

Consulting, web-site, provision of a lot of good relevant information.

Consulting and web resources.

... Information services, training, account management, and consulting are all excellent.

Accuracy of information, platforms available and support.

... There is an abundance of documentation I have benefited from. ...

... online documentation pretty good

Once you get the hang of them (see below), the Web pages are generally informative and the tutorials are well written.

Information available on web pages, ...

... I have also found the online web information to be useful and clearly presented.

... Once you get the hang of them (see below), the Web pages are generally informative and the tutorials are well written.

... NERSC web pages are really, really good and well organized (except for the IBM web pages which are a disaster, but that is not your fault). ...

... I appreciate the good information on the web site as well, this is by far the most useful supercomputer web site I've dealt with.

Web site, consulting, account queries

Accuracy of information, platforms available and support.

Job scheduling / batch throughput:

The overall job scheduling (load), user support, and documentation are the best in the world.

HPSS (connection speed & space), interactive queues, hsi

I am very pleased with the performance of seaborg and the queuing system.

Provides state-of-the art parallel computing platforms, and a batch environment which allows us to use our allocations in a timely fashion.

... Reasonably quick throughput on batch jobs.

Turn-around time is very good, especially for "heavy-duty", very massively parallel jobs.

... 2. Medium to large scale parallel computing (# of CPUs per job) are both feasible at NERSC, in contrast to other supercomputing centers, where only extremely large scale computing is practically possible (queue policy is a key factor here).

Can do a lot of jobs at one time.

... The queues work efficiently, and the queue choice are well balanced. ...

... The way Seaborg is now managed makes it a pleasure to use it -- jobs usually run smoothly with less baby-sitting than at any other powerful system that my group has used. The queue structure works well. And the filesystem is also excellent.

... The priority configurtaion in Seaborg is very reasonable.

On Seaborg I find the job queues much faster and better managed than the system used on for instance Killeen in previous years. Right now I am mostly running small jobs, I am hoping this remains true when I start submitting long jobs.

The queuing system, performance of seaborg is what I am pleased with the most. ...

Seaborg:

I am very pleased with the performance of seaborg and the queuing system.

... The stability of the IBM SP in particular if compared to some large clusters is great.

I'm very pleased with the new SP machine.

The IBM-SP is a very useable machine. It is reliable, it performs well and is well configured. ...

The SP seems a good solid machine on which to do reliable runs over many processors. We've had much worse experience with large PC clusters, and it's been great to run on seaborg. ...

... The way Seaborg is now managed makes it a pleasure to use it -- jobs usually run smoothly with less baby-sitting than at any other powerful system that my group has used. The queue structure works well. And the filesystem is also excellent.

I am only a recent user of seaborg and to date I have been very happy with the system hardware/software and its performance. ...

The seaborg SP is a GREAT machine! It is well run, responsive, fast, with lots of storage and good run time. ...

... In general pvp cluster and IBM SP seem to be run well and effectively.

The queuing system, performance of seaborg is what I am pleased with the most. ...

HPSS / data storage:

HPSS (connection speed & space), interactive queues, hsi

high performance supercomputing and data storage.

I primarily use HPSS and PDSF. They both have performed very well.

I only use PDSF and HPSS but am very happy with all aspects, uptime, disk volume, batch processing, expert help ....

Every aspect concerning my STAR data analysis, from account creation to data archive and getting analysis result especially submitted batch jobs to the PDSF linux cluster were processed timely and successfully.

NERSC manages its computers and mass storage system in excellent fashion.

File storage systems ...

... The HPSS storage system is also a execellent aspect of NERSC.

PDSF:

I primarily use HPSS and PDSF. They both have performed very well.

NERSC has a wonderful set of high performance linux clusters. Building, maintaining, upgrading and user policies make my work infinitely easier than working elsewhere.

I only use PDSF and HPSS but am very happy with all aspects, uptime, disk volume, batch processing, expert help ....

I mostly use PDSF to do batch analysis of STAR data, then I transfer the reduced volume set to a local computer for detailed analysis.

PDSF

Every aspect concerning my STAR data analysis, from account creation to data archive and getting analysis result especially submitted batch jobs to the PDSF linux cluster were processed timely and successfully.

well working pdsf cluster with almost 100% uptime, no crashes

Software / user environment:

The consistent use of the "modules load" program to be able to switch between different programs/versions of a program is great.

NERSC provides access to very high performance computing facilities on a platform with a simple interface, making development easy. ...

The available hardware and software is very good. It meets my needs well. ...

... libraries ...

I am only a recent user of seaborg and to date I have been very happy with the system hardware/software and its performance. ...

Support users and applications. Willingness to work with specific applications that need special access or help. Availability of support personel.

... excellent fortran environment ...

Other:

Make up long surveys

No advice to give, we're still beginning.


What should NERSC do differently?   66 responses

Job Scheduling

Reduce wait time for long queues.

Large jobs will sit in queue too long and are effectively impossible to run. Thus from a user perspective NERSC is not a large machine. It is a small machine! This seems contrary to the intent of NERSC.

Need longer time limit for big jobs (>1600 processors)

The 8 hours queue limit on the regular queue is restrictive. The regular_long with 24 hours sounds useful, but has only 32 nodes available.

install a longer than 24 hr queue on seaborg. much longer time slots (~1 week) should be available in the queues.

Add longer queues that will allow for longer runs on fewer processors. This will cut down on waste. Perhaps require special permission for these queues if it is a problem.

I guess it would help us if the 8 hour queue would change to a 10 hour queue, as long as throughput was not too adversey affected.

... You need to offer a long queue (at least 4-8 hours) for serial jobs. Much post- processing of large parallel calculations cannot or has not been parallelized. There are also still many linear algebra/math library operations for which parallel codes have not been developed. With the demise of the Cray SV1's, there is also a need for a serial queue to run these applications on. Even when parallelization is possible, the human time involved in making all of these codes immediately parallel is expensive; computer time is not. Although some of these calculations can be done on local platforms (i.e., Linux) this is of limited use since local resources can quickly become overloaded and we may not have all of the needed libraries installed locally.

allocation and structure of queue waiting

Walltime limit ?

... and it would be nice to have a better grasp of anticipated wait times for submitted jobs.

The operators or consultants should monitor the status of the batch queues more closely. You have a wonderful visual tool at qstat/llq_seaborg - you should use it. On two occasions this spring I noticed problems that you should have caught.

Allocations / accounting:

Streamline the allocations process and make it more equitable. ERCAP proposals are nearly as complex and elaborate as contract proposals. Respect user privacy regarding allocations and time used. Balance privacy and public scrutiny?

The allocation process could be less opaque.

... The allocation process could be further streamlined. Find some way to allow more mid-year flexibility in the resource allocation process. Get away from the current allocation model (which still tends to encourage a "use-it-or-lose-it" mentality). Perhaps you should consider under-allocating and then letting some fraction of the remaining time be made available in sort of a free-for-all mode. ...

... Also the allocation process is not very transparent. Fusion (the heavy users in nonlinear MHD and gyrokinetics) have run completely out of time in July.....even though we have got about 50% more than initially allocated. We are sitting on our hands until October. We should have got more time....but this also ties in to

allocation and structure of queue waiting

The application to obtain a startup allocation was a bit lengthy. ...

The charging of the accounts is rather artificial, especially on IBM SP the real computing time is usually much smaller than the charged time.

Software:

Improve the UNIX environment to be more user friendly -- more like the standard Linux bash command line.

Provide better UNIX shell

How about allowing tcsh?

Add more functionality via grid services.

... It might be useful to have a number of frequently used application codes up and running, e.g., some large electronic structure codes, Car-Parinello codes, etc. Some of these applications are now work-horses and accessible to many users.

Maybe more application software although I should continue to explore to make sure that I just can't find some things I would like to use.

Application software for analyzing model outputs is lacking. Tools for quick analysis of model outputs will be really useful.

Documentation:

information on how to use libraries

... The more online documentation, including specific examples, the better.

The technical documentation on the NERSC web site is not always easy to find or to parse. Often one is directed to the IBM web pages which are generally far less useful than the NERSC originated pages. This was particularly an issue for me in initially porting my code to the IBM-SP2 and trying to understand options/flags/limits etc for the MPI C compiler.

I think the web pages are very good in some spots, but incomplete and difficult to locate for other topics. Perhaps the web pages need to be reorganized or linked together differently.

improve web documentation of IBM SP.

My suggestions are pretty minor(1) web pages are sometimes a big convoluted and the information is hard to find; this is relatively rare, but does happen so it comes to mind. Mind you, on the whole

The information webpages could be organized in a better fashion, ...

Provide more computing resources:

Expansion of IBM SP system, more hours allocated

Add more computing resources (more CPU's, faster CPU's, more disk space, more memory, etc.) ...

The system is generally fairly crowded; perhaps adding more CPU's would help.

more computation power, as always!

Increase hardware capacity at least 10 folds and compare with the total capacity of the NSF centers.

The major way in which NERSC could improve would be to upgrade its computing facilities more aggressively. It is surprising to me that NERSC trails the NSF centers in total computing power.

Just continue to buy the processors and disks (/beta type). The more the better.

Provide better hardware:

Processors on IB SP are getting relatively slow.

Machines with higher memory bandwidth, shorter memory latency would greatly improve performance on my codes.

faster machines (as usual!). A good balance is important, presently the CPUs are too fast for their I/O or memory bandwidth.

The present class of machines at NERSC is geared to embarrassingly parallel codes that can be run in short increments. To provide useful supercomputing capability for less parallelizable codes (10s to 100s of processors), more emphasis should be placed on single-processor speeds (vector nodes), and much longer time slots (~1 week) should be available in the queues.

Availability of large-memory not-too-massively parallel machines would continue to be a useful resource.

Big picture maximize overall research productivity in a time-share envionment....this means proving more capacity (FLOPs per year) via "cluster center" (many not-connected identical clusters of 128-256ps size) and only offering "supercomputers" (2048ps) to the limited use (> 256ps) for which they were intended. ...

Keep the PVP Cluster / maintain a PVP resource:

NERSC needs a replacement for the PVP cluster. ...

Once you get rid of Killeen, I may never use NERSC again.

Provide access to some PVP systems elsewhere. Maybe DOE should do this. The older systems are useful for some particular applications.

The PVP cluster is a useful resource that should not disappear.

Maintain a PVP capability.

Training / new user services:

See previous comment about sending more information to new users. [It would be useful to new users if they were sent an email containing information about the resources available - for instance, which machines are available to them and for which machines their usernames and passwords are valid, and maybe a summary of what the different machines are generally used for (development/visualisation..). For example, I was not aware that visualisation services are available at nersc - I'm sure this information is located on the website but if you're not looking for it you're not going to see it - even though visualisation services would be of interest to me.]

If there is a way to let the new user know how to set up the scripts to run batch jobs, know how to understand the error messages during compiling.

NERSC should go almost exclusively to web tutorials and web based training. Courses on site are just out of the question to go, since it is much too expensive to fly to the west coast.

Can they give us a tip once a week in using seaborg or programming or a little message once a week to indicate what seaborg can do?

Training activity could be increased.

NERSC response: NERSC is offering monthly training classes on the Access Grid Node. These are announced in advance on the hpcf.nersc.gov home page.

No need for change / no suggestions:

Excellent center, I wouldn't change anything

Nothing specific comes to mind. When problems do arise they are handled promptly and professionally.

No suggesions

Nothing comes to mind.

PDSF:

Pdsf=> batch queue and IO priorities need to be clearly defined.

NERSC response: There is a section in a tutorial for new users http://pdsf.nersc.gov/talks/startut050302/PDSFstarTutorial3_files/v3_document.htm) that explains how system calculates priorities. That section will be linked from the batch page as the issue of priorities is of great interest to many users.

There is no IO priorities.

There is a terrible interface with desktop computers. I cannot access my PDSF files with a GUI interface. There ways to do it such as NFS or SAMBA, but NERSC does not implement them because of security reasons. There needs to be a way for me to access my NERSC files as if they are on my laptop. AFS works well, but NERSC only supports that access for a small amount of disk space. Modern computing uses a GUI interface. NERSC needs to support that interface.

NERSC response: Security threats indeed disrupted development of software that was unifying the desktop and the compute environments. There is a new generation growing though based on the Globus toolkit. We are monitoring this development but it is not ready for the production yet. Nevertheless it is important for users to follow those developments so that the projects develop to their liking (http://www.ppdg.net/pa/ppdg-pa/idat/ and http://mrccs.man.ac.uk/research/grenade/ ). Especially the Grenade project closely matches user's requirements:
"The initial prototype will use Globus to extend the functionality of the popular KDE desktop for Linux. KDE is open source and has an architecture ideal for our purposes, featuring an XML-based framework for componentisation. KDE is layered on Qt, a cross platform C++ GUI framework featuring an elegant, powerful signal/slot mechanism for communication between components. The first demonstrator will feature single sign-on, resource discovery, and a file browser for remote file systems (we will teach KDE's browser Konqueror to speak the grid-ftp protocol). An RSL GUI and drag-and-drop job submission will follow."

The frequency of disk failure at pdsf should be improved, if possible.

NERSC response: The hardware failure rate for hard drives is comparable to the average failure rate for commodity hard-drives. It may appear artificially high as we have a large number (>>600) of them. However, in order to minimize impact of those failures on our users we closely monitor the market and adopt solutions that ease that allow for transparent drive replacements. Right now (mid-October 2002) we are finishing a conversion of the disk vaults from a software to a hardware raid (~33 done and ~5 more to go). That not only improves their performance, but also makes them more reliable and easier to replace hard drives without affecting the users.

More outreach / collaborations:

Possibly more could be done to help remote users make the best use of NERSC. Possibly many users would benefit from periodic visits, as opposed to classes. ...

It is expensive to come as a university researcher and allocate time. I think NERSC needs more partnerships on the best science areas.

communicate with other HPC closely. Sometime, solutions to problems on the same type of machines could be copied from other HPC. It will save time and human resources.

More interactive services:

See below. [Consider dedicating a few nodes to interactive shared use for debugging purposes. SDSC does this; performance on the shared nodes is assuredly terrible, but you can at least fire up totalview on a parallel job without loadleveler rejecting it due to lack of free nodes. This situation has become the bane of my existence at NERSC in recent months.]

My only real complaint is about not being able to run interactively often on Seaborg. Perhaps this congestion could be mediated by (if possible) allowing different jobs to be run on the same nodes. Instead of occupying all 16 processors on a node when I am only using 4 processors, could the other 12 be used by someone else?

Networking:

faster connection w/ mcurie (T3E)

need better Xwindows access; too slow to actually use it from here

HPSS:

Improve the reliability of the HPSS storage system. This might have been done already as I have not used the storage system since I had problems several months ago.

No inode quotas:

... NERSC should remove i-node quotas on all machines. (More disk space would also help).

Other:

Make shorter surveys

kick out stupid users, they never learn

Most of my suggestions have been made already and are really quite minor.


How does NERSC compare to other centers you have used?   62 responses

NERSC is the best / very good / overall NERSC is better:

NERSC is the best I have used.

Very favorably.

NERSC has a better SP batch environment than NPACI, where it is difficult to use ones allocation. NERSC, in theory, provides much more permanent disk space than NPACI. NERSC gives users better access to consulting staff than NPACI. NPACI has unlimited allocations on their HPSS. However, they do not allow external access to their HPSS unlike NERSC.

Excellent.

You provide the best computational and users services. I use NCSA, SDSC and PSC.

Very well!

Nersc is certainly better than Idris (the center of CNRS-France), although consulting is easier (hour and language)

Nersc has beetter support (because it is dedicated support), and a lot of batch machines available (compare to rcf). rcf (Brookhaven), cern, in2p3 (Lyon, France)

Compared Livermore Computing at LLNL, NERSC is doing a great job.

NERSC is the best I know of.

I also use SDSC and many local (university) centers. NERSC has a huge lead on all of them. SDSC is a distant second.

I also use RCF at BNL. Compared to RCF, NERSC performs much better. RCF tends to be overloaded and they often have problems with their mass storage.

I think NERSC compares very well to other centers that usually do not have such a high standard of service.

NERSC compares very well with other centers. As I said, the major problem with NERSC is system crowding; this is likely the result of the high demand caused by the quality services you provide.

PDSF beats RCF for uptime, usability, stability, support. I much prefer to work at pdsf, even though I work on RHIC experiments and RCF is the main computing hub.

the only other similar system I've used is the Jlab High Performance Cluster; NERSC (PVP cluster) is far better in every way except pure horsepower

We have extensive experience with several NPACI and some DOD supercomputer centers. In all honesty, I think these centers have many things to learn from NERSC, whereas I cannot think of anything these other centers implement that could make NERSC better than currently is. One thing that NERSC should NOT do, as unfortunately many NPACI centers do, is to keep increasing its "favoritism" towards jobs requiring an ever-increasing number of CPUs. These other centers have failed to understand that there are specific areas of excellent science where linear scaling with huge # of CPUs is merely impossible, because of the nature of the problem dealt with, not because of code deficiencies. I strongly encourage NERSC to stay away from that logic. Excellent Science needs both super-large but also medium range parallel computing.

Other computing services I have used have been operated by non-professional staff or pretty small in size. The NERSC does an outstanding job of maintaining the system.

NERSC systems operations are much more reliable that BNL computing facilities.

It is definetily more reliable than RCF at BNL.

Far superior to Brookhaven.

The availability of software at NERSC and OSC is superiour to NCSA.

I think the quality of service is very high here at NERSC

terrific! look at the poorly working rcf at BNL.

Of all the ones I used, it is the best. The old Pittsburgh supercomputer center had equally good (if not better) consulting/help, but no good web pages and the computers weren't as good. The other center I use a lot was the SDSC bluehorizon. What

NERSC gave a larger environment on which to test my software than NCSA(UIUC) did.

I've used the SDSC, and a local beowulf. I must admit beowulfs equal the computational speed of NERSC, but the storage and queue system make NERSC much easier and efficient to use. I think I used the NERSC web site just as much as the SDSC one when starting to use the SDSC facilities, a testament to the wealth of information you provide.

NERSC is the top high performance computing center in my opinion. I also compute at NCAR, ORNL, PSC, and LANL. No other center provides the response or support that NERSC does.

You have done much better in Seaborg than the San Diego Supercomputer Center does in BlueHorizon (IBM/SP) in configurating the priority of the jobs in the queue. SDSC highly favors the big jobs using many nodes while those using a single node sometimes have to wait days even in the high priority queue.

Compares very well. (San Diego)

Compared to TACC, SDSC, ORNL, PNNL NERSC has a clearer web site and a team that responds most readily to any problems.

NERSC is one of my favorite centers to use and work with. ORNL, NCAR, PSC, LLNL, LANL

Superior to RCF

Overall NERSC does a good job. I have used machines at Argonne, NCSA, and Cornell, but NERSC has been the most professional operation. However the great number of users makes it difficult to get things done at times.

Better than SDSC in consulting proficiency.

much better. for example compared with ABCC of National Cancer Institute

NERSC is the same as / mixed response:

In comparison to LANL's ACL, NERSC is a full-support operation but far less flexible to the external user.

Consulting service is better than that of PSC and San Diego, but PSC has more cycles available and each processor is faster.

I feel that NERSC is superior to RCF, which has a lot of trouble with uptime and disk storage, and is on a par with CERN, from my somewhat limited experience of CERN.

I found "startup" was better handled at NCSA. I was mailed a packed with a "quick reference" sheet which answered almost all of my questions for getting up and running on the Origin. It took considerably longer and was more work for me to port my work to the IBM-SP2. This may be because I started when the machine was still new to NERSC. NERSC has handled all aspects better than my experience with the Cray at SDSC.

We currently work with a number of centers in US and Europe (NCSA, PSC, Leibniz-Rechenzentrum Garching, Max-Planck Rechenzentrum). We haven't worked as much at NERSC as some of the others, but we've been pleased with the services at NERSC and they seem to compare well. It is often the case that the software or projects we are running require some special considerations, such as open ports for communication, or availability of large disks for short amounts of time, and we've always found NCSA to be very helpful in working with us on things like this. We really appreciate that type of flexibility towards individual projects.

compared to NCAR SCD, the consultants at NERSC are more knowledgeable and able to resolve questions, but the the online resources are not as helpful. in terms of computers, NERSC is comparable to NCAR.

I am using all three NSF lead centers (NCSA, PSC, SDSC). The professionalism with which NERSC manages its facilities exceeds theirs, as does the responsiveness of NERSC's staff. This is high praise because the NSF centers are very good in these areas too. NERSC's mass storage system as as good or better than those at the NSF centers. Where NERSC lags is in overall compute power. Seaborg has less capability than the PSC Compaq machine, Lemieux, without even taking into account the very considerable resources at NCSA and SDSC. With the new DTS system coming on line at NCSA/SDSC in the summer of 2003, NERSC is in danger of falling seriously behind. More importantly, there are major problems squarely within the DOE mission, which require several orders of magnitude more computing power than NERSC currently has.

I have only used LLNL and NERSC. Both do a great job and share many personality traits.

I have use gsfc goddard space center cray computers and have found nersc computers to be as easy to use as gsfc computers. We didn't have sp allocations at gsfc so I can't compare seaborg to theirs.

The other center I used extensively is the NCSA at UI, Urbana-Champaign. I started using NERSC again after a gap of few years (mainly as a result of NCSA doing all the things I need much better than NERSC) and have to say that it has vastly improved. It now compares quite favorably with NCSA. My only problem (complaint) now is the lack of plotting software at NERSC ( at NCSA I can use on the origin machines AVS, Techplot etc.,). The available software such as DX seem to be poorly configured (DX for instance doesn't support netcdf as it is presently configured on Seaborg).

I think you are on par with other's like SDSC (particularly this year with your improvements in expansion ratio (turn-around time / run time) on seaborg.....you should focus on "research producivity"....then you would be the best.

NERSC is less good:

FZ Juelich, Germany They provide a unique home directory for all Cray Computers. The access to the archive file system is easier, as migration and demigration is managed by the operating system.

At LRZ Muenchen the maximum runtime is 32 hours for a job and the system places no other restrictions on number of cpu, memory... I use that regularly for production runs and it is great. At lemieux at PSC there is an option to specify qsub -I on which one gets a collection of nodes for interactive use. This allows for very rapid debugging, since one bypasses the queue.

Consider dedicating a few nodes to interactive shared use for debugging purposes. SDSC does this; performance on the shared nodes is assuredly terrible, but you can at least fire up totalview on a parallel job without loadleveler rejecting it due to lack of free nodes. This situation has become the bane of my existence at NERSC in recent months.

Consulting response is longer and training courses are less frequent compared to Pittsburgh Supercomputer Center (PSC).

I prefer to take classes at a locally in-person, and was glad to a able to take a few classes at the North Carolina Supercomputer Center on parallel programming and use of an IBM SP (which is a half hour drive from my home institution)

Compared to the ORNL CCS, NERSC is not as flexible in terms of accomodating jobs that may require the use of more processors or larger CPU time.

Dealing with HPSS credentials is a problem. Changing your password at NERSC is messy. At ORNL, they run DCE natively, making password changes easy. The down side is all the extra baggage that comes with DCE. It would be nice if DOE were to standardize on an authentication/authorization infrastructure, whatever it may be.

No comparison made:

I haven't used other centers.

NERSC is the only center I've tried.

SDSC Local NCSA

FNAL computing center and university clusters.

Too early to tell

NCSA is the other center that I have most experience with, and earlier with Cornell. I have also had some experience with NPACI and Pittsburgh.

RCF. but I used PDSF more often than RCF in the past year.

RCF


NERSC provides access to very high performance computing facilities on a platform with a simple interface, making development easy. Information services, training, account management, and consulting are all excellent.

Show Pagination