2000 User Survey Results

Response Summary

NERSC extends its thanks to all the users who participated in this year's survey. Your responses provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve. Every year we institute changes based on the survey; the FY 1999 survey resulted in the following changes:

We created a long-running queue (12 hours maximum) for jobs using up to 256 PEs on the Cray T3E. Last year 7 users asked for longer T3E queues; this year only one.
We opened a Cray SV1, Seymour, for interactive use. As highlighted below, this change was well appreciated.
We created new email lists to keep users better informed of NERSC announcements and changes. This change wasn't reflected in this year's survey results.
We enhanced the HPCF website; overall satisfaction with the website was higher this year.
- Made it easier to find information on the web concerning running batch and interactive jobs.
- added an NERSC Glossary and Acronym List.
- placed automated machine up/down information on the bottom of HPCF Home Page.

In FY 2000, 134 users responded to our survey. The respondents represent all 5 DOE Science Offices and a variety of home institutions: see User Information.

On a 7-point scale, with 7 corresponding to Very Satisfied to Very Dissatisfied, the average scores ranged from a high of 6.7 for our training classes and the PVP Fortran compilers to a low of 4.3 for PVP and T3E batch job wait time. Other areas with very high user satisfaction are consulting advice and SP availability (uptime). The areas of most importance to users are the available computing hardware (the number of cycles), the overall running of the center and its connectivity to the network. See the Overall Satisfaction and Importance summary table.

This year, the largest increases in user satisfaction came from the PVP cluster. Following the conversion of Seymour last year to an interactive machine, user satisfaction for the ability to run interactively on the PVP increased by almost one point. Five other PVP ratings increased by 0.6 to 0.8 points. See the hardware and software sections. Other areas showing a significant increase in satisfaction are HPSS performance and response time, hardware management and configuration, the HPCF website, and the T3E Fortran compilers. Only two scores were significantly lower this year than last: T3E batch wait time and consulting services (the latter still received high scores overall).

When asked what NERSC does well, 34 respondents focused on NERSC's excellent support staff and 29 pointed to our stable and well-managed production environment. Other areas singled out include well-done documentation, good software and tools, a very useful storage environment, and well-managed migrations and upgrades that "make supercomputing easy". When asked what NERSC should do differently the most common responses were to provide more resources, especially more cycles and inodes. Of the 47 users who compared NERSC to other centers, 53% said NERSC is the best or better than other centers. Several sample responses below give the flavor of these comments; for more details see User Comments.

"Very responsive consulting staff that makes the user feel that his problem, and its solution, is important to NERSC"
"Provide excellent computing resources with high reliability and ease of use."
"The announcement managing and web-support is very professional."
"Manages large simulations and data. The oodles of scratch space on mcurie and gseaborg help me process large amounts of data in one go."
"NERSC has been the most stable supercomputer center in the country particularly with the migration from the T3E to the IBM SP".
"Makes supercomputing easy."

Below are the survey results. You can also see the survey text.

User Information
Overall Satisfaction and Importance
All Satisfaction Questions Ranked and FY 1999 to FY 2000 Changes
Consulting and Account Support
Web and Communications
Hardware Resources
Software Resources
Training
Comments about NERSC

User Information

Number of responses to the survey: 134

Respondents by DOE Office and User Role:

Office	Respondents	Percent
ASCR	9	7
BER	29	22
BES	28	21
FES	28	21
HENP	37	28
guests	3	2

User Role	Respondents	Percent
Principal Investigators	35	26
Repo managers	22	16
Users	77	57

Respondents by Organization:

Organization	Respondents
Berkeley Lab	19
Livermore	9
Los Alamos	8
Argonne	7
UC Berkeley	7
Oak Ridge	6
General Atomics	4
New York Univ.	4
UC Los Angeles	4
NCAR	3
PNNL	3
U. Maryland	3
UC San Diego	3

Organization	Respondents
Ames Lab	2
City U. of New York	2
Florida State	2
Ohio State	2
PPPL	2
U. Texas	2
William & Mary	2
other universities	31
other labs	9

What NERSC resources do you use?:

Resource	Responses	Percent
IBM SP	64	48	56
Cray T3E	86	64	70
Cray PVP	66	49	44
HPSS	70	52	70
Visualization Server	6	4	8
Math Server	10	7	11
PDSF	5	4
NERSC web site	43	32	92
Consulting services	57	43	100
Account support services	46	34	83
Operations	14	10
Other	2	1

Other resources listed: ACTS, AFS, Three machine Linux network for development, Workstation, PC support.

How long have you used NERSC?

Time	Number
6 months or less	18
6 months - 3 years	45
more than 3 years	68

What desktop systems do you use to connect to NERSC?

Operating System Type	Number
UNIX	175
PC	73
MAC	31
VMS	1
Individual Systems	Number
UNIX-linux	60
UNIX-solaris	50
MAC-macos	31
PC-win98	29
UNIX-irix	27
PC-winNT	21

Individual Systems	Number
UNIX-osf	16
UNIX-aix	15
PC-win95	12
PC-win2000	8
UNIX-hpux	7
X-windows	1
VAX	1
LinuxPPC	1
Ultrix	1
OS/2	1
PC-win3	1

What type of connection do you often use to connect to NERSC?

Connection Type	Number
Ethernet	116
Cable Modem	17
DSL	9
ISDN	6
Modem	30
Other	4

Browser Used to Take Survey:

Browser	Number
Netscape 4	110
Internet Explorer 5	16
Internet Explorer 4	6
Netscape 3	2

Overall Satisfaction and Importance

Legend

Satisfaction	Average Score
Very Satisfied	6.5 - 7
Mostly Satisfied	5.5 - 6.4
Somewhat Satisfied	4.5 - 5.4

Importance	Average Score
Very Important	2.5 - 3
Somewhat Important	1.5 - 2.4

Overall Satisfaction with NERSC Frequency Histogram Plots

Topic	Satisfaction				Importance
Topic	No. of Responses	Avg. (1-7)	Std. Dev.	Change from '99	No. of Responses	Avg. (1-3)
Consulting services	111	6.39	0.82	-0.19	114	2.67
Account support	106	6.39	1.01	0.00	106	2.38
Overall satisfaction	128	6.15	1.03	-0.10	119	2.85
HPCF web site	103	6.13	0.96	0.26	101	2.43
Software maintenance and configuration	88	6.08	1.04	0.19	83	2.61
Mass storage facilities	90	6.03	1.12	-0.03	85	2.56
Network connectivity	104	6.01	1.18	-0.16	98	2.80
Hardware management and configuration	98	6.00	1.18	0.29	92	2.66
Available software	107	5.98	1.00	-0.01	102	2.59
Available computing hardware	109	5.90	1.25	-0.06	104	2.91
Allocations process	98	5.79	1.19	-0.08	91	2.70
Software documentation	95	5.62	1.13	0.16	90	2.51
Web-based training	61	5.23	1.24	0.04	64	1.97
Training classes	47	5.13	1.33	0.28	57	1.75
Visualization services	45	4.67	1.17	0.30	46	1.65

All Satisfaction Questions and FY 1999 to FY 2000 Changes

Legend

Satisfaction	Value
Very Satisfied	7
Mostly Satisfied	6
Somewhat Satisfied	5
Neutral	4

How Satisfied are you?

Topic	No. of Responses
Training: classes (attendees)	14	6.71
Software: PVP Fortran Compilers	32	6.66
Consulting: Timely response	100	6.63
SP: Uptime	50	6.52
Consulting: Quality of technical advice	99	6.49
Consulting: Followup	84	6.42
PVP: Uptime	39	6.41
Software: T3E Fortran Compilers	60	6.40
Consulting overall	111	6.39
Account support	106	6.39
HPSS: Reliability	62	6.39
Software: PVP Local documentation	21	6.38
Account support: Ease of obtaining account info	83	6.34
HPSS: Uptime	62	6.31
HPSS: Overall	70	6.26
Software: PVP User Environment	32	6.25
Web: Accuracy	81	6.22
Training: Online Tutorials	32	6.22
HPSS: Performance	64	6.20
Software: T3E User Environment	57	6.18
Software: T3E Programming Libraries	39	6.18
Consulting: Response to special requests	74	6.16
Overall satisfaction with NERSC	128	6.15
HPSS: User interface	63	6.14
HPCF web site overall	103	6.13
Training: Online class slides	19	6.13
PVP: Ability to run interactively	35	6.11
T3E: Uptime	65	6.09
Software maintenance and configuration	88	6.08
Software: SP User Environment	46	6.07
Software: SP Fortran Compilers	46	6.07
Software: PVP Performance and Debugging Tools	16	6.06
Software: SP Local documentation	39	6.05
Mass storage overall	90	6.03
HPSS: Response Time	75	6.03
Network connectivity	104	6.01
T3E: Overall	70	6.01
Hardware management and configuration	98	6.00
Web: Timeliness	76	6.00
Web: T3E Section	66	6.00
Software: T3E Local documentation	42	6.00
Software: SP Programming Libraries	30	6.00
Software: PVP C/C++ Compilers	13	6.00
Training: Teleconference lectures	13	6.00
Available software	107	5.98
Web: Getting Started Guide	55	5.96
Account support: Ease of modifying account info	68	5.93
Software: T3E C/C++ Compilers	30	5.93
Software: PVP General tools and utilities	14	5.93
Web: NERSC-specific info	67	5.91
Available computing hardware	109	5.90
Software: PVP Accounting tools	20	5.90
SP: Overall	56	5.88
PVP: Overall	44	5.86
Software: PVP Application software	18	5.83
Software: PVP Programming Libraries	16	5.81
Allocations process	98	5.79
Web: File Storage Section	46	5.78
Web: Ease of navigation	92	5.78
Software: T3E Application software	23	5.78
PVP: Disk configuration and I/O performance	31	5.77
Web: SP Section	57	5.75
Software: T3E Accounting tools	36	5.75
Software: SP General tools and utilities	25	5.72
Software: SP C/C++ Compilers	25	5.72
T3E: Ability to run interactively	58	5.71
Software: T3E Bug resolution	30	5.70
Software: PVP Vendor Documentation	16	5.69
Software: SP Application software	18	5.67
Software: T3E General tools and utilities	37	5.65
Web: Programming Info	71	5.63
Software documentation	95	5.62
Web: Searching	70	5.61
Software: T3E Vendor Documentation	29	5.59
Web: PVP Section	43	5.56
Software: T3E Performance and Debugging Tools	39	5.56
Math Server: Newton	11	5.55
SP: Ability to run interactively	41	5.51
Software: SP Vendor Documentation	26	5.50
Software: SP Bug resolution	22	5.45
T3E: Disk configuration and I/O performance	71	5.35
Software: SP Accounting tools	26	5.31
T3E: Batch queue structure	56	5.27
Visualization Server: Escher	8	5.25
Web-based training	61	5.23
SP: Batch queue structure	41	5.22
SP: Disk configuration and I/O performance	40	5.20
Training classes (all responses)	47	5.13
Software: PVP Bug resolution	10	5.10
PVP: Batch queue structure	34	5.03
Software: SP Performance and Debugging Tools	29	4.69
Visualization services	45	4.67
SP: Batch wait time	46	4.54
T3E: Batch wait time	63	4.33
PVP: Batch wait time	38	4.26

FY 1999 to FY 2000 Changes

The following are statistically significant changes in responses to questions common to the FY 1999 and FY 2000 user surveys.

Topic	FY 2000 Satisfaction	FY 1999 Satisfaction	Change
PVP Cluster: Ability to run interactively	6.11	5.18	+0.93
PVP Cluster: Overall	5.86	5.05	+0.81
PVP NERSC Documentation	6.38	5.68	+0.70
PVP Vendor Documentation	5.69	5.03	+0.66
PVP Fortran Compilers	6.66	6.04	+0.62
PVP Performance and Debugging Tools	6.06	5.46	+0.60
HPSS Response Time	6.03	5.68	+0.35
HPSS Performance	6.20	5.90	+0.30
Hardware management and configuration	6.00	5.71	+0.29
HPCF Website	6.13	5.87	+0.26
T3E Fortran Compilers	6.40	6.20	+0.20
Consulting Services	6.39	6.58	-0.19
T3E Batch Job Wait Time	4.33	5.04	-0.71

Consulting and Account Support

Legend

Satisfaction	Average Score	Significance of Change
Very Satisfied	6.5 - 7		not significant
Mostly Satisfied	5.5 - 6.4

Consulting Services / Account Support Satisfaction Frequency Histogram Plots

Question	Responses	Avg. (1-7)	Std. Dev.	Change from '99
Timely response to consulting questions	100	6.63	0.73	-0.01
Quality of technical advice from consultants	99	6.49	0.80	-0.03
Followup to initial consulting questions	84	6.42	0.96	-0.01
Ease of obtaining account information	83	6.34	0.95	0.08
Response to special requests	74	6.16	1.19	-0.12
Ease of modifying account information	68	5.93	1.27	-0.22

Summary of Comments

Comments and suggestions regarding NERSC Consulting Services: 23 responses

15	good service
4	improve follow-up
4	improve quality of response

Comments and suggestions regarding NERSC Account Support Services: 16 responses

8	good service
4	suggested enhancements
3	needs improvement

Individual Comments and suggestions regarding NERSC Consulting Services: 23 responses

good service

Your web pages on how to do things are great. [...]

I am especially pleased with NERSC consulting services. Every technical problem I have encountered has been remedied rather quickly and in a professional manner.

excellent jobs, even better than my Lab.

You guys have always been a pleasure to deal with.

good people, good attitude, responsive

I receive lots of very helpful information from the consultants.

The consulting people are always available and very helpful. Thank you for your effort.

Keep up the good work!

Keep up the excellent and knowledgeable work

Consulting services are good.

very good services

keep up the good work

I have been most satisfied with the help (but this category is not in the questionaire) I got from NERSC consulting , as without the help and advice of David Turner,Harsh Anand, Tom DeBoni ,Majdi ,Drs. Jonathan Carter and Richard Gerber , I would not have been able to perform the "the most gargantuan" calculations I have ever performed. These calculations would have been unthinkable on any supercomputer facility anywhere in the world, as it required about 70-80 Gb disk, 256-512 RAM and about 200 CPU hrs per run on a Cray J90! My sincerest thanks to all in the Consulting , and especially Ms. Francesca Verdier for her excellent guidance and advice throughout my usage of NERSC facilities for over 5 years.

improve follow-up

Generally excellent although I had to call twice for the same question and nobody got back to me with an answer or even saying that they were still working on it.

[...] Response times could be better.

Would be nice to have a list of consultant's email addresses available. Will be helpful especially for follow-up question(s).

As a result of last year's survey, I was contacted by a consultant to help port a PVP code to the T3E. She was very friendly but over the months that past, nothing was ever done. I contacted her every few weeks, and she said she would soon get to it, then the SP came, and she was overwhelmed. Eventually, I learned that she no longer had a valid email address at NERSC/LBL, and thus I assume she left the company. I do not expect that NERSC carry the load for me of learning to work with the T3E, but once the contact was made, help offered, etc, it seems poor form not to carry out what was offered. Through out last year I delayed investigating certain aspects of porting my code because of the continued assumption of help from her. Once again, I do not assume that NERSC is responsible for helping me, but the way in which the promised help never materialized contributed negatively to my needed migration of codes to the mpp environment.

improve quality of response

In general, 90% of my questions are answered quickly and accurately. My only "complaint" is it would be nice to have every single consultant versed in every possible oddity and detail of F90 and debuggers on every single platform (unfortunately, there seem to C/C++ users requiring support too!).

[...] Your consultants often don't have answers to my questions and seem unable to get them.

Consultants had the tendency to blame problems on the code, and not help with problems that lie in the NERSC hardware and software. For example, we had trouble porting a code to the NERSC SP3 that ran successfully on the NIST SP3.

As always, it depends on who answers the phone. [...]

other

It would be nice to have after hours consulting available, especially for those in other time zones

Please give more default disk space

I have not used the consulting services in a long time.

Individual Comments and suggestions regarding NERSC Account Support Services: 16 responses

good service

Also of very high quality.

I have had the best possible help if I ran into problems, and I am grateful to the personnel in the Account Support Services.

keep up the excellent and timely work

I am very satisfied with the NERSC Account Support Services

very good.

keep up the good work

Account support people are always there for us too. Thanks.

suggested enhancements

The robustness of the allocation process could be enhanced by allowing for the submission of non-native documents (PDF,Word Doc,txt) files to the allocations committee. With the current web-only interface one faces the difficult task of discussing the theoretical framework of the calculations with no equations to support the text. It would seem reasonable to expect that PDF files be used as an alternative standard to the web-only interface.

The new web site seems to be a good idea. Why not give summary information on default repo on login that way you immediately see your repo status.

More aggressively notify abnormally rapid decrease of resources. I used up IBMSP resources in very short time using wrong priority by mistake. If some warning can be made, this may not have occurred.

I have not yet learned how to use the full functionality of the replacement to setcub

needs improvement

Getting an account set up used to take 24 hours. Lately, it has been taking much longer and has always required us calling NERSC to get the account information. We also have to deal with multiple people at NERSC when asking for both T3E and SP accounts.

Try not to forget to get back to your users even if you cannot find the answer to their initial question.

Account support services are poor.

Web and Communications

Legend

Satisfaction	Average Score
Mostly Satisfied	5.5 - 6.4

Usefulness	Average Score
Very Useful	2.5 - 3
Somewhat Useful	1.5 - 2.4

Frequency Histogram Plots

HPCF Web Site

How satisfied are you?	Responses	Avg. (1-7)	Std. Dev.	Change from '99
Accuracy of information	81	6.22	0.92	0.00
Timeliness of information	76	6.00	0.89	0.01
Getting Started Guide	55	5.96	1.17	-0.12
T3E section	66	6.00	0.89	0.01
Info on using NERSC-specific resources	67	5.91	1.01	-0.02
Ease of finding information on web site	92	5.78	1.07	0.08
File Storage section	46	5.78	1.07	-0.04
SP section	57	5.75	1.11
General programming information	71	5.63	1.17	-0.11
Search facilities	70	5.61	1.07	-0.08
PVP Cluster section	43	5.56	1.14	-0.13

Keeping Informed

How useful are these?	Responses	Avg. (1-3)	Std. Dev.	Change from '99
NERSC Announcements Email Lists	88	2.45	0.66	-0.18
MOTD on computers	78	2.27	0.73	0.18
Announcements web archives	79	2.05	0.81	-0.11
Phone calls from NERSC	70	1.81	0.87	-0.08

Summary of Comments

Comments and suggestions concerning the HPCF web site: 13 responses

5	better navigation/searching
3	content improvements
3	good service

How would like to keep informed of changes and issues at NERSC? 13 responses

10	satisfied
2	don't send too many emails

Do you feel you are adequately informed about NERSC changes?	Yes: 98	No: 5
Are you aware of major changes at least 1 month in advance?	Yes: 86	No: 9
Are you aware of software changes at least 7 days in advance?	Yes: 66	No: 16
Are you aware of planned outages 24 hours in advance?	Yes: 77	No: 10

Individual Comments and suggestions concerning the HPCF web site. 13 responses

Better Navigation/Searching

Sometimes it seems one has to go through 2-4 levels to find things. Also, details are often not at NERSC but at Cray or IBM.

Don't use it all that much as it usually takes me more time to find what I am looking for than it's worth.

Please add a table of contents/site overview if it is not there already. It might also be helpful if people could build their own personal custom interface to the NERSC site that includes links to commonly used web pages, relevant messages, account information etc.

NERSC response: The Website Outline is at http://hpcf.nersc.gov/web/outline.html and the Index of Titles at http://hpcf.nersc.gov/web/pagetitles.html. Both are linked to from the HPCF home page, but perhaps we should make these links more prominent. The suggestion to provide a way for users to build customized interfaces into the NERSC website is a good one: we will do it!

One problem I often run into is that I'll remember there was a class or training session on some topic that I have a question about. However, to find the particular presentation I'm interested in I have to remember the date and/or place where the session was given. This is usually hard to recall so it can take a lot of time to go back and find what I'm looking for. It would be very helpful if all the training sessions, tutorials, classes, etc. could be cross-referenced based on the topics covered in addition to the chronological/location style organization that is currently used. Of course, I can always do a search, but this tends to give too many references that take time to sort through.

NERSC response: See Index of Web-Based Lectures, by Topic. We will make this document easier to find.

The search engine within NERSC is not effective enough.

Content Improvements

As a new user to this system, I found it difficult to determine how to get a job up and running on the system. The examples in the getting started guide were not sufficient. The example of how to run a batch script at http://hpcf.nersc.gov/running_jobs/cray/batch_start.html is a good start, but it is trivially simple. On the same page, it would be nice to have an example of how to actually run a piece of FORTRAN or c code -- including how to move your files back to your home directory.

Would like to see more detail MPI I/O information for SP.

NERSC response: See Introduction to MPI I/O.

The web site explains just the bare minimum of a specific topic. As a user, it would be very helpful if the web explains the bare necessities and also great details of a specific topic of users' interests.

good service

Your web site has answered the vast majority of my questions quickly, and completely.

Great Job

Overall, the NERSC web sites are outstanding. Besides being useful to me personally, they save me time because the first thing I tell new users on my project is, "Look at the NERSC web pages. They contain a lot of useful information."

other

The IBM documentation is hard to use, and has been unavailable at times. I don't know which link it was, but I was asked for a password to an IBM site when I was finally (recently) closing in on some needed piece of information. Together with the lack of support for Fortran 90 modules, the lack of adequate (i.e., "transparent") documentation makes the IBM a formidable challenge for development (and therefore for production).

For some reason always crashes my web browser (Netscape) --- although other sites do not.

NERSC response: Note that is the website for the NERSC Division at Berkeley Lab, not the website for the NERSC HPC facility, which is http://hpcf.nersc.gov. We believe that the problem reported has been fixed.

Please tell us how would like to keep informed of changes and issues at NERSC.

Satisfied

Existing email messages are fine. Backup detail is available via WEB or a phone call.

by e-mail

info on web longterm changes, major outages (7-30d) per email remainder (2d) MOTD

e-mail from lists is fine

Please keep it up the excellent job you all are doing!

Via e-mail and web postings

email is best.

I find the e-mails most helpful

send E-mails

email is the best. login message is good for MOTD too.

Too many operator emails telling us systems are going down that we don't even use

MOTD get's lost in all the other stuff that comes to screen. I guess what's best for me is a MOTD that jumps out at you and e-mail, but if you start e-mailing too much stuff then I probably won't be as likely to read it. A lot of it is my fault. I especially seem to always miss the "machine going down at 4 today" messages. Then I get pummeled with messages from 3 to 4 pm, with little if any notice prior to that. Incidentaly, I hate when the machines go down at 4 pm. In fact if I was to pick anything that bugs me the most, that would be it.

Other

Messages sent to screen of platform involved. e-mail sent to my accounts on the machine in question.

NERSC response: email is sent to your home institution, not to a NERSC machine.

Hardware Resources

Legend

Satisfaction	Average Score	Significance of Change
Very Satisfied	6.5 - 7		significant increase
Mostly Satisfied	5.5 - 6.4	significant decrease
Somewhat Satisfied	4.5 - 5.4		not significant
Neutral	3.5 - 4.4

Frequency Histogram Plots

IBM SP - gseaborg

How satisfied are you?	Responses	Avg. (1-7)	Std. Dev.
Uptime	50	6.52	0.86
Overall	56	5.88	1.31
Ability to run interactively	41	5.51	1.49
Batch queue structure	41	5.22	1.41
Disk configuration and I/O performance	40	5.20	1.62
Batch job wait time	46	4.54	1.88

Max. Number of Processors Used: 141 ( 48 responses) Max. Number of Processors Code Can Effectively Use: 591 (40 responses)

Cray T3E - MCurie

How satisfied are you?	Responses	Avg. (1-7)	Std. Dev.	Change from '99
Uptime	65	6.09	1.01	-0.17
Overall	70	6.01	1.07	-0.16
Ability to run interactively	58	5.71	1.35	0.11
Disk configuration and I/O performance	51	5.35	1.32	0.12
Batch queue structure	56	5.27	1.53	-0.20
Batch job wait time	63	4.33	1.58	-0.71

Max. Number of Processors Used: 146 ( 61 responses) Max. Number of Processors Code Can Effectively Use: 300 (46 responses)

Cray PVP Cluster

How satisfied are you?	Responses	Avg. (1-7)	Std. Dev.	Change from '99
Uptime	39	6.41	1.12	0.12
Ability to run interactively	35	6.11	1.35	0.93
Overall	44	5.86	1.41	0.81
Disk configuration and I/O performance	31	5.77	1.20	0.21
Batch queue structure	34	5.03	1.66	0.00
Batch job wait time	38	4.26	1.83	0.31

Max. Number of Processors Used: 9 ( 29 responses) Max. Number of Processors Code Can Effectively Use: 10 (24 responses)

HPSS

How satisfied are you?	Responses	Avg. (1-7)	Std. Dev.	Change from '99
Reliability	62	6.39	1.18	-0.07
Uptime	62	6.31	0.98	-0.02
Overall	70	6.26	0.99	0.14
Performance	64	6.20	0.98	0.30
User interface	63	6.14	1.06	0.08
Response time	62	6.03	1.10	0.35

Server Satisfaction

Satisfaction with	Responses	Avg. (1-7)	Std. Dev.	Change from '99
Newton	11	5.55	1.37	0.30
Escher	8	5.25	1.28	-0.20

Summary of Comments

Comments on NERSC's IBM SP: 28 responses

7	hard to use/software problems
7	improve turnaround time
6	provide longer queues
5	good machine
4	disk issues: more inodes, more local disk, more GPFS nodes
4	provide more interactive services
3	change batch scheduling priorities
2	inadequate documentation

Suggestions for NERSC's IBM SP Phase II System: 16 responses

4	batch configuration
4	switch/communications performance
3	software concerns
3	more processors

Comments on NERSC's Cray T3E: 19 responses

8	improve turnaround time
7	good machine
2	needs more memory

Comments on NERSC's Cray PVP Cluster: 14 responses

5	good machine / good interactive services
3	C90 was better
3	file issues: more inodes, migration
2	improve turnaround time

Comments on NERSC's HPSS Storage System: 16 responses

8	good system
4	availability/performance problems
2	interface improvements

Comments about NERSC's auxiliary servers: 5 responses

Individual Comments on NERSC's IBM SP: 28 responses

Hard to use/software problems

I'm making only light use of the SP for development. NERSC staff have been very helpful and responsive. The SP is not the easiest system to use (C++ compiler problems), but these are not the fault of NERSC.

Don't like the requirement to use $TMPDIR for module compiling. Do like the presence of NCAR graphics. Not sure how I will use mixed SMP/MPP capability when 8-way processors arrive in Phase II. debuggers on Seaborg are pretty poor compared to PVP or T3E.

Fortran compiler seems buggy, file storage per node very limited, limited documentation, problems with batch submission, rather slow processors compared to say DEC alpha, etc.

There is something wrong that I can not compile my code quite well. It is related to MPI settings.

Home directories should not be GPFS becouase of F() module compiling problem.

The lack of support for Fortran 90 modules is something that frustrates me a lot. [...] Compared to February, the new compiler is slow. Recompiling from scratch -- which is frequently necessary because my memory-mapped module files are obliterated every time I am logged out, so that any change forces me to start from the beginning -- takes hours. It would be nice if the old compiler were available for those that wish to use it. The new compiler fails to compile my codes without the '-qhot' option because of "lack of resources". This error message is not helpful. The "llqs" routine is not as useful for figuring out when a job will likely run compared to similar routines on the T3E. I prefer the version of totalview on the T3E, but this may be a function of my overall frustration with the IBM. [...] Gnuplot doesn't seem to pick up the tcsh line editing commands when running under tcsh. [...] The inability to ftp into gseaborg makes editing files a chore for me, since I am accustomed to editing from an emacs window running on my desktop. There is probably a way around this, but I don't know what it is.

Improve turnaround time

Job waits of 3-5 days for 6 hours of 64 nodes are common. This is completely unacceptable, it is not possible to get useful work done in this way. Available resources should either be drastically increased or else NERSC closed down.

Batch queue's seem to be rather long in regular class implying the need for a larger computer. Could you prepare an up-to-date plot of the average wait-to-run for the various queue's, as a function of time, that could be viewed on a web site, for example.

Initially, I was very satisfied with the IBM SP. However, around mid-summer the queues started getting very slow and batch jobs that used to go through overnight or less started taking 2-3 days. For my typical job (100 - 200 processors, 3 - 4 hours) this is intolerably slow. I also have access to a local IBM SP at my lab (ORNL) which has faster processors with 4 per node and much fewer users. Jobs that are taking 2 - 4 days to get through NERSC's IBM SP usually start immediately here and are done in a few hours. I'm hoping NERSC's IBM SP Phase II will improve this problem. [...]

I find the IBM SP a pretty slow machine.

Provide longer queues

I would like a longer queue.

A longer max wall clock time (>6 hrs) on gseaborg would be good, like on the T3E.

maximum running time for batch jobs of 6 hours is much too short for our compute intensive job

Good machine

Great Machine. Keep it up. Needs more I/O nodes for GPFS and faster processors ...

Interactive time is wonderful! Don't take machines down at 4 pm for maintenance.

Very stable, easy to use, faster than what I expect.

very happy

Max. number of processors depends on the configuration of the code (size of domain, spatial resolution). This code show good performance enhancement upto 96 processors (max. tested so far).

Disk issues: more inodes, more local disk, more GPFS nodes

need for local filesystem to fully exploit NWCehm capabilities

provide more interactive services

[...] Although there are evidently typically interactive PE's available on the IBM, there aren't very many overall. I'd prefer more for development, if the climate for fortran development were friendlier.

Available PEs for interactive runs should be more than 16(at least for short test runs!) Wait time for Batch jobs-short runs (~10-20mins) should not not exceed more than 5hrs.

interactive run is always at the very low priority. maybe it could be the same as debug queue.

One unified file system would help particulalrly with the F90 .mod file handling. The queues have become too crowded. The 6 hour time limit up from 4 was a welcome change. The interactive limit on one processor is too small to even compile some codes.

Change batch scheduling priorities

[...] In the meantime I think you need to consider rearranging the queues so that the longer jobs which really do take multiple days to finish don't get in the way of intermediate length jobs (100-200 processors, 2-4 hours) which should be put on a faster track with the potential to finish in a 24 hour period.

There is no obvious method to which jobs get to run when. We are running a 100 year model that takes nearly one month wall clock time to execute. With a 6 hour time limit, no q structure, and 3 day lag times from time of job submission to time of job execution, we have had to invent several strategies just to use the time we've been allotted. Further, nearly a third of the jobs that we do submit have to commit suicide because LoadLeveler tries to run them simultaneously, and they need to be run sequentially. We are obviously not the only users in this predicament.
1) Please set up some sort of q structure. Allow jobs that fill half the machine or more to run only at night.
2) If you don't do that, please allow users to use chron so that we don't have to occupy processors to submit jobs at regular intervals.

I rely on a defense machine allocation for SP time to do my critical runs, primarily because I have access to a queue system there that allows > 100 hr runs. I'm not sure though that even if I had such access at NERSC however that I'd use it. The i-node limits imposed are stifling, and require that I monitor my jobs full-time on the NERSC machines so that I may tar up output files/directories and remove them from the scratch space as they pop out of the run. I need to sleep sometime, and when I do, my inode limit becomes exceeded, and thejob crashes. At the DoD sites, this has never been a problem. They seem more set up for large users. I think NERSC caters far too much to the little users, and this is one instance of what makes me think so. Until I can do large (~100 hr) runs at NERSC, with 128-256 processors, and get into the queue system in less than a week, and be able to dump a significant amount of data before running out of resources, my REAL work will be done at the DoD sites. Also, the filesystem on the SP is hideous. For deep filesystem deletes (say 3 or for levels), with a few hundred or so files, it can take unbearably long times to copy or remove them. This compounds the inode problem mentioned above because of the effort involved in tarring up my stuff and putting it all on hpss. So...the queue system is too full because there are too many small users on the machine. There aren't enough inodes because there are too many users. And the filesystem is horribly slow. Other than that....

Inadequate documentation

[...]Documentation is generally hard to find and harder to understand (mainly because of excessive cross-references to documents that are hard for me to find). For example, ESSL or PESSL versions of FFT's require complicated initializations that took me quite a while to figure out, even with help from consultants. [...] The documentation for the different version of the xlf90 compilers -- mpxlf90, mpxlf95, mpxlf95_r7, xlf90, etc. -- didn't make it easy for me to figure out how to get started with a basic MPI-based parallel code. [...]

Other

Accounting may be improved.

I am not using it

It would be a good machine if it had a much higher communication bandwidth, lower latency AND if it were able to do asynchronous communications without any overhead on the processors involved, i.e. between setting up an asynchronous MPI send/receive and its completion at an MPI wait the processor needs to be able to perform calculations as efficiently as if there were no pending communications.

more memory

Individual Suggestions for NERSC's IBM SP Phase II System: 16 comments

Batch configuration

[...] A batch structure that favors large jobs explicitly would be very useful. There are plenty of computers around for people that are doing 32-64 PE jobs. The big machines ought to be available first for the applications that can't run elsewhere. The batch structure for mcurie is very good in this respect.

Queue time limit should be longer, even if that means wait time is longer.

maximum batch job running times should be 24 hours. it is 18 hours at san diego supercomputer center

As mentioned above, give priority to intermediate length batch jobs. Don't design everything around satistying the really big users.

Switch/communications performance

Please insist on getting the highest performance communication backbone that is available. I rely upon high performance communication heavily, and fear that 16 CPU nodes with existing switch hardward would be a step backward for my applications. [...]

I would strongly suggest that the switch should be updated to its final comfiguration BEFORE the nodes are upgraded.

More procs, faster I/O, faster communication. The usual requests.

Same as above. Concern about about full use of node cpus with mpi vs node bandwidth. and iternode communication.

Software concerns

I hope that we can rely on IBM's C++ rather than KAI's, but I'm not sure this is realistic.

[...] If the Phase II system continues to fail to support fortran code development (by failing to treat memory-mapped files on the same footing as ordinary files while requiring them for compilation) then the Phase II system will really drive me crazy. [...]

NERSC response: In Phase 2 system the GPFS memory-mapped file problem will be solved. In particular, Fortran 90 modules will work with GPFS.

Convince IBM to put some money into fixing that horrible filesystem. As much as I like my DoD accounts, they too have the terrible gpfs system that makes dealing with complex directory structures very painful.

More processors

Need processors as more as possible.

I hope there will be a phase III with even more nodes 8-)

No comments for this system, as it is already pretty much set. The next system after this must have many 1000's of processors if it is to be useful as a national resource.

Other

Same as for sp3.

looking forward to it!

Can't wait to get to pahse II.

get a good supply of Valium for your consultants...

Individual Comments on NERSC's Cray T3E: 19 responses

Improve turnaround time

The more I use it the more I like it. Batch waits can be excessive though

I can not run any meaningful calculations with a 4 hour queue and the btach job wait time on the 12 hour queues is very long.

The queues are too crowded and the turnaround is atrocious

Last time I checked, the queues here seemed even slower than the IBM SP. I only ocassionally the T3E anymore. This has gotten to be one of those computers where by the time the job is finished, if you're not careful, you may have forgotten why you started it. You need to do something to get better turnaround time.

Wait time in large job batch queues is too long.

queue length - need a bigger faster T3E?

T3E is really busy these days.

Interactive jobs time out after 30 minutes; batch jobs can spend a long time in the queue. But the worst thing is the inode quota of only 3500.

Good machine

Hope you keep it as long as possible!

Stable, also easy to use. And it is configured very well. I am also impressed by its checkpoint function. Hopefully, it can also be moved to IBM-SP.

This machine has probably the best communication network of any MPP machine I have used. Replacing the Alpha cache with streams was a bad idea; a large cache would have greatly improved its performance. It is a pity that an upgrade path to an Alpha 21264 based machine was not available.

Generally -- excellent machine, excellent performance until recently. Lately -- numerous crashes with no end in sight actually got so bad that I tried to use the IBM again (see comments above).

File system I/O is a bit slower compared to SGI although the computing power is a lot stronger than SGI origin series. Overall, it was mostly statisfactory to us.

Needs more memory

I don't use the T3E because there is not enough memory per node on the machine. It otherwise seems to be a very nice system to work on. Unfortunately, my smaller problems can be done locally on our own workstations, and the large ones need the memory of the SP systems.

more memory per processor!!!!

Other

Why it has not 'tcsh' shell?

NERSC response: tcsh is available, but must be loaded explicitly (since it does not come with the UNICOS operating system). See tcsh and bash.

The maximum time limit could be increased. A total of 12 hour is not enough if you work with systems like proteins in a water box. Actually, I guess that is one of the smallest number of hours in the supercomputer centers I know of.

getting old. configuration is not very usable. I switched to the SP completely.

Interactive time is wonderful! Don't take machines down at 4 pm for maintenance.

Individual Comments on NERSC's Cray PVP Cluster: 14 responses

Good machine / good interactive services

Good idea to make Seymour partially interactive.

Interactive time is wonderful! Don't take machines down at 4 pm for maintenance.

Many would like to see this facility upgraded

This is state-of the art Cray PVP Cluster! Unmatchable anywhere.

C90 was better

The replacement of the C90 with the J90/SV1 cluster was a poor decision. The cacheless true vector machine was a great architectural advance. Moving to 'pseudo' vector machines with a cache and all the problems that go with it was a retrograde step. [...]

Not as good as the C90 in terms of hardware and software.

No good compared to a machine (C90)

File issues: more inodes, migration

Need more inode and permanent file space.

I'm only using this for codes which I haven't yet ported to one of the MPP machines. Interactivity seems to be okay. My main gripes are the nuisance of automatic file migration and the fact that sometimes the system seems to be unable to even get the migrated files back. Since these are usually executables I often resort to recompiling the code since this is faster than waiting for dmget to get the file back from migration.

[...]Disks are a comodity item. They are cheap, and formatting them with adequate numbers of inodes is simple. Even if you feel it necessary to limit our disk quotas, please remove inode quotas.

Improve turnaround time

turn around can be somewhat long

The wait times to get jobs run seems to be increasing. This has resulted in exhortations to not use high priority queues but this doesn't fix the problem of multi-day waits to get jobs started.

Other

I would need a queue that allows to follow up a job with a successor without waiting time. Instead of having 6 jobs running in parallel, I would appreciate 6 continuous sequential jobs. The batch queue on killeen provides this at the moment to my full satisfaction, but only because currently nobody else extensively uses this machine. I cannot get any useful throughput on bhaskara and franklin. On seymour sometimes...

Never used.

seldom use during the last year.

Individual Comments on NERSC's HPSS Storage System: 16 responses

Good system

much, much better than the old CFS system! Love the UNIX interface!

incredibly useful and fast. No complaints, this is one great setup.

archive and hpss are great as long as the machines to access them from are up (mcurie is often down). A data processing machine that is stable would be great.

Dependability of the HPSS system increased significantly last year and I am finally getting satisfied with the system.

PCMDI is a heavy user of hpss. We are very satisfied. see http://www-pcmdi.llnl.gov/modeldata/PCM_Data/pcgdahome.html for details of the dataset

Fantastic system! Unmatchable!

Ahhhh, HPSS - best thing since sliced bread :)

I don't use them much. But it's a good place to store big files offline. And I get to store some model outputs while running the code. It's quite reliable.

Availability/performance problems

Many times, the system is not able to retrieve my files from HPSS storage when I need them most.

Obtaining directory listings of large directories is unreasonably slow.

hopelessly slow

Sometimes large files tranfer were interrupted because of time limit. It should be increased so as to transfer large files.

Interface improvements

We have had to create a script that checks to see if a file is accurately transfered to HPSS. This is something that should be done for users automaticly.

erosion of CFS features since move from LLNL

Other

I wish that the hsi source code would be set up in a tar file so that I could download it, compile it and run it on any type of architecture. That would be very nice...

Use it infrequently, so pretty much always forget all but the most basic commands.

Individual Comments about NERSC's auxiliary servers: 5 responses

A very reliable machine. A good use of expensive software licenses. [escher]

We have been receiving wonderful support from the Visualization group (in particular, Nancy Johnston)

Matlab licenses on Matlab is for 4 persons to use simultaneously. Sometimes this is a problem. Other times, we could just walk over to see how long other users will be using. [newton]

Don't use them.

difficult to develop programs on escher due to lack of debuggers. in this day of cheap CD writers, it would be nice to have really good documentation on the NERSC Web site on various ways to make movies. My impression from a previous post-doc who worked for me that things remain pretty painful in terms of multiple stages of work if one is trying to get QuickTime quality movies. Also, NERSC should bring up OpenDX on its visualization server.

NERSC response: for documentation on how to make movies, see: Making MPEG Movies. We have made this document easier to find.

Software Resources

Legend

Satisfaction	Average Score	Significance of Change
Very Satisfied	6.5 - 7		significant increase
Mostly Satisfied	5.5 - 6.4	not significant
Somewhat Satisfied	4.5 - 5.4

Software Satisfaction

Topic	N¹	Avg. (1-7)	Std. Dev.	Change from '99	N¹	Avg. (1-7)	Std. Dev.	Change from '99	N¹	Avg. (1-7)	Std. Dev.
Topic	PVP				T3E				SP
User environment	32	6.25	1.11	0.17	57	6.18	1.04	0.03	46	6.07	1.22
Fortran compilers	32	6.66	0.60	0.62	60	6.40	0.72	0.20	46	5.96	1.43
C/C++ compilers	13	6.00	1.08	0.55	30	5.93	0.98	-0.04	25	5.72	1.24
Application software	18	5.83	1.29	0.29	23	5.78	1.09	-0.07	18	5.67	1.03
Programming libraries	16	5.81	1.33	-0.13	39	6.18	0.82	-0.24	30	6.00	0.87
Vendor documentation	16	5.69	0.95	0.66	29	5.59	1.24	0.10	26	5.50	1.30
Local (NERSC) web documentation	21	6.38	0.67	0.70	42	6.00	1.01	0.17	39	6.05	1.12
Performance and debugging tools	16	6.06	0.68	0.60	39	5.56	1.45	0.11	29	4.69	1.61
General tools and utilities	14	5.93	1.21	0.04	37	5.65	1.21	-0.26	25	5.72	0.94
Accounting tools	20	5.90	0.79	0.16	36	5.75	1.25	0.03	26	5.31	1.54
Software bug resolution	10	5.10	1.10	-0.52	30	5.70	1.15	-0.21	22	5.45	1.34

¹ - Number of responses.

Summary of Comments

Comments about NERSC's software resources, suggested improvements, future needs: 13 responses

3	tools
3	libraries
2	compilers
2	AFS

Comments about ACTS Toolkit: 8 responses

2	please enhance
2	mentioned software not in ACTS

The following ACTS tools are currently installed at NERSC. Select all the ones that you currently use here.

34	none
13	Scalapack
6	Petsc
5	Superlu
3	Tau
1	Aztec

The following ACTS tools are not currently installed at NERSC. Select all that you would like to use at NERSC.

23	none
9	Global Arrays
7	Paws, Pvode
6	Pooma
5	Atlas
3	Globus, Pete, Siloon
2	Overture, Tulip
1	Hypre, Nexus, Opt

Individual Comments about NERSC's software resources, suggested improvements, future needs: 13 responses

Tools

Try to employ some leverage on IBM to get better debuggers for the IBM-SP. It would be nice to have a GUI for scp --- i am a novice user with it but it seems to involve an ungodly amount of typing! actually, it would be nice to have a GUI scp that could transfer multiple files, accept wildcards, etc.

We have not yet found any useful performance analysis tools for our complex C++/Fortran90 code.

Suggestion: add GNU tools such as gdb and ghostview.

Compilers

The latest version fo the FORTRAN compiler is causing problems. Maintaining access to earlier versions would be very helpful if possible. This would guarantee that a user could still compile their code. [SP user]

NERSC response: We maintain versions of previous compilers on the Crays. Unfortunately, the IBM SP does not support multiple versions of the compilers so we are not able to provide this service on the SP. We have informed IBM of the need to provide this functionality.

I would like to be able to use HPF on gseaborg. Though I can achieve higher performance using F90 with MPI, HPF is extremely useful in terms of my overall productivity in code development. I can quickly write an HPF code to answer "what if" questions, then decide if I want to develop an optimized F90/MPI version.

Libraries

I'm not an expert, but I hear that the latest version of HDF has advantages over NetCDF. I'd like to try it out, but only if there is a strong commitment to keep it current on NERSC machines. [...]

NCAR/NCL has been a poor substitute for DISSPLA/MAPPER over the last ~2 years

I am not sure whether NERSC could ask vendors to improve specific software if we encounter the needs, such as more flexible (to assign nodes and processors) IBM loadleveler, and parallel io with netcdf on IBM SP.

AFS

[...] It would be nice if a more standard interface to AFS were available on the supercomputers.

Please install AFS on the IBM SP!!!!

Other

Need improved documentation

This section is too convoluted.

We mainly use and develop our own software.

Very adequate!

Individual Comments about ACTS Toolkit: 8 responses

Please enhance

Please add Globus to the toolkit - it is very helpful in managing job submission and data handling on remote computing resources, such as NERSC.

The version of PETSc on the SP and (I think) T3E are at 2.0.24. There are several later versions -- 2.0.28 which I use in production on the SP and 2.0.29 which my colleagues in petsc have recently installed. Can NERSC provide support for PETSc on these later versions?

Mentioned software not in ACTS

By PAWS I assume you mean PAW (Physics Analysis Workstation) from CERN.

NERSC response: No, PAWS stands for Parallel Application WorkSpace.

I would like to use TAO at NERSC. (I believe this is also part of the ACTS toolkit although it isn't listed above. It has a confusingly similar name to TAU.)

NERSC response: TAO is not part of the ACTS toolkit. TAU stands for Tuning and Analysis Utilities.

Other

I'll wait till the blood dries on the cutting edge users...

Useless for us.

Don't use it.

The passed workshop is a good start for more people to use the toolkit. At least, I am starting.

Training

Legend

Satisfaction	Average Score	Significance of Change
Very Satisfied	6.5 - 7		not significant
Mostly Satisfied	5.5 - 6.4

Usefulness	Average Score
Very Useful	2.5 - 3
Somewhat Useful	1.5 - 2.4

Training Satisfaction

Topic	No. who have used	Satisfaction with NERSC's				Useful in HPC training?
Topic	No. who have used	Responses	Avg. (1-7)	Std. Dev.	Change from '99	Responses	Avg. (1-3)
Classes	13	14	6.71	1.53	0.52	15	2.67
Online web tutorials	32	32	6.22	0.94	0.05	34	2.62
Slides from classes on web	22	23	6.13	1.52	0.18	21	2.43
Teleconference lectures	12	13	6.00	2.25	0.22	15	2.13

Comments about training. In what area should we focus our attention? 5 responses

Short, concise, easy to find, relevant info on the Web. Something I can print out and use as a handy reference.

I learned a lot from various training classes, especially when I was new to NERSC.

A short 3-4day workshop training would be very helpfull.

I have so little spare time it is hard to say what I would attend. What Phase II of the IBM-SP arrives, you may need some programming classes on effective use of the hybrid architecture.

I have not seen any classes advertised that looked interesting enough to take me away from my immediate assignments. Maybe I am not a good target for them however.

User Comments

What does NERSC do well? 58 responses

34	user support
29	stable, well managed production environment; good hardware
9	everything / nothing singled out
7	documentation
6	software, tools
6	storage environment
5	well managed migrations and upgrades
3	allocations process
3	announcements to users

What should NERSC do differently? 49 responses

18	provide more cycles, improve turnaround time
7	inodes/storage improvements
6	software enhancements
4	manage systems differently
4	provide different hardware
3	accounting/allocations improvements
3	batch improvements
3	better documentation
2	networking/bandwidth improvements

How does NERSC compare to other centers you have used? 49 responses

25	NERSC is the best / better than
9	NERSC is good / only use NERSC
7	NERSC is the same as / mixed response
6	NERSC is less good

What does NERSC do well? 58 responses

User support

I have been very satisfied with most NERSC services and competencies. Great response time and quality answers to my questions/requests. Also, I find the web page well done.

[...]Very responsive consulting staff that makes the user feel that his problem, and its solution, is important to NERSC. [...]

consulting is awesome

People to people contact is excellent. General attitude from Horst, to Kramer, to Verdier, to account support and consulting is outstanding with respect to dealing with the users and their issues.

listen to users and effect changes in service

[...] Gives users good access to consultants.

Responds to users needs promptly and effectively.

The consultants are especially helpful.

Consulting, web, availability of machines.

Once I established a good rapport with the consultants, they were helpful. At first it was difficult to get straight answers.

Customer support is always timely and accurate.

[...] 2. User services (i.e. consulting and account support) are excellent.

Consulting service is excellent!

Good response from the consultants and sysadmins.

The consultant and account services are superb.

Consulting is good but very little else.

Information to users, maintainnance.

Consulting team is very excellent.

Stable, well managed production environment; good hardware

Provides stable production environment with excellent support services and first rate hardware for scientific computation.

Provide state-of-the-art computation, maximum speed, processors, capacity

Keep everything working smoothly. Excellent computer personnel.

Good management and accounting of a few big machines; good effort at maintaing WWW pages, help with standard questions, etc.

Keep allowing interactive time. Consultants helpful at times. Pretty good access to hardware. Pretty good tools.

SP. Batch turnaround time. I/O space. Mass storage

Provide good hardware, respond well to users.

Overall availability of resources and waiting times are quite predictable and constant through the year.

Typically tries to provide an adequate production environment. [...]

1. Provide world-class supercomputing resources. [...]

Provide access to high-performance computers with a variety of different architectures.

Provide excellent computing resources with high reliability and ease of use. [...] My work requires interactive use and the conversion of SEYMOUR was extremely helpful and welcomed. However ... see next box...

Good provision of flops and support.

NERSC is doing very good job to give us a very good environment of computing. I am very satisfied overall.

Documentation; announcements

[...] The announcement managing and web-support is very professional.

Warn us of scheduled downtime.

I'm very impressed with the friendliness and helpfulness of the consulting staff. I also find the e-mails about down-times helpful.

Nersc provides good support services, documentation, etc.

High availabilty of machines. Good online documentation. Responsive support team.

Software, tools

NERSC is a very well-managed Supercomputer Center. It provides excellent Fortran compilers and run-time environment on the Crays. NERSC is a most valuable resource for my research in nuclear structure theory.

Support of software. Have knowledgable staff to assist researchers with computer difficulties - both hardware and software aspects.

Maintenance of hardware and software is excellent. [...]

NERSC maintains the most updated hardwares and softwares which are very user-friendly.

Storage environment

Manages large simulations and data. The oodles of scratch space on mcurie and gseaborg help me process large amounts of data in one go.

Storage, connectivity.

ease of use of mass storage, access time to stored data

Executes the jobs, stores and transfers the data

Well managed migrations and upgrades

In general you are to be congratulated on the transition from 1980's supercomputing to Y2K multiprocessing. Machines are generally up and the storage facilities seem good (from my perspective as a fairly light user).

NERSC has been the most stable supercomputer center in the country particularly with the migration from the T3E to the IBM SP

keeps machines up. Upgrades facility in a timely fashion.

everything

yeah, NERSC does well

Most everything. A first-class operation.

Almost every aspect. Hardware, software, and consulting. I really happy to see efforts going on to keep on improving the current system.

It is the best among all I have used. I gives it five stars.

Makes supercomputing easy.

Provide timely computational service upto expectations.

NERSC undoubtedly is the best supercomputing facility that I have used over the years. NERSC has become available to academics all over the world with resources which are unthinkable in any academic environment. Credit must go to a major extent to Dr. Horst Simon and his associate Directors for this achievement and success! Ms. Francesca Verdier and her staff , especially those mentioned above in the Consulting Services have done an excellent job of helping users how to utilize the unmatchable resources at NERSC for solving major scientific and Engineering problems. I sincerely express my thanks to all at NERSC for making it a great pleasure for me to use the facilities at NERSC from a remote site [name omitted]. I look forward to use the NERSC facilities in the FY2001.

NERSC does very good job.

yes

Allocations process

Consulting was very good. Allocation service was very helpful.

User support and reponse. System allocation of resources.

The web-based allocation procedure is very convenient.

Other

I hope to solve my problems with MPI so my code can compile and run well on both SP and T3E.

access, consultants, visualization help

Training, consulting, web pages, making bleeding edge hardware available.

I think that the support is very good. The new IBM SP was a very welcome addition.

What should NERSC do differently? 49 responses

Provide more cycles, improve turnaround time

NERSC is doing a wonderful job. My great need is just for more resources (more time on the machines and more nusers/resources/greater storage speed.

Wait time in large job batch queues is long, which costs DOE programs a lot of money. Need to increase throughput.

Much more work needs to be done on providing greater resouces to the community.

[...] Also NERSC really needs many more processors given the demand.

Give sole access of all machines to me.

Find a way to shorten batch queues (!) [...]

DOE should put new machines in NERSC other than other places if DOE wants new machines.

The batch queue system on the PVP cluster does not fit my needs. I get most throughput on the slowest machine.

more PVP machines

Shorten the time it takes a job to run on the PVP.

provide more pvp cycles, particularly this year

Add more capacity for the heavy work loads.

It takes too long time to run a big memory job on PVP.

Improve on its vector computing. [...]

Inodes/storage improvements

The user file limits are unrealistically low on the IBM and Cray systems. NERSC seems unfriendly to users with large data/memory requirements.

[...] Improve on its disk resources, especially its inode resources.

I do not like the "inode" business in user file quota. I think it is outdated now and should be removed.

My only complaint (this is the same from year to year) is the I-node limit.

Taylor to individual request. [What I meant was something like the allocation of file space (and other restrictions) should consider individual needs. Please do not take this as my criticism. I am doing well within the allocated space.]

Software enhancements

Improve the global working environment for remote sites by installing AFS on the IBM SP. This way, for example, the same CVS repository can be used by several users at different sites.

[...] Some support for heterogeneous computing and more support for code steering on the T3E and SP.

Build computer systems comparble to those available at Los Almos and LLNL. Put more effort into using software tools such as GLOBUS as a model for remote computing using NERSC resources.

I am very satisfied with NERRSC. If I could ask you for one favor it would be to make the Nedit text editor available on the Crays (open source software).

More support for Mathematica is appreciated.

Keep investing in adding quality software in chemistry (an others) aplications. For example Jaguar...

Manage systems differently

NERSC should reduce the number of interactive machines. It should encourage batch submission and give more "credit" for use of more processors.

Interactivity and wait times for batch jobs at times can get very poor on your systems. Instead on aiming for maximum utilization of CPU cycles, you ought to find ways to maintain better "headroom" between the resources you have available and the user demand.

You need to make new systems available on a more rapid time scale once they are installed. NERSC seems to take a much longer time to make new systems available to users than other computer centers I've used (with no apparent improvement in functionality resulting from this slow acceptance process).

NERSC sometimes makes bad choices in how they set up their systems. For example, the way Fortran-90 modules have to be handled on the IBM SP is very time inefficient for users who are developing codes. Apparently, from my experience with other IBM SP's, the awkward way NERSC chose to do this is completely unecessary since others have not chosen to use this configuration.

Because NERSC makes supercomputing easy, it is somewhat a victim of its own success. By this I mean that truly large computational tasks suffer because resources are used by smaller tasks. Climate runs often take many hundreds of hours to execute, even in highly parallel configurations. The successful climate modeling centers (none are domestic...) all are able to access dedicated resources. It is difficult for the US climate modeling community to compete with European and Japanese groups if it must further compete with other fields for needed computational resources. As this situation is controlled by forces external to NERSC, I don't see much relief soon.

Managerial types claim than NERSC is a "capability" center. From my limited experience this is not really so however. Looking at gseaborg, e.g., there's a 200 node job that's been in the regular queue waiting for a 6 hr slot for a week and a half, but the machine's full of 1,3,4,8,16 node jobs. None of the jobs can run longer than 6 hrs, and they all presumably have a tight limit of the number of files they can generate as output.

Provide different hardware

save money: dump the Cray's (or out them into a museum), get a O2000 class box as an alternative.

Provide more middle of the road computing resources

The PVP machines are at the end of their line, it seems. NERSC should help users learn how to migrate away from these machines in the coming year.

It would be great to have alternative platforms, such as a large scale linux cluster.

Accounting/allocations improvements

Information about remaining budget should be attached to each output file.

I'm not very happy with the new GETNIM versus SETCUB command. Also, having to go to the NERSC web page to consult the remaining detailed allocation is clearly *not* a progress. I do not understand why this change happened. PS- Sorry not to have more time to fill in detail the rest of the survey.

Improve the allocation process to reflect likely results of hardware changes, such as the conversion of SEYMOUR to interactive. It costs six (6) times as much to compute interactively on SEYMOUR with only a factor of 2 or so improvement in execution time. My 2001 allocation was based on KILLEEN usage for most of FY2000 ... hence I did not use but 30-40% of my 2000 allocation. As a result my 2001 allocation was reduced to 1/3 of the 2000 allocation. Now in FY2001 I cannot use SEYMOUR at all , as it will deplete my allocation in a few months. I could make very good use of SEYMOUR to expedite my work, but that is now not an option. Hence, the availability of SEYMOUR will not help me AT ALL in FY2001 ... just because of the shortcomings of the allocation process.

Batch improvements

I used NERSC only for computation and for me the time available and the time for the job to stay in a queue are the most important. And it was OK. The way to monitor a job can be improved.

Should consider increasing the debug queue time limit on IBM and T3E.

I would like to run still longer jobs, but this is in conflict with the point above, I suppose ... [Overall availability of resources and waiting times are quite predictable and constant through the year.]

Better documentation

I would like a more friendly interface. [What i mean is that when i encounter some problem in the programming in FORTAN or shellscript, I can not find some help quickly on line. For example, on line help for "nertug", "ja", "$If DEF, BACK", "#QSUB", some FORTAN function such as "SSUM(...)" etc can not be found.]

describe access and usefulness to HPSS a little better

maybe you should consider an FAQ on questions to consultants in areas such as programming, UNIX utilities, etc which come up repeatedly or would be useful for active users to be aware of.

Networking/bandwidth improvements

Improve connectivity from outside labs (eg. Los Alamos Natl Lab) that also have firewalls.

Greatly improve the ease and speed of very large dataset transfers between NERSC and other labs. Security, finger pointing, and multi point of contact are impeding research.

Other

Don't bring the machines down for maintanence at 4 pm. Re-do section 6 on this Web form.

Use a survey with many fewer and less vague and overlapping questions.

Nothing comes to mind.

Keep up the excellent job you all are doing at NERSC even after some machines get transferred to a building in Oakland.

The consultant help with specific machines is sometimes weak. We had lots of problems porting a well tested code that ran on another IBM SP3 with a somewhat different architecture.

The recent stability problems with mcurie have gone on long enough to make me wonder that something is wrong somewhere. I have no idea if the problem is mostly one with NERSC or elsewhere, but I am unpleasantly surprised every day or two by another crash. Yuck.

Return to the way things were at Livermore.

Time difference is somewhat of an issue - relocate to the east coast :-)

How does NERSC compare to other centers you have used? 49 responses

NERSC is the best / better than

NERSC is generally superior to all others I have used. Hence, I don't care to use others much anymore.

NERSC is proably the best center I have been using. It has a very good assistance service and resources.

It is superior in its consulting, account support and training to [site anme omitted].

Much better support than provided at UCSB, where an Origin 2000 is available, but there is basically no support for using it. Machines are changing so rapidly that it is impossible for the researcher to keep up with the changes without the sort of help NERSC can provide. You are performing a vital service to the research community.

Much better than any other centers. [4 site names omitted]

As I said, it is the best, it is better than others I have ever used, such as computer centers in [3 site names omitted]

Much better!

Unmatchable!

I would say NERSC's IBM SP runs better than SDSC's BH SP.

Much better [site name omitted]

NERSC is the best of the centres I have used.

NIC in Juelich, Germany. NERSC allows for more flexible dealing with budget and generally budget enables more calculations.

Better than SDSC/NPACI in terms of system (IBM) reliability and throughput. Most of our effective computing is done at NERSC. My NAS account is too recent to compare NAS to NERSC in a fair manner.

Better than BNL, CERN (Switzerland), JINR (Russia), IHEP (Russia)

The hardware (file systems especially) on gseaborg seems to be much more reliable than that on the IBM SP bluehorizon at NPACI/SDSC. The gpfs nodes at NPACI are suddenly unavailable on occasion.

Much better than [site name omitted]

I have used [site name omitted] in the past (about 6 years ago). You are doing much better. Keep up the good work.

In my opinion, NERSC does better than most other centers that I used, such as [2 site names omitted].

Although I haven't really use some other centers, except I had an account in NCAR 5 years ago, I should say NERSC is doing the best.

Compared to [site name omitted], how could you not be superb in comparison. Relative to the LLNL NERSC of the early 90's, things are far better overall.

NERSC is better. [2 site names omitted]

Comparing to: [2 site names omitted] NERSC has the BEST consultants. Their web pages are easily superior.

Top of the list. [site name omitted]

The allocation procedure in NERSC is more convenient than the one in NCSC.

Best center. Easier to access than LANL or LLNL. More responsive than NCAR. Keep up the good work.

NERSC is good / only use NERSC

I only use NERSC, so i can not make a comparation.

NERSC is pretty compared to other centers.

Great. (SDSC, German Max Planck Center)

Very well. CCS at ORNL, Maui.

Very very good! The other center I have used: Livermore computing

It is very good. I use LLNL and NAS also, but spend a good deal of my time on NERSC machines. Keep up the good work!

Very well. Maui, SDSC.

Very well.

NERSC is the same as / mixed response

Principal other experience is with the LLNL center, which is also excellent. In distant past, used several others which offered mostly cycles but little infrastructure.

Compared to NCAR, machines at NERSC go down more regularly and jobs are killed more often. Compared to LANL, NERSC is more stable.

Apart from NERSC, I have used NCSA and Argonne National Lab machines. NERSC is comparable in service to these centers.

san diego supercomputer center. nersc is better except as indicated in one instance above [maximum batch job running times should be 24 hours. it is 18 hours at san diego supercomputer center]

Nersc is competitive with other major facilities, such as ERDC (DoD)

I only have LLNL LCC to compare to (and lots of the LLNL NERSC staff who stayed at LLNL). Both are outstanding.

Roughly the same as San Diego, NASA Ames and Goddard.

NERSC is less good

DoD systems seem more oriented to the large user. I have used systems at NAVO and ERDC (Cray and IBM).

I have never developed a code on a NERSC machine, since this is quite inconvenient due to long wait times. in that respect the experience I made last year at a different large center (Forschungszentrum Juelich, Germany) were quite different and more pleasant.

I prefer modi4 (at NCSA) as it has a longer wallclock limit, and still has quite reasonable queue waits.

I used also SDSC SP2 and Blue Horizon and University of Texas Cray SV1 and SP2. Blue Horizon was the best (just the best hardware).

Compared to DoD's CWES site, the limits on outfiles, and the queue systems are just too much geared at the little guy. Running on NERSC requires far too much babysitting of my runs: resubmitting, running high priority, tarring up output, etc.

I also use the eagle machine at the DOE High Performance Computing Research Center at Oak Ridge National Laboratory. The interactivity and turnaround time for batch jobs is much better here than on GSeaborg. Also, I like the fact that they have configured their system so that one doesn't have to go through the unusual contortions with Fortran-90 modules (i.e., putting them off in special disk areas which are not permanent) that NERSC requires of us. NERSC should learn how they have set up their IBM-SP in this respect and do similar things.

Other

LANL open supercomputing, Argonne, local clusters

ACL at Los Alamos National Lab and QCDSP at Columbia University.

Show Pagination