NERSC FY 2000 User Survey Results
NERSC extends its thanks to all the users who participated in this year's
survey. Your
responses provide feedback about every aspect of
NERSC's operation, help us judge the quality of our services, give DOE
information on how well NERSC is doing, and point us to areas we
can improve. Every year we institute changes based on the survey; the FY 1999
survey resulted in the following changes:
- We created a long-running queue (12 hours maximum) for jobs using up to
256 PEs on the Cray T3E. Last year 7 users asked for longer T3E
queues; this year only one.
- We opened a Cray SV1, seymour, for interactive use.
As highlighted
below, this change was well appreciated.
- We created new email
lists to
keep users better informed of NERSC announcements and changes.
This change wasn't reflected in this year's survey results.
- We enhanced the HPCF website; overall satisfaction with the website was
higher this year.
In FY 2000, 134 users responded to our survey. The respondents represent
all 5 DOE Science Offices and a variety of home institutions:
see User Information.
On a 7-point scale, with 7 corresponding to Very Satisfied
to
Very Dissatisfied, the average scores ranged from a high of 6.7 for
our training classes and the PVP Fortran compilers to a low of 4.3 for PVP
and T3E batch job wait time. Other areas with very high user satisfaction are
consulting advice and SP availability (uptime).
The areas of most importance to users are the available computing hardware (the
amount of cycles), the overall running of the center and its connectivity
to the network. See the Overall Satisfaction and
Importance summary table.
This year, the largest increases in user satisfaction came from the PVP
cluster. Following the conversion of Seymour last year to an interactive
machine, user satisfaction for the ability to run interactively on the PVP
increased by almost one point. Five other PVP ratings increased by 0.6
to 0.8 points. See the hardware and
software sections. Other areas showing a
significant increase in satisfaction are HPSS performance and response time,
hardware management and configuration, the HPCF website, and
the T3E Fortran compilers. Only two scores were significantly lower this year
than last: T3E batch wait time and consulting services (the latter still
received high scores overall).
When asked what NERSC does well, 34 respondents focussed on NERSC's excellent
support staff and 29 pointed to our stable and well managed production
environment.
Other areas singled out
include well done documentation, good software and tools, a very useful storage
environment, and well managed migrations and upgrades that "make supercomputing
easy". When asked what NERSC should do differently the most common responses
were to provide more resources, especially more cycles and inodes. Of the 47
users who compared NERSC to other centers, 53% said NERSC is the best or better
than other centers. Several sample
responses below give the flavor of these comments; for more details see
User Comments.
- "Very responsive consulting staff that makes the user feel that his
problem, and its solution, is important to NERSC"
- "Provide excellent computing resources with high reliability and ease of
use."
- "The announcement managing and web-support is very professional."
- "Manages large simulations and data. The oodles of scratch space on mcurie
and gseaborg help me process large amounts of data in one go."
- "NERSC has been the most stable supercomputer center in the country
particularly with the migration from the T3E to the IBM SP".
- "Makes supercomputing easy."
Below are the survey results. You can also see the survey text.
- User Information
- Overall Satisfaction and Importance
- All Satisfaction Questions Ranked and FY 1999 to FY 2000 Changes
- Consulting and Account Support
- Web and Communications
- Hardware Resources
- Software Resources
- Training
- Comments about NERSC
1. User Information
Number of responses to the survey: 134
Respondents by DOE Office and User Role:
| Office | Respondents | Percent |
| ASCR | 9 |
7 |
| BER | 29 |
22 |
| BES | 28 |
21 |
| FES | 28 |
21 |
| HENP | 37 |
28 |
| guests | 3 |
2 |
|
|
| User Role | Respondents | Percent |
| Principal Investigators | 35 | 26 |
| Repo managers | 22 | 16 |
| Users | 77 | 57 |
|
Respondents by Organization:
| Organization | Respondents |
| Berkeley Lab | 19 |
| Livermore | 9 |
| Los Alamos | 8 |
| Argonne | 7 |
| UC Berkeley | 7 |
| Oak Ridge | 6 |
| General Atomics | 4 |
| New York Univ. | 4 |
| UC Los Angeles | 4 |
| NCAR | 3 |
| PNNL | 3 |
| U. Maryland | 3 |
| UC San Diego | 3 |
|
|
| Organization | Respondents |
| Ames Lab | 2 |
| City U. of New York | 2 |
| Florida State | 2 |
| Ohio State | 2 |
| PPPL | 2 |
| U. Texas | 2 |
| William & Mary | 2 |
| other universities | 31 |
| other labs | 9 |
|
What NERSC resources do you use?:
| Resource | Responses |
Percent |
|
Responses to Corresponding |
| IBM SP | 64 | 48 |
| 56 |
| Cray T3E | 86 | 64 |
| 70 |
| Cray PVP | 66 | 49 |
| 44 |
| HPSS | 70 | 52 |
| 70 |
| Visualization Server | 6 | 4 |
| 8 |
| Math Server | 10 | 7 |
| 11 |
| PDSF | 5 | 4 |
| |
| NERSC web site | 43 | 32 |
| 92 |
| Consulting services | 57 | 43 |
| 100 |
| Account support services | 46 | 34 |
| 83 |
| Operations | 14 | 10 |
| |
| Other | 2 | 1 |
| |
Other resources listed: ACTS, AFS, Three machine linux network for development,
Workstation, PC support.
How long have you used NERSC?
| Time | Number |
|---|
| 6 months or less |
18 |
| 6 months - 3 years |
45 |
| more than 3 years |
68 |
What desktop systems do you use to connect to NERSC?
| Operating System Type | Number |
| UNIX | 175 |
| PC | 73 |
| MAC | 31 |
| VMS | 1 |
| Individual Systems | Number |
| UNIX-linux | 60 |
| UNIX-solaris | 50 |
| MAC-macos | 31 |
| PC-win98 | 29 |
| UNIX-irix | 27 |
| PC-winNT | 21 |
|
| Individual Systems | Number |
| UNIX-osf | 16 |
| UNIX-aix | 15 |
| PC-win95 | 12 |
| PC-win2000 | 8 |
| UNIX-hpux | 7 |
| X-windows | 1 |
| VAX | 1 |
| LinuxPPC | 1 |
| Ultrix | 1 |
| OS/2 | 1 |
| PC-win3 | 1 |
|
What type of connection do you often use to connect to NERSC?
| Connection Type | Number |
|---|
| Ethernet | 116 |
| Cable Modem | 17 |
| DSL | 9 |
| ISDN | 6 |
| Modem | 30 |
| Other | 4 |
Browser Used to Take Survey:
| Browser | Number |
|---|
| Netscape 4 | 110 |
| Internet Explorer 5 | 16 |
| Internet Explorer 4 | 6 |
| Netscape 3 | 2 |
2. Overall Satisfaction and Importance
Legend
| Satisfaction | Average Score |
| Very Satisfied | 6.5 - 7 |
| Mostly Satisfied | 5.5 - 6.4 |
| Somewhat Satisfied | 4.5 - 5.4 |
|
| Importance | Average Score |
| Very Important | 2.5 - 3 |
| Somewhat Important | 1.5 - 2.4 |
|
Overall Satisfaction with NERSC
Frequency Histogram Plots
| Topic |
|
Satisfaction |
|
Importance |
| No. of Responses |
Avg. (1-7) |
Std. Dev. |
Change from '99 |
No. of Responses |
Avg. (1-3) |
| Consulting services |
111 |
6.39 |
0.82 | -0.19 |
114 | 2.67 |
| Account support |
106 |
6.39 |
1.01 |
0.00 | 106 |
2.38 |
| Overall satisfaction |
128 |
6.15 |
1.03 | -0.10 |
119 | 2.85 |
| HPCF web site |
103 |
6.13 |
0.96 |
0.26 |
101 |
2.43 |
| Software maintenance and configuration |
88 |
6.08 |
1.04 | 0.19 |
83 | 2.61 |
| Mass storage facilities |
90 |
6.03 |
1.12 | -0.03 |
85 | 2.56 |
| Network connectivity |
104 |
6.01 |
1.18 | -0.16 |
98 | 2.80 |
| Hardware management and configuration |
98 |
6.00 |
1.18 | 0.29 |
92 | 2.66 |
| Available software |
107 |
5.98 |
1.00 | -0.01 |
102 | 2.59 |
| Available computing hardware |
109 |
5.90 |
1.25 | -0.06 |
104 | 2.91 |
| Allocations process |
98 |
5.79 |
1.19 | -0.08 |
91 | 2.70 |
| Software documentation |
95 |
5.62 |
1.13 | 0.16 |
90 | 2.51 |
| Web-based training |
61 |
5.23 |
1.24 |
0.04 |
64 |
1.97 |
| Training classes |
47 |
5.13 |
1.33 |
0.28 |
57 | 1.75 |
| Visualization services |
45 |
4.67 |
1.17 |
0.30 |
46 | 1.65 |
All Satisfaction Questions and FY 1999 to FY 2000 Changes
Legend
| Satisfaction | Value |
| Very Satisfied | 7 |
| Mostly Satisfied | 6 |
| Somewhat Satisfied | 5 |
| Neutral | 4 |
How Satisfied are you?
| Topic |
No. of Responses |
| Training: classes (attendees) |
14 | 6.71 |
| Software: PVP Fortran Compilers |
32 | 6.66 |
| Consulting: Timely response |
100 | 6.63 |
| SP: Uptime |
50 | 6.52 |
| Consulting: Quality of technical advice |
99 | 6.49 |
| Consulting: Followup |
84 | 6.42 |
| PVP: Uptime |
39 | 6.41 |
| Software: T3E Fortran Compilers |
60 | 6.40 |
| Consulting overall |
111 | 6.39 |
| Account support |
106 | 6.39 |
| HPSS: Reliability |
62 | 6.39 |
| Software: PVP Local documentation |
21 | 6.38 |
| Account support: Ease of obtaining account info |
83 | 6.34 |
| HPSS: Uptime |
62 | 6.31 |
| HPSS: Overall |
70 | 6.26 |
| Software: PVP User Environment |
32 | 6.25 |
| Web: Accuracy |
81 | 6.22 |
| Training: Online Tutorials |
32 | 6.22 |
| HPSS: Performance |
64 | 6.20 |
| Software: T3E User Environment |
57 | 6.18 |
| Software: T3E Programming Libraries |
39 | 6.18 |
| Consulting: Response to special requests |
74 | 6.16 |
| Overall satisfaction with NERSC |
128 | 6.15 |
| HPSS: User interface |
63 | 6.14 |
| HPCF web site overall |
103 | 6.13 |
| Training: Online class slides |
19 | 6.13 |
| PVP: Ability to run interactively |
35 | 6.11 |
| T3E: Uptime |
65 | 6.09 |
| Software maintenance and configuration |
88 | 6.08 |
| Software: SP User Environment |
46 | 6.07 |
| Software: SP Fortran Compilers |
46 | 6.07 |
| Software: PVP Performance and Debugging Tools |
16 | 6.06 |
| Software: SP Local documentation |
39 | 6.05 |
| Mass storage overall |
90 | 6.03 |
| HPSS: Response Time |
75 | 6.03 |
| Network connectivity |
104 | 6.01 |
| T3E: Overall |
70 | 6.01 |
| Hardware management and configuration |
98 | 6.00 |
| Web: Timeliness |
76 | 6.00 |
| Web: T3E Section |
66 | 6.00 |
| Software: T3E Local documentation |
42 | 6.00 |
| Software: SP Programming Libraries |
30 | 6.00 |
| Software: PVP C/C++ Compilers |
13 | 6.00 |
| Training: Teleconference lectures |
13 | 6.00 |
| Available software |
107 | 5.98 |
| Web: Getting Started Guide |
55 | 5.96 |
| Account support: Ease of modifying account info |
68 | 5.93 |
| Software: T3E C/C++ Compilers |
30 | 5.93 |
| Software: PVP General tools and utilities |
14 | 5.93 |
| Web: NERSC-specific info |
67 | 5.91 |
| Available computing hardware |
109 | 5.90 |
| Software: PVP Accounting tools |
20 | 5.90 |
| SP: Overall |
56 | 5.88 |
| PVP: Overall |
44 | 5.86 |
| Software: PVP Application software |
18 | 5.83 |
| Software: PVP Programming Libraries |
16 | 5.81 |
| Allocations process |
98 | 5.79 |
| Web: File Storage Section |
46 | 5.78 |
| Web: Ease of navigation |
92 | 5.78 |
| Software: T3E Application software |
23 | 5.78 |
| PVP: Disk configuration and I/O performance |
31 | 5.77 |
| Web: SP Section |
57 | 5.75 |
| Software: T3E Accounting tools |
36 | 5.75 |
| Software: SP General tools and utilities |
25 | 5.72 |
| Software: SP C/C++ Compilers |
25 | 5.72 |
| T3E: Ability to run interactively |
58 | 5.71 |
| Software: T3E Bug resolution |
30 | 5.70 |
| Software: PVP Vendor Documentation |
16 | 5.69 |
| Software: SP Application software |
18 | 5.67 |
| Software: T3E General tools and utilities |
37 | 5.65 |
| Web: Programming Info |
71 | 5.63 |
| Software documentation |
95 | 5.62 |
| Web: Searching |
70 | 5.61 |
| Software: T3E Vendor Documentation |
29 | 5.59 |
| Web: PVP Section |
43 | 5.56 |
| Software: T3E Performance and Debugging Tools |
39 | 5.56 |
| Math Server: Newton |
11 | 5.55 |
| SP: Ability to run interactively |
41 | 5.51 |
| Software: SP Vendor Documentation |
26 | 5.50 |
| Software: SP Bug resolution |
22 | 5.45 |
| T3E: Disk configuration and I/O performance |
71 | 5.35 |
| Software: SP Accounting tools |
26 | 5.31 |
| T3E: Batch queue structure |
56 | 5.27 |
| Visualization Server: Escher |
8 | 5.25 |
| Web-based training |
61 | 5.23 |
| SP: Batch queue structure |
41 | 5.22 |
| SP: Disk configuration and I/O performance |
40 | 5.20 |
| Training classes (all responses) |
47 | 5.13 |
| Software: PVP Bug resolution |
10 | 5.10 |
| PVP: Batch queue structure |
34 | 5.03 |
| Software: SP Performance and Debugging Tools |
29 | 4.69 |
| Visualization services |
45 | 4.67 |
| SP: Batch wait time |
46 | 4.54 |
| T3E: Batch wait time |
63 | 4.33 |
| PVP: Batch wait time |
38 | 4.26 |
FY 1999 to FY 2000 Changes
The following are statistically significant changes for responses to questions common to the
FY 1999 and FY 2000 user surveys.
| Topic | FY 2000 Satisfaction | FY 1999 Satisfaction | Change |
| PVP Cluster: Ability to run interactively |
6.11 | 5.18 | +0.93 |
| PVP Cluster: Overall |
5.86 | 5.05 | +0.81 |
| PVP NERSC Documentation |
6.38 | 5.68 | +0.70 |
| PVP Vendor Documentation |
5.69 | 5.03 | +0.66 |
| PVP Fortran Compilers |
6.66 | 6.04 | +0.62 |
| PVP Performance and Debugging Tools |
6.06 | 5.46 | +0.60 |
| HPSS Response Time |
6.03 | 5.68 | +0.35 |
| HPSS Performance |
6.20 | 5.90 | +0.30 |
| Hardware management and configuration |
6.00 | 5.71 | +0.29 |
| HPCF Website |
6.13 | 5.87 | +0.26 |
| T3E Fortran Compilers |
6.40 | 6.20 | +0.20 |
| Consulting Services |
6.39 | 6.58 | -0.19 |
| T3E Batch Job Wait Time |
4.33 | 5.04 | -0.71 |
3. Consulting and Account Support
Legend
| Satisfaction |
Average Score |
|
Significance of Change |
| Very Satisfied | 6.5 - 7 |
|
not significant |
| Mostly Satisfied | 5.5 - 6.4 |
Consulting Services / Account Support Satisfaction
Frequency Histogram Plots
| Question |
Responses |
Avg. (1-7) |
Std. Dev. |
Change from '99 |
| Timely response to consulting questions |
100 |
6.63 | 0.73 | -0.01 |
| Quality of technical advice from consultants |
99 |
6.49 | 0.80 | -0.03 |
| Followup to initial consulting questions |
84 |
6.42 | 0.96 | -0.01 |
| Ease of obtaining account information |
83 |
6.34 | 0.95 | 0.08 |
| Response to special requests |
74 |
6.16 | 1.19 | -0.12 |
| Ease of modifying account information |
68 |
5.93 | 1.27 | -0.22 |
Summary of Comments
| 15 |
good service |
| 4 |
improve follow-up |
| 4 |
improve quality of response |
| 8 |
good service |
| 4 |
suggested enhancements |
| 3 |
needs improvement |
Individual Comments and suggestions regarding NERSC Consulting Services:
23 responses
- good service
-
Your web pages on how to do things are great. [...]
I am especially pleased with NERSC consulting services. Every technical problem
I have
encountered has been remedied rather quickly and in a professional manner.
excellent jobs, even better than my Lab.
You guys have always been a pleasure to deal with.
good people, good attitude, responsive
I receive lots of very helpful information from the consultants.
The consulting people are always available and very helpful.
Thank you for your effort.
Keep up the good work!
Keep up the excellent and knowledgeable work
Consulting services are good.
very good services
keep up the good work
I have been most satisfied with the help (but this category is not in the
questionaire) I got
from NERSC consulting , as without the help and
advice of David Turner,Harsh Anand, Tom DeBoni ,Majdi ,Drs. Jonathan
Carter and Richard Gerber , I would not have been able to perform the
"the most gargantuan" calculations I have ever performed. These
calculations would have been unthinkable on any supercomputer facility
anywhere in the
world, as it required about 70-80 Gb disk, 256-512 RAM and about 200 CPU
hrs per run on a
Cray J90! My sincerest thanks to all
in the Consulting , and especially Ms. Francesca Verdier for her excellent
guidance and advice
throughout my usage of NERSC facilities for over 5 years.
- improve follow-up
-
Generally excellent although I had to call twice for the same question
and nobody got back to me with an answer or even saying that they were
still working on it.
[...] Response times could be better.
Would be nice to have a list of consultants's email addresses available.
Will be helpful especially for follow up question(s).
As a result of last year's survey, I was contacted by a consultant to help
port a PVP code to the T3E. She was very friendly but over the months that
past, nothing was ever done. I contacted her every few weeks, and she said
she would soon get to it, then the SP came, and she was overwhelmed.
Eventually, I learned that she no longer had a valid email address at
NERSC/LBL, and thus I assume she left the company. I do not expect that NERSC
carry the load for me of learning to work with the T3E, but once the contact
was made, help offered, etc, it seems poor form not to carry out what was
offered. Through out last year I delayed investigating certain aspects of
porting my code because of the continued assumption of help from her. Once
again, I do not assume that NERSC is responsible for helping me, but the way
in which the promised help never materialized contributed negatively to my
needed migration of codes to the mpp environment.
- improve quality of response
-
In general, 90% of my questions are answered quickly and accurately. My only
"complaint" is it
would be nice to have every single consultant versed in every possible
oddity and detail of
F90 and debuggers on every single platform (unfortunately, there seem to
C/C++ users requiring support too!).
[...] Your consultants often don't have answers to
my questions and seem unable to get them.
Consultants had the tendency to blame problems on the code,
and not help with problems that lie in the NERSC hardware
and software. For example, we had trouble porting a code to
the NERSC SP3 that ran successfully on the NIST SP3.
As always, it depends on who answers the phone. [...]
- other
-
It would be nice to have after hours consulting available, especially
for those in otber time zones
Please give more default disk space
I have not used the consulting services in a long time.
Individual Comments and suggestions regarding NERSC Account Support Services:
16 responses
- good service
-
Also of very high quality.
I have had the best possible help if I ran into problems , and I
am grateful to the personnel in the Account Support Services.
keep up the excellent and timely work
I am very satisfied with the NERSC Account Support Services
very good.
keep up the good work
Account support people are always there for us too. Thanks.
- suggested enhancements
-
The robustness of the allocation process could be enhanced by allowing for the
submission of
non-native documents (PDF,Word Doc,txt) files to the allocations committee.
With the current
web-only interface one faces the difficult task of discussing the
theoretical framework of the
calculations with no equations to support the text. It would seem
reasonable to expect that
PDF files be used as an alternative standard to the web-only interface.
The new web site seems to be a good idea. Why not give summary information on
default repo on login that way you immediately see your repo status.
More aggressively notify abnormally rapid decrease of resources. I used up IBMSP
resources
in very short time using wrong priority by mistake. If some warning can be
made, this may not have occurred.
I have not yet learned how to use the full funcionality of
the replacement to setcub
- needs improvement
-
Getting an account set up used to take 24 hours. Lately it has been
taking much longer, and has always required us calling NERSC to get
the account information. We also have to deal with multiple people
at NERSC when asking for both T3E and SP accounts.
Try not to forget to get back to your users even if you cannot find
the answer to their initial question.
Account support services are poor.
4. Web and Communications
Legend
| Satisfaction | Average Score |
| Mostly Satisfied | 5.5 - 6.4 |
|
| Usefulness | Average Score |
| Very Useful | 2.5 - 3 |
| Somewhat Useful | 1.5 - 2.4 |
|
Frequency Histogram Plots
HPCF Web Site
| How satisfied are you? |
Responses |
Avg. (1-7) |
Std. Dev. |
Change from '99 |
| Accuracy of information |
81 |
6.22 | 0.92 | 0.00 |
| Timeliness of information |
76 |
6.00 | 0.89 | 0.01 |
| Getting Started Guide |
55 |
5.96 | 1.17 | -0.12 |
| T3E section |
66 |
6.00 | 0.89 | 0.01 |
| Info on using NERSC-specific resources |
67 |
5.91 | 1.01 | -0.02 |
| Ease of finding information on web site |
92 |
5.78 | 1.07 | 0.08 |
| File Storage section |
46 |
5.78 | 1.07 | -0.04 |
| SP section |
57 |
5.75 | 1.11 | |
| General programming information |
71 |
5.63 | 1.17 | -0.11 |
| Search facilities |
70 |
5.61 | 1.07 | -0.08 |
| PVP Cluster section |
43 |
5.56 | 1.14 | -0.13 |
Keeping Informed
| How useful are these? |
Responses |
Avg. (1-3) |
Std. Dev. |
Change from '99 |
| NERSC Announcements Email Lists |
88 |
2.45 | 0.66 | -0.18 |
| MOTD on computers |
78 |
2.27 | 0.73 | 0.18 |
| Announcements web archives |
79 |
2.05 | 0.81 | -0.11 |
| Phone calls from NERSC |
70 |
1.81 | 0.87 | -0.08 |
Summary of Comments
| 5 |
better navigation/searching |
| 3 |
content improvements |
| 3 |
good service |
| 10 |
satisfied |
| 2 |
don't send too many emails |
| Do you feel you are adequately informed about NERSC changes? |
Yes: 98 |
No: 5 |
| Are you aware of major changes at least 1 month in advance? | Yes: 86 |
No: 9 |
| Are you aware of software changes at least 7 days in advance? | Yes: 66 |
No: 16 |
| Are you aware of planned outages 24 hours in advance? |
Yes: 77 |
No: 10 |
Individual Comments and suggestions concerning the HPCF web site. 13 responses
- Better Navigation/Searching
-
Sometimes it seems one has to go through 2-4 levels to find things. Also,
details are often not at NERSC but at Cray or IBM.
Don't use it all that much as it ususally takes me more time to find
what I am looking for than it's worth.
Please add a table of contents/site overview if it is not there already. It
might also be helpful if
people coud build their own personl custom interface to the NERSC site that
includes links to
commonly used web pages, relevant messages, account information etc.
NERSC response: The Website Outline is at
http://hpcf.nersc.gov/web/outline.html and the Index of Titles at
http://hpcf.nersc.gov/web/pagetitles.html. Both are linked to from the
HPCF home page, but perhaps we should make
these links more prominent. The suggestion to provide a way for users to build
customized interfaces into the NERSC website is a good one: we will do it!
One problem I often run into is that I'll remember there was a class or
training session on some topic that I have a question about. However, to find
the particular presentation I'm interested in I have to remember the date
and/or place where the session was given. This is usually hard to recall so
it can take a lot of time to go back and find what I'm looking for. It would
be very helpful if all the training sessions, tutorials, classes, etc. could
be cross-referenced based on the topics covered in addition to the
chronological/location style organization that is currently used. Of course,
I can always do a search, but this tends to give too many references that
take time to sort through.
NERSC response: See
Index of Web-Based Lectures, by Topic. We will make this document easier
to find.
Search engines within NERSC is not effective enough.
- Content Improvements
-
As a new user to this system, I found it difficult to determine how to get a
job up and running on the system. The examples in the getting started guide
were not sufficient. The example about how to run a batch script at
http://hpcf.nersc.gov/running_jobs/cray/batch_start.html is a good start,
but
it is trivially simple. On the same page, it would be nice to have an example
of how to actually run a piece of fortran or c code -- including how to move
your files back to your home directory.
Would like to see more detail MPI I/O information for SP.
NERSC response: See Introduction to MPI
I/O.
The web site explains just bare minimum of a specific topic. As a user,
it would be very helpful if the web explains the bare necessities and
also great details of a specific topic of users' interests.
- good service
-
Your web site has answered the vast majority of my questions quickly, and
completely.
Great Job
Overall, the NERSC web sites are outstanding. Besides being useful
to me personally, they save me time because the first thing I tell
new users on my project is, "Look at the NERSC web pages. They contain
a lot of useful information."
- other
-
The IBM documentation is hard to use, and has been unavailable at times. I
don't know which link it was, but I was asked for a password to an IBM site
when I was finally (recently) closing in on some needed piece of
information. Together with the lack of support for Fortran 90 modules, the
lack of adequate (i.e., "transparent") documentation makes the IBM a
formidable challenge for development (and therefore for production).
For some reason always crashes my web brower
(Netscape) --- although other sites do not.
NERSC response: Note that
is the website for the NERSC Division at Berkeley Lab,
not the website for the NERSC HPC facility, which is
http://hpcf.nersc.gov. We believe that the problem reported has been
fixed.
Please tell us how would like to keep informed of changes and issues at
NERSC.
- Satisfied
-
Existing email messages are fine. Backup detail is available via WEB or a phone
call.
by e-mail
info on web
longterm changes, major outages (7-30d) per email
remainder (2d) MOTD
e-mail from lists is fine
Please keep it up the excellent job you all are doing !
Via e-mail and web postings
email is best.
I find the e-mails most helpful
send E-mails
email is the best.
login message is good for MOTD too.
- Don't send too many emails
-
too many operator emails telling us systems are going down that we don't even
use
MOTD get's lost in all the other stuff that comes to screen. I guess what's
best for me is a MOTD that jumps out at you and e-mail, but if you start
e-mailing too much stuff then I probably won't be as likely to read it. A lot
of it is my fault. I especially seem to always miss the "machine going down
at 4 today" messages. Then I get pummeled with messages from 3 to 4 pm, with
little if any notice prior to that. Incidentaly, I hate when the machines go
down at 4 pm. In fact if I was to pick anything that bugs me the most, that
would be it.
- Other
-
Messages sent to screen of platform involved. e-mail sent to my accounts
on the machine in question.
NERSC response: email is sent to your home institution, not
to a NERSC machine.
5. Hardware Resources
Legend
| Satisfaction | Average Score |
|
Significance of Change |
| Very Satisfied | 6.5 - 7 |
|
significant increase |
| Mostly Satisfied | 5.5 - 6.4 |
significant decrease |
| Somewhat Satisfied | 4.5 - 5.4 |
|
not significant |
| Neutral | 3.5 - 4.4 |
|
Frequency Histogram Plots
IBM SP - gseaborg
| How satisfied are you? | Responses | Avg. (1-7) |
Std. Dev. |
| Uptime |
50 |
6.52 |
0.86 |
| Overall |
56 |
5.88 |
1.31 |
| Ability to run interactively |
41 |
5.51 |
1.49 |
| Batch queue structure |
41 |
5.22 |
1.41 |
| Disk configuration and I/O performance |
40 |
5.20 |
1.62 |
| Batch job wait time |
46 |
4.54 |
1.88 |
Max. Number of Processors Used: 141 ( 48 responses)
Max. Number of Processors Code Can Effectively Use: 591 (40 responses)
Cray T3E - MCurie
| How satisfied are you? | Responses | Avg. (1-7) |
Std. Dev. |
Change from '99 |
| Uptime |
65 |
6.09 |
1.01 | -0.17 |
| Overall |
70 |
6.01 |
1.07 | -0.16 |
| Ability to run interactively |
58 |
5.71 |
1.35 | 0.11 |
| Disk configuration and I/O performance |
51 |
5.35 |
1.32 | 0.12 |
| Batch queue structure |
56 |
5.27 |
1.53 | -0.20 |
| Batch job wait time |
63 |
4.33 |
1.58 | -0.71 |
Max. Number of Processors Used: 146 ( 61 responses)
Max. Number of Processors Code Can Effectively Use: 300 (46 responses)
Cray PVP Cluster
| How satisfied are you? | Responses | Avg. (1-7) |
Std. Dev. |
Change from '99 |
| Uptime |
39 |
6.41 |
1.12 | 0.12 |
| Ability to run interactively |
35 |
6.11 |
1.35 | 0.93 |
| Overall |
44 |
5.86 |
1.41 | 0.81 |
| Disk configuration and I/O performance |
31 |
5.77 |
1.20 | 0.21 |
| Batch queue structure |
34 |
5.03 |
1.66 | 0.00 |
| Batch job wait time |
38 |
4.26 |
1.83 | 0.31 |
Max. Number of Processors Used: 9 ( 29 responses)
Max. Number of Processors Code Can Effectively Use: 10 (24 responses)
HPSS
| How satisfied are you? | Responses | Avg. (1-7) |
Std. Dev. | Change from '99 |
| Reliability |
62 |
6.39 |
1.18 | -0.07 |
| Uptime |
62 |
6.31 |
0.98 | -0.02 |
| Overall |
70 |
6.26 |
0.99 | 0.14 |
| Performance |
64 |
6.20 |
0.98 | 0.30 |
| User interface |
63 |
6.14 |
1.06 | 0.08 |
| Response time |
62 |
6.03 |
1.10 | 0.35 |
Server Satisfaction
| Satisfaction with | Responses | Avg. (1-7) |
Std. Dev. |
Change from '99 |
| Newton |
11 |
5.55 |
1.37 | 0.30 |
| Escher |
8 |
5.25 |
1.28 | -0.20 |
Summary of Comments
| 7 |
hard to use/software problems |
| 7 |
improve turnaround time |
| 6 |
provide longer queues |
| 5 |
good machine |
| 4 |
disk issues: more inodes, more local disk, more GPFS nodes |
| 4 |
provide more interactive services |
| 3 |
change batch scheduling priorities |
| 2 |
inadequate documentation |
| 4 |
batch configuration |
| 4 |
switch/communications performance |
| 3 |
software concerns |
| 3 |
more processors |
| 8 |
improve turnaround time |
| 7 |
good machine |
| 2 |
needs more memory |
| 5 |
good machine / good interactive services |
| 3 |
C90 was better |
| 3 |
file issues: more inodes, migration |
| 2 |
improve turnaround time |
| 8 |
good system |
| 4 |
availability/performance problems |
| 2 |
interface improvements |
Individual Comments on NERSC's IBM SP: 28 responses
- Hard to use/software problems
-
I'm making only light use of the SP for development. NERSC staff have
been very helpful and responsive. The SP is not the easiest system to use
(C++ compiler problems), but these are not the fault of NERSC.
Don't like the requirement to use $TMPDIR for module compiling. Do like the
presence of
NCAR graphics. Not sure how I will use mixed SMP/MPP capability when 8-way
processors
arrive in Phase II. debuggers on Seaborg are pretty poor compared to PVP
or T3E.
Fortran compiler seems buggy, file storage per node very limited,
limited documentation, problems with batch submission, rather
slow processors compared to say DEC alpha, etc.
There is something wrong that I can not compile my code quite well. It is
related to MPI settings.
Home directories should not be GPFS becouase of F() module compiling problem.
The lack of support for Fortran 90 modules is something that frustrates
me a lot.
[...] Compared to February, the new compiler is slow. Recompiling
from scratch -- which is frequently necessary because my memory-mapped module
files are obliterated every time I am logged out, so that any change forces
me to start from the beginning -- takes hours. It would be nice if the old
compiler were available for those that wish to use it.
The new compiler
fails to compile my codes without the '-qhot' option because of "lack of
resources". This error message is not helpful.
The "llqs" routine is not as
useful for figuring out when a job will likely run compared to similar
routines on the T3E.
I prefer the version of totalview on the T3E, but this
may be a function of my overall frustration with the IBM. [...]
Gnuplot doesn't seem to pick up the tcsh line
editing commands when running under tcsh.
[...] The inability to ftp into gseaborg
makes editing files a chore for me, since I am accustomed to editing from an
emacs window running on my desktop. There is probably a way around this, but
I don't know what it is.
- Improve turnaround time
-
Job waits of 3-5 days for 6 hours of 64 nodes are common.
This is completely unacceptable, it is not possible to get useful
work done in this way. Available resources should either be
drastically increased or else NERSC closed down.
Batch queue's seem to be rather long in regular class implying the need for a
larger computer.
Could you prepare an up-to-date plot of the average wait-to-run for the
various queue's, as
a function of time, that could be viewed on a web site, for example.
Initially, I was very satisfied with the IBM SP. However, around mid-summer
the queues started getting very slow and batch jobs that used to go through
overnight or less started taking 2-3 days. For my typical job (100 - 200
processors, 3 - 4 hours) this is intolerably slow. I also have access to a
local IBM SP at my lab (ORNL) which has faster processors with 4 per node and
much fewer users. Jobs that are taking 2 - 4 days to get through NERSC's IBM
SP usually start immediately here and are done in a few hours. I'm hoping
NERSC's IBM SP Phase II will improve this problem. [...]
I find the IBM SP a pretty slow machine.
- Provide longer queues
-
I would like a longer queue.
A longer max wall clock time (>6 hrs) on gseaborg would be good, like
on the T3E.
maximum running time for batch jobs of 6 hours is much too short for our
compute intensive job
- Good machine
-
Great Machine. Keep it up. Needs more I/O nodes for GPFS and faster
processors ...
Interactive time is wonderful! Don't take machines down at 4 pm for
maintenance.
Very stable, easy to use, faster than what I expect.
very happy
Max. number of processors depends on the configuration of the code (size of
domain, spatial
resolution). This code show good performance enhancement upto 96 processors
(max. tested so far).
- Disk issues: more inodes, more local disk, more GPFS nodes
-
need for local filesystem to fully exploit NWCehm capabilities
- provide more interactive services
-
[...] Although there are
evidently typically interactive PE's available on the IBM, there aren't very
many overall. I'd prefer more for development, if the climate for fortran
development were friendlier.
Available PEs for interactive runs should be more than 16(at least for short
test runs!)
Wait time for Batch jobs-short runs (~10-20mins) should not not exceed more
than 5hrs.
interactive run is always at the very low priority.
maybe it could be the same as debug queue.
One unified file system would help particulalrly with the F90 .mod file
handling. The queues
have become too crowded. The 6 hour time limit up from 4 was a welcome
change.
The interactive limit on one processor is too small to even compile some
codes.
- Change batch scheduling priorities
-
[...] In the meantime I think
you need to consider rearranging the queues so that the longer jobs which
really do take multiple days to finish don't get in the way of intermediate
length jobs (100-200 processors, 2-4 hours) which should be put on a faster
track with the potential to finish in a 24 hour period.
There is no obvious method to which jobs get to run when. We are running a
100 year model that takes nearly one month wall clock time to execute. With
a 6 hour time limit, no q structure, and 3 day lag times from time of job
submission to time of job execution, we have had to invent several
strategies just to use the time we've been allotted. Further, nearly a third
of the jobs that we do submit have to commit suicide because LoadLeveler
tries to run them simultaneously, and they need to be run sequentially. We
are obviously not the only users in this predicament. 1) Please set up
some sort of q structure. Allow jobs that fill half the machine or more to
run only at night. 2) If you don't do that, please allow users to use
chron so that we don't have to occupy processors to submit jobs at regular
intervals.
I rely on a defense machine allocation for SP time to do my critical runs,
primarily because I have access to a queue system there that allows > 100 hr
runs. I'm not sure though that even if I had such access at NERSC however
that I'd use it. The i-node limits imposed are stifling, and require that I
monitor my jobs full-time on the NERSC machines so that I may tar up output
files/directories and remove them from the scratch space as they pop out of
the run. I need to sleep sometime, and when I do, my inode limit becomes
exceeded, and thejob crashes. At the DoD sites, this has never been a
problem. They seem more set up for large users. I think NERSC caters far too
much to the little users, and this is one instance of what makes me think so.
Until I can do large (~100 hr) runs at NERSC, with 128-256 processors, and
get into the queue system in less than a week, and be able to dump a
significant amount of data before running out of resources, my REAL work will
be done at the DoD sites. Also, the filesystem on the SP is hideous. For
deep filesystem deletes (say 3 or for levels), with a few hundred or so
files, it can take unbearably long times to copy or remove them. This
compounds the inode problem mentioned above because of the effort involved in
tarring up my stuff and putting it all on hpss. So...the queue system is too
full because there are too many small users on the machine. There aren't
enough inodes because there are too many users. And the filesystem is
horribly slow. Other than that....
- Inadequate documentation
-
[...]Documentation is generally hard to find and harder to understand
(mainly because of excessive cross-references to documents that are hard for
me to find). For example, ESSL or PESSL versions of FFT's require complicated
initializations that took me quite a while to figure out, even with help from
consultants.
[...] The documentation for the
different version of the xlf90 compilers -- mpxlf90, mpxlf95, mpxlf95_r7,
xlf90, etc. -- didn't make it easy for me to figure out how to get started
with a basic MPI-based parallel code. [...]
- Other
-
Accounting may be improved.
I am not using it
It would be a good machine if it had a much higher communication
bandwidth, lower latency AND if it were able to do asynchronous
communications without any overhead on the processors involved, i.e.
between setting up an asynchronous MPI send/receive and its completion
at an MPI wait the processor needs to be able to perform calculations
as efficiently as if there were no pending communications.
more memory
Individual Suggestions for NERSC's IBM SP Phase II System: 16 comments
- Batch configuration
-
[...] A batch structure that favors large jobs explicitly would be very useful.
There are plenty of
computers around for people that are doing 32-64 PE jobs. The big machines
ought to be
available first for the applications that can't run elsewhere. The batch
structure for mcurie is very good in this respect.
Queue time limit should be longer, even if that means wait time is longer.
maximum batch job running times should be 24 hours. it is 18 hours at san diego
supercomputer center
As mentioned above, give priority to intermediate length batch jobs. Don't
design everything around satistying the really big users.
- Switch/communications performance
-
Please insist on getting the highest performance communication backbone that is
available. I
rely upon high performance communication heavily, and fear that 16 CPU
nodes with existing
switch hardward would be a step backward for my applications. [...]
I would strongly suggest that the switch should be updated to its final
comfiguration BEFORE the nodes are upgraded.
More procs, faster I/O, faster communication. The usual requests.
Same as above. Concern about about full use of node cpus with mpi vs node
bandwidth. and iternode communication.
- Software concerns
-
I hope that we can rely on IBM's C++ rather than KAI's, but
I'm not sure this is realistic.
[...] If the Phase II system continues to fail to support fortran code
development (by failing to treat
memory-mapped files on the same footing as ordinary files while requiring
them for compilation) then the Phase II system will really drive me crazy.
[...]
NERSC response: In Phase 2 system the GPFS
memory-mapped file
problem will be solved. In particular, Fortran 90 modules will work with
GPFS.
Convince IBM to put some money into fixing that horrible filesystem.
As much as I like my DoD accounts, they too have the terrible gpfs
system that makes dealing with complex directory structures very painful.
- More processors
-
Need processors as more as possible.
I hope there will be a phase III with even more nodes 8-)
No comments for this system, as it is already pretty much set.
The next system after this must have many 1000's of processors
if it is to be useful as a national resource.
- Other
-
Same as for sp3.
looking forward to it!
Can't wait to get to pahse II.
get a good supply of Valium for your consultants...
Individual Comments on NERSC's Cray T3E: 19 responses
- Improve turnaround time
-
The more I use it the more I like it. Batch waits can be excessive though
I can not run any meaningful calculations with a 4 hour queue and the btach job
wait time on the 12 hour queues is very long.
The queues are too crowded and the turnaround is atrocious
Last time I checked, the queues here seemed even slower than the IBM SP. I only
ocassionally the T3E anymore. This has gotten to be one of those computers
where by the
time the job is finished, if you're not careful, you may have forgotten why
you started it. You need to do something to get better turnaround time.
Wait time in large job batch queues is too long.
queue length - need a bigger faster T3E?
T3E is really busy these days.
Interactive jobs time out after 30 minutes;
batch jobs can spend a long time in the queue.
But the worst thing is the inode quota of only 3500.
- Good machine
-
Hope you keep it as long as possible!
Stable, also easy to use. And it is configured very well.
I am also impressed by its checkpoint function. Hopefully, it
can also be moved to IBM-SP.
This machine has probably the best communication network of any MPP
machine I have used. Replacing the Alpha cache with streams was a bad
idea; a large cache would have greatly improved its performance. It is
a pity that an upgrade path to an Alpha 21264 based machine was not
available.
Generally -- excellent machine, excellent performance until recently.
Lately -- numerous crashes with no end in sight actually got so bad that
I tried to use the IBM again (see comments above).
File system I/O is a bit slower compared to SGI although the computing
power is a lot stronger than SGI origin series. Overall, it was mostly
statisfactory to us.
- Needs more memory
-
I don't use the T3E because there is not enough memory per node on the machine.
It otherwise seems to be a very nice system to work on. Unfortunately, my
smaller problems can
be done locally on our own workstations, and the large ones need the memory
of the SP systems.
more memory per processor!!!!
- Other
-
Why it has not 'tcsh' shell?
NERSC response: tcsh is available, but must be loaded
explicitly (since it does not come with the UNICOS operating system). See
tcsh and
bash.
The maximum time limit could be increased. A total of 12 hour is not
enough if you work with systems like proteins in a water box. Actually,
I guess that is one of the smallest number of hours in the supercomputer
centers I know of.
getting old. configuration is not very usable. I switched to the SP completely.
Interactive time is wonderful! Don't take machines down at 4 pm for maintenance.
Individual Comments on NERSC's Cray PVP Cluster: 14 responses
- Good machine / good interactive services
-
Good idea to make Seymour partially interactive.
Interactive time is wonderful! Don't take machines down at 4 pm for maintenance.
Many would like to see this facility upgraded
This is state-of the art Cray PVP Cluster! Unmatchable anywhere.
- C90 was better
-
The replacement of the C90 with the J90/SV1 cluster was a poor decision.
The cacheless true vector machine was a great architectural advance.
Moving to 'pseudo' vector machines with a cache and all the problems
that go with it was a retrograde step. [...]
Not as good as the C90 in terms of hardware and software.
No good compared to a machine (C90)
- File issues: more inodes, migration
-
Need more inode and permanent file space.
I'm only using this for codes which I haven't yet ported to one of the MPP
machines.
Interactivity seems to be okay. My main gripes are the nuisance of
automatic file migration and
the fact that sometimes the system seems to be unable to even get the
migrated files back.
Since these are usually executables I often resort to recompiling the code
since this is faster
than waiting for dmget to get the file back from migration.
[...]Disks are a comodity item. They are cheap, and formatting them with
adequate numbers of inodes is simple. Even if you feel it necessary to
limit our disk quotas, please remove inode quotas.
- Improve turnaround time
-
turn around can be somewhat long
The wait times to get jobs run seems to be increasing. This has resulted in
exhortations to not
use high priority queues but this doesn't fix the problem of multi-day
waits to get jobs started.
- Other
-
I would need a queue that allows to follow up a job with a successor
without waiting time. Instead of having 6 jobs running in parallel,
I would appreciate 6 continuous sequential jobs. The batch queue
on killeen provides this at the moment to my full satisfaction, but
only because currently nobody else extensively uses this machine.
I cannot get any useful throughput on bhaskara and franklin. On
seymour sometimes...
Never used.
seldom use during the last year.
Individual Comments on NERSC's HPSS Storage System: 16 responses
- Good system
-
much, much better than the old CFS system! Love the UNIX interface!
incredibly useful and fast. No complaints, this is one great setup.
archive and hpss are great as long as the machines to access them from are up
(mcurie is often down). A data processing machine that is stable would be great.
Dependability of the HPSS system increased significantly last year and I am
finally getting satisfied with the system.
PCMDI is a heavy user of hpss. We are very satisfied.
see
http://www-pcmdi.llnl.gov/modeldata/PCM_Data/pcgdahome.html
for details of the dataset
Fantastic system! Unmatchable!
Ahhhh, HPSS - best thing since sliced bread :)
I don't use them much. But it's a good place to store big files offline.
And I get to store some model outputs while running the code.
It's quite reliable.
- Availability/performance problems
-
Many times, the system is not able to retrieve my files from HPSS storage when
I need them most.
Obtaining directory listings of large directories is unreasonably slow.
hopelessly slow
Sometimes large files tranfer were interrupted because of time
limit. It should be increased so as to transfer large files.
- Interface improvements
-
We have had to create a script that checks to see if a file is accurately
transfered to HPSS.
This is something that should be done for users automaticly.
erosion of CFS features since move from LLNL
- Other
-
I wish that the hsi source code would be set up in a tar file so that I could
download it, compile
it and run it on any type of architecture. That would be very nice...
Use it infrequently, so pretty much always forget all but the most
basic commands.
Individual Comments about NERSC's auxiliary servers: 5 responses
-
A very reliable machine. A good use of expensive software licenses. [escher]
We have been receiving wonderful support from the Visualization group
(in particular, Nancy Johnston)
Matlab licenses on Matlab is for 4 persons to use simultaneously.
Sometimes this is a problem. Other times, we could just walk over
to see how long other users will be using. [newton]
Don't use them.
difficult to develop programs on escher due to lack of debuggers.
in this day of cheap CD writers, it would be nice to have really good
documentation on the
NERSC Web site on various ways to make movies. My impression from a
previous post-doc
who worked for me that things remain pretty painful in terms of multiple
stages of work if one is
trying to get QuickTime quality movies.
Also, NERSC should bring up OpenDX on its visualization server.
NERSC response: for documentation on how to make movies,
see:
Making MPEG Movies. We have made this document easier to find.
6. Software Resources
Legend
| Satisfaction | Average Score |
|
Significance of Change |
| Very Satisfied | 6.5 - 7 |
|
significant increase |
| Mostly Satisfied | 5.5 - 6.4 |
not significant |
| Somewhat Satisfied | 4.5 - 5.4 |
|
Software Satisfaction
| Topic |
|
PVP |
|
T3E |
|
SP |
| N1 | Avg. (1-7) |
Std. Dev. |
Change from '99 |
N1 | Avg. (1-7) |
Std. Dev. |
Change from '99 |
N1 | Avg. (1-7) |
Std. Dev. |
| User environment |
32 |
6.25 | 1.11 | 0.17 | 57 |
6.18 | 1.04 | 0.03 | 46 | 6.07 | 1.22 |
| Fortran compilers |
32 |
6.66 | 0.60 | 0.62 | 60 |
6.40 | 0.72 | 0.20 | 46 | 5.96 | 1.43 |
| C/C++ compilers |
13 |
6.00 | 1.08 | 0.55 | 30 |
5.93 | 0.98 | -0.04 | 25 | 5.72 | 1.24 |
| Application software |
18 |
5.83 | 1.29 | 0.29 | 23 |
5.78 | 1.09 | -0.07 | 18 | 5.67 | 1.03 |
| Programming libraries |
16 |
5.81 | 1.33 | -0.13 | 39 |
6.18 | 0.82 | -0.24 | 30 | 6.00 | 0.87 |
| Vendor documentation |
16 |
5.69 | 0.95 | 0.66 | 29 |
5.59 | 1.24 | 0.10 | 26 | 5.50 | 1.30 |
| Local (NERSC) web documentation |
21 |
6.38 | 0.67 | 0.70 | 42 |
6.00 | 1.01 | 0.17 | 39 | 6.05 | 1.12 |
| Performance and debugging tools |
16 |
6.06 | 0.68 | 0.60 | 39 |
5.56 | 1.45 | 0.11 |
29 | 4.69 | 1.61 |
| General tools and utilities |
14 |
5.93 | 1.21 | 0.04 | 37 |
5.65 | 1.21 | -0.26 | 25 | 5.72 | 0.94 |
| Accounting tools |
20 |
5.90 | 0.79 | 0.16 | 36 |
5.75 | 1.25 | 0.03 |
26 | 5.31 | 1.54 |
| Software bug resolution |
10 |
5.10 | 1.10 | -0.52 | 30 |
5.70 | 1.15 | -0.21 | 22 | 5.45 | 1.34 |
1 - Number of responses.
Summary of Comments
| 3 |
tools |
| 3 |
libraries |
| 2 |
compilers |
| 2 |
AFS |
| 2 |
please enhance |
| 2 |
mentioned software not in ACTS |
The following ACTS tools are currently installed
at NERSC. Select all the ones that you currently use here.
| 34 |
none |
| 13 |
Scalapack |
| 6 |
Petsc |
| 5 |
Superlu |
| 3 |
Tau |
| 1 |
Aztec |
The following ACTS tools are not currently installed
at NERSC. Select all that you would like to use at NERSC.
| 23 |
none |
| 9 |
Global Arrays |
| 7 |
Paws, Pvode |
| 6 |
Pooma |
| 5 |
Atlas |
| 3 |
Globus, Pete, Siloon |
| 2 |
Overture, Tulip |
| 1 |
Hypre, Nexus, Opt |
Individual Comments about NERSC's software resources, suggested improvements,
future needs: 13 responses
- Tools
-
Try to employ some leverage on IBM to get better debuggers for the IBM-SP.
It would be
nice to have a GUI for scp --- i am a novice user with it but it seems to
involve an ungodly
amount of typing! actually, it would be nice to have a GUI scp that could
transfer multiple files, accept wildcards, etc.
We have not yet found any useful performance analysis tools for our complex
C++/Fortran90 code.
Suggestion: add GNU tools such as gdb and ghostview.
- Compilers
-
The latest version fo the FORTRAN compiler is causing problems. Maintaining
access to
earlier versions would be very helpful if possible. This would guarantee
that a user could still compile their code. [SP user]
NERSC response: We maintain versions of previous
compilers on the Crays. Unfortunately, the IBM SP does not
support multiple versions of the compilers so we are not able to provide this
service on the SP. We have informed IBM of the need to provide this
functionality.
I would like to be able to use HPF on gseaborg. Though I can achieve
higher performance using F90 with MPI, HPF is extremely useful in terms
of my overall productivity in code development. I can quickly write
an HPF code to answer "what if" questions, then decide if I want to
develop an optimized F90/MPI version.
- Libraries
-
I'm not an expert, but I hear that the latest version of HDF has advantages
over NetCDF. I'd
like to try it out, but only if there is a strong commitment to keep it
current on NERSC machines. [...]
NCAR/NCL has been a poor substitute for DISSPLA/MAPPER over the last ~2 years
I am not sure whether NERSC could ask vendors to improve specific software
if we encounter the needs, such as more flexible (to assign nodes and
processors) IBM loadleveler, and parallel io with netcdf on IBM SP.
- AFS
-
[...] It would be nice if a more standard interface to AFS were available on
the supercomputers.
Please install AFS on the IBM SP!!!!
- Other
-
Need improved documentation
This section is too convoluted.
We mainly use and develop our own software.
Very adequate!
Individual Comments about ACTS Toolkit: 8 responses
- Please enhance
-
Please add Globus to the toolkit - it is very helpful in managing job
submission and data
handling on remote computing resources, such as NERSC.
The version of PETSc on the SP and (I think) T3E are at 2.0.24. There are
several later
versions -- 2.0.28 which I use in production on the SP and 2.0.29 which my
colleagues in
petsc have recently installed. Can NERSC provide support for PETSc on these
later versions?
- Mentioned software not in ACTS
-
By PAWS I assume you mean PAW (Physics Analysis Workstation) from CERN.
NERSC response: No, PAWS stands for
Parallel Application
WorkSpace.
I would like to use TAO at NERSC. (I believe this is also part of the ACTS
toolkit although it
isn't listed above. It has a confusingly similar name to TAU.)
NERSC response: TAO is not part of the ACTS toolkit.
TAU stands for Tuning and Analysis
Utilities.
- Other
-
I'll wait till the blood dries on the cutting edge users...
Useless for us.
Don't use it.
The passed workshop is a good start for more people to use the toolkit.
At least, I am starting.
7. Training
Legend
| Satisfaction | Average Score |
|
Significance of Change |
| Very Satisfied | 6.5 - 7 |
|
not significant |
| Mostly Satisfied | 5.5 - 6.4 |
|
| Usefulness | Average Score |
| Very Useful | 2.5 - 3 |
| Somewhat Useful | 1.5 - 2.4 |
|
Training Satisfaction
| Topic |
|
No. who have used |
|
Satisfaction with NERSC's |
|
Useful in HPC training? |
| Responses |
Avg. (1-7) |
Std. Dev. |
Change from '99 |
Responses | Avg. (1-3) |
| Classes |
13 | 14 |
6.71 |
1.53 | 0.52 | 15 |
2.67 |
| Online web tutorials |
32 | 32 |
6.22 |
0.94 | 0.05 |
34 | 2.62 |
| Slides from classes on web |
22 | 23 |
6.13 |
1.52 | 0.18 | 21 |
2.43 |
| Teleconference lectures |
12 | 13 |
6.00 |
2.25 | 0.22 | 15 |
2.13 |
Comments about training. In what area should we focus our attention?
5 responses
Short, concise, easy to find, relevant info on the Web. Something I
can print out and use as a handy reference.
I learned a lot from various training classes, especially when I was new to
NERSC.
A short 3-4day workshop training would be very helpfull.
I have so little spare time it is hard to say what I would attend. What Phase
II of the IBM-SP
arrives, you may need some programming classes on effective use of the
hybrid architecture.
I have not seen any classes advertised that looked interesting enough to take
me away from
my immediate assignments. Maybe I am not a good target for them however.
8. User Comments
| 34 |
user support |
| 29 |
stable, well managed production environment; good hardware |
| 9 |
everything / nothing singled out |
| 7 |
documentation |
| 6 |
software, tools |
| 6 |
storage environment |
| 5 |
well managed migrations and upgrades |
| 3 |
allocations process |
| 3 |
announcements to users |
| 18 |
provide more cycles, improve turnaround time |
| 7 |
inodes/storage improvements |
| 6 |
software enhancements |
| 4 |
manage systems differently |
| 4 |
provide different hardware |
| 3 |
accounting/allocations improvements |
| 3 |
batch improvements |
| 3 |
better documentation |
| 2 |
networking/bandwidth improvements |
| 25 |
NERSC is the best / better than
|
| 9 |
NERSC is good / only use NERSC
|
| 7 |
NERSC is the same as / mixed response |
| 6 |
NERSC is less good |
What does NERSC do well? 58 responses
- User support
-
I have been very satisfied with most NERSC services and competencies. Great
response time and quality answers to my questions/requests.
Also, I find the web page well done.
[...]Very responsive
consulting staff that makes the user feel that his problem, and its
solution, is important to NERSC. [...]
consulting is awesome
People to people contact is excellent.
General attitude from Horst, to Kramer, to Verdier, to account
support and consulting is outstanding with respect to
dealing with the users and their issues.
listen to users and effect changes in service
[...] Gives users good access to consultants.
Responds to users needs promptly and effectively.
The consultants are especially helpful.
Consulting, web, availability of machines.
Once I established a good rapport with the consultants, they were helpful.
At first it was difficult to get straight answers.
Customer support is always timely and accurate.
[...] 2. User services (i.e. consulting and account support) are excellent.
Consulting service is excellent!
Good response from the consultants and sysadmins.
The consultant and account services are superb.
Consulting is good but very little else.
Information to users, maintainnance.
Consulting team is very excellent.
- Stable, well managed production environment; good hardware
-
Provides stable production environment with excellent support services and
first rate hardware for scientific computation.
Provide state-of-the-art computation, maximum speed, processors, capacity
Keep everything working smoothly. Excellent computer personnel.
Good management and accounting of a few big machines;
good effort at maintaing WWW pages, help with standard questions, etc.
Keep allowing interactive time. Consultants helpful at times. Pretty
good access to hardware. Pretty good tools.
SP. Batch turnaround time. I/O space. Mass storage
Provide good hardware, respond well to users.
Overall availability of resources and waiting times are quite predictable and
constant through the year.
Typically tries to provide an adequate production environment. [...]
1. Provide world-class supercomputing resources. [...]
Provide access to high-performance computers with a variety of different
architectures.
Provide excellent computing resources with high reliability and ease of use.
[...] My work requires interactive use and the conversion of SEYMOUR was
extremely helpful and welcomed. However ... see next box...
Good provision of flops and support.
NERSC is doing very good job to give us a very good environment of computing.
I am very satisfied overall.
- Documentation; announcements
-
[...] The announcement managing and web-support is very professional.
Warn us of scheduled downtime.
I'm very impressed with the friendliness and helpfulness of the consulting
staff. I also find the e-mails about down-times helpful.
Nersc provides good support services, documentation, etc.
High availabilty of machines. Good online documentation. Responsive support
team.
- Software, tools
-
NERSC is a very well-managed Supercomputer Center. It provides excellent
Fortran compilers
and run-time environment on the Crays. NERSC is a most valuable resource
for my research in nuclear structure theory.
Support of software.
Have knowledgable staff to assist researchers with computer difficulties - both
hardware and software aspects.
Maintenance of hardware and software is excellent. [...]
NERSC maintains the most updated hardwares and softwares which are
very user-friendly.
- Storage environment
-
Manages large simulations and data. The oodles of scratch space on mcurie and
gseaborg help me process large amounts of data in one go.
Storage, connectivity.
ease of use of mass storage, access time to stored data
Executes the jobs, stores and transfers the data
- Well managed migrations and upgrades
-
In general you are to be congratulated on the transition from 1980's
supercomputing to Y2K multiprocessing. Machines are generally up and the
storage facilities seem good (from my perspective as a fairly light user).
NERSC has been the most stable supercomputer center in the country
particularly with the migration from the T3E to the IBM SP
keeps machines up. Upgrades facility in a timely fashion.
- everything
-
yeah, NERSC does well
Most everything. A first-class operation.
Almost every aspect. Hardware, software, and consulting. I really happy to see
efforts going on to keep on improving the current system.
It is the best among all I have used. I gives it five stars.
Makes supercomputing easy.
Provide timely computational service upto expectations.
NERSC undoubtedly is the best supercomputing facility that I have used
over the years. NERSC has become available to academics all over the world
with resources which are unthinkable in any academic environment.
Credit must go to a major extent to Dr. Horst Simon and his associate
Directors for this achievement and success! Ms. Francesca Verdier and
her staff , especially those mentioned above in the Consulting Services
have done an excellent job of helping users how to utilize the unmatchable
resources at
NERSC for solving major scientific and Engineering problems. I sincerely
express my thanks to
all at NERSC for making it a great pleasure for me to use the facilities at
NERSC from a remote
site [name omitted]. I look forward to use the NERSC facilities in
the FY2001.
NERSC does very good job.
yes
- Allocations process
-
Consulting was very good. Allocation service was very helpful.
User support and reponse. System allocation of resources.
The web-based allocation procedure is very convenient.
- Other
-
I hope to solve my problems with MPI so my code can compile and run well on
both SP and T3E.
access, consultants, visualization help
Training, consulting, web pages, making bleeding edge hardware available.
I think that the support is very good.
The new IBM SP was a very welcome addition.
What should NERSC do differently? 49 responses
- Provide more cycles, improve turnaround time
-
NERSC is doing a wonderful job. My great need is just for more resources (more
time on the machines and more nusers/resources/greater storage speed.
Wait time in large job batch queues is long, which costs
DOE programs a lot of money. Need to increase throughput.
Much more work needs to be done on providing greater resouces to the community.
[...] Also NERSC really needs many more processors given the demand.
Give sole access of all machines to me.
Find a way to shorten batch queues (!) [...]
DOE should put new machines in NERSC other than other places if DOE wants new
machines.
The batch queue system on the PVP cluster does not fit my needs. I get most
throughput on the slowest machine.
more PVP machines
Shorten the time it takes a job to run on the PVP.
provide more pvp cycles, particularly this year
Add more capacity for the heavy work loads.
It takes too long time to run a big memory job on PVP.
Improve on its vector computing. [...]
- Inodes/storage improvements
-
The user file limits are unrealistically low on the IBM and Cray systems. NERSC
seems unfriendly to users with large data/memory requirements.
[...] Improve on its disk resources, especially its inode resources.
I do not like the "inode" business in user file quota. I think it is outdated
now and should be removed.
My only complaint (this is the same from year to year) is the I-node limit.
Taylor to individual request. [What I meant was something like the allocation
of file space (and other restrictions) should consider individual needs.
Please do not take this as my criticism. I am doing well within the
allocated space.]
- Software enhancements
-
Improve the global working environment for remote sites by installing
AFS on the IBM SP. This way, for example, the same CVS repository can be
used by several users at different sites.
[...] Some support for heterogeneous computing and more support for code
steering on the T3E and SP.
Build computer systems comparble to those available at Los Almos and LLNL. Put
more effort into using software tools such as GLOBUS as a model for remote
computing using NERSC resources.
I am very satisfied with NERRSC. If I could ask you for one favor it would be
to make the Nedit text editor available on the Crays (open source software).
More support for Mathematica is appreciated.
Keep investing in adding quality software in chemistry (an others) aplications.
For example Jaguar...
- Manage systems differently
-
NERSC should reduce the number of interactive machines. It should encourage
batch submission and give more "credit" for use of more processors.
Interactivity and wait times for batch jobs at times can get very poor on
your systems. Instead on aiming for maximum utilization of CPU cycles, you
ought to find ways to maintain better "headroom" between the resources you
have available and the user demand.
You need to make new systems available on a more rapid time scale once they
are installed. NERSC seems to take a much longer time to make new systems
available to users than other computer centers I've used (with no apparent
improvement in functionality resulting from this slow acceptance process).
NERSC sometimes makes bad choices in how they set up their systems. For
example, the way Fortran-90 modules have to be handled on the IBM SP is very
time inefficient for users who are developing codes. Apparently, from my
experience with other IBM SP's, the awkward way NERSC chose to do this is
completely unecessary since others have not chosen to use this
configuration.
Because NERSC makes supercomputing easy, it is somewhat a victim of its own
success. By this I mean that truly large computational tasks suffer because
resources are used by smaller tasks. Climate runs often take many hundreds
of hours to execute, even in highly parallel configurations. The successful
climate modeling centers (none are domestic...) all are able to
access dedicated resources. It is difficult for the US climate modeling
community to compete with European and Japanese groups if it must further
compete with other fields for needed
computational resources. As this situation is controlled by forces external
to NERSC, I don't see much relief soon.
Managerial types claim than NERSC is a "capability" center. From my limited
experience this is
not really so however. Looking at gseaborg, e.g., there's a 200 node job
that's been in the
regular queue waiting for a 6 hr slot for a week and a half, but the
machine's full of 1,3,4,8,16
node jobs. None of the jobs can run longer than 6 hrs, and they all
presumably have a tight
limit of the number of files they can generate as output.
- Provide different hardware
-
save money: dump the Cray's (or out them into a museum), get
a O2000 class box as an alternative.
Provide more middle of the road computing resources
The PVP machines are at the end of their line, it seems. NERSC should help
users learn how to migrate away from these machines in the coming year.
It would be great to have alternative platforms, such as a large scale linux
cluster.
- Accounting/allocations improvements
-
Information about remaining budget should be attached to each output file.
I'm not very happy with the new GETNIM versus SETCUB command. Also, having to
go to the NERSC web page to consult the remaining detailed allocation is clearly
*not* a progress. I do not understand why this change happened.
PS- Sorry not to have more time to fill in detail the rest of the survey.
Improve the allocation process to reflect likely results of hardware changes,
such as the conversion of SEYMOUR to interactive. It costs six (6) times as
much to compute interactively on SEYMOUR with only a factor of 2 or so
improvement in execution time. My 2001 allocation was based on KILLEEN usage
for most of FY2000 ... hence I did not use but 30-40% of my 2000 allocation.
As a result my 2001 allocation was reduced to 1/3 of the 2000 allocation. Now
in FY2001 I cannot use SEYMOUR at all , as it will deplete my allocation in a
few months. I could make very good use of SEYMOUR to expedite my work, but
that is now not an option. Hence, the availability of SEYMOUR will not help
me AT ALL in FY2001 ... just because of the shortcomings of the allocation
process.
- Batch improvements
-
I used NERSC only for computation and for me the time
available and the time for the job to stay in a queue are
the most important. And it was OK. The way to monitor a job can be improved.
Should consider increasing the debug queue time limit on IBM and T3E.
I would like to run still longer jobs, but this is in conflict with the point
above, I suppose ... [Overall availability of resources and waiting times are
quite predictable and constant through the year.]
- Better documentation
-
I would like a more friendly interface. [What i mean is that when i encounter
some problem in the programming in FORTAN
or shellscript, I can not find some help quickly on line. For example, on line
help for "nertug", "ja", "$If DEF, BACK", "#QSUB", some FORTAN function such
as "SSUM(...)" etc can not be found.]
describe access and usefulness to HPSS a little better
maybe you should consider an FAQ on questions to consultants in areas such as
programming, UNIX utilities, etc which come up repeatedly or would be
useful for active users to be aware of.
- Networking/bandwidth improvements
-
Improve connectivity from outside labs (eg. Los Alamos Natl Lab) that also have
firewalls.
Greatly improve the ease and speed of very large dataset transfers between
NERSC and other
labs. Security, finger pointing, and multi point of contact are impeding
research.
- Other
-
Don't bring the machines down for maintanence at 4 pm. Re-do section
6 on this Web form.
Use a survey with many fewer and less vague and overlapping questions.
Nothing comes to mind.
Keep up the excellent job you all are doing at NERSC even after some
machines get transferred to a building in Oakland.
The consultant help with specific machines is sometimes weak.
We had lots of problems porting a well tested code that ran
on another IBM SP3 with a somewhat different architecture.
The recent stability problems with mcurie have gone on long enough to make me
wonder that something is wrong somewhere. I have no idea if the problem is
mostly one with NERSC or elsewhere, but I am unpleasantly surprised every
day or two by another crash. Yuck.
Return to the way things were at Livermore.
Time difference is somewhat of an issue - relocate to the east coast :-)
How does NERSC compare to other centers you have used? 49 responses
- NERSC is the best / better than
-
NERSC is generally superior to all others I have used. Hence, I don't care to
use others much anymore.
NERSC is proably the best center I have been using. It has a very good
assistance service and resources.
It is superior in its consulting, account support and training to [site anme
omitted].
Much better support than provided at UCSB, where an Origin 2000 is available,
but there is
basically no support for using it. Machines are changing so rapidly that it
is impossible for the
researcher to keep up with the changes without the sort of help NERSC can
provide. You are
performing a vital service to the research community.
Much better than any other centers. [4 site names omitted]
As I said, it is the best, it is better than others I have ever used,
such as computer centers in [3 site names omitted]
Much better!
Unmatchable!
I would say NERSC's IBM SP runs better than SDSC's BH SP.
Much better [site name omitted]
NERSC is the best of the centres I have used.
NIC in Juelich, Germany.
NERSC allows for more flexible dealing with budget and generally budget
enables more calculations.
Better than SDSC/NPACI in terms of system (IBM) reliability and
throughput. Most of our effective computing is done at NERSC.
My NAS account is too recent to compare NAS to NERSC in a fair manner.
Better than BNL, CERN (Switzerland), JINR (Russia), IHEP (Russia)
The hardware (file systems especially) on gseaborg seems to be much more
reliable than that
on the IBM SP bluehorizon at NPACI/SDSC.
The gpfs nodes at NPACI are suddenly unavailable on occasion.
Much better than [site name omitted]
I have used [site name omitted] in the past (about 6 years ago). You are doing
much better. Keep up the good work.
In my opinion, NERSC does better than most other centers that I
used, such as [2 site names omitted].
Although I haven't really use some other centers, except I had an account in
NCAR 5 years ago, I should say NERSC is doing the best.
Compared to [site name omitted], how could you not be superb in comparison.
Relative to
the LLNL NERSC of the early 90's, things are far better overall.
NERSC is better. [2 site names omitted]
Comparing to: [2 site names omitted] NERSC has the BEST consultants. Their web
pages are easily superior.
Top of the list. [site name omitted]
The allocation procedure in NERSC is more convenient than the one in NCSC.
Best center. Easier to access than LANL or LLNL. More responsive than NCAR.
Keep up the good work.
- NERSC is good / only use NERSC
-
I only use NERSC, so i can not make a comparation.
NERSC is pretty compared to other centers.
Great. (SDSC, German Max Planck Center)
Very well. CCS at ORNL, Maui.
Very very good!
The other center I have used: Livermore computing
It is very good. I use LLNL and NAS also, but spend a good deal of my
time on NERSC machines. Keep up the good work!
Very well. Maui, SDSC.
Very well.
Hi
- NERSC is the same as / mixed response
-
Principal other experience is with the LLNL center, which is also excellent. In
distant past,
used several others which offered mostly cycles but little infrastructure.
Compared to NCAR, machines at NERSC go down more regularly and jobs are killed
more often. Compared to LANL, NERSC is more stable.
Apart from NERSC, I have used NCSA and Argonne National Lab machines. NERSC is
comparable in service to these centers.
san diego supercomputer center.
nersc is better except as indicated in one instance above
[maximum batch job running times should be 24 hours.
it is 18 hours at san diego supercomputer center]
Nersc is competitive with other major facilities, such as ERDC (DoD)
I only have LLNL LCC to compare to (and lots of the LLNL NERSC
staff who stayed at LLNL). Both are outstanding.
Roughly the same as San Diego, NASA Ames and Goddard.
- NERSC is less good
-
DoD systems seem more oriented to the large user. I have used systems at NAVO
and ERDC (Cray and IBM).
I have never developed a code on a NERSC machine, since this
is quite inconvenient due to long wait times. in that respect
the experience I made last year at a different large center
(Forschungszentrum Juelich, Germany) were quite different and more pleasant.
I prefer modi4 (at NCSA) as it has a longer wallclock limit, and still has
quite reasonable queue waits.
I used also SDSC SP2 and Blue Horizon and
University of Texas Cray SV1 and SP2.
Blue Horizon was the best (just the best hardware).
Compared to DoD's CWES site, the limits on outfiles, and the queue systems are
just too
much geared at the little guy. Running on NERSC requires far too much
babysitting of my runs:
resubmitting, running high priority, tarring up output, etc.
I also use the eagle machine at the DOE High Performance Computing Research
Center at Oak Ridge National Laboratory. The interactivity and turnaround
time for batch jobs is much better here than on GSeaborg. Also, I like the
fact that they have configured their system so that one doesn't have to go
through the unusual contortions with Fortran-90 modules (i.e., putting them
off in special disk areas which are not permanent) that NERSC requires of us.
NERSC should learn how they have set up their IBM-SP in this respect and do
similar things.
- Other
-
LANL open supercomputing, Argonne, local clusters
ACL at Los Alamos National Lab and QCDSP at Columbia University.
|