NERSCPowering Scientific Discovery for 50 Years

2000 User Survey Results

Hardware Resources

Legend

SatisfactionAverage ScoreSignificance of Change
Very Satisfied 6.5 - 7   significant increase
Mostly Satisfied 5.5 - 6.4 significant decrease
Somewhat Satisfied 4.5 - 5.4   not significant
Neutral 3.5 - 4.4  

Frequency Histogram Plots

IBM SP - gseaborg

How satisfied are you?ResponsesAvg. (1-7)Std. Dev.
Uptime 50 6.52 0.86
Overall 56 5.88 1.31
Ability to run interactively 41 5.51 1.49
Batch queue structure 41 5.22 1.41
Disk configuration and I/O performance 40 5.20 1.62
Batch job wait time 46 4.54 1.88

Max. Number of Processors Used: 141 ( 48 responses) Max. Number of Processors Code Can Effectively Use: 591 (40 responses)

Cray T3E - MCurie

How satisfied are you?ResponsesAvg. (1-7)Std. Dev.Change from '99
Uptime 65 6.09 1.01 -0.17
Overall 70 6.01 1.07 -0.16
Ability to run interactively 58 5.71 1.35 0.11
Disk configuration and I/O performance 51 5.35 1.32 0.12
Batch queue structure 56 5.27 1.53 -0.20
Batch job wait time 63 4.33 1.58 -0.71

Max. Number of Processors Used: 146 ( 61 responses) Max. Number of Processors Code Can Effectively Use: 300 (46 responses)

Cray PVP Cluster

How satisfied are you?ResponsesAvg. (1-7)Std. Dev.Change from '99
Uptime 39 6.41 1.12 0.12
Ability to run interactively 35 6.11 1.35 0.93
Overall 44 5.86 1.41 0.81
Disk configuration and I/O performance 31 5.77 1.20 0.21
Batch queue structure 34 5.03 1.66 0.00
Batch job wait time 38 4.26 1.83 0.31

Max. Number of Processors Used: 9 ( 29 responses) Max. Number of Processors Code Can Effectively Use: 10 (24 responses)

HPSS

How satisfied are you?ResponsesAvg. (1-7)Std. Dev.Change from '99
Reliability 62 6.39 1.18 -0.07
Uptime 62 6.31 0.98 -0.02
Overall 70 6.26 0.99 0.14
Performance 64 6.20 0.98 0.30
User interface 63 6.14 1.06 0.08
Response time 62 6.03 1.10 0.35

Server Satisfaction

Satisfaction withResponsesAvg. (1-7)Std. Dev.Change from '99
Newton 11 5.55 1.37 0.30
Escher 8 5.25 1.28 -0.20

Summary of Comments

Comments on NERSC's IBM SP: 28 responses

7 hard to use/software problems
7 improve turnaround time
6 provide longer queues
5 good machine
4 disk issues: more inodes, more local disk, more GPFS nodes
4 provide more interactive services
3 change batch scheduling priorities
2 inadequate documentation

Suggestions for NERSC's IBM SP Phase II System: 16 responses

4 batch configuration
4 switch/communications performance
3 software concerns
3 more processors

Comments on NERSC's Cray T3E: 19 responses

8 improve turnaround time
7 good machine
2 needs more memory

Comments on NERSC's Cray PVP Cluster: 14 responses

5 good machine / good interactive services
3 C90 was better
3 file issues: more inodes, migration
2 improve turnaround time

Comments on NERSC's HPSS Storage System: 16 responses

8 good system
4 availability/performance problems
2 interface improvements

Comments about NERSC's auxiliary servers: 5 responses

 


Individual Comments on NERSC's IBM SP: 28 responses

Hard to use/software problems

I'm making only light use of the SP for development. NERSC staff have been very helpful and responsive. The SP is not the easiest system to use (C++ compiler problems), but these are not the fault of NERSC.

Don't like the requirement to use $TMPDIR for module compiling. Do like the presence of NCAR graphics. Not sure how I will use mixed SMP/MPP capability when 8-way processors arrive in Phase II. debuggers on Seaborg are pretty poor compared to PVP or T3E.

Fortran compiler seems buggy, file storage per node very limited, limited documentation, problems with batch submission, rather slow processors compared to say DEC alpha, etc.

There is something wrong that I can not compile my code quite well. It is related to MPI settings.

Home directories should not be GPFS becouase of F() module compiling problem.

The lack of support for Fortran 90 modules is something that frustrates me a lot. [...] Compared to February, the new compiler is slow. Recompiling from scratch -- which is frequently necessary because my memory-mapped module files are obliterated every time I am logged out, so that any change forces me to start from the beginning -- takes hours. It would be nice if the old compiler were available for those that wish to use it. The new compiler fails to compile my codes without the '-qhot' option because of "lack of resources". This error message is not helpful. The "llqs" routine is not as useful for figuring out when a job will likely run compared to similar routines on the T3E. I prefer the version of totalview on the T3E, but this may be a function of my overall frustration with the IBM. [...] Gnuplot doesn't seem to pick up the tcsh line editing commands when running under tcsh. [...] The inability to ftp into gseaborg makes editing files a chore for me, since I am accustomed to editing from an emacs window running on my desktop. There is probably a way around this, but I don't know what it is.

Improve turnaround time

Job waits of 3-5 days for 6 hours of 64 nodes are common. This is completely unacceptable, it is not possible to get useful work done in this way. Available resources should either be drastically increased or else NERSC closed down.

Batch queue's seem to be rather long in regular class implying the need for a larger computer. Could you prepare an up-to-date plot of the average wait-to-run for the various queue's, as a function of time, that could be viewed on a web site, for example.

Initially, I was very satisfied with the IBM SP. However, around mid-summer the queues started getting very slow and batch jobs that used to go through overnight or less started taking 2-3 days. For my typical job (100 - 200 processors, 3 - 4 hours) this is intolerably slow. I also have access to a local IBM SP at my lab (ORNL) which has faster processors with 4 per node and much fewer users. Jobs that are taking 2 - 4 days to get through NERSC's IBM SP usually start immediately here and are done in a few hours. I'm hoping NERSC's IBM SP Phase II will improve this problem. [...]

I find the IBM SP a pretty slow machine.

Provide longer queues

I would like a longer queue.

A longer max wall clock time (>6 hrs) on gseaborg would be good, like on the T3E.

maximum running time for batch jobs of 6 hours is much too short for our compute intensive job

Good machine

Great Machine. Keep it up. Needs more I/O nodes for GPFS and faster processors ...

Interactive time is wonderful! Don't take machines down at 4 pm for maintenance.

Very stable, easy to use, faster than what I expect.

very happy

Max. number of processors depends on the configuration of the code (size of domain, spatial resolution). This code show good performance enhancement upto 96 processors (max. tested so far).

Disk issues: more inodes, more local disk, more GPFS nodes

need for local filesystem to fully exploit NWCehm capabilities

provide more interactive services

[...] Although there are evidently typically interactive PE's available on the IBM, there aren't very many overall. I'd prefer more for development, if the climate for fortran development were friendlier.

Available PEs for interactive runs should be more than 16(at least for short test runs!) Wait time for Batch jobs-short runs (~10-20mins) should not not exceed more than 5hrs.

interactive run is always at the very low priority. maybe it could be the same as debug queue.

One unified file system would help particulalrly with the F90 .mod file handling. The queues have become too crowded. The 6 hour time limit up from 4 was a welcome change. The interactive limit on one processor is too small to even compile some codes.

Change batch scheduling priorities

[...] In the meantime I think you need to consider rearranging the queues so that the longer jobs which really do take multiple days to finish don't get in the way of intermediate length jobs (100-200 processors, 2-4 hours) which should be put on a faster track with the potential to finish in a 24 hour period.

There is no obvious method to which jobs get to run when. We are running a 100 year model that takes nearly one month wall clock time to execute. With a 6 hour time limit, no q structure, and 3 day lag times from time of job submission to time of job execution, we have had to invent several strategies just to use the time we've been allotted. Further, nearly a third of the jobs that we do submit have to commit suicide because LoadLeveler tries to run them simultaneously, and they need to be run sequentially. We are obviously not the only users in this predicament.
1) Please set up some sort of q structure. Allow jobs that fill half the machine or more to run only at night.
2) If you don't do that, please allow users to use chron so that we don't have to occupy processors to submit jobs at regular intervals.

I rely on a defense machine allocation for SP time to do my critical runs, primarily because I have access to a queue system there that allows > 100 hr runs. I'm not sure though that even if I had such access at NERSC however that I'd use it. The i-node limits imposed are stifling, and require that I monitor my jobs full-time on the NERSC machines so that I may tar up output files/directories and remove them from the scratch space as they pop out of the run. I need to sleep sometime, and when I do, my inode limit becomes exceeded, and thejob crashes. At the DoD sites, this has never been a problem. They seem more set up for large users. I think NERSC caters far too much to the little users, and this is one instance of what makes me think so. Until I can do large (~100 hr) runs at NERSC, with 128-256 processors, and get into the queue system in less than a week, and be able to dump a significant amount of data before running out of resources, my REAL work will be done at the DoD sites. Also, the filesystem on the SP is hideous. For deep filesystem deletes (say 3 or for levels), with a few hundred or so files, it can take unbearably long times to copy or remove them. This compounds the inode problem mentioned above because of the effort involved in tarring up my stuff and putting it all on hpss. So...the queue system is too full because there are too many small users on the machine. There aren't enough inodes because there are too many users. And the filesystem is horribly slow. Other than that....

Inadequate documentation

[...]Documentation is generally hard to find and harder to understand (mainly because of excessive cross-references to documents that are hard for me to find). For example, ESSL or PESSL versions of FFT's require complicated initializations that took me quite a while to figure out, even with help from consultants. [...] The documentation for the different version of the xlf90 compilers -- mpxlf90, mpxlf95, mpxlf95_r7, xlf90, etc. -- didn't make it easy for me to figure out how to get started with a basic MPI-based parallel code. [...]

Other

Accounting may be improved.

I am not using it

It would be a good machine if it had a much higher communication bandwidth, lower latency AND if it were able to do asynchronous communications without any overhead on the processors involved, i.e. between setting up an asynchronous MPI send/receive and its completion at an MPI wait the processor needs to be able to perform calculations as efficiently as if there were no pending communications.

more memory

 


Individual Suggestions for NERSC's IBM SP Phase II System: 16 comments

Batch configuration

[...] A batch structure that favors large jobs explicitly would be very useful. There are plenty of computers around for people that are doing 32-64 PE jobs. The big machines ought to be available first for the applications that can't run elsewhere. The batch structure for mcurie is very good in this respect.

Queue time limit should be longer, even if that means wait time is longer.

maximum batch job running times should be 24 hours. it is 18 hours at san diego supercomputer center

As mentioned above, give priority to intermediate length batch jobs. Don't design everything around satistying the really big users.

Switch/communications performance

Please insist on getting the highest performance communication backbone that is available. I rely upon high performance communication heavily, and fear that 16 CPU nodes with existing switch hardward would be a step backward for my applications. [...]

I would strongly suggest that the switch should be updated to its final comfiguration BEFORE the nodes are upgraded.

More procs, faster I/O, faster communication. The usual requests.

Same as above. Concern about about full use of node cpus with mpi vs node bandwidth. and iternode communication.

Software concerns

I hope that we can rely on IBM's C++ rather than KAI's, but I'm not sure this is realistic.

[...] If the Phase II system continues to fail to support fortran code development (by failing to treat memory-mapped files on the same footing as ordinary files while requiring them for compilation) then the Phase II system will really drive me crazy. [...]

NERSC response: In Phase 2 system the GPFS memory-mapped file problem will be solved. In particular, Fortran 90 modules will work with GPFS.

Convince IBM to put some money into fixing that horrible filesystem. As much as I like my DoD accounts, they too have the terrible gpfs system that makes dealing with complex directory structures very painful.

More processors

Need processors as more as possible.

I hope there will be a phase III with even more nodes 8-)

No comments for this system, as it is already pretty much set. The next system after this must have many 1000's of processors if it is to be useful as a national resource.

Other

Same as for sp3.

looking forward to it!

Can't wait to get to pahse II.

get a good supply of Valium for your consultants...

 


Individual Comments on NERSC's Cray T3E: 19 responses

Improve turnaround time

The more I use it the more I like it. Batch waits can be excessive though

I can not run any meaningful calculations with a 4 hour queue and the btach job wait time on the 12 hour queues is very long.

The queues are too crowded and the turnaround is atrocious

Last time I checked, the queues here seemed even slower than the IBM SP. I only ocassionally the T3E anymore. This has gotten to be one of those computers where by the time the job is finished, if you're not careful, you may have forgotten why you started it. You need to do something to get better turnaround time.

Wait time in large job batch queues is too long.

queue length - need a bigger faster T3E?

T3E is really busy these days.

Interactive jobs time out after 30 minutes; batch jobs can spend a long time in the queue. But the worst thing is the inode quota of only 3500.

Good machine

Hope you keep it as long as possible!

Stable, also easy to use. And it is configured very well. I am also impressed by its checkpoint function. Hopefully, it can also be moved to IBM-SP.

This machine has probably the best communication network of any MPP machine I have used. Replacing the Alpha cache with streams was a bad idea; a large cache would have greatly improved its performance. It is a pity that an upgrade path to an Alpha 21264 based machine was not available.

Generally -- excellent machine, excellent performance until recently. Lately -- numerous crashes with no end in sight actually got so bad that I tried to use the IBM again (see comments above).

File system I/O is a bit slower compared to SGI although the computing power is a lot stronger than SGI origin series. Overall, it was mostly statisfactory to us.

Needs more memory

I don't use the T3E because there is not enough memory per node on the machine. It otherwise seems to be a very nice system to work on. Unfortunately, my smaller problems can be done locally on our own workstations, and the large ones need the memory of the SP systems.

more memory per processor!!!!

Other

Why it has not 'tcsh' shell?

NERSC response: tcsh is available, but must be loaded explicitly (since it does not come with the UNICOS operating system). See tcsh and bash.

The maximum time limit could be increased. A total of 12 hour is not enough if you work with systems like proteins in a water box. Actually, I guess that is one of the smallest number of hours in the supercomputer centers I know of.

getting old. configuration is not very usable. I switched to the SP completely.

Interactive time is wonderful! Don't take machines down at 4 pm for maintenance.

 


Individual Comments on NERSC's Cray PVP Cluster: 14 responses

Good machine / good interactive services

Good idea to make Seymour partially interactive.

Interactive time is wonderful! Don't take machines down at 4 pm for maintenance.

Many would like to see this facility upgraded

This is state-of the art Cray PVP Cluster! Unmatchable anywhere.

C90 was better

The replacement of the C90 with the J90/SV1 cluster was a poor decision. The cacheless true vector machine was a great architectural advance. Moving to 'pseudo' vector machines with a cache and all the problems that go with it was a retrograde step. [...]

Not as good as the C90 in terms of hardware and software.

No good compared to a machine (C90)

File issues: more inodes, migration

Need more inode and permanent file space.

I'm only using this for codes which I haven't yet ported to one of the MPP machines. Interactivity seems to be okay. My main gripes are the nuisance of automatic file migration and the fact that sometimes the system seems to be unable to even get the migrated files back. Since these are usually executables I often resort to recompiling the code since this is faster than waiting for dmget to get the file back from migration.

[...]Disks are a comodity item. They are cheap, and formatting them with adequate numbers of inodes is simple. Even if you feel it necessary to limit our disk quotas, please remove inode quotas.

Improve turnaround time

turn around can be somewhat long

The wait times to get jobs run seems to be increasing. This has resulted in exhortations to not use high priority queues but this doesn't fix the problem of multi-day waits to get jobs started.

Other

I would need a queue that allows to follow up a job with a successor without waiting time. Instead of having 6 jobs running in parallel, I would appreciate 6 continuous sequential jobs. The batch queue on killeen provides this at the moment to my full satisfaction, but only because currently nobody else extensively uses this machine. I cannot get any useful throughput on bhaskara and franklin. On seymour sometimes...

Never used.

seldom use during the last year.


Individual Comments on NERSC's HPSS Storage System: 16 responses

Good system

much, much better than the old CFS system! Love the UNIX interface!

incredibly useful and fast. No complaints, this is one great setup.

archive and hpss are great as long as the machines to access them from are up (mcurie is often down). A data processing machine that is stable would be great.

Dependability of the HPSS system increased significantly last year and I am finally getting satisfied with the system.

PCMDI is a heavy user of hpss. We are very satisfied. see http://www-pcmdi.llnl.gov/modeldata/PCM_Data/pcgdahome.html for details of the dataset

Fantastic system! Unmatchable!

Ahhhh, HPSS - best thing since sliced bread :)

I don't use them much. But it's a good place to store big files offline. And I get to store some model outputs while running the code. It's quite reliable.

Availability/performance problems

Many times, the system is not able to retrieve my files from HPSS storage when I need them most.

Obtaining directory listings of large directories is unreasonably slow.

hopelessly slow

Sometimes large files tranfer were interrupted because of time limit. It should be increased so as to transfer large files.

Interface improvements

We have had to create a script that checks to see if a file is accurately transfered to HPSS. This is something that should be done for users automaticly.

erosion of CFS features since move from LLNL

Other

I wish that the hsi source code would be set up in a tar file so that I could download it, compile it and run it on any type of architecture. That would be very nice...

Use it infrequently, so pretty much always forget all but the most basic commands.

 


Individual Comments about NERSC's auxiliary servers: 5 responses

A very reliable machine. A good use of expensive software licenses. [escher]

We have been receiving wonderful support from the Visualization group (in particular, Nancy Johnston)

Matlab licenses on Matlab is for 4 persons to use simultaneously. Sometimes this is a problem. Other times, we could just walk over to see how long other users will be using. [newton]

Don't use them.

difficult to develop programs on escher due to lack of debuggers. in this day of cheap CD writers, it would be nice to have really good documentation on the NERSC Web site on various ways to make movies. My impression from a previous post-doc who worked for me that things remain pretty painful in terms of multiple stages of work if one is trying to get QuickTime quality movies. Also, NERSC should bring up OpenDX on its visualization server.

NERSC response: for documentation on how to make movies, see: Making MPEG Movies. We have made this document easier to find.