Hardware Resources
Legend
Satisfaction | Value |
---|---|
Very Satisfied | 7 |
Mostly Satisfied | 6 |
Somewhat Satisfied | 5 |
Neutral | 4 |
Somewhat Dissatisfied | 3 |
Mostly Dissatisfied | 2 |
Very Dissatisfied | 1 |
Cray T3E - MCurie
Topic | Satisfaction | ||
---|---|---|---|
No. of Responses | Avg. | '98/Change | |
Uptime | 93 | 6.26 | 5.58/+0.68 |
Overall | 64 | 6.17 | 5.20/+0.97 |
Ability to run interactively | 85 | 5.60 | |
Batch queue structure | 81 | 5.47 | 4.51/+0.96 |
Disk configuration and I/O performance | 71 | 5.23 | |
Batch job wait time | 80 | 5.04 | 4.43/+0.61 |
Question | No. of Responses | Avg. |
---|---|---|
Uptime estimate (%) | 56 | 89.6 |
Batch wait time estimate (hours) | 49 | 14.5 |
Max. number of PEs used | 77 | 142.2 |
Max. number of PEs code can effectively use | 60 | 379.4 |
Cray PVP Cluster
Topic | Satisfaction | ||
---|---|---|---|
No. of Responses | Avg. | '98/Change | |
Uptime | 73 | 6.29 | 5.69/+0.60 |
Disk configuration and I/O performance | 54 | 5.56 | |
Ability to run interactively | 68 | 5.18 | |
Overall | 58 | 5.05 | 4.92/+0.13 |
Batch queue structure | 60 | 5.03 | 4.85/+0.18 |
Batch job wait time | 62 | 3.95 | 4.79/-0.84 |
Question | No. of Responses | Avg. |
---|---|---|
Uptime estimate (%) | 48 | 88.6 |
Batch wait time estimate (hours) | 43 | 43.7 |
HPSS
Topic | Satisfaction | ||
---|---|---|---|
No. of Responses | Avg. | '98/Change | |
Reliability | 81 | 6.46 | 5.51/+0.95 |
Uptime | 81 | 6.33 | 5.39/+0.94 |
User Interface | 72 | 6.06 | 4.88/+1.18 |
Overall | 69 | 6.12 | 5.09/+1.03 |
Performance | 73 | 5.90 | 5.46/+0.44 |
Response Time | 75 | 5.68 | 5.29/+0.39 |
Question | No. of Responses | Avg. |
---|---|---|
Uptime estimate (%) | 50 | 91.7 |
Reliability estimate (%) | 45 | 94.2 |
Performance estimate (MB/sec) | 12 | 11.8 |
Servers
Topic | Satisfaction | |
---|---|---|
No. of Responses | Avg. | |
Visualization Server - Escher | 11 | 5.45 |
Math Server - Newton | 12 | 5.25 |
Comments on NERSC's Cray T3E - 39 responses
- 7: A good machine / get more T3E processors
-
- Stable, reliable.
- Good scalability.
- Provides lots of production time.
-
- 7: Comments on queue structure
-
- 7 asked for longer time limits.
- 1 asked for fairer turnaround across queues.
-
- 6: Comments on throughput, performance
-
- 4 said bad; 2 said good.
-
- 6: Comments on interactive computing
-
- 4 asked for more; 2 said good.
-
- 5: Comments on disks, inodes, file systems
-
- 4 said inodes too restricted.
- 1 said /usr/tmp too full.
-
- 2: Comments on memory
-
- 1 said enough; 1 said need more.
-
- 5: Other comments
-
- need more optimization info.
- Provide better queue stats (let us know which queue best to use now).
- PVM errors, I/O bottlenecks.
-
- 7: Don't use it (yet) / no specific comments
Did the C90 retirement go as smoothly as possible? - 42 responses
- 31: Yes
- 2: Yes, but...
-
- Would have liked access to C90 files right away.
- Too many changes at the same time.
-
- 5: No
-
- Too much congestion on the J90s afterwards.
- We lost cf77 and DISSPLA.
- Now there is less interactive access.
- Should not have removed C90 until full SV1 cluster was in place.
- "As smoothly as possible" should have meant keeping the C90!
-
- 4: Didn't use the C90 / hadn't used it recently
Does the current NERSC PVP cluster meet your vector computing needs? - 52 responses
- 24: No
- 3: Probably no / am waiting to see
- 3: Yes, but... / yes and no
-
- Slow throughput / slow processors: 20 responses.
- Need interactive SV1s / interactive too slow: 8 responses.
- Shouldn't interactive environment be same as batch environment?
- Doesn't like charging rate.
- Problems with multitasking (OpenMP) and C++.
- Disk system poorly configured.
- Poor network access.
- cqstatl response slow
- Wants prompt to use the arrow keys.
- Doubts he can run BASIS.
-
- 21: Yes
- 1: No answer
Comments on NERSC's Cray PVP Cluster - 26 responses
- 15: Slow throughput, slow processors, need more PVP cycles
- 5: Need interactive SV1s, interactive too slow
- 4: Comments on software, configuration
-
- Doesn't like the automatic logout on killeen.
- Doesn't like f90 compiler.
- Multitasking (OpenMP) instabilities.
- Disk system poorly configured.
- Needs more info on text editors.
-
- 4: They're good machines, satisfied
-
- Stable, good software development/testing environment.
- Excellent accuracy.
- Running short jobs works well.
-
- 3: Comments on queues
-
- Needs a queue with 200 MW memory, 50 GB disk, 100-150 hours.
- Make scheduling fairer.
- Too many jobs are allowed for 1 user.
- Too many job failures.
- Frustrated when exceeds CPU and disk limits.
-
- 1: Haven't used them
Comments on NERSC's HPSS Storage System - 38 responses
- 12: It's a good system, satisfied
-
- Can easily handle large (8+GB) files.
- Happy with reliability.
-
- 11: Would like a better user interface
-
- More like Unix, more like NFS (3 responses).
- Provide the ability to manipulate entire directories (3 responses).
- Provide better failure recovery (2 responses).
- Provide better search capabilities (2 responses).
-
- 7: Improve the transfer rate
- 4: Command response too slow
- 3: Stability problems
- 2: Password problems
- 4: Don't know what it is, don't use:
Comments about NERSC's auxiliary servers - 10 responses
- 2: Like them
- 1: Newton needs an upgrade
- 1: Provide a data broadcasting web server
- 6: Don't use them, no answer
Comments on NERSC's Cray T3E - 39 responses
- A good machine / get more T3E processors: 7 responses
the best supercomputer I know of
Nice machine !
It is definitely a very stable and reliable machine!!!
I like the machine, and it is helpful to my work.
Ideally, it would be good to have more T3E's so that more projects could be awarded greater than 10**5 processor hrs./year.
Good scalability. [...]
The machine is extremely useful in the amount of production time that is available on it, however it is out of date as a massively parallel supercomputer.
- Comments on queue structure: 7 responses
I'd like to have a long 512-processor queue
Batch queues that run longer than 4 hours would be nice. 12 hours would make many of my jobs easier to run and manage. Some things just can't be done in 4 hour segments.
Some jobs in the production queue start fairly quickly. The pe128 especially seems to be under utilized. But for long jobs, the wait can the long. In one case I launched a job on gc128 and was disappointed by the slow turn around. The jobs started the evening after I submitted, then ran a couple of hours for the next two nights. The results were not ready for 3 days. I would have been better off, I think, if I submitted multiple jobs in the production queues.
I've never tried more than 256 processors, but I could. I have some problems that I would like to solve that are too large for the current queue configurations. I am confident that my code will scale past 256.Time limits per job seem to be too restrictive for long jobs, but that require few processors. For example, the T3E system allows a time limit of 4:10 h for jobs with less that 64 processors.
Climate modeling requires very long runs. In general, the queue structure is inappropriate for these multi-month calculations.
As a "naive" user, I do not know what the possibilities or alternatives might be with regard to batch turn-around time. I would like to be able to run batch jobs longer than 4 hours with fewer than 64 processor
It would be good to be able to run a job for longer time, say 24 hours, maybe with smaller number of processors (4-64).
- Comments on throughput, performance: 6 responses
Recently the queue wait time has been a real problem.
The batch job wait time is too slow
Too more users on MCURIE
[...]Batch jobs are sometimes slow getting into and through the queues. This tends to lead to heavy interactive use to get adequate response time. My jobs tend to grow in time and thus require increasing numbers of processors. This is difficult to manage with slow queues.
good turnaround; few down times
Very high throughput compensates for lack of speed. [...]
- Comments on interactive computing: 6 responses
Interactive and network performance are poor. [...]
I asked my postdoc. His main complaint was with the inability to run small jobs interactively when two big jobs were doing so. Maybe some sort of queue where a new job can take at most half of the available processors.
I know it's very hard to set up this architecture to run with multiple jobs like a PVP --- but some interactive use of many processors would help debugging.
If possible, keep a small number of processors available for debugging after midnight.
I find the ability to run parallel jobs interactively is particularly useful as I am developing a parallel code and this makes debugging a lot easier compared with some other systems that I use.
Do a lot of my work on your T3E because I can run small test interactively, as well as have access to totalview. This is probably THE most important thing for me. Do not like it when mcurie is not available after 4 pm on some weekdays, but can live with it. Don't run batch jobs very often, other people in the group do that.
- Comments on disks, inodes, file systems: 5 responses
[...] The /usr/tmp/ disks are lately full enough that one has to take special measures with large datasets to avoid having a PE hang in an I/O cycle. Richard Gerber helped me with this problem, and knows what I am talking about.
The NERSC implementation of AFS is not user friendly. I have given up on it, and find this annoying. I never figured out how (from a NERSC computer) to access AFS directories that belonged to a different user -- sharing code development directories with other users is one of the main reasons I use(d) AFS.[...] The limitation on the number of files allowed in the home area is very restrictive as compared to the limitation on memory used.
Storage restrictions, in particular inode restrictions, seem somewhat more restrictive than is necessary.
inode limits, especially in temp seem weird. also, what exactly do the previously two questions mean??
In general, the inode limitations on temporary and home disk space are very restrictive. Having limits on the number of inodes used seems excessively limiting. Good configuration of the drive space shouldn't require these type limitations. [...]
- Comments on memory: 2 responses
I like that it has 256 memory .. the nersc t3e has been the only machine I have access to with this kind of memory .. I need that for part of my work.
[...] The machine need more memory per PE, the current 256MB severely limit what I can do on it. I'd need 0.5-1GB/PE to get larger calculations done.
- Other comments: 5 responses
[...] Would like access to more performance tuning on single node performance (e.g. a quick reference on best set of compile flags for good performance). The Cray documentation is either too verbose or do not give any information at all.
mcurie has been up and down a lot lately, but mostly it is pretty reliable. It is much better than when new schedulers were being tested. [...]
The queue stats are not very useful. A useful stat would be something that better informs which queue is best to run on "now," based on history, like a year-long plot of daily Mflops/wait time, perhaps on a per-queue basis or otherwise so we know if it's better to wait or try another queue.
PVM calls cause my code to stop with errors very often. Resubmits without changes are successful.
[...] Couldn't use many more than 16 PEs due to IO bottlenecks (so far, NERSC is working on this).[...]
- Don't use it (yet) / no specific comments: 7 responses
N/A
Ever so slowly I creep up on porting my major code from SMP architectures to MPP...
I don't use it.
Haven't used it yet
I have not used it yet, but will start in FY00.
(I haven't fully tested my code, but I plan to try it on a larger number of PEs than I have so far.)
(This is based on the experience of my group, not personal experience.)
Did the C90 retirement go as smoothly as possible? - 42 responses
- Yes: : 31 responses
I did not have any problems and found the website and consultants very helpful in storing and/or moving files around to accommodate the retirement.
Yes I was well prepared for it thanks to your encouragement to get on to the J90s
The transition from the C90 was handled fine.
Yes, not bad.
it was reasonably smooth for me
Yes - I had moved essentially everything off the C90
It was fine for me.
No problems
No problems for me.
yes [22 responses]
- Yes, but...: : 2 responses
Yes, it did. It would have been nice to access files right away after the transition rather than waiting until after Jan. 20.
Essentially, yes. There were too many changes going on all at the same time back then, so it was overwhelming to us here.
- No: 5 responses
There was a lot on congestion on the J90 for a number of months following this. It didn't really seem like enough capacity was added to handle all the users coming from the J90.
C90 retirement meant that we lost the very fast f77 compiler and DISSPLA. This has forced me to use the very slow F90 compiler on the J90s and much code conversion to retrofit NCAR graphics. Also, the J90s are slower than the C90 and could stand to have more interactive access.
Wished the c90 was not shut down until full PVP cluster was in place. J90 interim had a very adverse effect on our productivity
The C90 was a great machine. "As smoothly as possible" should have meant keeping it on the floor.
No.
- Didn't use the C90 / hadn't used it recently: 4 responses
I did not use C90.
I haven't used the C90 for a long time. I found the tern around time on the J90's to be superior to the C90.
I never used it.
It was perfectly smooth for me. I had nothing on the C90.
Does the current NERSC PVP cluster meet your vector computing needs? - 52 responses
- No: 24 responses
no. I have to wait long in batch queues
NO! The batch queuing environment results in wait times (for me) that are HUGE compared to what I was used to on the C90!
The PVP cluster has been very difficult to use. In FY99 there were many times that jobs I submitted took many days to start.
Not really. Need higher performance.
No-no-no!!! if the waiting time to run batch jobs is any indication. I hope the new queue system is a big improvement because the present situation is nearly intolerable.
The jobs seem to be waiting so long in the queue that I am at present discouraged to run jobs. But I do need this facility for my computing.
The system seems to be overloaded given the long wait time on the batch queue. More batch machines might help (of course, that requires money). I have heard of a proposal to turn one of the batch machines into an interactive machine. Since there aren't enough batch machines as it is, I am opposed to any reduction in the batch resources.
No, it is way too slow. I work on many systems all at the same time, and the CRAYs are by far the slowest to get out my results. And I mean by one or more orders of magnitude. However, I am not exploiting the vector capabilities so perhaps this is not fair.
Very slow-- lots of wall clock time.
batch queues too slow
need more capacityturnaround slow in recent months, but I recognize the emphasis must be given to MPP, in regard to resource acquisition.
Based on the very long wait time in the batch queues (sometimes as much as a week or more) there seems to be greatly too little PVP resources. I also notive very large numbers of jobs submitted by one or a few users. I hope the batch system accounts for this and runs only a few of those jobs at a time, otherwise, it is unfair and possibly wasteful, to encourage submission of many jobs.
turn-around for large memory jobs is tooslow
Batch is too slow.
No. they are very slow.
I do not use it much because my jobs stay in queue for a very long time (It might be related to the end of the year).
Noty exactly, needs to wait long time to obtain the result, sometimes the compiler (CC) fails to overcome any overflow proplems. Also, need the prompt to be customized such that the user can retrieve the previous commands using the arrow keys.
NO. MORE processors needed.SV1 processors are NOT as fast as they are supposed to be. Account charging makes it even worse. I have no clue why I am charged 3 times more on SV1 than I was on J90SEs, my code runs only 2 times faster! SV1 charging factor should be reduced to 1 or 1.2 . What is the benefit of generously charging to users for NERSC? Total allocation awards for FY2000 on SV1s is less than the total computing power on the cluster. Repositories going over the allocation should be automatically borrow time from the DOE reserve, as the process takes time that users don't want to lose. It is a good strategy to urge users use their allocations early in the year and take off some of it if they do not.
No. So far, I have not managed to get better than about 180 Mflops out of a single SV1 CPU. This compares with 500 Mflops for the C90 and 100 Mflops for the J90. I really need a vector machine with the C90's capabilities.
Multitasking on the J90/SV1 has problems of frequent crashes and sometimes wrong results. Tracking down these problems is almost impossible since there is no way of doing safe multiprocessing i.e.multiprocessing which gives reproducible results down to the last bit!
The disk system is poorly configured, with insufficient inodes. It needs to have its file systems rebuilt with of order 1 inode/5Kbytes which is a typical value for workstations/PC's. The old 15000 inode quota was totally inadequate: the new 5000 inode quota is totally ridiculous. Adding a few more disks to the cluster would help. After all, disks are dirt cheap.Not a true replacement for the C90. Switching one or two of the SV1's to interactive use would help. They'll be a small part of the total resource, soon enough.
I do interactive software development and testing now and then on killeen. Sometimes interactive response time is slow in the afternoons. I think the SV1's are currently "batch only." How about giving interactive access to one of them?
No. There should be one more box (SV1) available for interactive use.
NO...NOT AT ALL. THIS IS VERY IMPORTANT TO ME AND MY GROUP. We need to have at least 1 of the three machines devoted to pure interactive use. I recommand 2 of 3 with interactive in the day time and 3 of 3 batch during 11-5 PDT. The J90 is such a dog. Unless the PVP cluster gets significant interactive use it is likely we will not find it very useful and will increasingly turn to other computing resources.
The interactive J90 is very slow. Batch access of late has been terrible.
- Probably no / am waiting to see: 3 responses
The configuration is OK, but the interactive machine is often too busy. A small persistent annoyance is the long wait to get job status info out of cqstatl. My main problem is my network connection - I have had consulting help a couple of times on this, and it has always ended up with you saying it's
the fault of the CUNY network provider and them saying the delays are on the NERSC side.Since this is a fairly new cluster - since the SV1 is a new machine - will hold my fire.
Have not used it. Can I run BASIS codes there? I doubt it.
- Yes: : 21 responses
It is ok.
My vector computing needs are very small.
NA. I only run short interactive jobs.
No complaints
NO problems so far, but I haven't made any significant demands lately.
yes [16 responses]
- Yes, but... / yes and no: 3 responses
Yes, but with the complaint given below that the interactive response of the J90se is often too slow.
yes, but note that the compilation is done on a J90 while the software is run on an SV-1. Doesn't this slow the code since it is not compiled on the SV-1?
Yes and no. While it is good to have the faster SV1s available, I don't use the machines as much as I could simply because interactive response on the J90 is very poor. Simple UNIX operations such as 'cd' and 'ls' can takes many seconds to happen. Also, I often use my codes in an interactive manner, mixing significant computation with interactive steering. While it would be possible to work this way using batch (i.e. submitting little batch jobs between each bit of steering), the current state of the batch queues (overloaded) makes that completely impractical. (Note that I do use batch when interactive steering is not needed.)
I strongly urge you to make the interactive machine a SV1. This would ease up the overload on the K machine and make it much easier for us to do our work. It is inevitable that anyone who makes use of the batch machines will also make use of the interactive machine, for testing and debugging purposes. Putting that on the slowest machine available impedes our work. Also, making the - No answer: 1 response
N/A
Comments on NERSC's Cray PVP Cluster - 26 responses
- Slow throughput, slow processors, need more PVP cycles: 15 responses
The batch response is often slow, presumably due to high demand. The 36 hour estimate above [for batch job wait time] averages 24 over the summer and 48 lately.
Over the last 5 years have been solely using the scalar/vector machines at NERSC. In the last 2-3 years, shifted to the PVP cluster. In the early stages stability of machines were very questionable - memory and disk failures + communication between machines (loss of output). Bascially could not run large jobs (memory / disk / time) on this cluster. Reduced, at least for the past several years, to run simple short to medium runs on the PVP machines. The point of NERSC was supply users to state-of-the-art computing resources and thus push the limits of these machines. The PVP cluster has failed to live up to the promise both in terms of software (OS) and hardware failures. Maybe the SV1 will overcome these problems - see answer to the above [since the SV1 is a new machine - will hold my fire]. .
[...] looooong wait in batch queues
[...] The reconfiguring of the batch queues so that the "nice" value decides when the job starts, rather than its priority during execution was the stupidest idea to come out of NERSC since the decision to replace the C90 with the J90/SV1's. The ideal is a system where the nice value acts merely in the way it is supposed to under the Unix standard, and the user is given the capability to change the nice value up and down over the whole range allowed for the batch queues (say 5-19). This would be similar to the old control command, but without separate slow, medium and fast queues. As it stands, I sometimes have to wait almost a week to get a job to start. I am tempted to submit multiple copies of each job with different nice values.
It's usable but most people I speak with feel it should be avoided unless absolutely necessary. Mostly because the performance is so poor. [...]
So far, the batch wait times hve been longer under the charging system started Oct. 1 than they were last year under the old system.
See above. [MORE processors needed.SV1 processors are NOT as fast as they are supposed to be. Account charging makes it even worse...]
Currently, the wait time before jobs begin is completely unacceptable. I have to wait 2-3 days for a job to begin. The date today is Sept. 28, 1999. Hopefully, after Oct. 1 when the fiscal year begins anew, the wait time will be less.
These computers are not very fast. I learned parallel computing techniques to get away from using them. I now use them only for diagnostic post-processors that require NCAR graphics, which is not supported on MCURIE.
I have found the time waiting in the queue extremely long!!!!
[...] but wait long time to get the my share of cpu time.
There was a big backlog of jobs during the summer. I haven't run any jobs recently so am not aware of how things stand at present.
My only complaint is that in the last few months the turnaround time to perform my simulations has gone way up. When I first started using NERSC computers (circa 1995), it took about a day to cycle a moderate simulation through the 5MW queue. Now, I have had jobs sitting on the queue for several days before they even begin.
Get more SV1 machines if batch remains over-subscribed.
I think that my answer to the previous question says most of it [low megaflops, ...] Don't try to argue that I am an isolated case since most of your users seem satisfied. Note that a number of your big users have moved off the PVP cluster entirely.
- Need interactive SV1s, interactive too slow: 5 responses
slow interactive response; [...]
could use an interactive PVP machine
I think you should seriously consider making one of the SV1's the interactive machine rather than the J90se. Many users were expecting that this would be the case before the SV1's arrived. They were disappointed then when there was no improvement in interactive use. There still seem to be many times when the J90se has very slow interactive response. Since this uses up valuable human time staring at a monitor waiting for something to happen it is worth trying to improve.
[...] only use killeen, wish it was faster; how about making one of the faster machines available for interactive use?
Maybe more interactive sessions on machines other than killeen.
- Comments on software, configuration: 4 responses
[...] I also don't like the non-standard aspects of the way they are set up. Most annoying is the automatic logout on killeen - that's just ridiculous and everyone knows how to circumvent it anyway.
Make the F90 compiler as fast as F77 was on the C90. inept f90 compiler; lacks graphics software that was on c90
I think that my answer to the previous question says most of it [..., multitasking instabilities, disk system poorly configured, insufficient inodes].
[...] Need more information about text editors.
- They're good machines, satisfied: 4 responses
My needs are simple. I only run short interactive post-processing on killeen
Very stable and good software development/testing environment. [...]
it's great
Excellent accuracy, but wait long time [...]
- Comments on queues: 3 responses
Possibility of running batch jobs with RAM of 200 MW, Disk 50 Gb , and a max of CPU about 100 -150 hrs at J cluster should be made available in the near future.
The batch queue system functions but has features that really seem candidates for improvement. The queuing system seems unpredictable and somewhat unreliable. Jobs seem to be scheduled in random order. Of two similar jobs submitted simultaneously, one could run within 12 hours, skipping over several hundred other jobs in the queue, the other could wait many days. So, it does little good to submit jobs in priority order in order to do a systematic series of jobs where the next step depends on previous results. Some users seem to be able to submit 100 jobs and have them all run very quickly. Occasionally, jobs are "failed" because of a system problem and it is necessary to resubmit and wait again several days for the job to start after already having waited several days before the system problem was encountered. When a system problem is causing jobs to fail, often jobs continue to be submitted, all of which fail causing the problem to multiply. Finally, it
As a beginner, I experienced frustration with exceeded CPU- and disk space limits (due to a core file of one job). Is there a possibility to automatically (or manually) keep alive or resubmit jobs that crash due to such external problems? The introduction of premium jobs helps greatly to detect errors in the batch script.
- Haven't used them: 2 responses
I have not used the Cray PVP cluster
N/A
Comments on NERSC's HPSS Storage System - 38 responses
- It's a good system, satisfied: 12 responses
Very good job on maintaining such a monster storage system.
HPSS is the best mass storage system I have used for the last 20 years.
I have a lot of data stored. When I need it, it is there, so I am happy.
mostly works well [...]
Very nice and efficient system, a great way to store data. Can handle< large files (8+GB) easily, which is extremely useful.
Very good
Without HPSS , I could not use NERSC computing facilities.It is a sine quo non for my work.
I use the storage to HPSS thru my batch jobs since I use the stored files for restarting my programs. Perfomance is not a big thing for me but reliability of HPSS being up to store the data is. I am very satisfied with the reliability.
very useful, for backing up files from my work computer system
It's good but [...]
It works for me.
useful for our RNC group to storage the mass data.
- Would like a better user interface: 11 responses
It would be nice to have better interface software to HPSS. I'm thinking of something that would let one see the whole (or large portions of) one's directory tree structure in a graphical display. Also, some type of user friendly search capability would be a big help.
It would be nice if the system were less like using ftp like, and more like using an NFS fileserver.
[...] (Please fix the HSI wildcard char recog.! Seems sensitive to term emulation, backspace, mouse-paste etc.)
I would be nice to have masget etc. commands we used to have for the mass storage before. These commands might exist, and I am just not aware of them.
[...] ftp interface a bit primitive
Again, because I'm lazy, I'l like the interface to look as much exactly like Unix as possible. I'd like to be able to use foreach loop and wildcards in the same way that I do in tcsh...it's pretty close, but still frustrating at times.
The FTP interface is a bit clumsy for many problems, and it appears that the FTP restart command is not supported, so there is no way to restart a failed transfer, or just get a portion of a file.
More UNIX like interface is an improvement.
It's good but needs some super-simple wrappers to allow one to store away (and retrieve) a whole directory in one fell swoop. I can write these for myself, but really they should be provided. For example, a command called "stash" that just tar's the files in the current directory (not below!) and stuffs that into a like-named directory in HPSS would be a great help. I wrote such a tool for CFS but dare not try using it now without careful attention to the changes needed.
[...] HPSS needs to be configured to send an error flag to the shell when the transfer fails so that a batch job could be made to crash when a transfer failed. Note, cfs was able to do this.
It should be easier to search the whole directory structure for files by name, date, etc.
- Improve the transfer rate: 7 responses
It's not as fast as NCAR's MSS. [...]
Improve transfer rate
I believe my upper bound on the transfer rate is due to the network, but I which nevertheless that it could be higher. At Brookhaven I have (or used to have) 10 MB/sec.
FTP is slow
slow at transferring lots of large files; needs bigger pipe
At times, the HPSS can be slow.
Reliability and performance appear to be highly variable. At times both are good for long stretches. At other times, they appear to degrade and remain so for a considerable time.
- Command response too slow: 4 responses
It might be useful if the ls command did not take so long when deep in a directory structure. This is often the decision maker as to what files are needed.
HPSS is too slow in responding to directory commands in directories with more than a few thousand files. [...]
System response seems slow, but I have no expertise by which to judge it
I'm not sure how hpss works, but there seems to be a lengthy delay after entering some command like "dir" or "get file" before it begins execution. Probably this is unavoidable due to the size of the storage system.
- Stability problems: 3 responses
PFTP seems to die transferring large files more often that I would expect/hope.
The system does not seem to be very stable.
Reliability and performance appear to be highly variable. At times both are good for long stretches. At other times, they appear to degrade and remain so for a considerable time.
- Password problems: 2 responses
Right now it takes a long, long time to authenticate my password.
sometimes it requires the password more than once
- Don't know what it is, don't use: 4 responses
Don't use it as much as I should as I was a cfs user and have not taken much time to learn how to use the new system.
I am told that I have an HPSS storage system account. I haven't the faintest idea what that means.
I have not used the NERSC's HPSS storage system
I'm not sure if this is an HPSS issue, per se, but I don't understand why i-nodes are so limited on t3e machine. In our work, we have lots of small files, so this has been a problem.
Comments about NERSC's auxiliary servers - 10 responses
- Like them: 2 responses
They are powerful, useful, and well maintained.
Good response times.
- Newton needs an upgrade: 1 response
Newton needs to be upgraded with more powerful processor(s)
- Provide a data broadcasting web server: 1 response
You might consider also providing a password protected web-server where users could broadcast data from their runs on the PVP and MPP machines directly to themselves and to their collaborators. It would be particularly useful if such a server could provide some software (e.g., using JAVA) which would gather one's data off of PVP or MPP platforms and then make graphical web-based displays of it. I recently saw a demonstration of such a collaborative graphical simulation environment developed at a Japanese supercomputer center (at JAERI) and it looked like a very useful capability.
- Don't use them, no answer: 6 responses
don't use these servers
Not used
No Answer
Have not had the time or pressing need to use escher this past year.
N/A
I have not used them.