NERSCPowering Scientific Discovery for 50 Years

2005 User Survey Results

Comments about NERSC

What does NERSC do well?

[Read all 82 responses]

47 Provides access to powerful computing resources, parallel processing, enables science
32 Excellent support services, good staff
30 Reliable hardware, well managed center
11 Everything, general satisfaction
11 HPSS, data and visualization services
6 Software resources and management
4 Documentation, NERSC web sites

What should NERSC do differently?

[Read all 65 responses]

24 Improve queue turnaround times
22 Change job scheduling / resource allocation policies
17 Provide more/new hardware; more computing resources
8 Data Management / HPSS Issues
5 Software improvements
5 Account issues
5 Staffing issues
4 Allocations / charging issues
2 Network improvements
2 Web Improvements

How does NERSC compare to other centers you have used?

[Read all 51 responses]

26 NERSC is the best / overall NERSC is better / positive response
12 NERSC is the same as / mixed response
7 NERSC is less good / negative response
6 No comparison made

 

What does NERSC do well?   82 responses

Note: individual responses often include several response categories, but in general appear only once (in the category that best represents the response). A few have been split across response categories (this is indicated by ...).

  Provides access to powerful computing resources, parallel processing, enables science:   47 responses

powerful is the reason to use NERSC

I compute at NERSC because I cannot afford to purchase and maintain clusters with fast interconnect

NERSC provides powerful computing resources and excellent services. It enables people to perform many challenging, difficult computational work.

NERSC provides superior hardware computing capabilities, and professional user support and consulting. These two areas is where I see NERSC core strengths, here NERSC offers resources that are not easily matched by any local cluster or computer farm set up.

I like the new hardware (jacquard and bassi).

... I compute at NERSC because I have to analyze large amounts of data stored on PDSF and I need the parallel computing facilities to enable my analysis to run fast. I would not be able to do my research without a cluster like PDSF.

I am not a super-power user but find NERSC essential for getting <1 day turn around for runs that on a fast PC would take 1 day to 1 month or more.

NERSC provides me the resources badly needed that can not be obtained elsewhere. It is very important to my scientific research program.

INCITE project

NERSC (in particular Seaborg) is extremely valuable for our research in theoretical nuclear structure and reaction physics. While we are able to write and debug our Fortran-95 codes on our local workstations (LINUX and MAC OSX), we need a supercomputer to do production runs.

The machines are powerful, well administered, reasonably responsive, and accessible to groups which do not have the resources to build their own large clusters.

Online allocation process is convenient. Since NCSC is closed in our place, NERSC is the only facility that I can rely on for large numerical calculations. I am mostly satisfied with the performance of NERSC. Thanks.

it allows to do things that take too long or take too much memory on a linux box

Speed, data accessibility

Excellent resources and support.

I mostly use NERSC to run my Titanium benchmarks on lots of nodes.

It is the primary source of computing assigned by DOE to carry out the computational work which is part of my contractual obligations with DOE.

NERSC computer resources are very important for my research. It gives me access to high performance computers and the mass storage that I need to store intermediate and final results. I am very satisfied with flexible way, NERSC handles allocations of CPU time and mass storage.

I compute at NERSC because of the availability of massively parallel resources. IT is one of only a few facilities which offers the ability to compute across 100's or 1000's or processors. I am most pleased with the stability of the NERSC machines; most of the time one can run very large number of processor jobs and the machine will usually be stable in this time.

NERSC provides generally reliable hardware and software, and very good user support from consultants. It provides a significant portion of my overall computing resources.

The increases of computer performance (CPUs & Memories) are important for us to do the large-scale computing at NERSC.

The resources provided by NERSC are essential to my DOE-funded research. NERSC does a very good job of making massively parallel computing resources available and usable.

Access to large parallel systems is very useful.

Very useful for large "batch" projects

I compute at NERSC because it provides enough processors with the required memory and speed for my applications.

NERSC is doing a good job and is very important to me. There is a large amount of computing resource and I need to run hundreds of jobs to analyze the data.

NERSC has the largest and fastest computers I have access to.

I compute with Jacquard and analyze with Davinci. Both have been really helpful, fast and reliable. I have some comments on the batch system, which I think needs improvement. My jobs were sometimes killed because of an NFS file handle error, so that I would have to wait in the queue to restart (continue) that run. I think that in the event of a run killed due to a system error, any *dependent* job related to that killed job should be given the highest priority to start. Also, there have been problems with jobs getting stuck in the queue for a week or more, overtaken by jobs by jobs submitted more recently. I think the batch system should be improved to avoid such scenarios.

  Excellent support services, good staff:   32 responses

There is a distinct PROFESSIONALISM in the way NERSC conducts business. It should be a model for other centers

Provide a source of parallel computing that I do not have. Great resource & a Great Consulting staff. Without them I could not do and learn the advantages of parallel processing

Consulting is great. Management is great. DOE strategy is not so great.

NERSC is important because it combines state of the art HW/SW but most important the combination of state of the art HW/SW with excellent first class consulting/collaboration.

Iwona is great at PDSF support. PDSF node availability is quite good.

very useful service
really fast to help with different problems

... I am also impressed with the fast turnaround time for solving issues. ...

I'm very happy with the support and (at least in general) the timeliness of the replies to requests. The staff makes NERSC (= pdsf in my case) a nice place to work.

support and consulting are very good

Processors available and good consulting support.

NERSC provides good help line support. I don't do much except get our codes to compile on the seaborg compiler. Others in my group are the ones who actually run the codes.

Consulting is excellent Up until recently the computing power has been second to none.

It is a great place for high end computing. The services at NERSC wonderful.

Good user service. We are cycle hungry. Very pleased with move to clusters. We have been getting a lot of uncharged time from Cheetah and Phoenix at ORNL-CCS. Cheetah is gone, and we don't know how well the new Phoenix allocations will work. Got a lot done on Jacquard this year but it is now overloaded.

NERSC tries to be a user facility and tailor its systems for the user, although it is not as good at this as it used to be. It does not discourage actually talking to its consultants, unlike some other places.

I am extremely satisfied with NERSC! In particular I would like to thank Francesca Verdier, she has always and very promptly responded to all my concerns/questions/requests and have always showed a willingness to help that is not usual!

Services of the personnel are excellent. Antiquated and inadequate hardware is the main issue from what I see. Bassi helps but more is needed.

So far NERSC is doing very well on flexibility. NERSC can meet with users requests timely. NERSC is important because it is very close to my University. I am supposed to be able to do complex data visualization and analysis interactively although it turns out not as I expected. The consulting services at NERSC are also very good compared to other centers.

  Reliable hardware, well managed center:   30 responses

NERSC is and remains to be the best run central computing facility on the planet. I have been working at NERSC/MFECC for 25 years and continue to use it for highly productive work.

Management of large-scale multi-processing.

I think the ability to provide support for a wide variety of codes, that scale only to 64/128 proc to those that scale all the way to 4096. This flexibility given different scientific applications is crucial for its success.

NERSC maintains a high performance computing center with minimal downtime which is very reliable. I am very impressed with how short the amount of downtime is for such complicated systems. ...

NERSC systems are generally stable and well maintained. This is great for development. But, NERSC becomes a victim of its own success because many people want to use it, resulting in long batch queues.

I use seaborg primarily because of how much memory it has on each node. And I've had very few problems with it - a definite plus.

Stable computing environment and fair queuing system.

NERSC has good machines, the overall setup is quite good.

I run at NERSC's Seaborg because the performance is the most consistent and reliable of any of the facilities at which I compute. I never have trouble running on even 1000+ processors - that's the most important thing for me. The few times I've needed questions answered or account changes, the consultants gave me a very quick and satisfying response and I really appreciate

Recently I do lots of simulations at NERSC. NERSC is really a world-class facility, in my opinion. The administration of the system is really great.

Capacity and capability computing; flexibility and responsiveness locally.

Machines are up and running most of the time, and the waiting times for job execution are quite acceptable

Generally, it runs well at NERSC. There are so many computing nodes and the system is very robust and reliable. There are many scientific programs needed to be run on the NERSC. That's why we choose and use it.

Excellence machine offering reliable and excellence service. But we need bigger one and more allocation.

  HPSS, data and visualization services:   11 responses

We generate very large amounts of data and having a reliable partner who can store at large scale, plan and manage the storage capacity etc is very valuable.

Provides good computing support for large computer jobs, including an easy to use and remotely access archiving system.

free; convenient; HPSS has now become indispensable for me

Mostly everything works well when there are no network problems or other issues the either prevent me from logging on or make my sessions crawl. NERSC is very important to me because I do all of my data analysis on PDSF and the data for our experiment (KamLAND) is all processed at NERSC.

(We only use PDSF and HPSS:) I am very satisfied with the overall facility. The HPSS system is the main repository for all our experimental data and we are quickly becoming its largest user. The system is efficient when there are not many connections to it and once you have an HPSS connection, the transfer of data onto HPSS is very fast.
PDSF is our main analysis cluster and was essential for doing our analysis for all our papers. The many available processors and the fact that the facility is shared among several experimental groups that have similar needs, make it very useful for us. The flexibility of the machine and staff allow things to get switched around quickly if necessary.
I am very happy that NERSC decided to use the GPFS file system. We converted our disks to this new file system a month or so ago and see dramatic improvements in the number of concurrent clients and throughput over what was used previously.

NERSC is best for interactive running and visualization. The time on NERSC is important to the completion of my projects, however,more time is spent waiting in the Seaborg queue than on other machines.

  Many things, general satisfaction:   11 responses

NERSC is doing a great job and continuously improving. I am very satisfied with your job. Thanks!!

NERSC is doing terrific. It has very helpful consultant systems, provides all kinds of softwares and libraries we need, and also the computing systems are well customized. I can get helps, find documentations, solve my problems, get my work done very very quickly.

NERSC is best at delivering the combination of leading-edge (or near leading-edge) hardware and system reliability in a production setting. High-performance scientific computing requires both computing muscle as well as systems reliability. NERSC has always been able to manage the balance between the two to give us (the users) an impressive product. As a result, my colleagues and I have been able to run simulations this year that we could not have done at any other (open) facility. Understandably, we are very pleased and hope that NERSC will have the funding to both upgrade its systems and maintain its consulting excellence.

NERSC runs a very tight ship. Although the queues are often long, downtime is rarely a significant issue and the facilities are well maintained to keep up with the demand. This reliability and the extent of both hardware and software resources allow me to run simulations in approximately 1 week (including queue time) that would take on the order of a month to run on my local machines. Additionally, the availability of a dedicated computing cluster allows me to handle all necessary data processing while the "next run" is in progress, thereby maximizing my efficiency.

NERSC staff are excellent. Machine uptime and access is excellent. queue structure is very good.

Ease of supercomputer use
Computing power of machines is very good

Support (consulting) is excellent
Nersc helped me and my entire community with a special project (see qcd.nersc.gov)

I am mainly very satisfied with the PDSF facility. This satisfaction and its proximity are the main reasons for use.

NERSC's strengths are a combination of its dedicated and knowledgeable consulting and support staff; and its very large systems, which make it possible to run very large parallel jobs on the same machine where they are developed and tested serially or on a few processors. The presence of good analysis and debugging tools, such as Totalview, is also critical.

I compute at NERSC because the systems are always up and running, the software is available and up to date, and the consultants always have the answers to my questions.

In large-scale computing, *many* things are *very important* (any single one going wrong can ruin your day, possibly your cpu bank account), as I checked. nersc does them well!

Stable super computing environment with almost always up to date libraries good technical support

  Documentation, NERSC web site:   3 responses

The help documents online. I need NERSC to handle parallel computation for time consuming problems.

Abundant and very helpful website information about scientific computing.

NERSC has great computers and except for the last 5-6 months, great support. Things tend to run most of the time and one can get interactive runs done quickly. Web site is comprehensive and quite extensive.

  Used to be Good:   2 responses

I have not used the facilities significantly in the last few years and cannot answer detailed questions in a meaningful way. In prior years, I found NERSC to have excellent state of the art computers and a dedicated support staff. The main problem was the rate at which changed. When not using the facility intensively it became more and more difficult to keep up with changes. Right now I would like and excellent, stable facility rather than a state-of-the art computing facility.

NERSC used to be a superb place for supercomputing. I've done very substantial simulations in previous years. The problem is that the seaborg (the main resource at NERSC) is very much outdated. This problem is exacerbated by overloading the system with too many jobs. We could not even use our last year allocation because it was difficult to run jobs.

 

 

 

What should NERSC do differently? (How can NERSC improve?)   65 responses

  Improve queue turnaround times:   24 responses

The most important improvement would be to reduce the amount of time that jobs wait in the queue; however, I understand that this can only be done by reducing the resource allocations.

* to improve turn around time for jobs waiting in the batch ...

A queued job sometimes takes too long to start. But I think that, given the amount of users, probably there would be no efficient queue management anyway.

The turnaround time should be much better, although it has improved since last year (Especially since the inclusion of Jacquard). ...

run more jobs and faster

I wish the queue time for large short jobs on Jacquard got through faster. Also, now that Jacquard seems to be stable, please minimize its down time. Thanks.

Queue on seaborg is a bit slow due to heavy use. I recommend another machine.

Batch Scheduling needs to be upgraded so that wait time in queue is reduced ??

Limit the waiting time in the queues, especially on seaborg (it is pretty good on jacquard).

Decrease waiting time for jobs.

The batch queue waits can be very long. It is difficult to get jobs through with time requests for more than 8 hours of computing.

I am mostly dissatisfied with the batch wait time. Sometimes I had to wait for more than a week for a job to get into the run mode. The situation has become that the turnaround time sometimes is slower than our local cluster for those jobs that we can afford to run here. Any ways to reduce the turnaround time will be useful.

Less waiting time on queues

... * The queue wait times are ludicrous on seaborg. I don't want to use that machine anymore, although it's a great machine.

Reduce batch queue hold time.

My major complaint of the past year was the time spent waiting in batch queues on Seaborg. The introduction of Jacquard has helped this somewhat, though queue times are occasionally still long (one week for 1000 processors). I certainly appreciate being given "boosts", when circumstances demand it. However, relying on this a lot slows things down for users who are in a more ordinary mode of operation. Short of additional systems capacity, I'm not certain of the best solution...

NERSC response: NERSC has made several changes based on the over allocation that occurred in 2005 and the resulting long turnaround times:

  • Time for new systems will never be pre-allocated but allocated shortly before the system enters full production.
  • Time has not been over allocated in 2006.
  • NERSC is investigating with DOE options to under allocate resources in order to improve turnaround times while still meeting the DOE mission.
  • NERSC has suggested to DOE that the allocation process align allocations more to the mission and metrics DOE has for NERSC
However, many things that affect turnaround time are outside of NERSC control, such as large numbers of projects that have similar project deadlines.

 

  Change job scheduling / resource allocation policies:   22 responses

Don't over allocate

Overallocation is a mistake.
Long waits in queues have been a disaster for getting science done in the last few years. INCITE had a negative affect on Fusion getting its science work done.

Under allocate to reduce queue waits. Look for better queue systems that don't involve priorities. Keep large ps queues like 64ps and 32ps on Jacquard always open. ...

NERSC should never over-allocate its resources. It should seek to decrease the wait times in the queues. It should work with DOE to enhance resources (Seaborg is kind of old now).

It's much better to have idle processors than idle scientists/physicists. What matters for getting science done is turnaround time.
Don't measure the GFlops the machine is getting, measure the GFlops the scientist is getting. For example, my job might do an average of 800 GFlops continuously over 8 hours. But that doesn't matter if I have to wait two weeks (336 hours) in the queue, in which case I really only get
(8/(8+336)) * 800 GFlops = 18.6 GFlops.
So the turnaround time is 43 times longer than what it could be if it weren't for the wait in the queue!!! So the supercomputer is effectively turned into a much slower machine. Although that already looks bad, it's actually even worse: scientific research is often an iterative process.
For example, one has to wait several weeks (for the queue) and a day (for the run) to obtain a result. That result then needs to be studied, so that an appropriate next run can be decided. Then another several week wait before those results are available. When turnaround time is this long, some problems simply aren't solvable (even though they could be if turnaround time was about the same as compute time). So big science research is being harmed, not by Seaborg's hardware, but by the way you are allocating Seaborg's hardware.
Don't over-allocate! It seems like you think that "no idle processors" means "we're getting good efficiency". But it *really* means: "we have inadequate computing resources for the number of jobs we're accepting".

NERSC response: NERSC agrees that over allocation is not the right thing to do. Time has not been over allocated in 2006 and we did not pre-allocate time for the new Bassi Power5 system. NERSC is investigating with DOE options to under allocate resources in order to improve turnaround times while still meeting the DOE mission.

Capacity computing needs

Although this problem has been addressed somewhat in the last year or so, the batch queue system on Seaborg can make it difficult to run jobs for a long time on a modest number of nodes rather than in bursts of a few hours on larger numbers.

My work would be advanced more effectively if NERSC were oriented toward what is often termed 'capacity computing'. I most emphatically have no need for high-end systems that can apply thousands of CPUs to one process and break speed records. Very few projects will ever be awarded a large fraction of such a system, so money is being wasted to provide a capability that is not really used much of the time. A user with a large allocation of, say, 500000 hours can use it up in a year by running on 64 CPUs all the time (fewer CPUs on the newer machines). What is the point of emphasizing massively parallel capabilities if users can't afford to routinely run on huge numbers of processors? We should aim for high MFLOPs/dollar, not high total speed on one job.

Realize that , in general, the user knows best and uses the facilities in a way that maximizes the scientific output from his/her allocation. Don't put barriers which try to force a particular style of usage and which discourage other types of usage. For example, recognize that 100 1-node jobs are as good as 1 100-node job, if that enables the user to get his/her work done in a timely fashion.

NERSC response: NERSC hopes that the introduction of the 640-CPU Opteron cluster, Jacquard, and the 888-CPU Power5 system, Bassi, has helped to meet capacity computing needs.

More interactive/debug resources

... Interactive computing on Seaborg remains an issue that needs continued attention. Although it has greatly improved in the past year, I would appreciate yet more reliable availability.

As a user, I would like to see debug job available most of time. I think that making debug job fast and convenient is biggest save because it saves user's time (not computer time).

At this moment, I think the time limit (half an hour) for the interactive queue seems too short. It is not possible for debugging a large scientific program within only half an hour. I suggest this time limit can be extended to 2 hours. Or add another queue such as express for such kind of purpose. Thank you.

NERSC response: Increasing the time limit on the interactive/debug queues would increase their turnaround time. We think that 30 minutes is a good compromise. For longer debug runs, please consider using the premium class.

It would be useful if there was higher priority given to interactive multiprocessor jobs --- it can be very slow getting 4-8 processor runs executing. I find the poe stuff a pain in the ass in terms of having to specify the wait time between retries and the number of retries.

I think the most important issue are the batch queuing policies on seaborg. The decision that two production jobs can block even the debug queue slows down the program development, since production and development interfere.

Different queue policies

... The time limits on the queues should be larger, although I know it would affect turnaround. I would suggest skewing the processor allocation towards jobs which are a bit less parallelizable. Not by much, but in that way jobs wouldn't sit forever and then run 24 hs in 16 processors.

... * to have a mechanism of restoring jobs that crashed due to unexpected outage ...

I accept that when debugging small 1/2 processor jobs are essential, but I would like to see a restriction to stop multiple (ie > 10) being submitted swamping the batch queue.

Queues queues queues.
INCITE needs to be reformulated so that it does not disable every other user of the Center for months at a time.

NERSC might be able to improve by specializing in large-scale computations by the following points:
1) account charges for large-scale computations
2) job scheduling for large-scale computations
3) compute nodes for large-scale computations
1) & 2) are already tried and now installed at Seaborg. I think the people who are doing the large-scale computations at Seaborg have really benefited from these concerns. I hope NERSC to continue to try these adjustments and the people who run the large-scale jobs can take full advantage of NERSC computer resources.
As far as 3), some compute nodes can be reserved for large-scale computations. For example, 64-128 nodes might be enough for this purpose. Of course, if there is no use for large-scale computations, NERSC should arrange these nodes for the computations with small number of nodes to share.
Also it might be possible by combining 2) & 3) that NERSC can set the class charge priority for large-scale computations higher at some compute nodes.

  Provide more/new hardware; more computing resources:   17 responses

Expand the computational ability:
Reduce the waiting time;
Increase the maximum time for a running code.

Expand capabilities for biologists; add more computing facilities that don't emphasize the largest/fastest interconnect, to reduce queue times for people who want to runs lots of very loosely coupled jobs. More aggressively adapt to changes in the computing environment.

Our group would like to use a CRAY supercomputer again (I have used NERSC's CRAY SV1 several years ago).

Aside from increasing the amount of computational resources, I think NERSC is serving my needs.

We've been less active taking advantage of the computing facilities mainly due to the perceived complexity of getting the kinds of software/systems needed to do computational biology installed on the cluster. This is an historical perception, based on previous attempts to get software like blast running. I would like to try this again, as it makes more sense to use centralized clusters.

NERSC needs to expand the IBM-SP5 to 10000 processors to replace the IBM-SP3
Continue to test new machines, including the Cray products

NERSC needs to push to get more compute resources so that scientists can get adequate hours on the machine

6000 or more P5-processors to replace Seaborg's P3-processors!

The overloaded queues on Seaborg and other systems is an on-going problem. DOE should give more money so that NERSC can buy new hardware.

... Dump Seaborg and move to clusters..change from a supercomputer center to a cluster center. The total speed of a single computer (cycles/sec) is completely irrelevant. How many cycles/year can NERSC deliver is what you should shoot for

It would be great to increase the size of jacquard.

Obtain major increase in hardware available to the user.

NERSC response: NERSC has recently deployed two new computational platforms: the 640-CPU Opteron cluster, Jacquard, and the 888-CPU Power5 system, Bassi. In addition, the NERSC-5 procurement is underway, with the goal of providing a substantial increase in sustained performance over the existing NERSC systems. The increase is expected to be at least 3 to 4 times the computational power of Seaborg. NERSC-5 is expected to start arriving at NERSC during FY 07.

  Data Management / HPSS Issues:   8 responses

Disk space, especially scratch on Jacquard but also more generally, is often a restriction.

I hope NERSC can have a stronger data visualization platform and make all data accessible to all platforms without physically moving data around.

NERSC response: NERSC has deployed the NERSC Global Filesystem, a large shared filesystem that can be accessed from all of the compute platforms.

Disk storage (at PDSF) unfortunately remains a headache: diskservers are not very robust. This may change with GPFS. Also, cheaper local storage on compute nodes is at the moment of limited use with relatively frequent downtime of individual nodes and scheduling difficulties (or the impossibility to access data from other nodes using e.g. rootd).
HPSS is also not as robust as could be hoped for.

... Improve HPSS interface software (htar and hsi).

... - I would like to see more tapes on HPSS to help getting our data off of tapes. We now occasionally have to wait more than 12 hrs to stage data. I realize that there are never enough tape drives, but this would help us.

... File system quotas seem to favor small numbers of large files, rather than the reverse, which is occasionally a difficulty when one is trying to concurrently maintain and develop several versions of a code divided into hundreds of source files.

... Remove the distinction between permanent and scratch disk space. Many of us need large amounts of disk space on a long term basis.

  Software improvements:   5 responses

... * to have faster compilers (say, from Portland Group) ...

The lack of flexibility in supplying alternate compilers on Jacquard has rendered that machine useless to me and my group, even though in principle, it could be a highly useful resource.

Can work towards improving the scaling of Gaussian 03 with number of processors

The software support on seaborg and jacquard leaves room for improvements, also more software could be supported.

  Account issues:   5 responses

allow group accounts

... Allow production accounts for collaborations to use. ...

... Another request, which is for my collaboration, we could use a dedicated account for running our data production jobs. It would greatly simplify and improve our data production process and I believe our liaison has proposed a viable solution that is compatible with the DOE user account mandate.

- I would like to see the return of production accounts. We continue to have trouble with running large scale production by multiple individuals. We run into obvious problems like file-permissions. These can be solved by users being diligent about them when creating directories, but it doesn't solve the fact that some programs that do not honor the set umask when creating files - these have to all be set by hand at a later stage (they create files using 0644 mask instead of the more permissive 0666). Other more problematic issues are: job control for long running jobs (user may not be available and someone else should take control over the production) and ensuring a common environment to make sure that all processing is done in the same way (it turns out that many of the people running our production are also developers and may accidentally run the production using their developer environment, instead of the production env). Another issue that we continue to run against is the maximum number of simultaneous connections that HPSS allows. On a distributed computing system like PDSF, one needs the ability to open many simultaneous connections in order to extract the data from HPSS to the local machine. The HPSS group set a limit (I think 15) for the number of connections that any user can have, but are willing to increase that limit on an individual basis. We would like to see our production account to have this higher limit. It would ensure that normal users would not be able to abuse the HPSS system, yet allow our production to function properly.
Production account issues were discussed extensively in April/May and we had a viable solution that would fulfill the DOE mandate of needing to track users. (the solution involved having production accounts that are not accessible from outside of NERSC, the user would always have to log into a system at NERSC as himself and only then ssh/su to the production account, allowing full tracking of who-did-what-at-what time). I am disappointed that this system has not yet been implemented, it would solve quite a few problems/annoyances we have at the moment when running large scale production.

Password management is awkward.
A single pw should apply to all nersc hosts.
A pw change should propagate almost instantly to all nersc hosts.
Forced pw changes are a security vulnerability & wastes valuable time thinking of new ones.
NERSC web services should not depend on vulnerable cookie & javascript technology.

  Staffing issues:   5 responses

* Consulting needs to actually answer one's technical questions. They seem competent but highly understaffed. What's the point of a fancy computer if you can't run things!?!? ...

Increase staff for maintenance/support. There should be somebody skillful available all the time (even when the one person in charge is on vacation/in the weekend). [PDSF user]

My biggest request is please hire more PDSF specialists. The staff appear to me to be stretched very thin, and there are significant and noticeable downtimes for these machines that affect my work schedule almost weekly. I think the staff at PDSF are doing a fantastic job but there aren't enough of them and I know that there has been some big reorganization there lately. I think it would help everyone if more resources could be devoted to hiring or training new personal specifically for PDSF. ...

... - More staff at PDSF would help tremendously. The staff maintaining the cluster is very good and helpful, but appears to be stretched to the maximum at the moment.
- There appears to be somewhat of a separation between NERSC and PDSF. I would like to see PDSF treated as a full member of NERSC, just like HPSS and the super computers are. The combination of PDSF and HPSS is very powerful and it appears that not everyone at NERSC realizes this. This may of course just be a perception issue on my part, but I do not have the same perception with HPSS for instance. It could also be due to the fact that users of PDSF are in a much more narrow field than the general users of SEABORG and JACQUARD. The PDSF users are overwhelmingly in nuclear-, particle- or astro-physics, but these fields all have enormous computing requirements and this is something that NERSC does well - use it to your advantage!

Extend remote data mining *vis group*

  Allocations / charging issues:   4 responses

The most important thing for NERSC is a fair and equitable resources allocation. It should be based on what the researcher has accomplished in the prior year or years, not on some outlandish promises that never fulfilled and changes all the time. They may sound sexy or politically correct.

Not catering so much to special interests such as INCITE and paying more to much needed ER production computing needs

it seems like my allocations keep going down year after year, despite the fact that I always request the same amount of CRU's, namely the minimal allowed in the NIM application form. I think this is happening because I don't do parallel computations, so this is probably not considered "sexy" by the allocations committee. I challenge anybody in the committee, however, to prove to me that my NERSC project is not at the forefront of DOE's current HEP mission. I believe I deserve more generous allocations.

The only thing that I am not sure about is the charging system. I would probably not use the machine charge factor. I would have the same charge for all systems and if some resource becomes computationally inexpensive this system will have a larger queue that will eventually encourage people to use a different system, which is slower but has less of a queue. In any case, I would like to know why you choose one system instead of the other!

  Network improvements:   2 responses

Better network bandwidth to LBL.

increase network speed

  Web improvements:   2 responses

Better web pages. By this make it easer to selectively scroll through (Clicking) the information.

Better web documentation on using the systems would be useful.

  Other suggestions:   2 responses

... * to have an orientation workshop on basics of using NERSC after allocation announcements
* to have a shorter survey form :)

I sometimes feel that changes which are to be made could be announced with a little bit more time for users to make the changes that are required. [PDSF user]

  Don't change, no suggestions:   4 responses

I think you are doing very well.

no comments

--blank--

Looking forward to using the resources.

 

How does NERSC compare to other centers you have used?   51 responses

  NERSC is the best / overall NERSC is better / positive response:   26 responses

Excellent!. I (we) have used the ORNL facility and NSF supported centers. NERSC is the best in my opinion.

Better than ORNL for the standard user.

There is a distinct PROFESSIONALISM in the way NERSC conducts business. It should be a model for other centers

very good

One of the best known to me (comparing to RZG Garching and SDSC).

NERSC is very good--the best, or in the top 2 in my experience. Others I have used:
NCSA

PCS
SDSC
TACC
Various teragrid sites (above and others)
Several European facilities (including CERN, Swiss Center for Supercomputing, NIKHEF in the Netherlands).

NERSC is the most useful computing center I have used.

Much better than HRLN (Hanover, Germany). On a different plane than our local computational center OSCER.

Centers used:
Stanford BioX cluster

Stanford Linear Accelerator Center (SLAC) Computing Farm
Local group workstations
NERSC is more professional and far more resourceful compared to the above centers. The main draw back is obviously the need to apply for computer time and limit on available computer time. Generally, we fully support an expansion of NERSC's
activities and an increase in their total computational resources.

I think NERSC compares very favorably to other computer centers (eg CCS at ORNL). There seems to be greater man-power available at NERSC to resolve consulting issues and any machine problems.

I can compare NERSC to the RHIC Computing Facility (RCF) and the MIT-LNS computing facility. I would say that NERSC compares very favorably to RCF, with better equipment, better people, better setup. NERSC compares similarly to the computing facility at MIT, but the MIT center was much smaller in scale.

NERSC is the best.

At present, I am very satisfied with NERSC computer resources, comparing to the following centers:
RCNP SX-5, Japan
Titech GRID, Japan

It works ! (eg. Columbia at Ames still can't perform large data reads !)

NERSC > ORNL as far as available computing resources are concerned

NERSC training is good for remote users, the remote class on jacquard let me participate without traveling.

I haven't had previous experience with large off-site storage systems, but my overall impressions are that the storage system, in particular the management and accounting is very well thought through and had most of the features I expected.

NERSC compares favorably to other centers I have used, such as the newer computing facilities at ORNL. Hardware resources seem to be greater at NERSC, resulting in much shorter queues; and more stable, with considerably less downtime. NERSC also excels in user support, where other facilities can seem comparatively short-staffed.

NERSC is superior to other places when it comes to consulting and web services.

Very happy with NERSC - as compared to NAVO and ERDC. The documentation at NERSC is considerably better than from NAVO and ERSC.

I am comparing with ORNL, LLNL, ESC, and HPLS.
NERSC has more stable systems and better consulting service. Significantly more software is available on the NERSC machines and always up to date.

NERSC is the number one center to me: LANL, CSCS, CERN

NERSC has generally been more responsive to the user than other centres, although recent demands that the user use large numbers of processors for his/her job have moved it away from that model. NERSC has tended to be more reasonable with disk allocations, although I would prefer that all disk allocations were considered permanent.

I would rate NERSC substantially higher than TeraGrid in terms of information available (web documentation, on-line allocation process, etc) and easy of use. In terms of resources I would say that in my experience they are comparable.

NERSC is still head and shoulders above all other pretenders. In making this ranking, I am judging in terms of both systems and personnel. NERSC has the mos usable hardware capacity, system stability, and HPSS stability. Complementing that are a staff that is knowledgeable, professional and available.
One often undermentioned aspect of NERSC, which I especially appreciate, is the approachability of management at NERSC (both mid-level and senior). NERSC management seems much better tied in with both the needs of the scientific community and the "realities of the world" than I experience at other centers.
I have used countless computing centers in my career (including numerous NSF and DOD centers, industry-specific centers, as well as other DOE centers) and I make my comparisons based on this experience. However, for my current project, much of my comparison is based on experiences at ORNL/CCS, at which I have also done a fair bit of large-scale computing. It saddens me to say this, but the ORNL organization seems to have neither the vision nor the technical savvy to accomplish the mission to which they aspire.

I can only compare Jacquard with our local Bewolf cluster. Jacquard is both faster and more reliable than what I use locally so it has been a key component in my work of these last few months.

  NERSC is the same as / mixed response:   12 responses

NERSC is similar to PSC and SDSC.

On the whole a good center. As good as the now defunct Pittsburgh center, which was truly outstanding in my opinion. Miles ahead of SDSC which I could never get to use without hassle.

The NERSC staff are very knowledgeable and professional. NERSC suffers somewhat from its own success, I find many of the resources to be oversubscribed.

RCF @ BNL: NERSC is a much friendlier and more 'open' place. I got the impression that rcf was messier, but it looks like they improved quite a bit. Due to disk access problems at NERSC (LBL RNC disks) I move part of my production back to rcf. Disk access seems to be more stable there.

its equally good with other facilities

I also use the Ohio Supercomputing Center (OSC) and the National Center for Supercomputing Applications (NCSA). Each has its advantages and disadvantages.
Comparing to Seaborg:
OSC: Slightly higher performance; shorter queue waits; cannot run on 1000+ procs.; Consulting not quite as good.
NCSA: Much higher performance (NCSA's Cobalt machine runs about 3 or 4 times faster than Seaborg in my experience.); Similar queue wait times; cannot run on 1000+ procs; Not as good about keeping users informed on updates/changes.
In my fantasy world, I would have NERSC upgrade Seaborg to get performance comparable to NCSA's Cobalt and reduce queue wait times a bit. But, I'm still quite happy even if that never happens. NERSC sets the standard for reliability and user relations, and allows for users to easily run on 1000+ processors.

NERSC (PDSF) is very easy to contact, know what is going on and accessible for question remarks as to operation. This is more so than at other facilities (RCF, Cern computer center). Performance is comparable between these centers, although centralized disk storage is more robust at RCF and Cern. Also, Cern seems to have a superior tape system (although I have no recent experience with it).

I have some experience with the NIC in Juelich. Here, the archive file system is transparent for the user. One can access migrated files as if they were stored on a regular disk of the computer. This is convenient. In terms of computational resources, documentation and support NERSC certainly compares very well.

User service at NERSC is better than ORNL-NCCS, but we have had more "uncharged" cycles from ORNL-NCCS. Don't know how new NCCS program will work.

The NERSC hardware and software are more stable that those at NCCS. However, the turnaround time at NCCS is much faster once the job get started.

NCSA , ARL (DoD) mostly less waiting time in queue at DoD but they are weaker at system configuration and keeping the compilers and libraries updated

SDSC has faster processors and usually better wait times than Seaborg.
LLNL computers have faster processors, but the wait times and queue structure is erratic. Jacquard should address the problem with the wait time for smaller jobs, making it more advantageous to use NERSC.

  NERSC is less good / negative response:   7 responses

Compared to Fermilab and SLAC, NERSC is terrible with regard to collaborative computing; NERSC seems entirely oriented toward single user computing which is unrealistic for most of my needs. SLAC's use of AFS is very effective; I miss that at NERSC.

Computing resources at NERSC is still small compared to other centers like NCSA and SDSC.

The system is not so stable recently as RCF at BNL.

The computing resources at SLAC. It's not what they do, it's what they don't do:
They don't over-allocate their machines, so we can get jobs running very soon.
A computing result from seaborg is only really useful if it can be obtained in about the same amount of time as it takes seaborg to produce it.
It's much better to have idle processors than idle people.

It is falling behind SDSC in hardware and batch turnaround.
Interactive computing is much better at SDSC.

A simpler and clearer allocation process, such as at NCSA, would be useful.

LLNL has much more in the way of resources.

  No comparison made:   6 responses

Seaborg is my first exposure to massively parallelized computing resources--I have nothing to compare it with.

I have some experience with the CCS at Oak Ridge, but not so much that I could really compare the centers.

I have not used other to compare them to.

I use no other centers.

no comments

Recently I have started using LLNL's Thunder and their staff has provided excellent and friendly assistance.

NERSC provides superior hardware computing capabilities, and professional user support and consulting. These two areas is where I see NERSC core strengths, here NERSC offers resources that are not easily matched by any local cluster or computer farm set up.

Speed, data accessibility

There is a distinct PROFESSIONALISM in the way NERSC conducts business. It should be a model for other centers

NERSC is important because it combines state of the art HW/SW but most important the combination of state of the art HW/SW with excellent first class consulting/collaboration.