NERSCPowering Scientific Discovery for 50 Years

2004 User Survey Results

Comments about NERSC

What does NERSC do well?

[Read all 118 responses]

 

68 Provides access to powerful computing resources, parallel processing, enables science
53 Excellent support services, good staff
47 Reliable hardware, well managed center
30 Easy to use, well managed software, good user environment
26 Everything, general satisfaction
20 HPSS, disk space, data services
7 Documentation, NERSC web site
6 Allocations process, ERCAP, INCITE
2 Visualization
2 Training

What should NERSC do differently?

[Read all 94 responses]

 

45 Improve Seaborg turnaround time
37 Change Seaborg job scheduling policies - 25 users requested more support for midrange jobs
25 Provide more/new hardware; more computing resources
15 Improve the allocations process / ERCAP / INCITE
8 Other Seaborg improvements
7 PDSF improvements
4 Software improvements
3 More/better training
3 Network improvements
8 Other
3 Don't change, no suggestions

How does NERSC compare to other centers you have used?

[Read all 77 responses]

 

39 NERSC is the best / overall NERSC is better / positive response
20 NERSC is the same as / mixed response
7 NERSC is less good / negative response
11 No comparison made
 

What does NERSC do well?   118 responses

Note: individual responses often include several response categories, but in general appear only once (in the category that best represents the response). A few have been split across response categories (this is indicated by ...).

  Provides access to powerful computing resources, parallel processing, enables science:   68 responses

NERSC is of the utmost importance for our research in theoretical nuclear structure physics funded by DOE. While we can use our local workstations (LINUX, Mac-OSX) to develop and test our Fortran-95 codes, it is not possible to run production jobs on on our local machines. Without access to the NERSC supercomputers we would not be able to do our research! Thank you for providing supercomputer resources to us.

NERSC supplies a lot of FLOPs reliably, and provides very competent consultants. It is a good place to use parallel codes that scale well on the available machines.

NERSC does a truly outstanding job of supporting both a small community of "power" users as well as a large community of mid-range users. Both are important, and, as a result of NERSC's success in supporting both communities, the Center facilitates an extraordinary amount of scientific productivity.

NERSC provides the state-of-the-art computing environment for my scientific work. We are simulating ultrafast photo-induced electron transfer processes in the condensed phase and on nano-particles, which cannot be accomplished on a single workstation. NERSC really makes it possible for us to perform accurate quantum simulations for such processes.

NERSC helps me providing the computing power for RHIC related simulations. I appreciate very much the good resources and very timely response from the support team. [PDSF user]

NERSC is important to me because it provides a way to perform computer intensive tasks in a reasonable turn-around time. NERSC has a good staff and reliable computer environment.

NERSC provides a very powerful and stable computing environment. We have been pleased with the responsiveness of the staff on both technical and accounting matters. NERSC is important to us because our DOE work is dominantly computational and we are dependent on NERSC for the compute power to carry out our work.

I believe NERSC offers: (1)- excellent computer power; (2)- excellent queuing system; (3)- accounts that are not restricted to US citizens. I am myself not a US citizen, and I collaborate on few projects with "sensitive countries" foreign nationals. Without (1)-(3) these projects on basic research would be impossible to carry out.

Its important to me to have access to high performance machines, so performance information can be published and the work appears very current.

Facilities not available locally.

Generally satisfied with NERSC management. Appreciate that MICS puts too many constraints on NERSC. Applaud move to added clusters to improve capacity computing limit. NERSC is at least half my computing resource. Have found ORNL.CCS to be more responsive to my needs of late.

Many processors; ability to access large number of processors in a timely manner.

pdsf

I compute at NERSC because Seaborg is one of the few computers on which I can run the desired calculations. Generally NERSC has offers good consulting and good set of software capabilities.

Historically NERSC has done a good job of providing and maintaining supercomputing resources that meet the needs of the fusion computation community. NERSC is a critical resource to me - without it substantial amounts of my computational fusion research would not be possible.

NERSC (in the form of PDSF) has a good allocation of resources for the STAR experiment. I often use it in preference to the RHIC Computing Facility (RCF) at Brookhaven because there is more adequate disk space etc. [PDSF user]

NERSC has an overall good queueing system for debug jobs. We compute at NERSC primarily for the access to a larger number of processors than is available on our local parallel computing system, permitting the running of larger simulations than would otherwise be possible.

I compute at NERSC because of the large arrays of processors that I am able to use simultaneously.

The ability to run on a large number of processors is most important to me.

The processing ability at NERSC, parallel processing, Is very important to my project and is not available at OK State. That part works well and is providing me on what I need.

I use PDSF because without the parallel processing I could not analyze my data. I am very pleased with my ability to do work on the cluster.

The PDSF system at NERSC is powerful and convenient to use. Without this facility, my job would be much harder. [PDSF user]

Most of my parallel computing with NERSC has been for homework assignments and projects for the CS 267 (parallel computing) course at UCB. I also run serial benchmarks on NERSC machines, for performance comparisons between architectures for various sparse matrix operations. In the future, I plan to use NERSC computers for development and testing of parallel numerical algorithms.

NERSC provides a stable computer environment for large number of processors. It does that well. I realize there are many diverse demands for its resources, but my overall comment would be: during core working hours, high priority should be given to developer queues or else the inertia of the production runs (especially encouraged through rebates on > 1024 processor jobs ) swamps the machine. Without development there is a dwindling return for all our efforts.

I perform large-scale calculations that require a huge amount of floating-point operations and a lot of storage, both of which are provided by NERSC.

NERSC is very important for my science projects. The class of problems I solve with NERSC can be done only using supercomputers.

When the queue waiting time was shorter earlier this year (see below), NERSC was an invaluable resource for my research on cosmological simulations. Some research groups in Universities can pool together resources to start a Beowolf type of cluster with 10s of processors, but the support staff, subsequent hardware upgrades, and space/cooling have often proven difficult. NERSC provides centralized hardware as well as an excellent support system for individual PIs to focus on the scientific and computing issues.

The great attraction that supercomputer facilities like NERSC have is their powerful computing resources that make computing faster. It seems that NERSC keeps improving on that by adding more nodes etc.

The ATLAS experiment make use of PDSF and it is an important component of our overall distributed grid based access to computing and storage resources. [PDSF user]

NERSC do well in providing and maintaining a resourceful, reliable, and high-end supercomputing environment. That is why I compute at NERSC and why NERSC is important to me.

We have accounts at NERSC because our applications partners have accounts at NERSC. We are not heavy cycle users on our own.

 

  Excellent support services, good staff:   53 responses

Consulting service is extremely good. The consultants are very knowledgeable and make NERSC a superior computing center. The system administrators are excellent as well since the computers are always up and running. If there is a problem, it is solved very quickly. Of course, the computing power of the NERSC computers is also a major reason why I like to run there.

The staff is always available and very responsive, and the computing facilities are some of the best available anywhere.

I have always been extremely happy with NERSC consulting and support. This allows me to more efficiently use NERSC's hardware and it makes NERSC a tremendous asset even during times when the hardware is showing its age.

I am just beginning work that puts me on NERSC for the first time after several years' hiatus, so I limited my replies in this survey. You can see from those replies what I am pleased with [high scores for consulting support]. I am computing on Seaborg for the parallel processing capability. I have not yet needed more than the 30 minute queue. I expect to be trying more services in the future.

I am mostly satisfied with turn over of my jobs at NERSC. But of late, the jobs wait for a long time in the queue. The consulting service and the web site of NERSC is really wonderful.

The consulting is doing a great job to help us to get our work quickly. The seaborg is very robust, not like other systems which has so many crashed.

good consultant services; I compute at NERSC because I have no other place to compute

I really like the ability to use (and develop) my large parallel applications. The NERSC staff has always been very helpful and competent.

NERSC is very supportive of its users. This year is special because we are involved in INCITE, but nevertheless, being involved in the project made us discover all the resources that NERSC has available for its users. I compute at NERSC because my boss gets time at NERSC :) But whenever (if ever and hopefully) I am able to obtain allocations for my own projects, I will choose NERSC for sure. [INCITE user]

Other than this issue [bad turnaround], I have been happy over the years with NERSC's responsiveness to users. Accounts are created quickly; phone assistance with passwords or other issues are always friendly & helpful.

NERSC provides support and true CS people very well. [PDSF user]

PDSF support is wonderful! Both interactive and batch PDSF nodes are crucial to my work. [PDSF user]

Excellent user support (web+consulting staff)

NERSC consultants are great!

See collective previous open ended comments. [Things DON'T fall through the cracks. Good followup, and pro-activity. As usual, NERSC is the most proactive site in producing training. And most of it is applicable to other sites, too. Maybe this is why they pale in comparison.]

Responsiveness of consultants and opportunity for requesting large awards. [INCITE user]

Staff and support.

I was very satisfied with very kind help from David Skinner and Francesca Verdier. Their help was very important to me. I was also very satisfied with NERSC consulting - people there did great job. Thanks very much for the help from NERSC.

I am very satisfied with the consultant services and visualization. I have found very useful the storage services. [INCITE user]

The support provided by NERSC has been exceptional this year. Kudos to David Skinner and Christina.

NERSC offer outstanding service. The hardware uptime and the reliability of the production environment is unmatched.

PDSF provides outstanding user support. The staff are extremely responsive to user requests and questions. They both care about users and do a good job of helping us. [PDSF user]

The consulting help is outstanding

  Reliable hardware, well managed center:   47 responses

seaborg is still the best managed and run machine that I know. Uptime and availability are stellar. The hardware starts to show its age, though, and an upgrade to faster processors would be welcome. The fact that seaborg just works is extremely important, it is so reliable and useable. Over the last years, NERSC has provided a large fraction of my total computer time usage and has made large scale model calculation possible.

Excellent facility and support team.

NERSC provides a highly professional, highly reliable computing experience with access to significant amounts of computing power. The NERSC staff are well informed, helpful and pleasant to deal with.

NERSC provides a particularly stable and reliable service, and is relatively free of quirks. This is a big help when developing code.

Seaborg is a rock-solid, dependable, fast platform. It has far fewer outages and system problems than other supercomputer platforms I've used. The consultants are generally more knowledgeable and more available than on other systems. Seaborg just works, in ways many other supercomputers don't.

The IPM SP is a fast, reliable computer and the support is very good.

NERSC machines are almost always available
Consulting staff is great
Large number of nodes allows large memory applications

Seaborg has been very useful for me as it is stable compared with LINUX clusters and its uptime has been very good except for the past month.

I appreciate the large-scale, robust, reliable, stable computing resources provided by NERSC.

Excellent computing resources and consulting service.

Our DOE projects makes it possible to use facilities at NERSC. The super-stability is what I am most pleased with, so it is very important to us.

It is good for large-scale problems. The availability of computer is very good but the turnaround time can be somewhat slow. Also seaborg seems to be an aging machine compared to other institutions.

you do a good job keeping the production machines running and a problem which i had once with my account password was resolved very quickly.

Maintain a reliable computing platform with sufficient capacity for the user base. My group often uses seaborg just for its solid Fortran compilers, complete with (mostly) working bounds checking, and debugging environment.

The management and maintenance of computers are doing well. Many of my research problems rely completely on NERSC computers. These research problems require large memory.

 

  Easy to use, well managed software, good user environment:   30 responses

We have DOE supports grants and find that applying for time and using time at NERSC is better and easier than at other facilities.

Most pleased with the efficient ease of use and sensible software configuration and queue structure.

Seaborg has (until the last few months) been the most useful high performance system that my group has used. Its combination of power and ease of use has been great.

Capacity & capability computing. Common computing environment for large international collaborations.

NERSC is invaluable for running the large-scale simulations that are needed for modeling intense ion beams. For me, the best aspect of the system is that most of the complexity remains invisible. The only irritation is that jobs sometimes wait a week or more in the queue.

Machines are stable and good for longer simulations.
Many software resources are available especially for profiling/debugging.
All information is accessible on the web.
It is the most user friendly machine.

I use NERSC for my work on the EUSO project, which at this stage involves software development within a ROOT framework. NERSC provides a nicely setup cluster (PDSF) with all the necessary compilers etc... that have enabled me to download the CERN ROOT package and all the software from my EUSO collaborators and compile it without any problems. The usual selection of editors is also available so I have an environment that I am comfortable with. [PDSF user]

Seaborg is relatively reliable, and shows little down time. High availability and high performance, they are the most critical advantages.

The software compilers are well maintained. Mass storage is very well done. The web site is well organized and looks good. The consultants are very good.

I compute at NERSC because
(1) Seaborg is still a good machine for parallel computations. (This is changing; see my comment on it's getting old, above.)
(2) Interactive and debugging runs are far easier to do here than at SDSC/NPACI
(3) Software resources and User environment is far superior to NPACI.
(4) Consultants are great.
At this stage, if I were NERSC, I would start moving some resources from consulting to procurement and eventually testing of a new machine.

I am most satisfied with NERSC's software environment and the availability of the hardware suitable for large-scale computation. NERSC is important to me because some of our computer simulations have to be finished using a larger computing resource, such as SEABORG.

I find the allocation and queuing procedures at NERSC to be eminently logical. Support for standard libraries is reliable and thorough.

We compute at NERSC because of the large-scale computer facilities and resources easily accessible and easy to use.

  HPSS, disk space, data services:   20 responses

data storage resources

NERSC handles large data sets easily. We use NERSC because of its large processor farms and storage. [PDSF user]

Well, I use NERSC/PDSF because our (KamLAND's) data is stored there. :) NERSC does a good job storing the huge amounts of data we generate in an easily accessible manner. [PDSF user]

PDSF and HPSS (which is all I use) [PDSF user]

Has a large amount of computing power and a huge secure data storage capacity. Especially this later point is of upmost importance to my current computing needs.

NERSC have tended to respond to users' concerns and tried to create an environment to satisfy the users' needs. Prior to the last couple of years NERSC appeared to give creating a good production environment for ALL their users top priority. NERSC used to be a centre devoted to scientific computing not to computer science. Of course, NERSC is currently the only general purpose DOE supercomputer centre. NERSC's HPSS is reliable enough, and we feel certain enough about continued access that we use it as the central repository for our multi-system multi-site projects.

Fast I/O, access to > 100 processors.

I do 99% of my computations at NERSC, mostly because of large storage available to keep the runs output and the software support at NERSC. I really like NERSC web documentations and queue pages, and NIM.

Big data is easier to deal with a NERSC than the other sites I deal with.

  Everything, general satisfaction:   26 responses

NERSC has excellent people working there. I'm VERY happy with everyone I've come across. People have been knowledgeable and professional. I compute at NERSC because it's really big. Seriously, the number of processors allows us to do research on problems that we simply cannot do anywhere else. In that regard I consider NERSC a national treasure. One really silly request, how about a NERSC T-Shirt! I'd buy one.

Overall, the services and hardware reliability are excellent. I think that NERSC sets the standard in this regard.

NERSC is doing great. The uptime is fantastic and the system configuration is superb. I compute at NERSC because of it's great reliability. Whenever there is a lot to compute in a certain time I use NERSC. Also I never experienced any problems while compiling code or running jobs. So I'm greatly satisfied.

I am a very satisfied customer - no complaints here.

I strongly appreciate (1) the well-maintained and well-organized hardware and software, particularly on seaborg; (2) the size of the available resources (e.g., seaborg and HPSS), and (3) the presence of dedicated, knowledgeable consultants. These three factors make it possible for me to do science.

The quality of the hardware available at NERSC is extremely good, the possibility to scale to a very large number of processors is also a big point in favor, as it is the large availability of software, especially of parallel libraries like PARPACK and SUPERLU_DIST, very important for me. The quality of the service (like the help desk and the account management) is extremely good. The NIM interface is simply great, this year for the first time I've contributed to the ERCAP proposal for my group and I found it superb in this respect.

I want to praise NERSC staff. Seaborg is by far the best maintained machine I have run on. The support is great, and I can always count on it. I wish other supercomputing centers followed your business model. [INCITE user]

I am very happy with the overall level of service, it is excellent.

Overall everything is fine (except for some hick-ups at pdsf). I especially like the very timely response of the NERSC staff to problems/questions. It almost feels like being a part of a big family. Keep on with this good work for us! [PDSF user]

NERSC is very reliable, very well managed. I don't have to worry about whether a job will run there, only about when. There is a good amount of space in home and scratch, though I keep needing more scratch space, which nersc grants me from time to time. The mass storage is good, the rate of transfer from and to mass storage, and the ease of transfer are other plus points. I do most of my work at nersc, and I am very reluctant to work on other machines. I guess that's the highest praise I can give you guys!

1. high performance, reliable computing resource for large applications;
2. the large data set storage: long (HPSS) and short (scratch disc) term;
3. 24/7 accessibility;
4. handy debugging, profiling and visualization software.

I compute at NERSC because:
1) it's easy to transfer files to/from NERSC (unlike RCF at BNL)
2) almost no problems with disk space
3) my jobs move through the queue reasonably fast
4) very friendly, helpful staff [PDSF user]

excellent !

NERSC has been, and continues to be, the best-run HPC center that I have used (and I have used numerous sites over my career). NERSC's management strategy results in a very stable usable facility that optimizes the amount of science that can be accomplished. I appreciate that they do not attempt to be at the "bleeding edge" with respect to production hardware and software. Although there is a place for such systems (even within my own research program), productive science requires stable productive systems. NERSC is excellent at providing this need.

NERSC offers a very professional services in three areas: Hardware, Software and Consulting! I am most please BECAUSE all of the three components are provided by NERSC which is absolutely necessary at the forefront of HPC-Science!

My research needs intensive computing, that's why NERSC is important to me. I have an overall good impression of NERSC. One can use as much as 2,000 processors or even more at one time and finish jobs very quickly. The changes of allocations are reasonable. And people are doing hard work to improve the performance of SEABORG.

Runs a quality computing service. PDSF along with the research staff developing more advanced grid & data management services are a valuable resource for us. [PDSF user]

The service from NERSC is great, which provides convenient and reliable computing resources for our work.

I am very pleased with HPSS and PDSF - I think that both facilities are extremely valuable and useful. HPSS covers our data storage needs and PDSF provides a great facility for doing our data analysis. The fact that both facilities are housed at NERSC and can be used together make them more than the sum of the parts. [PDSF user]

Overall, I am very satisfied with the support and computing environment at NERSC. While the limited scalability of our code really hurts us on IBM P3 platforms, the support, HPSS and large numbers of processors allows us to get a great deal of research done at NERSC. NERSC has been very responsive to our requests for rapid batch queue access to complete our runs on time and consulting support to increase the numbers of processors that we can apply to the code.

NERSC is world-class state of the art supercomputing facility unmatched anywhere in the world, and is doing a superb job of meeting the challenges in solving Computational problems in diverse area of science. I would like to congratule wholeheartedly NERSC for running such a facility with utmost of efficiency and professional competence. Congratulations and thanks to all at NERSC,especially Horst Simon, Francesca Verdier and David Turner.

NERSC is very well suited to performing computational science through the combination of first rate, hardware, software, mass storage, support and training and should be a model for other sites to follow.

I am most pleased with the reliability of NERSC, the hardware and software work as described on their website. Important information can easily be found. Problems are dealt with quickly. [PDSF user]

Seaborg machine is great. Even though it is pretty dense and hard to have a spot, this is the only machine where I can run: that long on that much nodes that are dedicated. So that's fine with me. Change nothing!

I am a new user and so far my impression is that NERSC seems to do everything very well except possibly queueing of jobs requiring less than 32 nodes. I am impressed by the available hardware, software, support etc.

Good overall service; good codes available.

  Documentation, NERSC web site:   7 responses

do well: documentation, support, overall maintenance and management.

  Allocations process, ERCAP, INCITE:   6 responses

Large allocation allowing to address particular timely and important science problem; INCITE is a truly great idea! [INCITE user]

 

What should NERSC do differently? (How can NERSC improve?)   94 responses

  Improve Seaborg turnaround time:   45 responses

Change the batch queues so that ordinary jobs execute in days, not weeks.

What I am somewhat dissatisfied with is the batch queue time, which is very long at times, ...

The queue wait times have been extremely long (about 2 weeks recently), and this has almost completely stalled my research. ...

I think seaborg is pretty saturated. Sometimes it takes long time to get batched jobs started. This has forced me to find other sources of computing power.

(1) Batch queue wait times-- these have become horrible in the past half-year. ...

As mentioned earlier, the time spent sitting in queues is creeping up. I would appreciate seeing this problem addressed. ...

1. To change a bit the queue policy so that one job won't wait too long; ...

Change the queue structure to eliminate two week waits. I'm not sure this is possible but I'm writing it anyway.

Work on shortening queues on seaborg.

The queue becomes too crowded recently. We understand that INCITE runs are important, but their priority should not be too different from other regular jobs. We also hope that the new machine would solve part of the problem.

For my needs I would prefer less priority on 512+ processor job requests so more users could user seaborg simultaneously, reducing the long queue times. It's difficult to do cutting edge research when one has to wait for a week or more for each job to run.

there have been too long queuing times that essentially counteract the attraction of supercomputing resources. ...

The waiting time for the batch jobs is way too long. Sometimes, the submitted jobs have been idling for more than 10 days, which just defeats the purpose of the supercomputing. Other than that, I am very satisfied. Thank you.

It would great if queueing could be improved to wait times of week or preferably less.

Faster queues.

The turnaround time for batch jobs could be shortened.

Some resources should be devoted to offering cycles with minimal wait time. Wait times in seaborg queue are far too long.

 

  Change Seaborg job scheduling policies:   37 responses

The current focus only on jobs which can exhibit high degrees of parallelism is, in my opinion obviously, misguided. Some problems of great scientific interest do not naturally scale to thousands of processors.

NERSC should return to its original mission of providing the production environment which allowed the scientists to maximize their research. That is NERSC should give satisfying the user priority over satisfying the DOE and the OMB.

Pay attention to what its users need - resist as best as possible calls from "above" from those who know little about actual scientific research. Provide resources that fit the profile of the jobs your users actually want to run (which I would guess peaks somewhere around 64-128 procs if artificial pressure is not applied). Do not reward users for wasting computational resources by running using very large numbers of procs, even when their codes scale significantly less than perfectly (and yes, this is essentially waste, because any real research involves multiple code runs, so 2 512 proc runs will generally be better than 1 1024 proc run unless the code scales perfectly to 1024 - but your policies encourage users to run higher than they should in order to be able to run at all, wasting already oversubscribed CPU hours).

... Also, the job priority system discriminates against smaller jobs (less than 32 nodes) - i.e. MAJORITY of users!

the allocation process was very distorted this year by the requirement that most of the computing be done with Ncpu>1000. This is terrible for almost all users. NERSC should do everything it can to get sensible operating requirements - even if management would like to have most users run huge jobs - if they are going to truly serve the needs of the users. In the current situation the users are under severe pressure to serve the needs of the computer center.

For the last 24 years NERSC has been the place where "I could get things done". With the initiation of the INCITE program that changed. The machine was effectively taken over by the 3 INCITE groups and work at NERSC stopped. After the upgrade my large calculations no longer run at NERSC and I had to move those computations to a p690 in Hannover, Germany.

The current queue structure on Seaborg encourages large jobs, but is less efficient for two reasons: (1) there are more idle processors waiting for a big job to start, and (2) it encourages people to submit jobs on more processors than their code can efficiently use.

Better management of the resources. I know that DOE wants to see very large jobs running on Seaborg and also support "big splash" projects, but unfortunately, it has prevented the vast majority of users from running jobs in a timely fashion in the past few months. This is not NERSC's fault. However, I think that the recipients of the INCITE awards should spread their computer usage over the whole year instead of doing all the runs during the last months of the program and thus preventing everybody else from running.

Given the amount of computer time that I am allocated, I cannot make use of the large number of processors on Seaborg. Unless everyone is allocated enough time to make use of hundreds of processors, NERSC should give more consideration to providing resources for smaller codes.

I have experienced a significant increase in the queue waiting time (often days) in reg_1 on seaborg recently, which seems to be correlated with the discount policy on large-processor jobs and the increased number of jobs in the reg_32 queue. Some of my colleagues at UC Berkeley also voiced similar frustrations, and a few have started to look for other computing resources. As much as we would like to use reg_32, the central issue is some scientific problems simply do not seem to scale well beyond 64 or 128 processors. The large number of users waiting in the seaborg reg_1 indicates that I am not in the minority. The wait time was much more reasonable earlier this year, so I would like to see NERSC modify the current queue priority and reduce the wait time.

... it would be good to state more precisely the queueing policies for jobs of the same category based on the number of processors and wall clock time. ...

NERSC should preserve a fraction of its resources for medium size parallel computing. State-of-the-art scientific research often times needs up to a 100 CPUs per job. Encouraging people to submit larger and larger jobs (thousands of CPUs/job) puts this distinct class of projects (in need of medium range parallel computing) in a very difficult position, as their jobs wait in the queue for a very extensive period of time, waiting for either these super large jobs to be picked up or to be done.

I already described the problems with the queue structure, so I won't repeat them. That's by far my biggest complaint. [The queue structure is so ludicrously biased toward large jobs that it is sometimes impossible to use one's time with a code that is optimum at 128-256 processors. That limit is set by the physics of the problem I'm solving, and no amount of algorithmic tinkering or optimization is going to change it much. NERSC gave my research group time in response to our ERCAP request, but to actually use the time we won, we wind up having to pay extra to use the express queue. Otherwise we would spend a week or more waiting in the regular queue each time we need to restart and job, and we'd never actually be able to use the time we were granted. I understand that NERSC wants to encourage large jobs, but the current queue structure guarantees that anyone who can't scale to 1000 processors is going to have to use the premium queue to get anything done.]

... The other thing that NERSC needs to do is allow a mix of small and large jobs to get onto Seaborg with relatively little waiting time in the queues, as used to be the case until the last few months.

Improve queue for medium sized jobs (128 proc+)/large sized jobs -- it seems that a few large jobs run and many small jobs (~64 proc or less) fit in the holes but these jobs are so small that they should really e run elsewhere on small cluster type machines or interactively so that the seaborg nodes are mainly reserved for larger jobs.

do not just emphasize high-performance computing -- it would be nice for me to feel like my work is welcome by NERSC even if it does not involve massively parallel programs

My only problem is the heavy emphasis on extreme levels of parallelism. Our problem sizes are often just not large enough to suit that. But I understand that is one of the reasons for a facility like this, so you need to emphasize it.

Sort out the queues & adopt a consistent priority policy to support the kind of computing that can only be done at a facility like NERSC. This summer has been a nightmare.

... 2) Reduce/remove the disparity between charge factors for reg_32 and reg_1 queue. I am very dissatisfied with the one-half charge for jobs in reg_32 queue. My jobs are waiting for a very long time. I wonder if this policy indeed results in optimal use of resources anyway - reg_32 and reg_128 queues already had a higher priority than the reg_1 and reg_1l queues, so if users could run their jobs as efficiently in the bigger queues, they would already have been doing it. (Maybe the average wait time was longer for reg_32 queue, but I see no reason for the wait time to improve with the implementation of the new policy, so that is not the factor encouraging users.) As far as I understand, usually if it takes 2t wall-clock time with n processors, using 2n processors wall-clock time is greater than t, unless there is a cache problem. So most efficient use of resources is to use the least number of processors on which you can fit your problem. I suppose with the uniform charge factor, that is why there were fewer reg_32 and reg_128 jobs. With uniform charge factor, I would guess people would only use the bigger queues if they need to, so that their jobs would finish in the allowed wall-clock time. In my opinion, this is how it should be.
Now with the new policy, for 16<=n<32, using 2n processors is a better option even if the code doesn't gain a second in wall-clock time as opposed to using n processors! reg_32 queue has a higher priority, too. A user might even use half the tasks per node with 2n processors; the usual charge penalty is simply compensated for by the new policy. That does not make sense. Moreover, I suppose the loadleveler has a much harder task with all these big jobs, and I am curious if the nodes are waiting longer in between jobs.

NERSC should move to supply capacity computing (cycles per year) rather than capability (cycles/sec). Should move from supercomputer center to a "cluster farm center" to supply cycles per year at greater cost benefit. Supercomputers of the parallel type are very costly in a wide time share environment. Their use should be limited to large jobs and underloaded to preserve good turn-around: when the [run time] /[turn around time] is < than 0.5, the supercomputer is running at half nominal speed.
The folly of time shared supercomputers is this: if properly loaded one is likely to wait twice as long to get half the machine as compared to a quarter and the turn-around time is likely to be faster using a quarter of the machine. If the optimal job for Seaborg is 512ps (OMB rule 50% jobs over 512ps), it is being used as 12 512ps clusters....but the unused connectivity is costly.

... Change queue management in summer, when it seems lots of summer interns put extra charge on the system. Stop queuing privileges for INCITE programs (I agree with giving priority to IAEA and IPCC).

Job queuing on Seaborg: jobs requiring 128 nodes or more should be allowed to have priority over those requiring 32 nodes or less.

different queues for big jobs, medium jobs and small jobs (# cpus). This may cut down wait times, by making sure peoples jobs are competing with like-jobs.

... Better queues.

... Also, I hope that one user can only have two jobs running in queue so that no user has to wait for a few days to get one job run.

... One of the causes [of long queue waits] is that some users run multiple huge jobs using more than 128 nodes with greater priorities. I mostly run jobs using less than 10 nodes, but queuing times can reach as long as 3 weeks. I think that NERSC can set a new policy to minimize queuing time for small jobs.

The batch Queue scheduling should be improved.

I would like to see NERSC offer more for medium-scale jobs (ones using < 128) processors. ...

The code I use (Guassian) is not well tuned on Seaborg. Queues are geared to massively parallel codes. Guassian can use only 12 procs efficiently. Turnaround time is bad; max wall time is too small.

... 3) Judging by a biased look at the jobs in the seaborg queue at any given time, NERSC caters too much to folks that can do their work on desktop machines or small clusters, particularly given the low price of amazingly powerful small-scale systems. Those needing the larger facility wait in a very long line with these projects...it takes days or weeks to run a job on seaborg unless you happen to have a very specific kind of job---namely the sort that scales well to a large number of processors so that you can get into the "special" big queue. But I repeat that the science done by those codes is not necessarily better than us doing work down at the lowly 512 processor level, and at the 512 level we sit in line with the desktop folks running scaling studies for ERCAP instead of doing real science. ...

The batch queues are configured as to make supercomputing facilities almost unusable, except for the privileged few. It takes 2-3 weeks to run a regular (nowadays even priority) job. It is impossible to debug a code, or do physics research, in this environment. You are on the verge of becoming irrelevant with regard to scientific computing.

I am most interested in rapid turnover of jobs requiring 1-20 SEABORG nodes (16-320 processors), with jobs mostly requiring 5-24 hours of wall clock time. I sometimes require significantly more time than this, but if the turn around time is fairly fast, I can break it down in sections. Until about 6 months ago, the turn-around was fine - occasional delays, but mostly could submit jobs and have them begin within 24 hours. Lately this has not been the case.

... A standby queue would be helpful to enable some activity to continue on such [exhausted] accounts. ...

We hope the max wallclock could be increased.

My only additional expectation is that the high-performance computation will be allowed to run longer time - 24 hours for the presence. ...

... max wall time is too small.

 

  Provide more/new hardware; more computing resources:   25 responses

NERSC has done an outstanding job of serving the community. In order for this to continue, NERSC needs continued support from the DOE for its staff and the services they provide, and NERSC needs support for a new high end system to replace seaborg.

... I would especially like to see NERSC maintain hardware with cutting-edge single-cpu performance.

Perhaps NERSC should upgrade its Seaborg processors.

The SP3 processors on Seaborg are very slow, compared to currently available processors. This limits how much cosmologists like my group can do. A more powerful system would allow us to solve problems that cannot be tackled by U.S. users (although our European and Japanese competitors already have much more computing time on their dedicated systems, and they are attacking these important problems). ...

Get some vector computers again or hybrid parallel/vector machines

I welcome your announcement that you plan to provide an additional supercomputer with fewer nodes. I believe this is very important because not all computational problems can effectively use hundreds of processors. It would be nice to have access to CRAY supercomputers again at NERSC.

NERSC needs to move on from the IBM-SP3

The usual: More processors and greater throughput.

Faster hardware, better turnaround times ... the usual requests of users.

Get a larger computer :)

The computer code I use becomes more complex from day to day to use the best physics you can. However this increases the computing time. The great management and support at NERSC combined with new hardware would be an irresistible package.

Even more processors.

Keep adding nodes.

See collective previous open ended comments. [Need the new computer system sooner. Need access to cluster and vector technology in order to keep stagnant models up to date.]

NERSC needs a big upgrade to its hardware to keep up-to-date with current computing technology.

I wish they've kept PVP computers longer. It was very frustrating to port all codes to Seaborg from Crays, and find out that single processor at Seaborg is much slower, than it used to be at Crays. I've been working at NERSC for ~6 years, and by now I've had to port my codes to a new computer at least 5 times (!). I guess, going for a new machine is good, but may be you should keep the older ones longer? ...

Larger computer ...

NERSC response: In early calendar year 2005 NERSC will deploy a new Linux cluster with 640 dual 2.2 Ghz Opteron CPUs available for computations. The target workload for the cluster is jobs that do not scale well on Seaborg.

  Improve the allocations process / ERCAP / INCITE:   15 responses

Create a pdf file from allocation request for our records.

NERSC response: This has already been implemented. At the bottom of the ERCAP request lists page is a button Show selected requests in PDF format. The 2006 ERCAP request form will use this PDF format for the "Show Complete Request" tab.

... Similarly, I've described my problems with the ERCAP proposal process. I feel it gives short-shrift to science, and focuses on code optimization to the exclusion of scientific returns. [The ERCAP allocation process is not very good. At other supercomputer centers, an allocation request is written like a scientific proposal. It includes certain required topics, but the proposers are free to write a proposal that makes sense in the context of their code and their problem. NERSC's proposal form is too much of a one-size-fits-all solution. ...]

NERSC response: DOE does not conduct a science review of the ERCAP request forms. This is because they have already conducted a science review for the DOE grant request. DOE does ask NERSC to conduct a computational review of the requests. See Types of Awards and Review Process.

I would like to see a little bit more forgiveness for the projects like INCITE and anticipation that most allocation will likely be used near the allocation period. This follows from the fact that scientific application software (code) is constantly being developed and the science result is a function not only of the raw idea, computing resources, but includes "latest the greatest" element of scientific software development. For that reason, I foresee many INCITE projects being slow at the beginning and consuming most of their resources before end of November. Extending data storage allocation might be very important to allow for careful data analysis and maximizing scientific impact of the project. In essence, INCITE projects are long-term and long-lasting even if allocation is nominally for one year. [INCITE PI]

I believe a mistake was made in the allocation of resources to the INCITE teams. It was my understanding from the proposals that these groups should have been able to run immediately on Seaborg. Since the majority of them didn't, and they were not docked accordingly at the end of each quarter, we are now in the position of watching them try to burn almost all of their time in the last quarter of the year. This now gives them override on more than a third of the remaining time causing huge backups for everyone else. If they had run smoothly from the start of the award, or were docked when they didn't use the time, we wouldn't be in this situation.
I do believe in the INCITE process and think this is an excellent program, I just have a problem with the implementation.

NERSC response: It is important that the INCITE projects began using their time shortly after their award. As is stated in the call for proposals, Applicant codes must be demonstrably ready to run in a massively parallel manner on Seaborg. In 2005 NERSC will work closely with the INCITE awardees to help them start computing as early as possible.

... Also, the allocations process was not clearly explained, and we consequently lost some of our computer time because we did not use enough of it by a certain date (and the reason we could not use it all was because of the long queue wait times). In addition, our allocated hours were/are constantly being changed (added and subtracted) without any notification given to us.

NERSC response: We apologize for the confusion and hope that the 2005 award letters more clearly states this policy: Repositories (both MPP and HPSS) that haven't used significant amounts of time (or SRUs) each quarter are adjusted by transferring a part of the unused balance to the corresponding DOE Office reserve. See: Allocation Management.

The allocation of one of my accounts (mp169) was exhausted in June. A standby queue would be helpful to enable some activity to continue on such accounts. Alternatively, a better redistribution system would enable some additional allocation to be available when an account is exhausted with six months remaining in the operating year. Wise allocation management is of course a user responsibility, however, in some years circumstances result in shortages whereas in other years surpluses occur.

Given the amount of computer time that I am allocated, I cannot make use of the large number of processors on Seaborg. ...

When account allocations expire, some transition period should follow to allow users to process data. Large projects generate massive amounts of data, and a typical user has no storage resources at home to transfer this data to.

NERSC response: NERSC policy is that when an account expires the user has one month on Seaborg (or on the PDSF) in limited status (cannot submit batch jobs). During that time they can do data cleanup and they have full access to HPSS. For the next 5 months they have no access to Seaborg (or other computational systems) but they do have read/only access to HPSS. After that 6 months their access to HPSS is terminated but their files remain (indefinitely for now, but in the future there may be a policy on how long files will be retained).

I think the INCITE program was ill conceived. Betting that the performance of a tiny subset of the scientific community will payoff enormously better than the community as a whole seems to me like trying to time the stock market. It may work once, but the opportunity costs are enormous.

The NERSC allocation for FY 2004 seemed to be a mess; this may not be NERSC's fault.

Quicker allocation process for short period for emergency purpose such as data recovery.

... Also, reward all time in one simple, direct process, using application forms that do not require an enormous amount of time to fill out. Treat all users equally. Avoid special targeted initiatives at all costs - these both waste user time by requiring them to fill out multiple applications, and justifiably anger users because they make the machines very difficult to use for "regular" users.

Improve the allocation process to allow a better planning of computational resources. Many projects were put on hold since not enough resources were allocated at the start of the fiscal year.

... and larger allocation of the computation time.

1) In the ERCAP process, the focus on GFlops and scalability is just plain stupid. These numbers are not a measure of a code's ability to do science. I can put a huge useless do-loop in my code and really get the flop count up high. In fact since it wont have communication, I can look pretty good on scalability too. That loop's gonna make my real work come out the end a lot slower, but who cares because I'll get a big allocation with my flop count and scalability. These statistics are only good because they provide a simple set of numbers that management types can hold onto. Worse though, near the ERCAP submit time, how many jobs are stacked in the queues of seaborg just to get these scaling numbers for the ERCAP submission (trust me, you can see these jobs, clear as day). This compounds the problem of people rushing in at the end of the year to use their time. I think NERSC should stop focusing on flop counts and raw scalability and find another way to measure scientific relevance.
2) The ERCAP review process is silly. The reviewers are not qualified to comment on the scientific appropriateness of the numerical methods in a code, since every code is application-specific. The projects are funded by DOE, and therefore automatically qualify for time at NERSC. Exactly how are the reviews used in the allocation process? It seems that the review process is a way to make NERSC appear to doing due diligence in spreading out the hours in a scientifically justifiable way, but too little information is given to under-qualified reviewers, and it is unclear how their reviews are even used, if at all.
... 4) Lose the INCITE thing. Again, why are applications that scale well held in an exalted status? Certainly I can lean as much from 20 512-processor runs of linear-algebra-bound code than I could with a few 4096 runs of a well-scaling, but algorithmically inefficient explicit algorithm that requires orders of magnitude more time-steps or grid points. But the INCITE jobs are sucking away time along with the scaling studies and desktop-size runs. Also, looking at last year's project, it is unclear to me that the INCITE jobs have accomplished anything spectacular.

  Other Seaborg improvements:   8 responses

... A problem that I bring up every year is the quality of interactive service. Although this has improved since the last survey, the lack of ability to do small debugging runs interactively (at least with any reliability) is a problem. Would it not be possible to set aside a few nodes that could run with IP protocol (rather than US), in order to create a pool of processors where users could simultaneously run in parallel?

Better interactive access on seaborg for debugging and development jobs. Perhaps a few more dedicated nodes for these sorts of tasks.

Improve interactivity. Why do small jobs have to be so difficult and problematical to run at times?

... (3) Seaborg needs more large-memory nodes. 1GB per processor isn't enough for many of our larger jobs. ...

Improve Seaborg's processors. ...

... 2. To fix the I/O problems so that we always have all the nodes available to us; ...

Access to the home directory source code even when the Seaborg is down.

The new OS has really been a bummer. I'm going to have to spend a lot of time trying to figure out why our performance is so degraded. This is unfortunate.

  PDSF improvements:   7 responses

PDSF could have a few more interactive machines. Sometimes they're fairly heavily loaded.

The PDSF interactive nodes are rather sluggish; I am no longer able to link software in a reasonable amount of time on these nodes any more.

More responsive interactive nodes on PDSF!! I can't stress this enough.
Maybe a bigger user disk quota (500MB can sometimes be frustrating during analysis jobs or when testing new software codes).

Improve I/O problems.

NERSC response: The PDSF support team has made it possible to run interactively on the batch nodes (there is a FAQ that documents these procedures). They also recently purchased replacement login nodes that are being tested now and should go into production in December 2004. They are top of the line opterons with twice as much memory as the old nodes.

more disk space (always good), increase the number of PDSF login nodes.

I am quite worried about the performance of the disk vaults at PDSF. It seems that the combination of very large disks (>2.5 TB per node) and NFS does not scale very well to many jobs. I know that this problem is actively being addressed at PDSF, but it is my single complaint about the current situation.

NERSC response: The PDSF support team has added about 20 Terabytes additional disk space. As to disk performance, there is unfortunately no technology at this point that the PDSF could afford to deploy.

Consider more whether using 'experimental' systems on trial is always appropriate - i.e. determine a better balance between increased likelihood of failure and free or better resource

  Software improvements:   4 responses

... Also the xlf fortran compiler, which makes somewhat difficult sometimes to port codes from other platforms (I'm porting mostly from tru64 and intel compilers). Also the debugging tool is not so easy to use and a somewhat more extended documentation on it would be welcome.

NERSC could improve their C++/HPC services to the users.

... 3. To upgrade ESSL if possible.

Please fix AVS5.6 on seaborg.

  More/better training:   3 responses

I should be informed if there is a course or tutorial class at least one month ahead of time by email no matter it's at NERSC or LLNL, so I can plan to go there to use grid.

Offer more video lectures for remote users.

Better educations as to what resources are available, what they are for and how to use them.

NERSC response: In 2004 NERSC organized 20 user training lectures in 7 separate events. All were presented via the Access Grid and were captured as streaming videos (using Real Media streaming) so that users can replay them at any time. These lectures have been added to the tutorials page for "one stop" access to training materials. See NERSC Tutorials, How-To's, and Lectures.

  Network improvements:   3 responses

... (4) Faster network connectivity to the outside world. I realize that this may well be out of your hands, but it is a minor impediment to our daily usage.

As I mentioned earlier, some improvements in the network access would be nice, but I do not know if the problems I am seeing are anything to do with NERSC/PDSF or if they originate in the network local to my office. [PDSF user]

More storage space and faster access.

  Other suggestions:   8 responses

1) Reduce/remove hpss charge for TRANSFERING files to and from mass storage. The charge for making transfers to and from mass storage does not make sense to me. It sounds too harsh, most of our hpss allocation is exhausted by the transfers rather than actual storage. We have huge files, huge amounts of data. We can not keep it in scratch. Sometimes we need to analyze another aspect of the data, and then need to retrieve it. Unfortunately, we are charged every time we access it. ...

... (2) More resources dedicated to consulting: I think the consultants are great and I'm extremely appreciative of their time and thought; but difficult problems seem to take too long to resolve, or are resolved only incompletely-- and it's really only those difficult problems for which we seek NERSC's expert advice in the first place. Perhaps more consulting resources would improve follow-up, allow consultants to spend more time on user problems, etc. ...

NERSC should have project liaisons for each major project to help the very competent technical staff understand the priorities and important of the scientific work. [PDSF user]

Get more involved in Open Science Grid

To be more user friendly. The complexity of having to run a computer center with a very diverse public makes difficult to concentrate in creating easy user interfaces that would reach the full potential of the computer center.

One password for all access. ...

To make security measures less boring.

1) It seems that there is still room for improvements in the area of data analysis and visualization.
2) I'd like to run some jobs that would use few TB of disk space. There could be a disk array for these types of jobs, where files would be removed after 1 month or so to avoid filling-up the disk.

 

  Don't change, no suggestions:   3 responses

No suggestions at this time.

Keep on doing the superb job !

No major complaints. (I've only been using NERSC irregularly for the past two years, so I haven't had time to develop very specific complaints.)

 

How does NERSC compare to other centers you have used?   77 responses

  NERSC is the best / overall NERSC is better / positive response:   39 responses

My experience with supercomputing centers (no longer existing) has been spotty at best - user support was often lacking and certainly slow. Thus, I have really appreciated using NERSC (and especially Seaborg and its predecessor) over the years.

NERSC stands head and shoulders above all other centers I have used. (I am comparing it to ORNL, NCSA, PSC, SDSC, NASA (Ames and Goddard), and various defense and commercial sites.)

For me NERSC is the center which sets the standard in the combination of services and hardware (well even if the SP3 is somewhat outdated). Centers I compare are: CSCS, ETH and LANL.

Eagle supercomputer at Oak Ridge National Laboratory
NERSC's more varied queueing system and classes provides greater flexibility than does the ORNL system. For example, some large scale debugging is feasible at NERSC but not at ORNL.

NERSC is the user's heaven compared to RCF at BNL

NERSC is the best computing facility I have ever used as compared to hardware reliable, software availability, consulting helps, .... Anyway, I choose to run on seaborg and wait for other computing facility to be mature.

In terms of availability, usability and overall quality, NERSC is unbeatable. I'm using SDSC (very little these days, NERSC is so much better that even SDSCs faster hardware cannot keep up with it) and HLRN (Hoechstleistungsrechenzentrum Nord in Hannover and Berlin), the HLRN has much more configuration, availability and usability issues than NERSC (that's a Pwr4 with Fed. switch). Overall, NERSC is the best computer center I've ever used.

NCAR/NCSA
NERSC is MUCH easier to apply for time to. Also, we can run for a much longer time, NCAR has a 6 hour limit.

I would rank NERSC among the best of all the major compute centers.

NERSC is much better than a computer center at NASA/GSFC. The latter is less powerful and is down very often.

Much better than BNL RCF, which I have stopped using.

Outstanding. My baseline for comparison is Rechenzentrum Garching, Germany, and San Diego Supercomputer Center.

I use sometimes Oakridge computers Cheetah and Eagle. They are also great but the machines are much smaller than seaborg. So for big jobs, NERSC is the place to go!

Compares very well

I've done a lot of computing at NCAR on their IBM SP systems. I feel the NERSC staff has been more responsive to requests for help. Good job.

NERSC is far superior. I worked on computers at ORNL, ARSC, and Earth Simulator Center in the past few years.

Our group has used NERSC supercomputers for about 10 years. Prior to that, we used supercomputers at NCSA (Urbana, Illinois) funded by NSF. In our experience, NERSC is better than NCSA!

Oh, nersc is by far the best.
Compared to sdx (local UK machine), jlab cluster, PSC.

Superb service! Other centers (SDSC, PSC, big LANL and LLNL machines) do not come even close to NERSC's level.

I have used the supercomputing facilities at ORNL and LANL.
NERSC rates highly compared to these; I would however like to see more interactive access if possible.

I also compute at the HLRN (HoechstLeistungsRechenzentrumNord). I don't want to flatter NERSC, however HLRN is no match against NERSC. The downtime of SEBORG for instance is 1day (scheduled) against what the HLRN had a downtime of about 2months in 2004. So the HLRN does nothing you should start doing.

Probably better than RCF (fewer problems).

Better, more smoothly run that BNL/RCF.
Much more accessible than PNNL EMSL.

Compared to BNL the PDSF system is more user friendly and more reliable.

Very well. The support at PDSF is superlative. I am comparing to CERN (a few years back) and MIT Physics and Stanford

I have also attempted to use a cluster at LSU, but found it unusable due to extremely poor administration, software, batch queuing, etc. PDSF is like heaven in comparison. I have to admit that I have not used the LSU cluster in the last 6 months, however.

I think that NERSC stands head and shoulders above some other centers I have used. I have used most extensively the RHIC Computing Facility besides NERSC. RCF has improved quite a bit in recent years, but I still think that the NERSC personnel is far superior. Our technical problems on both HPSS and PDSF were always addressed rapidly and it was made clear what and why there were certain technical limits.

I've also used the Citris IA-64 and Millennium x86 clusters at UCB. NERSC administers its computing services much more professionally and with much greater attention to the needs of its users. As a result, it is much easier to get maximal effectiveness out of NERSC equipment, even though some of that equipment is a couple years behind the technological edge.

I also compute at NCAR and ORNL. Given the very different support requirements and operating constraints of these different sites, NERSC is very competitive and is an essential part of our research effort.

NERSC is probably the most user friendly site I use. As I said, the people are knowledgeable, friendly, helpful and proactive. The hardware is reliable and productive. I appreciate Jim Craw's and the consultants proactive and voluntary keeping us informed about situations.
ORNL, NCAR

NERSC is much better managed than NSF centers at UCSD and PSC.

NERSC has tended to provide more of our computing capacity with a more user friendly environment than other centres. They have been effective at bringing new systems into production faster than other centers. The only centre that has been more user friendly than NERSC is the LCRC here at Argonne. However, they serve a much smaller group of users. They also do not provide the archival capabilities of NERSC. We are also comparing NERSC to NPACI, which appears to be less well organized than NERSC. In addition the NSF tends to move us between sites and machines from year to year, which we find annoying. Finally we are comparing NERSC to NCSA. The ongoing problems they have had with their Tungsten Linux cluster makes us doubt their competency in running such a centre. Note that none of these other centers allow remote access to their archival systems.

Off all the centers I have used NERSC is by far the best. The list of centers I have experience at and which I am comparing to includes ORNL CCS, NCSA, SDSC, NASA Ames, NASA Goddard, PSC, and LLNL.

I've used the RHIC computing facility at Brookhaven. NERSC (particularly PDSF) are far, far better in every respect: reliability, uptime, and particularly in user support. NERSC is also far more effective at keeping intruders out, without burdening their users.

Best I have ever used. In the past I have also used centers at LLNL and LANL.

I am not currently using other centers. I used to use LLNL, NERSC is significantly better to use, mainly because of the user support.

I use LLNL computers and I believe that:
1) the queuing system is much better at NERSC.
2) the possibility to debug large jobs does not exist at LLNL. We have found bugs in our code that only show-up for large numbers of CPU's (>128). Being able to debug these jobs was crucial for success in couple projects.

I've tried running at NAS (at NASA Ames) recently, and your shop is run much better, from reliability to access to documentation.

NERSC computing resource is much more stable, compared to the Grid Parallel Computing system at Tokyo Institute of Technology, Japan.

  NERSC is the same as / mixed response:   20 responses

3 years ago NERSC was by far the best center I had access to. I have the feeling that other supercomputing centers have upgraded their computers faster than NERSC. For instance, I had expected that Seaborg would have been upgraded to an SP4 already, since when seaborg was put together it was among the very first SP3 to become available for scientific computing in the US.

NERSC has a higher level of reliability. It has less "spinning disk" which puts it at a disadvantage. Compared to NCSA, UCSD.

I have used the San Diego supercomputer center and the open computing facility at LLNL. In comparison, NERSC's systems are more stable, reliable, and better run. However, the queue structure is much worse. At LLNL, the queuing system is not so biased to large jobs that smaller ones can't get through. At San Diego, they are experimenting with different queueing algorithms to try to make it possible for smaller and larger jobs to co-exist. The consultants there have tinkered with the queues manually to ensure that it is possible for people to use the time they have been allocated. NERSC should start doing something similar -- if not manual tinkering, then at least experimenting with a queue structure that makes it possible to be a small-node user without having to wait a week in order to get 24 hours of computing.

Compared to NCAR, I think that NERSC has a more reliable system. However, NCAR has significantly faster IBM Power4 processors.

See answer to first question above. [Seaborg has (until the last few months) been the most useful high performance system that my group has used. Its combination of power and ease of use has been great.] Other centers that we have used recently include NCSA and Pittsburgh in the U.S., and several supercomputers in Germany.

Other than the INCITE fiasco it is the best I have run at.

Other than NERSC I use only jazz cluster in ANL. jazz is worse than NERSC in almost every respect. I guess the only thing that is better on jazz is that its processors (Pentium IV Xeon) are much faster than Power 3.

I have used LLNL's MCR, ASCII White and ASCII Blue computers. What I like of those systems is that the allocations are not time-based, but fair-use based. Maybe a fair-use policy could be implemented at some levels at NERSC. Other than that, both centers, LLNL's and LBL's are excellent.

Well NERSC is still better than SDSC, but post-upgrade the machine is in a bad state and is useful only for a small class of my calculations. I have collaborators clamoring for results I can't deliver because I can't run the calculations.

I have only used the Ohio Supercomputing Center's clusters. They offer smaller amounts of resources but have generally been easier to start jobs on in a shorter period of time.

I have use the Ohio Supercomputing Center. Their batch time is less, but they cannot provide as many processors.

Consulting is as good as NCAR, not quite as good as ORNL.
Seaborg is way overloaded. I rarely encounter queues more than a day long at either NCAR or ORNL.
Seaborg is up and available more than computers at either NCAR or ORNL, ditto for NERSC HPSS.

There are a number of differences between NPACI (Datastar) and NERSC. Interactive queues and number of processors are better on Seaborg. However, the batch wait time tends to be a lot longer. The processor speeds are comparable.

It compares well to PPPL and IPP-Garching and San Diego Supercomputing center. Small jobs run very well at these other centers, but not so well on SEABORG. I get not enough resources available for these small jobs at NERSC, but not at these other centers.

The old days of preferential treatment for large MPP runs on SEABORG was wonderful (for us!). There are somewhat long turnaround times for small processor runs.

NERSC does better than San Diego in the quality of people who assist the user but San Diego gets high marks for the speed of the machine (Power 4!) and the turnaround time. NERSC does better for large problems (S.D. tops out around 1000 procs).

- CINES Montpellier France
- CEA Grenoble France
About the same quality, they are good and you are good. Your web site is much better than the one of CINES which gives you superiority for access information, etc. Also I think you can do better, they are really worse. regarding CEA I do not even think there is a web site to get information. I can also compare to CERFACS (France), but let say that the exploitation of the machines is a less professional.

Overall, NERSC do better than the ORNL center that I am also using. But the ORNL center is more friendly to serial jobs and small parallel jobs. Also the ORNL consulting staff sometimes provide better answers to my questions.

Again, SDSC/NPACI has a new machine, Datastar, that in my experience is 2-3 times faster on my problems than Seaborg. Unfortunately, their user environment makes it difficult to do much more than run large batch jobs there. Perhaps I'm not in the loop, but the fact that I've heard nothing of a replacement for Seaborg makes me unsure how useful NERSC will continue to be.

The support is top notch, the file systems and overall reliability are second to none. The allocation process is silly. The queue structure unfairly favors jobs using a large number of processors. The INCITE system is counterproductive. There are way too many small users.

  NERSC is less good / negative response:   7 responses

Everyone is migrating to smaller clusters that have better turn around. Even if the peak performance is worse, it is easier to do physics in such an environment.

I have also worked on NCSC(North Carolina Supercomputing Center---now defunct), ORNL(eagle), LCRC(Argonne national lab), NCSA (tungsten), and CACR (mulder---now defunct). The queue wait time on every other supercomputer center has been reasonable. This was not true for NERSC. To me the fairest queue system is one that applies unilaterally to everyone (and not assigning different priorities so people can manipulate the system to their own advantage---all of us would like to have the highest priority) . The fairest system I worked with was one that had a wall clock limit of 12 hours, and a limit of 2 queued jobs by any one person at any time. And this applied to everyone. This prevented a few people from shutting everyone else out of the computer.

I have been getting better service from CCS.ORNL. I think it might be that they have a bigger range of computing size. Note that before CHEETAH got the new federated switch, the node-node communication was so poor that most people just used 1 node=32ps...CHEETAH was like a "cluster farm"....after the new switch allowed muti-node use it became a supercomputer....became overloaded and it's usefulness was degraded by long wait times.
ANY QUESTIONS: email: waltz@fusion.gat.com

Comparing to other centers I have used, at NERSC the max wallclock is not enough for our calculation. We use 16 processors and there needs about 2-3 days for one job.

Long wait times make it difficult to get timely results and lowers my overall productivity. NERSC is therefore not as good as other facilities where the wait times are shorter.

The San Diego Supercomputer Center.
Their turnaround times for batch jobs are shorter.

NCSA, PSC. Queues dedicated to codes like Gaussian which are poorly parallelized.

  No comparison made:   11 responses

The only other center I use is Livermore Computing, which is not comparable in scale.

The only other centers I have used have local users and are much smaller than NERSC; there is really no comparison.

NERSC is the only computing center that I have been using.

Out side of parallel PC at Oklahoma State University I have nothing to compare to. The resources at OSU were not adequate to do the research I am doing at NERSC.

Homegrown beowulf computers, and PSC.

The other centers I have used are miniature compared to NERSC and really no comparison would be meaningful.

LLNL, NCAR, ANL

I had used the Illinois Supercomputing Center before but am not a current user.

Upgrade the grid gatekeeper frontends to PDSF. It is fairly easy to saturate them.

I haven't used other centers.

Center for Scientific Computing, Goethe University, Frankfurt/Main, Germany