Comments about NERSC
What does NERSC do well?
In their comments:
- 103 users mentioned computational resources or HPC resources for science; of which
- 25 sited large number of cores, high end, scaling or parallel resources
- 14 sited Franklin
- 7 sited the PDSF
- 7 sited architectural variety
- 3 sited Bassi
- 3 sited Jacquard
- 59 mentioned good consulting, staff support and communications;
- 24 mentioned good software support or an easy to use environment;
- 17 queue management or job turnaround;
- 15 data services (HPSS, large disk space, purge policy, NGF, data analysis);
- 14 stability and reliability;
- 9 good networking, access and security
- 7 good documentation and web services;
- 6 allocations and account management;
Their responses have been grouped for display as follows:
- Provides access to multiple HPC resources / is overall a good center
- Hardware and services are good
- Does well at scaling and job management / provides diverse machines
- Provides reliable computing services
- Enables science
- Provides good machines and cycles
- Good support services and staff
- Good software / easy to use environment
- Good data storage environment
- Other comments
What should NERSC do differently?
|18:||Change job scheduling / resource allocation policies|
|16:||Provide more computing resources / new hardware / different acquisition strategy|
|14:||Allocations and Charging issues|
|13:||Improve queue turnaround times|
|13:||More or Better Services|
|8:||Data Storage and Disk Space Issues|
|8:||No suggestions / Satisfied|
How does NERSC compare to other centers you have used?
|61:||NERSC is the best / overall NERSC is better / positive response|
|25:||NERSC is the same as / mixed response|
|11:||NERSC is less good / negative response|
|7:||No comparison made|
What does NERSC do well? 150 responses
- Provides access to multiple HPC resources / is overall a good center
NERSC provides excellent computing services that my DOE-BES project needs.
I use NERSC because it is reliable, the cluster and software is well maintained and the consulting services are incredible. This all contributes to me getting my work done effectively.
NERSC is generally excellent, and has both leadership computing power and good ease of use, increasing productivity. This is most of all because the staff are very responsive to user needs and are effective in making leadership class machines work well for user applications. Additionally, the queue structure is clear and usable, the networking is very good, and the storage resources are adequate for large jobs. The analytics and visualization programs and associated software support are very important.
I use NERSC because of the following reasons:
1). Our groups codes compile and run well on these systems.
2). The queues are really reasonable, and I like the opportunity to put jobs at 'low' priority.
3). I like the fact that scratch is large enough to run jobs and it isn't cleaned so that I don't have to mess with backing up all my work continually. One of the Teragrid machines (Lonestar) is really annoying to work on because of this (the fact that their backup system is unreliable doesn't make it any better).
Excellent computational resources. State-of-the art supercomputers with well-maintained software.
NERSC is an important part of the scientific computing resources for the US. Consequently, our hpc performance tools need to be applicable on NERSC platforms.
NERSC in general does things well.
I have been very pleased with the resources and staff at NERSC. I use the computers at NERSC because I am at a new university with few computer facilities and very little support staff. I've found that it's been very easy to use the resources at NERSC and that the queues for most job types are very short. Also, I've found that the staff quickly addressed any questions or software needs I had. Also, I've found the online queue viewer very useful.
I am very satisfied with NERSC in every respect.
The overall support for scientific computing is excellent. This is very important as NERSC machines are ones of those in the country that can be used for solving my scientific problems.
NERSC does a good job with the supercomputers, and I use Bassi because it is fast, works well with the NCAR atmospheric models, and has shorter queue times than the NCAR computers.
NERSC provides continuously active and very effective computational research environment. Thank you.
NERSC is a wonderful supercomputing center. I really like it.
NERSC is doing good job overall
NERSC is a model for the high performance computing centers
NERSC is doing an excellent job; and overall I am very satisfied with all resources it offers.
NERSC has in the past been the best run community computation facility on the planet. Basically it still is, but the INCITE awards kill my ability to productively use NERSC. I can't compute at NERSC until the INCITE awards run out of time.
Top notch computer center, but current situation with Franklin is hurting it badly and needs to be addressed quickly.
all covered in the survey
NERSC manages the resources for High performance computing very well.
NERSC online tutorials are often out-dated.
NERSC is important as the only major open HPC facility in the Country for scientific computing, large scale data analysis and for HPC algorithm development, experimentation and testing.
NERSC has succeeded to keep an open, and yet secure, computing environment at PDSF with access to large storage resources. This, together with the cost effectiveness of the solution, are the reasons that PDSF is of key importance to me.
- NERSC's hardware and services are good
NERSC has excellent, large-scale computing resources that we need for our application. The hardware and the "liveware" are both fantastic.
Capacity and capability computing.
Well-balanced systems (compute/communicate/IO).
Good range of systems.
Responsiveness to user community needs.
1. Availability of great computational resources. Franklin is a fantastic machine, if a bit fragile
2. The classes and tutorials are NERSC are very useful for newcomers to supercomputing.
3. The website is great, and blows away any of the other supercomputing websites that I have used (or tried to use).
NERSC seems to be doing everything well. I compute there because of the resources available, the friendliness of the staff and consultants and their willingness to help when a problem arises. NERSC is a national resource for computational science and a model of how a user facility should be run.
The NERSC facility is fantastic. I'm very pleased with the hardware available, the people, the help, and the queues.
Its very stable. Its faster, good compilers. Nice support structure -consulting, very secure.
NERSC provides sufficient hand holding to cope with our less experienced users while still providing sufficient computational resources to deal with our more demanding calculations.
NERSC provides an excellent resource for DOE researchers. They provide access to cutting edge supercomputers, virtually any software that a researcher desires and has strong user support services.
NERSC does very well at providing production cycles very efficiently. NERSC mainframes are well supported and easy to use from the field. The support services are superior. NERSC has the right mix of mainframes for my computing needs.
Most importantly, NERSC has top-of-the-line computing facilities that, in general, work. Additionally, NERSC has good staff who make an effort to work with the users in a professional manner.
NERSC provides state-of-the-art parallel platforms. The production runs would not be possible without using NERSC resources. The user support is excellent.
I can perform computational work there I can do in very few other places. The user support is usually very good. Software and compilers are adequate for my needs
NERSC provides advanced computing with excellent interaction with users. I have been using NERSC since its inception, starting with John Killeen. I would be lost without it.
The fortran packages are the only problem I have had. My codes were originally built on CRAY, and the transition has not always been ideal.
It's a well managed system and it has very good support.
Overall, my experience with NERSC has been great. The consultants are helpful and respond quickly to questions. The processing speed on Jacquard is adequate to my needs.
The clusters are really good. Well documented usage and info. easily found on their website.
Have received requested time with acceptable levels of prior justification.
franklin is a very nice machine and is consistent in running my jobs.
consulting is very responsive.
NERSC is important as the only reliable computing resource we have. The consulting team is very good. NERSC environment is pleasant.
Systems are well maintained, user support is good and knowledgeable.
Excellent helpline, excellent access to machines, excellent hardware (Franklin has been a bit too much down)
Except for Franklin, all the machines work all the time. And when they don't, the consultants are always there for support.
I'm a relatively new supercomputer user, using one program (GYRO) developed by other individuals. I've only used Franklin in any serious quantity, and have been generally happy with its performance and availability. The live user support for passwords and technical help has been excellent, effective and very pleasant to interactive with.
The diversity of systems available to the user as well as the information available on - line (batch queue policies, software library information, etc.) are the main aspects of NERSC which we are most pleased with. Furthermore, actions such as the refund of time due to extended Franklin downtime shows how seriously NERSC handles its users.
The resources and technical consulting services are very good. NERSC provides a lot of the computing time required for our large production runs.
I use primarily Franklin, and when up, am very satisfied. When needed, the support has been very helpful and timely.
Despite some difficulties with keeping Franklin on-line consistently, computational resources at NERSC (both hardware and software) are superior and easy to use.
Simulations for STAR are done at NERSC, and all analyses are performed at PDSF. I am very pleased with the response times of trouble tickets.
HPSS storage makes NERSC (PDSF, specifically) a good bang-for-the-buck. PDSFs user support flexibility to adapt to users' special needs is good.
Support staff is very prompt and extremely helpful. The computational resources are excellent and well-maintained.
NERSC provides resources we do not have at our home institution. It offers excellent consulting and services.
- NERSC does well at scaling and job management / provides diverse machines
Flexibility--I can get my jobs done. Few other resources can accommodate the requirements of the jobs I run at NERSC.
What NERSC is best at is the combination of large-scale computing facilities with more flexible queuing policies than in other comparable facilities. Also the existence of "small-scale supercomputers" (Jacquard) is very useful to make tests.
NERSC is excellent. Franklin is a great resource - lots of cores. The waiting of queues for large core runs is very nice. (Obviously there is a wait time for 16384 core run for 36 hours :) )
1. NERSC offers a variety of computing resources (high memory, many nodes, shared memory visualization nodes etc.)- which is extremely useful
2. Tools required for Cosmic Microwave Background (CMB) research are properly installed and configured on some of the machines
Easy to access, different choice to queue. I need to use NERSC to fulfill our DOE project. 90% of my job is done in NERSC.
NERSC has a diverse collection of machines and gives out time readily to medium-scale users in a very fair manner.
The availability of large processor counts with relatively short wait times for scaling studies is the primary reason I use the NERSC systems. The friendliness and availability of the support staff in diagnosing problems make the systems a pleasure to use.
NERSC is the best place for me to do very large jobs due to scalability and processor count on Franklin. NERSC machines are easier to log into than some others.
NERSC is unique in the number and competence of staff it provides to keep its systems running smoothly and to solve problems for its users. Additionally, by providing systems with large numbers of processors without strongly incentivizing jobs that use a majority of these processors, it makes possible routine debugging and production runs, with good throughput, of jobs of intermediate sizes (hundreds to a few thousand processors) that are too large to be run effectively on local clusters but too small to qualify for INCITE-like programs that demand scaling to tens of thousands of processors or more.
Provides large clusters necessary for large computing projects.
Large number of processors for parallel jobs are very useful.
A place like NERSC is the only way one can run SMP jobs.
I mainly use Franklin for solid state electronic structure calculations. The scaling is very good in comparison to other systems and the software is very stable. The large number of nodes enables simulations of system sizes which were not previously possible.
The reason I compute at NERSC is the feasibility of the large-scale computation. As regards this point, I am very satisfied with the current situation of the computing facilities at NERSC.
The availability of cpu time without long queue waits on Franklin is the main factor for us. The fact that the Franklin environment meets all our needs allows us to take advantage of that.
great queuing, lots of nodes
The availability of large scale parallel computing with enough resources to have the job done.
it provides large scale computing resources, with a fair batch queue structure.
Parallel computation is essential for doing the type of calculation we are doing. NERSC supplies exactly that.
we work on performance tools and application studies. NERSC provides large-scale systems on which our tools must run and cycles for code studies.
Testing the parallel codes and can test large number of nodes in one go. This help us to know how much scale up we do in our simulation codes. Also NERSC help us to make sure that our code has cross-platform compatibility.
- NERSC provides reliable computing services
NERSC provides high performance computing with good access and reliability.
NERSC is an absolute life saver. NERSC provides a reliable, well maintained, frequently upgraded platform so that I can do my work instead of spending my scant research time on system management and maintenance.
Very good service received so far. I am pleased with the strong stability, high efficiency and easy-to-use of NERSC. These are very important for users who need not to spend a lot of time in those jobs.
I use NERSC because it offers high-end computing resources that perform reliably and are supported well. Our research group requires large computing resources and NERSC provides a large fraction of what we need without any problems.
Provides a resource for storage and computing facility while being reliable and easy to use.
- NERSC enables science
NERSC provides computing resources to accomplish by BES science goals.
NERSC is an essential research for successful completion of my DOE (HEP and NP) projects.
Big computer to run molecular dynamics simulations from first principles on large systems (100-300) atoms for simulation time of the order of some ps.
NERSC provides for all my high performance computational needs. It is crucial to my own research activity in nuclear theory.
I am very much satisfied with the fast machines which are necessary to carry out huge catalyst related calculations, especially transition-state search.
I compute at NERSC because our resources are insufficient for the amount of computational chemistry that I want to / need to do.
NERSC offers me the opportunity to try projects that I could not do elsewhere. It has allowed me to remain competitive as an atmospheric scientist and pursue grant proposals that I wouldn't were it not for the NERSC compute capabilities.
I can not get access to the state-of the art world class facility like NERSC anywhere in the world, and this supercomputer facility is sine quo non for my research in chemistry of superheavy elements . I perform one of the most gargantuan ( if not the most gargantuan) relativistic and non-relativistic all-electron Hartree-Fock and Dirac-Fock all- virtual spinor space coupled-cluster calculations for molecules of superheavy elements, which need almost unlimited cpu hrs and huge disk space. This is the raison-d'etre for my use of NERSC facility.
Serves best the broader user community of DOE researchers - the low-end to moderate-end user.
NERSC is very important to our group. We are able to run CCSM only on a small number of clusters.
- Provides good machines and cycles
NERSC provides access to fast, powerful computers for calculations.
I'm pleased with computational capability of Franklin.
keep up the good job and buy more machines. bassi can be extended for example;
Franklin is a great machine.
I have been very happy using Franklin. About the only thing that would make me more happy is having more capacity. NERSC is so important because it has allowed us to get a lot of calculations done in a reasonable period of time.
jacquard is an excellent cluster and is very well supported
I use NERSC as one of several places to do high-performance computing. The hardware is very good--the performance of our codes on NERSC systems is excellent.
I am very satisfied with the NERSC facilities, in particular for the jobs requiring significant memory for computations.
Computing power of the PDSF cluster is very helpful to the amount of calculations required by our research
The new machines are fast and generally stable. The interactive queues are relatively fast.
The computing source is very good.
The calculation is very fast, which makes my work more efficient.
NERSC delivers me computing power like no other computing center I have access to. Without this resource the work our group is doing could not be done in this extent.
NERSC is an important resource for my project, providing a large chunk of our computational needs. However, the XT3/XT4 does not seem ready to serve in a production environment. The MPI especially requires too much tinkering and tuning to be part of a simple usable system.
pdsf is all I use. Very satisfied, except old hardware keep failing.
Franklin is a great machine! We could get incredible results thanks to its power.
Normally I can get results quickly.
Computers are superb.
Allocation times are generous (although we always need more)
NERSC is very important for me because my research is purely computational, therefore having access to excellent supercomputers is definitely a plus.
- Good support services and staff
NERSC does customer service very well. I am always pleased whenever I deal with NERSC. I would also say that NERSC's infrastructure for users is very helpful.
NERSC is doing extremely well on account management. I did not appreciate enough until I started to use other computing centers.
I think the consultants are very good. I like that I don't need a cryptocard to login.
NERSC is very user oriented, has capabilities that fit my needs. The allocations process is not a large burden, and well worth the effort. NERSC is a model for a computational facility.
The allocation process is easier at NERSC for mid-sized projects (several hundred thousand CPU hours).
FABULOUS tech support that's available 24/7 and always very professional, and well-maintained software and configuration. This is more important to me than even having the latest machines. I liked Seaborg because it was always well-maintained, even though it was rather slow.
Nersc is an example for all other national labs. They are so friendly and so helpful. They should get a raise.
The reason that I like Nersc is that the environment is friendly. Nobody will push you around and I do not need spend too much time on unnecessary things. It is very productive. It is important to me since I can work on my research that I could never be able to do. And the research has a huge impact. I am very productive, yes very.
Keep on doing your super jobs, and do not forget allocating more cpu time for me!
NERSC is top notch. Consultants are all great. Franklin has had a rough start, but is improving slowly.
Accounts, keeping your users informed, support is very responsive when available. Even getting phone calls out of the blue regarding patterns of my HPSS usage -- I was very impressed with that. Keeping franklin running sounds like quite a job to do, so I am pretty understanding when Lustre eats one of my files.
The process for requesting a start-up account was very straight-forward, and I received a prompt reply.
It was easy to get started on Franklin, thanks primarily to the excellent website.
NERSC provides me quick and kind helps to solve the issues
The batch system is very nice (faster and stable)
I am very happy with NERSC. Tech support is great; the people are very helpful.
The support group do good jobs, I can have instant help when needed.
The support is excellent. All inquiries are addressed in a timely manner.
NERSC is very responsive to our needs. The consultants are knowledgeable and helpful. The systems are generally up and stable.
Generally, I'm pleased with NERSC; I can do good work here and the level of support is excellent (so that I'm able to concentrate mostly on physics issues rather than why-won't-this-computer-compile-or-run-my-code-type issues).
I think the response of the support team is great. Thanks!
Excellent consulting service.
NERSC has a very knowledgeable staff that is user-oriented and very friendly. I enjoy interacting with everybody at NERSC.
Your user "how to" web sites for the different machines is very good. Really, the best I have seen. However, NERSC is just another computer resource for me. I run at computer systems at ORNL where I am located as well.
user consultation is superb. They are very helpful.
NERSC has been important to me as a source of information in new technologies with talks and a careful support.
- Good software / easy to use environment
NERSC provides large scale computers with high reliability, and provides expert software assistance.
NERSC offers a lot of very important software and have a great support team. Without this, my research would become significantly more challenging.
capability to compile and error check fortran codes better than other computers.
It allows us to to simulations to big and time consuming to do on our home computers.
NERSC presents updated software. I use it because my needs of parallel computing, and I find in NERSC all the adequate software, as well as, a good consultant help in order to do it (mainly, when implementing new codes or using new libraries).
I compute at NERSC because the software environment is good and because I often need to run problems 100X bigger than what a desktop can do.
If we ever get reasonable DOE funding, maybe we would run jobs 1000 to 10^4 times larger
The availability of large computing resources (since Franklin's inception) have made NERSC an invaluable resource for my research. I am glad that NERSC has moved away from IBM platforms towards linux clusters. This makes software development much easier and requires less training for students. Without NERSC I could not complete my research
Bassi is very fast. NERSC provide the latest version of VASP.
I mainly use pdsf for dta analysis in STAR collaboration. The starroot software package is well maintained and current, avoiding duplication of work at my home institution (UC Davis).
- Good data storage environment
NERSC is important to be because of PDSF, HPSS and the resources STAR uses for simulation and data analysis.
In addition to providing much needed simulation resources, PDSF (together with HPSS) allow access to the reduced STAR data, and make NERSC/PDSF a viable alternative to BNL/RACF. Since many of our colleagues have a difficult time accessing BNL/RACF, such an alternative is very important for collaboration.
Easy access to a range of different hardware systems, including a common directory (/project) shared between the platforms (and between different users of a repository) so that data are always accessible even if one or another machine is down.
Reliable, not much down time.
I use PDSF for simulations and analysis. It is essential because of its speed and disk resources.
NERSC (PDSF) is one only two centres supporting STAR computing where collaborators can get an account ie Tier 0 or 1. RCF at BNL is very full of jobs and disk space is very limited due to being full so I find PDSF better for something. In particular the access to HPSS through the grid is a unique feature and very important for for distributing STAR data produced at Tier 2 centres (no accounts for collaborators in general).
I use NERSC when my local machine is busy; also because NERSC has HPSS storage that is very convenient since I produce large datasets; also I use NERSC to check my codes on a different hardware that helps to catch difficult bugs.
Two words: disk space.
Good computing. Good storage. We always need more.
NERSC does a very job on supercomputing.
I am most satisfied with the purge policy, so that I only purge my data whenever is needed. This may sound simple, but it worth a million if you are analyzing a big project like climate research. People never knows which data is needed and it takes lots of time to do the backup jobs to the hpss, as hpss don't have a "rsync" feature.
- Other comments
I just wish I can have more allocation so more challenging problem can be attacked.
This was a good survey. It was not too long.
I compute at NERSC because my boss, Grant Branstator, told me to. Everything works fine at NERSC except franklin and HPN-SSH.
The hardware is good, but the OS was down for too many times. I compute at NERSC because this is the only place I can compute.
What should NERSC do differently? (How can NERSC improve?) 108 responses
- Franklin Issues: 24 comments
It would be great if NERSC could magically improve the stability of Franklin... Unfortunately, hardware failures increase with the size and complexity of the system.
Other than Franklin's frequent down time, I have no complaints, and even then I am very satisfied.
I have no suggestions. I'm sure now that franklin has been more stable lately that things will be a little more smooth.
Keep Franklin more stable.
Uptime should be more, downtime should be less. [Franklin user]
Improve Franklin uptime.
Reducing the outages. [Franklin user]
... Franklin is good when it is up.
needs overall improvement:
Get Franklin working !
Stop buying from Cray. Between the Catamount boondoggle and all the issues I've had with Franklin, I'm left wondering: is Cray run by 2 smart oxen or 1000 chickens?
Consider getting rid of the Cray XT4 (franklin) and getting something that works without endless fiddling on your part. ...
I think NERSC is doing fine right now. Their responses to problems are timely. Franklin has some initial issues, but these should settle down over time.
I do not believe the XT3/XT4 problems are due to any NERSC issues. The only thing NERSC might do is raise the priority of user problems higher and more quickly with Cray.
Making the response time on Franklin faster would be nice.
Need to improve network and load management on the log in nodes for Franklin. At times it is very difficult to get any work done since the response time is so slow.
NERSC's greatest liability at the moment is the poor and unreliable I/O performance of Franklin, its flagship machine. I don't know how this can be fixed, but I would recommend that future purchasing decisions take more account of user experiences on similar machines. In the case of Franklin, the similar Oak Ridge NCCS/LCF machine, Jaguar, had been plagued by similar issues for some time.
fix lustre on franklin.
Hammer on the Sun people to fix the Lustre filesystem. It seems to me that issues with Lustre are the main reason that Franklin has issues. Otherwise it is a fantastic machine. NERSC has been a really great place to compute.
improve stability and usability:
... 2) Make franklin more reliable and user friendly
Franklin could be more stable and with less headaches. ...
improve stability and performance:
Continue to improve Franklin uptime and command line responsiveness.
I wish that Franklin was a little more stable, but I'm sure this will improve as the bugs get worked out. I wish the queue was faster on Franklin.
improve job management:
Needs a better resource management system. Limited run time is extremely frustrating. As I am running simulations that take weeks, having to resubmit jobs every day leaves a lot of lost compute time (waiting in the queue). Additionally, I have jobs killed because they ran overtime, but this was completely due to some slow down on the processor level, as I've had these jobs complete successfully previous (i.e. 200,000 steps can be completed in 24 hours, but for some reason, at 24 hours, I only get through 150,000 steps). [Franklin user]
- Change job scheduling / resource allocation policies: 18 comments
more support for mid range jobs:
... Don't push users to use large number of processors when they are not needed. Science first (NERSC should not be an experimental site for computer science). Computational capacity is more important than capability.
Do not force out "small" users --- there are many of us who are small only because DOE Office of Science has not given adequate funding to our programs. We hope this will change in the post-Dubya era...
Although this might be a minor point, I would like NERSC to set the class charge factors finely by the execution queues for optimizing the user's usage of the computing facilities. [In particular this user would like the addition of a regular_medium class on Franklin, with boosting and discounting over regular_small.]
I wish there were more of a market for SMP users, because it is a very convenient form of parallelism for some problems that are difficult to make totally distributed. However, getting through a queue that favors high total processor counts is hard. This doesn't mean that it is easy to find necessary resources (including large total number of hours) elsewhere, which seems to be the implicit assumption in setting queue preferences.
longer wall times:
1) Add a (super?) low priority, long duration (4 day) batch queue on franklin. ...
Providing for long serial queues (~12 hours) and enabling these for applications such as IDL would further improve the usefulness of Franklin in particular. We appreciate your efforts to do this and look forward to finding a solution with you soon.
Increase the time limit for premium class jobs. ... [Bassi user]
less support for INCITE and premium jobs:
Change the way INCITE awards work in the batch queue. [Bassi user]
Less emphasis on INCITE, special users. More emphasis on providing high throughput for production applications. [Jacquard / Bassi user]
Keep premium users from taking over bassi
more debug support:
Do not make it too costly to do development or testing. (where a user would only grab a node or two)
More support for debugging at scale. ...
training on queue use:
an improved queue system on bassi with proper instruction and approximate delay times to run the submitted jobs would be useful
The large number of different cluster and different queue options is confusing. Which one should I use?
more robust queuing system:
As computing clusters grow, it would be very interesting/helpful for NERSC to invest in robust queuing systems such as Google's MapReduce model. It seems that all of NERSC's clusters are based upon the premise that failures are abnormal and can be dealt with as a special case. As clusters and job sizes grow, single point failures can really mess up a massively parallel job (Franklin) or a large number of parallel jobs (bad nodes draining queues on PDSF). Companies like Google have succeeded with their computing clusters by starting with the premise that hardware failures will happen regularly and building queuing systems that can automatically heal, rather than relying upon the users to notice that jobs are failing, stop them, alert the help system, wait for a fix, and then resubmit jobs.
It should detect the jobs that fail because of hardware failure and reimburse the user of the time spent for running the job. I had so many jobs stopping because of insufficient wall time. I think that's a pity. However, I don't know what to suggest. [Jacquard user]
support for "massively serial" job streams:
I don't understand the rationale for limiting the total number of simultaneous jobs on jacquard to only 4. For my case, I required a lot of CPU time, but my jobs were implicitly parallel; I needed to run hundreds of independent single-processor jobs at once, not one hundred-processor job. The easiest thing would have been to simply submit 100 jobs and let the PBS queue figure out the most efficient way to distribute them. Instead, I had to first write wrapper scripts to launch my jobs through MPI, and then I had to manually divide them into submissions, arbitrarily selecting the number of CPUs based on estimates of the cluster loading. Basically, I had to wait for some large number of CPUs to be free at once before my jobs could start, even though I only needed one CPU at a time. It seems that allowing people to submit a large number of single-processor jobs simplifies life for the users *and* allows PBS to manage the overall cluster load more efficiently.
I understand the requirement that each job always be assigned to its own node (so that basically 2 processors is the minimum)... that simplifies everything and avoids RAM allocation problems. But it seems it would be better for everyone to limit the total number of requested processors, not the total number of jobs, so that people who don't need parallelization can still use the cluster efficiently.
In any case, this is not a major point, just more of an annoyance for which I don't understand the justification.
less support for small jobs:
I would suggest doing more to discourage single node and small jobs
- Provide more computing resources / new hardware / different acquisition strategy: 16 comments
provide more memory:
NERSC's seaborg was a great success because of its reliability and its large amount of per-node memory. That led to the situations that majority of scientific codes ran well on it. The future computer (NERSC6) shall have a configuration with large amount of per-node memory (similar to bassi or larger, but with larger amount CPUs than bassi has).
... more memory per processor core.
My applications requires a lot of per core memory. I hope that at least one of NERSC future machines will address this need in terms of computer architecture.
... Another very useful improvement would be to have more memory per core.
It is possible to provide a subset of Franklin machines with greater than 2 Gb of Ram. Maybe 16 with 8 Gb (per proc) 64 with 4 Gb (per proc) Certain problems are not efficiently addressed by 1.875 Gb of Ram
provide more cycles:
More machines to accommodate the large number of users.
As usual, bigger computers, more cycles, ...
Have bigger machine, and allocate large time.
You could always have more ... power ...
more careful machine transitions / longer runs during early user access:
Don't upgrade the machine and software too frequently. To be more conservative when purchasing new machines (stability should be an important factor for new machine, not just the speed). ...
... Longer transition period between old machine shutdown and stable period of new machines.
The Franklin supercomputer was purchased for a specific type of computing problem. As we know, not all computing strategies are appropriate for all problems. While Franklin may be a beneficial machine to some, there was an issue with Seaborg going away and a suitable replacement using a similar computing strategy not being in place. I'm both worried and excited about the upgrades to Franklin. Some researchers may feel that those upgrades are taking money away from a potential replacement for Seaborg - something with 8 CPUs (or more) per node. This may be the particular perception of those who view Franklin as a failure.
Two suggestions here then:
1. manage expectations of new hardware, and
2. when taking one kind of machine off-line, replace it with another of the same kind.
This is hard to answer. Most of our runs for the past 3-years have been at DoD sites. DoD has instituted a special month or two at the beginning of the life of a new acquisition of dedicated use of the whole machine to selected users [one writes proposals to be considered]. This works for DoD since they have over 6 different sites and typically 1-2 new supercomputer is being bought each FY.
For our work this has been of tremendous importance since our codes parallelize to all available cores without saturation [with help from Jonathan Carter :)].
While NERSC has early users -- this is primarily for short runs. For example, in Fall 07 we were chosen to run on the newly acquired 9000 core SGI Altix at ARFL (Wright-PAtt AFB). We were given all 9000 cores on shifts of 24 hours : we had the whole machine for 24 hours, and the other groups had the machine for 24 hours. This lasted for 6 weeks. Yes - the machine had its hick-ups, but we are still analyzing the TBs of data still!!
Obviously NERSC can only do this if they procure more machines than on a 3-5 year cycle.... -- so this is more of a comment and a wish than that practical for NERSC.
provide different architectures:
NERSC cannot do much differently. Computing procurements are driven by a very broad science base, which means certain areas (in my case chemistry) end up with an architecture that is not as well tuned for their applications.
Acquire a very large basic Linux cluster in order to offer more computing capacity to those users that don't particularly need to run very large single jobs.
I think that NERSC should evaluate whether its users computational needs might be better served by different computational configurations, particularly by a large collection of clusters with 256 - 1024 CPUs that would not require expensive interconnects but would be large enough to support the majority of the needs of users. On a per-CPU basis I know of several 'departmental clusters' that are ~half the cost of Franklin, for instance, and we could therefore have twice the total number of CPUs available for production work. Now that we have Franklin it can handle the relatively rare jobs that require many CPUs, and the next expansion of NERSC could more cost-effectively serve the majority of the work that does not need a huge number of CPUs.
- Allocations and Charging issues: 14 comments
improve allocation management:
See comments about managing allocations. I know of at least one other person that had a similar experience to ours so I know we are not the only ones.
The main problem we had working with NERSC was the ease with which it was possible to blow through our entire allocation before we realized the we had even done so. This occurred for a number of reasons.
2) The allocation management seems to be non-existent. There is no indication that you are running down to the end of your allocation or that you are asking for more resources than your allocation can provide. On our MPP2 machine at PNNL you can't get on the queue if you request a job size that overruns your existing allocation but there seems to be no such restriction at NERSC. MPP2 will also provide a warning that you are asking for more than 20% of an existing allocation, which provides another opportunity to check that you may be doing something stupid.
I'm only aware of the INCITE process for requesting significant time, and I would like INCITE to review applications twice yearly. The time between having a ready code, applying, and then waiting for the results can be quite long if you're a postdoc.
... It is also annoying to loose time on a quarterly basis if it is not utilized according to a predetermined standard (this does not happen at the other centers I use, e.g. SDSC, Artic)
The system of having to use up a certain percentage of your account by certain dates can be very irritating. It doesn't necessarily match my schedule of when I need to do computation, and rushing jobs through to meet the quotas can be a waste of computation time in the long run. Perhaps you can make the system more flexible, especially for small accounts where it can't make much of a difference in your overall scheduling scheme.
The allocation process should be stabilized - it seems like when it occurs changes every two years.
issues with Machine Charge Factors:
Treat a CPU hour as the base unit for allocations and drop the machine factors. These factors are confusing and led several different PIs at my institution (me included) to request only 1/6 of the time we actually needed because of the 6 times adjustment for the franklin hours.
Lose the "dog hours" accounting system, and base the numbers on real cpu-hours. This scaling to an old dead machine is silly and confusing.
The machine charge factors drive me crazy, but I understand the heavy demand for the NERSC resources.
... 1) The accounting system is not very intuitive and is geared towards mistakes unless you actually read up on it carefully (I don't think many people actually do this). It seems to me that using a node for 1 hour under regular circumstances on the flagship machine should be counted as 1 node-hour and everything else should be adjusted downward from that (except possibly using a high priority queue). The fact that you need to consider an extra 6.5 multiplier when using Franklin is asking for trouble. We won't forget to do it in the future, but we found out about this the hard way. ...
... Less confusing charging policy for SU's.
need a larger allocation:
Maybe, allocate a lot more time for me. ...
This biggest issue is with the allocation of hours. With the advent of massively parallel systems like Franklin, 1 million cpu hours for a modest sized group get used up very quickly. As the project calls are only yearly, this makes things difficult. Many other systems I have been involved with in Europe take open calls for project time throughout the year or at least quarterly.
One issue I've run up against is that my research group can use more hours than we are allocated (and we are allocated a lot of hours). I would run more if I didn't fear eating up others' hours. I don't know what the best way to deal with this issue is -- more frequent renewal possibilities?
We are mainly limited by the amount of cpu time we have available at NERSC. A deeper discount for jobs over, say, 10000 cores would be helpful to us and may encourage code/algorithm improvements by the community of researchers.
I would like more time on Franklin (or equivalent machines) by orders of magnitude. Nuclear physics did not get enough time to allocate.
- Improve queue turnaround times (especially on Bassi): 13 comments
The queue times on Bassi are long. ...
Bassi has become so busy as to be almost useless to me. ...
Sometimes the whole machine is taken over by a few large and long run time jobs. The queue waiting becomes very long in order to run small quick jobs that only require a few nodes and less than one hour run time. In this case, maybe good to designate a few nodes for small and short jobs. [Bassi user]
The queuing systems on most of the NERSC systems means an inordinate wait for many of my jobs. Especially on Bassi, the limits on the number of queued jobs limits the work that can be done on these systems.
Either NERSC computers are over subscribed, or the queues could be restructured to be more effective. Wait times can often go longer than 4 days, which is too long IMHO. [Bassi user]
Good as it is..sometimes batch jobs waiting in the queue is long. [Bassi user]
I am waiting for so long time on queue [Bassi user]
... Decrease waiting time. [Bassi user]
the waiting queue is very long. [Bassi user]
The queuing system in Bassi needs a serious work. The turn out time is just too long.
waiting times are long. [Bassi / Jacquard user]
... We still need to wait in the queue for quite some time. [Franklin / Jacquard user]
... My main concern is not what NERSC can improve but whether it can keep the queues short on Franklin -- less a matter of improvement than of staying in a very good place.
- More or Better Services: 13 comments
more PDSF support / recognition:
PDSF needs better NERSC recognition and support. It is a facility with large scientific output. It appears understaffed.
... I guess because of lack of personnel or time some of the thing on the webpages are out of date. Not sure how one would address that. It would mean someone had to have an encyclopaedic knowledge to know what was wrong. [PDSF user]
This is less affecting me now than when I was part of Nearby Supernova Factory, but a huge problem we had was weekend support for PDSF.
Accept pdsf as an important part of NERSC.
more or better training:
Enhance the tutorials that are accessible online (html & pdf versions), with better up to date examples. Some of these are mostly rehashed copies of the vendor's manuals.
Offer more trainings for remote users. Make them more accessible..
more consulting/visualization help:
More help with porting code to new platforms and more help with preparing data for visualization. ...
Dedicated support would be nice but I believe that it would be difficult to implement
improved web services:
a wiki on the use of the facilities
NIM could improve in its look and feel
... This survey could be shorter - it is too complicated and long.
better performance and resiliency tools:
NERSC should tell more about their strategic plans. Hopefully in three years we will be operating differently than we are now (command line submission, manual data management etc.) Is NERSC going to actively help with this, or simply be a resource provider? Is NERSC going to help campaign to get better performance and resiliency tools (fault tolerance) actually put into production vs being left as academic demos?
improved reporting of outages:
The report of outage is not always in time. ....
- Data Storage and Disk Space Issues: 8 comments
If it were possible, the file system for Franklin /scratch should be accessible from the visualization/analytics server Davinci, and /projects should be accessible from Franklin compute nodes.
A significant improvement would be to make file permissions on the various platforms should have the same defaults. On Bassi, files are created with a "group" that is equal to the "user". This does not make sense to me. Groups should be for groups. We have to waste a lot of cpu time running chgrp on many files.
More disk space to users. The whole point of having a LARGE cluster is to do LARGE simulations. That means LARGE amounts of data. We should get more storage space (individually).
Scratch disk can be larger, or the limit on personal use of it can be larger. It will be good if there's a rule to make people delete the scratch data once a job is completed (or in a short time), everybody can share a bigger space.
Not sure. You could always have more storage and more power, I guess.
I have in the past had one job that *could* have used a very large amount of disk. The administrators increased my quota upon request to a reasonable limit, but unfortunately that was less than required. Calculating direct slowed things down, but I could get the calculation done. If there was a fraction of disk set aside for testing requirements (or, a flexible quota system that permitted large but infrequent and transient boosts to users' disk allowance), that might allow users to make better judgments about job requirements, and whether requesting a permanent quota increase will be enough. If a job actually requires a terabyte of scratch, disk quota increases are probably not going to be the solution.
If it is possible, a fast mass storage resource shared between the different systems would help my work. /project is very useful, but the I/O bandwidth leaves room for improvement.
I find the archive storage system a bit difficult to use, but this is not terribly important to us (we can cope with it easily). ...
The only noticeable problem is the frequency of disks getting problems but I think that this has improved. ... [PDSF user]
- No suggestions / Satisfied: 8 responses
I think NERSC does a great job.
everything is fine.
NERSC seems to be improving all the time.
I am basically satisfied with NERSC.
I think it already did very well.
Keep the same good work please.
I'm very pleased with the NERSC resources I've used, so I have no suggestions at this time.
I think you are doing really good now.
- Software issues: 7 comments
Adopting the same OS and compilers in Jacquard and Franklin would be nice as it would allow us to test our software in Jacquard and transfer to Franklin without any further modification. At this time, we are not always sure that a program running on Jacquard will also run on Franklin. ...
I would like to see some additional software libraries installed on *head* nodes. GTK in particular. [Jacquard user]
Make Gaussian available on Franklin. I could use NWChem, and probably should learn how to, but it provides results that are slightly different than Gaussian does, and I need to clean up the results using Gaussian after running an optimization on NWChem.
... Also add a new [Gaussian] license to Franklin. Do you allow users to make some recommendations on what you should add?
Maintenance of software library versions and the modules system needs to improve.
Our major electronic structure workhorse is MOLPRO. The inability to run molpro in parallel across multiple nodes significantly detracts from the attractiveness of running jobs at NERSC.
Keeping up to date with the STAR libraries at RCF.
- Network Issues: 3 comments
Better network access outside of ESNet would greatly improve our efficiency. [PDSF STAR project manager]
... We could use faster data transfer. This can be accomplished via HPN-SSH, I think. But you would need to support it on your end. Most of the speed-up that is achieved by using these patches to standard SSH happens on the sending end (that's you, when I am moving data from NERSC to NCAR). I have had an open ticket on this issue for a long time.
As I have stated earlier, the only "difficulty" is in the time it takes for me to download atmospheric model data output files. The network connection between NERSC and my "home" computer is consistently slow. I don't know where the bottleneck is occurring. [University of North Carolina at Asheville user]
How does NERSC compare to other centers you have used? 104 responses
- NERSC is the best / overall NERSC is better / positive response: 61 responses
NERSC is the best supercomputer user facility I have worked with. It provides the best user services and has an enormous software repository.
The user support, ticketing, and follow-up is particularly competent and professional relative to other DOE centers.
There is no comparison really as other facilities I have used are not comparable at all to NERSC. NERSC stands by itself alone in its class.
One key thing that I've been enjoying at NERSC is that the queues are not overloaded like some places I compute. Either that, or the queues are designed to facilitate the style of problems that I perform. So, I see progress regularly and am not frustrated by the wait times. The other thing that I really like is that the /scratch space is large enough that my files are not regularly being scrubbed. I really dislike some of the DOD centers because it takes a long time to bring my restart files across to the scratch space, then I wait in the queue forever to get the CPUs. Once I have the CPUs my job looks for the files and they've been scrubbed. Then I'm starting over again. This has never happened at NERSC.
DOD (AFRL, ERDC, NAVO, ARL)
NERSC is better than average for the computer centers I have worked with, and there is no one facility I feel is better than NERSC. In particular, the security strategy at NERSC is more effective at keeping the computing resources both secure and available to users with limited interruptions. This is unique and very effective.
PDSF and HPSS are a powerful combination, and the commitment and responsiveness of their administrative teams should be commended.
NERSC systems seem to have much faster turnaround time than supercomputing facilities at Oak Ridge National Laboratory (Oak Ridge, Tennessee)
NERSC is No. 1. I used ORNL once. It was unpleasant.
The application/activation process for accounts is much easier than at ORNL. The lack of requiring electronically generated passwords (e.g. SecureID) is a great plus for NERSC. Other facilities I use that require these are ORNL, NCAR, and some EMSL resources.
NERSC is much better than NCCS in user service, and machine availability.
The best of all
Compared to the cluster at pnnl, NERSC is much better. Shorter queues and easier access.
Better support and better capabilities. (Environmental Molecular Science Laboratory -PNNL).
I've only really used the local Stanford clusters, and NERSC is heads and shoulders above Stanford in every respect: account services, compiler environment, queues, tutorials on-line, etc.
I compare NERSC to Teragrid systems and also nano millennium (on campus I believe). I think NERSC is the best out of these and I can't think of anything that the others do that I would like to see NERSC do. Well, I guess I don't like the fact that I can't run 'low' priority jobs over 12 hours long on Franklin, but that's about it.
I have also used NCAR, and I have found NERSC to be better in every way. Good job!
Better than the Goddard Space Flight Center at NASA. The people GSFC are also very responsive to our requests, but seem to lack the technical ability to actually get the job done for us. We no longer compute there.
Best so far....
In addition to NERSC, I use the following facilities:
1) Ohio Supercomputing Center (OSC)
This is a smaller, local center. It provides us with a small fraction of computing resources compared to NERSC - both in terms of allocation size and machine size available. However, we are allowed to renew our allocation frequently if needed.
2)National Center for Supercomputing Applications (NCSA)
NCSA also does not offer us as many resources (allocation/machine size) as NERSC.
3)National Center for Computational Sciences (NCCS)
NCCS also offers a large Cray-XT4 system, however our group is relatively new to this facility and our allocation is fairly small.
NERSC is the facility at which we do the majority of our computation. We have had a long history with NERSC and a closer relationship, including being involved in pre-beta/beta stage testing on Franklin. NERSC provides us with a sizable allocation with which we can afford to scale our simulations up to thousands of processors and complete heroic-sized research projects. NERSC is the very good at communicating with its users - notifying of system changes and offering special allocation programs. The support staff is also very friendly and responsive.
nersc is number 1.
NERSC offers pretty much hassle-free interaction and a long-term stable environment for the user. I am comparing it in this regard to NCSA. NERSC is much better.
Best so far (only other experience is SDSC's DataStar, which is significantly older than Bassi).
Compared to SDSC at UCSD, NERSC is much better in terms of the resources it offers and the availability of staff for consulting issues. Compared to LC at LLNL, NERSC is a much more open facility (clearly) and has more resources available to the common user.
NERSC is the prime example of how a user facility should be run.
It's the best service I've ever used.
Better than most of the other centers. I would rank it on the top of others.
much better than our university supercomputing center.
Easy to contact. Helpful and dedicated staff.
The only other computer center I use routinely is the RHIC computing facility, and I much prefer using PDSF at NERSC. Because I can count on finding disk space, and throughput on batch jobs is generally higher.
I have only used NERSC and NCAR. I find NERSC has shorter queues, equivalent support, and a less cumbersome security system (NCAR uses cryptocards which I hate).
Good consultant and excellent software, and user attention.
The NERSC is the best one.
It's the best that I ever used.
I had used ORNL computers. I like NERSC's flexibility with job size (number of processors) and NERSC technical support.
NESRC is very very good in what she does.
As I said earlier, NERSC really shines when compared to NCCS. NERSC has spoiled me with the quality of the services.
NERSC compares very favorable with other centers such as Oakridge NCCS, RPI CCNI. NCSA and OSC with comparable resources to NCCS and CCI and more available resources than NCSA and OSC. Software and documentation is always up-to-date and your staff works hard on bugfixes etc.
NERSC has more reliable hardware and software support than NCCS.
LLNL clusters for low-clearance interns had MISERABLE support and software configuration. NERSC is doing it right by comparison.
Although Franklin initial experience was a bit rough. My experiences at TACC using Ranger were far, far worse.
The main alternative computing center with which I have experience is NCCS, which I have not used in six months. In my opinion, NERSC is superior in all respects, particularly in that it seems to be driven by the needs of its scientific users rather than by a desire to do cutting-edge computer science regardless of the quality of the science that enables.
Better. Much better actually (than the DoD centers)
NERSC is doing extremely well, as usual, especially since you are dealing with about 10 times more projects and users than the other centers. Hiring more people would be a good thing if DOE would allow it...
NERSC compares excellently with all the other centers we have dealt with. These DoD sites are NAVO, AFRL, ERDC.
I have recently been running stuff at NCAR and at NCCS, and I like NERSC better that these other two centers. Mostly I like the people I talk to at NERSC, and I think the time spent in the queue is shorter.
nersc is better than others, especially better than the jpl nasa computing center I also used to use.
Other centers I have used are NCCS at ORNL and MCS at Argonne.
1. NERSC website is far superior to either of the others.
2. The BG/L at MCS is not useful for me because of the incredible small amount of memory per core. Franklin is a much better machine with respect to my needs.
3. Jaguar is equivalent to Franklin, but the queues are heavily biased to very large jobs. This is good, because getting large jobs to run on Franklin takes a while (though I haven't done it for at least 6 months, so perhaps the queue throughput has changed by now). But getting small jobs to run on Jaguar is more difficult, so Franklin is better there. With access to both, I find them to be good complements to each other. But for those who have access to only one or the other, this could be problematic.
In my former life, I have used DoD HPC centers and NERSC provides service that is on par with and, in some aspects, exceeds what I experienced with the DoD HPC centers.
I also use some computers that are part of the Teragrid Consortium. I also like the Teragrid facilities, but I've generally found the queue waits to be shorter at NERSC and that it's easier to get timely answers to questions at NERSC, in part because it's sometimes not clear where to address question on the Teragrid.
Compares very well(if not much, much better), RHIC Computing Facility.
NERSC versus CINECA (in Bologna) is faster in running jobs in queue. Better informations about resources and software.
Comparing to the SX-8 at RCNP, Osaka University, Japan, and the TSUBAME (linux cluster for the parallel computation) at Tokyo Inst. Tech. Japan, I am much satisfied with the NERSC facilities for running the code, because of the prompt upgrade of the software and the bug-fixing.
very well run and managed. Usually good turn around. good consultants
Much better than an LSU cluster I attempted to use long ago, and much better than smaller institution-specific grids with a much less sophisticated (i.e. manual) queuing system.
Very easy to use compared to RCF.
NERSC is the best center that I use overall. I also do computing at NCCS, LLNL, LANL, and SDSC.
For STAR it seems to be have a higher effective availability than RCF when one considers how tight resources are at the latter.
- NERSC is the same as / mixed response / no comparison: 25 responses
The primary other facility we use is the NASA Advanced Supercomputing (NAS) center. Both NERSC and NAS are providing excellent services; nothing occurs to me when I am asked to compare them. Both are excellent. Thanks for the great service!
I've used NERSC, LLNL, and ANL. NERSC is tied with LLNL as the most user friendly and responsive center I've dealt with.
[I am comparing with past experience at Oak Ridge, LANL, and Air Force centers.]
NERSC is probably the best, though services and overall performance at LANL and at Argonne/APS local cluster are also very good. I am very satisfied with all supercomputer centers I have worked with so far.
Our other HPC experience is mainly at ATLAS at LLNL (https://computing.llnl.gov/?set=resources&page=OCF_resources#atlas). NERSC's queue structure is generally much easier to use for us, resulting in higher productivity. ATLAS does allow long serial queues for analysis (eg IDL) which is very important as noted above.
I also work at NCAR and ORNL. NERSC has been very responsive to our needs, and has a commendable uptime record. However queue times at both NCAR and ORNL tend to be much shorter.
Comparison with ORNL Jaguar
Pro: More user-friendly queue policies, in particular for small-scale jobs
Con: Slower execution time
Compared to the computational centre in Juelich, NERSC is far superior in terms of machines available and information available on - line. At Juelich, however, there is no storage limit on archived data (only on the number of files) and the disk space available by default is much more (~3TBytes).
In general, NERSC is more stable, and therefore a better place for development than other centers I have used. But this success means that there is more competition with other users for resources (long queues).
NERSC is very well run although I can't complaint too much about the services I receive at ORNL either. However, your user "how to" web site is much better than anything I have seen at ORNL. Also, your help desk is more responsive than the one we have here. I just get better turn round on my jobs at ORNL and therefore I get most of my work done ORNL. NERSC is a secondary source of computer resources for me.
Franklin seems to be up more reliable than Jaguar at ORNL. If/when Franklin gets upgraded to quad core, I hope that that can be done with significantly less down-time than the quad core upgrade of Jaguar.
I found several "hands-on" workshops organized by/at the ALCF (BlueGene/P) at ANL very useful; in particular the fact that they have "reserved queues" during the workshops so that you get immediate turn-around, even for big jobs. Of course, I realize that the ALCF serves a much smaller community, and reserving Frankling impacts a lot more users than reserving BlueGene/P at ALCF. In addition, Argonne is half a day driving for me, so it is easier to go there for a day or 2 than going to NERSC; which is probably why I have never (or I should say, not yet) attended a workshop at NERSC.
Texas A&M Supercomputing center.
NERSC is on par with the LLNL center.
The professionalism and amount of support available is better than at TAMU, as is the stability of the systems.
NERSC seems on par with the other centers I use, which include ORNL and NCAR.
It is as good or better. Recently, I've used the ATLAS cluster at LLNL.
Compared to Jaguar at Oak Ridge - They have passcode access, it is a hassle to type the pass code but at NERSC I end up getting my password unblocked frequently (eg after trying to log into Franklin while it is down). Their queues have been bad during their quad core upgrade.
Compared to Red Storm at Sandia - You're accessibility is infinitely better for a variety of reasons.
It's much better than http://www.arl.hpc.mil , though franklin is down a lot more. When up, franklin is a better computing environment.
NERSC compares very favorably to other centers.
National Center for Atmospheric Research
NOAA Earth System Research Laboratory
NOAA National Centers for Environmental Prediction
The only thing that is done at these centers is a tighter coupling of the file systems, and more standard use of "user", "group", "world" permissions on files. NERSC's consulting and staff are on par or exceed that of these other facilities.
Comparing to RCF at BNL:
NERSC/PDSF has an accessible support staff
NERSC/PDSF disk systems are not very stable compared to experience at RCF (and CERN)
Other centers I've used:
National Center for Supercomputing Applications (NCSA)
The Ohio Supercomputing Center (OSC)
NERSC has bigger facilities than these others I've used and has allocated my research group more time than the other two. OSC, probably because it's smaller, has a frequent (monthly) allocation application review.
Good but others have shorter waiting queues
I have computed at SDSC, NCSA, ORNL, ANL (ALCF), TACC, FNAL and PSC and some older centers. NERSC compares quite well, at least as far a my experience on Franklin is concerned. I did not make much use of the IBM machines. I have found that PSC was willing to assign a single person as the first line of support to our project. This worked out well there as he knew who to contact when we had any unusual problems and he kept up well with our code and issues.
I'm also running GYRO on Jaguar at NCCS, where I have honestly done most of my work. I think this is mainly due to the relative computing allocations I have available to me at each location, but I've also found that the MPP charge on Franklin is such that if I try to do the number of runs I'd like, I burn through my allocation time at NERSC quickly. The work I'm doing requires many moderately sized runs rather than a few "hero" runs, so I'd actually prefer to run more on Franklin and request smaller processor batches than I do on Jaguar to improve the efficiency of my runs.
I also used JAGUAR. Franklin performs the same.
I'm also working on SDSC. Both supercomputing centers are satisfactory.
- NERSC is less good / negative response: 11 responses
I have better experiences with both PNNL and Argonne but some of the reasons have nothing to do with NERSC being less competent. That Argonne is operating BlueGene machines and is has a group developing system software for this platform on-site is a huge advantage. PNNL is operating a single Linux-based machine which, in addition to being rather mature, was built specifically to run my code.
LLNL was excellent and perhaps provided the model for NERSC.
The German facility at Juelich was a place where I worked for many years. they have kept up with the advances in computers a bit better than NERSC. They have now installed the latest Blue Gene IBM machine.
Allocation management at PNNL seems to be a lot more straightforward than at NERSC.
I was surprised to see that Franklin is about 4 times slower than EMSL's MPP2 for computational chemistry runs.
The NIH "helix" cluster team offers the "swarm" command. It is a wrapper to "embarrassingly parallel" PBS jobs, so that the user does not have to known anything about processor configurations. All these options (cput, ppn, ncpus etc.) are "behind the scenes" automatically generated such that the cluster is being used in the most efficient way (no empty cores on a node for example.) This makes parallel processing unbelievably easy, one simply specifies the commands that have to be executed. I was so enthusiastic about the swarm command that I contacted the NIH team about where they obtained this program. I was being answered that they developed it in-house, but it is freely available. Maybe it could be interesting for NERSC to contact the NIH "helix" cluster team about this extremely helpful program.
Coming from Virginia Tech with System X, I was able to submit a job and have it run to completion, whether it took 2 hours or 2 weeks.
I have also used DOD HPCMP resources. They tend to update their hardware more often than NERSC.
teragrid has a better web portal
NERSC seems to have stricter rules, more bureaucracy, and less flexibility.
BNL-RCF/ACF commits staff to aspects of distributed data storage, such as dcache and xrootd. If PDSF is to remain a functional facility, in particular xrootd will need active NERSC support.
I have used the Minnesota Supercomputing Institute. In my opinion, their support and help staff was better and friendlier.
- No comparison made: 7 responses
I have not used any other major center like NERCS
The other centers we use are more geared to our largest-scale production runs. We are part of an INCITE award at ARGONNE and ORNL. We are also part of an LLNL Grand Challenge award.
Other than NERSC, I have experience with only the LCRC at ANL, though the extent of my use thus far doesn't really give me much to really compare the two.
I have almost only used NERSC for my computation needs.