NERSCPowering Scientific Discovery for 50 Years

2003 User Survey Results

Response Summary

Many thanks to the 326 users who responded to this year's User Survey -- this represents the highest response level yet in the six years we have conducted the survey. The respondents represent all five DOE Science Offices and a variety of home institutions: see Respondent Demographics.

The survey responses provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve. The survey results are listed below.

You can see the FY 2003 User Survey text, in which users rated us on a 7-point satisfaction scale. Some areas were also rated on a 3-point importance scale or a 3-point usefulness scale.

Satisfaction ScoreMeaning
7 Very Satisfied
6 Mostly Satisfied
5 Somewhat Satisfied
4 Neutral
3 Somewhat Dissatisfied
2 Mostly Dissatisfied
1 Very Dissatisfied
Importance ScoreMeaning
3 Very Important
2 Somewhat Important
1 Not Important
Usefulness ScoreMeaning
3 Very Useful
2 Somewhat Useful
1 Not at All Useful

The average satisfaction scores from this year's survey ranged from a high of 6.61 (very satisfied) to a low of 4.67 (somewhat satisfied). See All Satisfaction Questions. Areas with the highest user satisfaction were:

TopicAvg ScoreNo. of Responses
HPSS reliability 6.61 126
Consulting - timely response 6.55 207
Consulting - technical advice 6.54 200
HPSS uptime 6.54 126
Local Area Network 6.54 114

Areas with the lowest user satisfaction were:

TopicAvg ScoreNo. of Responses
Access Grid classes 4.67 27
Escher visualization software 4.75 8
Visualization services 4.81 97
NERSC training classes 4.88 24
Training 5.04 94

The largest increases in satisfaction over last year's survey came from the IBM SP, Seaborg, HPSS uptime, network connectivity, and available hardware:

TopicAvg ScoreIncrease from 2002No. of Responses
SP Applications 6.00 0.30 94
SP Libraries 6.27 0.18 131
SP Disk Configuration and I/O Performance 6.15 0.18 156
HPSS Uptime 6.54 0.17 126
Network Connectivity 6.23 0.16 241
Available Hardware 6.13 0.16 255

The areas rated significantly lower this year were:

TopicAvg ScoreDecrease from 2002No. of Responses
PDSF Fortran Compilers 6.03 -0.42 29
PDSF Ability to Run Interactively 5.77 -0.41 64
PDSF Applications 5.87 -0.34 39
SP Queue Structure 5.69 -0.23 177
SP Uptime 6.42 -0.14 191

Survey Results Lead to Changes at NERSC

Every year we institute changes based on the survey. NERSC took a number of actions in response to suggestions from the 2002 user survey.

SP resource scheduling:

Could longer run time limits be implemented across the board?

NERSC response: In March 2003 limits were extended from 8 to 48 hours for jobs running on 32 or more nodes, and from 8 to 12 hours for jobs run on 31 or fewer nodes. The "regular long" class, which provides a 24 hour limit for jobs run on 31 or fewer nodes, was preserved but with restrictions on the number of jobs that can run simultaneously.

Could more services be devoted to interactive jobs?

NERSC response: In March 2003 interactive jobs were given an additional system priority boost (placing them ahead of debug jobs).

Could there be a serial queue?

NERSC response: Two new classes to facilitate pre-and-post data processing and data transfers to HPSS were introduced in November, 2003. Jobs run in these classes are charged for one processor's wall clock time.

Could more resources be devoted to the "small node-long runtime" class (more nodes, a longer run time, better throughput)?

NERSC response: Resources were not increased for "regular long" types of jobs; rather the priority has been to increase resources for jobs running on more than 32 nodes. This is in line with the DOE Office of Science's goal that 1/4 of all batch resources be applied to jobs that use 1/8 of the available processors. For FY 2004 this goal has been increased to target 1/2 of the batch resources. Perhaps because of this resource prioritization, satisfaction with the SP queue structure dropped by 0.2 points.

SP software enhancements:

Could the Unix environment be more user-friendly (e.g. more editors and shells in the default path)?

NERSC response: The most recent versions of vim, nano, nedit, gvim, pico, xemacs are now in in all users' paths by default, as well as the compression utilities zip and bunzip2. Two new utilities help make the batch environment easier to use: llhist shows recently completed jobs and ll_check_script gives warnings/advice on crafting batch scripts. This year's rating for SP applications went up by 0.3 points..

Could there be more data analysis software, including matlab?

NERSC response: Matlab and Mathematica are available on the math server, newton. Matlab is not available on the IBM SP because big Matlab jobs can severely affect other users on the interactive nodes. The IDL (Interactive Data Language) package is available on Seaborg for interactive data analysis and visualization of data.

Computing resources:

NERSC needs more computational power overall.
Could a vector resource be provided?
Could mid-range computing or cluster resources be provided?

NERSC response: All the above are excellent suggestions and we certainly understand the desire for more computational resources. The FY 2004 Seaborg allocation requests were for 2.4 times the amount available to allocate. The reality is that there is no budget for additional hardware acquisitions. Last year we were able to double the number of nodes on Seaborg and this year's rating for available computing hardware increased by 0.2 points.

Documentation:

Provide better searching, navigation, organization of the information.

NERSC response: The NERSC user web site (http://hpcf.nersc.gov) has been restructured with new navigation links that should make finding information faster and easier. Related information has been consolidated. Printer-friendly links have been added to consolidate multi-page documents into a single one. The final phase of the update will be to encode descriptions for each page to increase the effectiveness of the search engine.

Enhance SP documentation.

NERSC response: We have made an effort to keep up-to-date on a wide range of SP topics: IBM compilers, the LoadLeveler batch system, IBM SP specific APIs, and links to IBM redbooks. In addition the presentation of SP information has been streamlined; hopefully information is easier to find now. In August 2003 we received positive comments from ScicomP 8 attendees in regard to how we present IBM documentation.

Training

Provide more training on performance analysis, optimization and debugging.

NERSC response: Since last year's survey NERSC has emphasized these topics in our training classes, for example: CPU performance analysis on Seaborg, Scaling I/O and Communication, Debugging Parallel Programs with Totalview. See http://www.nersc.gov/nusers/services/training/.

Provide more information in the New Users Guide.

NERSC response: More information on initial account setup was added to the New User Guide, which was also reformatted for ease of use. See http://hpcf.nersc.gov/help/new_user/.

This year's survey included several new questions:

  1. How useful were the DOE and NERSC scaling initiatives? [Read the Scaling Initiatives Response Page]

    In FY 2003 NERSC implemented initiatives aimed at promoting highly scalable applications as part of the DOE emphasis on large scale computing. For the first time, DOE had in FY 2003 an explicit goal that "25% of the usage will be accounted for by computations that require at least 1/8 of the total [compute] resource." (Note: for FY 2004 this goal is for 50% of the usage, rather than 25%.)

    The 24 respondents who had participated in the Large Scale Jobs Reimbursement Program and the 32 respondents who had worked on scaling their codes with the NERSC consultants rated these initiatives as "very useful" on average. poe+, used to measure code performance characteristics, had been used by 104 respondents and was also rated "very useful" on average. The 115 respondents who rated Seaborg's new batch class structure, designed to give preference to high concurrency jobs, gave it an average rating of "somewhat useful".

    20 users wrote comments in support of the scaling initiatives, for example:

    Please push this project as much as you can. This type of consulting is very important if one goes to the limit of a system in terms of #processors and sustained performance.
    11 users stated why they thought these initiatives are misguided. The general theme behind these comments was that it is science output that is important, not scaling per se. Some representative comments here:
    I believe that they are totally misguided. The emphasis should be on maximizing the SCIENTIFIC output from NERSC. If the best way to do this is for the user to run 100 1-node jobs at a time rather than 1 100-node job, every effort should be made to accommodate him/her. ... In the final analysis, it should be up to the users to decide how they use their allocations. Most, if not all of us, will choose a usage pattern which maximizes our scientific output. Remember that most of us are in computational science, not in computer science. We are interested in advancing our own fields of research, not in obtaining Gordon Bell awards.
    Don't freeze out the small-to-moderate user --- the science/CPU hour is often higher for the moderate user.
    There is always a tension between massive users and those who want to run smaller jobs. While many researchers use a single node (16 processors), I think it would not be cost effective for DOE to pay them to run on their own machines.
  2. Why do you compute at NERSC? (What are the reasons NERSC is important to you?) [Read All 229 Responses]

    Many of the answers were along the lines of "to run my codes in order to get my science done". Users pointed out that they need powerful compute resources that they can't get elsewhere. Many users specifically mentioned large numbers of processors or parallel computing as a reason to compute at NERSC. Turnaround time (getting results fast) is very important. Data analysis, especially in the context of PDSF computing is also a common theme. One user even pointed out that the time is "free".

  3. Has security gotten in the way of your work at NERSC?

    Ninety percent of the respondents (217 users) answered no to this question.

  4. If security has gotten in the way of your work at NERSC, how? [Read All 25 Responses]

    25 users answered this question:

    • 10 pointed to difficulties accessing NERSC (the change to ssh version 2, FTP retirement, difficulties with tunneling and ports).
    • 6 reported password or login attempt problems.
    • 3 encountered difficulties with accessing HPSS
    • 3 had grid/distributed computing concerns,
    • 3 said "it's inconvenient".

     

  5. How do you compare NERSC openness and access to your home site and others? [Read All 146 Responses]
    • 49% stated that NERSC has similar or greater openness than other sites they access
    • 28% said that NERSC's openness or security measures are good (without making a comparison)
    • 9% said that NERSC is less open or too secure

 

Users are invited to provide overall comments about NERSC:

  • 119 users answered the question What does NERSC do well?   69 respondents pointed specifically to NERSC's good hardware management practices which provide users with excellent access to HPC resources; 62 mentioned User Support and NERSC's responsive staff; 17 highlighted documentation and 13 job scheduling and batch throughput. Some representative comments are:

    Powerful and well maintained machines, great mass storage facility, and helpful and responsive staff. What more could you want?
    As Apple would put it .... "it just works". I get my work done and done fast. Seaborg is up and working nearly all the time. Network, storage, it's all there when I need it. That is what matters most and NERSC delivers.
    NERSC simply is the best run centralized computer center on the planet. I have interacted with many central computer centers and none are as responsive, have people with the technical knowledge available to answer questions and have the system/software as well configured as does NERSC.
  • 75 users responded to What should NERSC do differently?. The area of greatest concern is job scheduling; 14 users expressed concerns with favoring large jobs at the expense of smaller ones; six wanted more resources devoted to interactive computing and debugging. Next in concern is the need for more hardware: more compute power overall, different architectures, mid-range computing support, vector architectures. Eight users pointed out the need for better documentation and six wanted more training. Some of the comments from this section are:

    NERSC's new emphasis favoring large (1024+ processor) jobs runs contrary to its good record of catering to the scientific community. It needs to remember the community it is serving --- the customer is always right. The queue configuration should be returned to a state where it no longer favours jobs using large numbers of processors.
    I'm not in favor of giving highest priority to the extremely large jobs on all nodes of seaborg. I think that NERSC must accommodate capacity computing for energy research that cannot be performed anywhere else, in addition to providing capability computing for the largest simulations.
    NERSC should move more aggressively to upgrade its high end computing facilities. It might do well to offer a wider variety of architectures. For example, the large Pentium 4 clusters about to become operational at NCSA provide a highly cost effective resources for some problems, but not for others. If NERSC had a greater variety of machines, it might be able to better serve all its users. However, the most important improvement would be to simply increase the total computing power available to users.
    It would be great if NERSC could again acquire a supercomputer with excellent vector-processing capability, like the CRAY systems which existed for many years. The success of the Japanese "Earth Simulator" will hopefully cause a re-examination of hardware purchase decisions. Strong vector processors make scientific programming easier and more productive.
    Measure success on science output and not on size of budgets or quantity of hardware.
    The overhead on account managers still seems a bit much for what we're getting. I still find the ERCAP process onerous (i.e., more information requested than should be necessary). Also, most of the codes we are using are changing much more from year to year in a scientific sense than a computational sense, it becomes repetitious to have to keep evaluating them computationally each year. You need to keep in mind that most of us are being funded to do science rather than computational research.
  • 65 users answered the question How does NERSC compare to other centers you have used?   63% of the respondents stated that NERSC was a good center (no comparison made) or was better than other centers they used. Reasons given for preferring NERSC include good hardware, networking and software management, good user support, and better job throughput. 11% of the respondents said that NERSC was not as good as another center they used. The most common reason for finding dissatisfaction with NERSC is job scheduling.

 

Here are the survey results:

  1. Respondent Demographics
  2. Overall Satisfaction and Importance; Why do you use NERSC?; Security and Flexible Work Option
  3. All Satisfaction Questions and Changes from Previous Years
  4. DOE and NERSC Scaling Initiatives
  5. Web, NIM, and Communications
  6. Hardware
  7. Software
  8. Training
  9. User Services
  10. Comments about NERSC

Respondent Demographics

Number of respondents to the survey: 326

  • Respondents by DOE Office and User Role
  • Respondents by Organization
  • Which NERSC resources do you use?
  • How long have you used NERSC?
  • What desktop systems do you use to connect to NERSC?
  • Web Browser Used to Take Survey
  • Operating System Used to Take Survey

 

Respondents by DOE Office and User Role:

OfficeRespondentsPercent
ASCR 20 6
BER 32 10
BES 89 27
FES 35 11
HENP 146 45
guests 4 1
User RoleNumberPercent
Principal Investigators 43 13
PI Proxies 37 11
Project Managers 17 5
Users 229 70

 

Respondents by Organization:

Organization TypeNumberPercent
Universities 182 55.8
DOE Labs 120 36.8
Other Govt Labs 12 3.7
Industry 6 1.8
Private Labs 6 1.8
OrganizationNumberPercent
Berkeley Lab 59 18.1
UC Berkeley 17 5.2
Livermore 11 3.4
Oak Ridge 9 2.8
Argonne 8 2.5
Brookhaven 8 2.5
Stanford 8 2.5
U. South Carolina 8 2.5
UC Davis 7 2.1
U. Wisconsin - Madison 7 2.1
Yale 7 2.1
NREL 6 1.8
SLAC 6 1.8
U. Colorado 6 1.8
U. Washington 6 1.8
PNNL 5 1.5
UCLA 5 1.5
Ames Lab 4 1.2
Inst National de Physique 4 1.2
Ohio State 4 1.2
Purdue 4 1.2
U. Texas - Austin 4 1.2
OrganizationNumber
Auburn U. 3
Cal Tech 3
Georgia IT 3
Harvard 3
Max Planck Inst. 3
New York U. 3
UC San Diego 3
UC Santa Cruz 3
U. Oklahoma 3
Vanderbilt 3
City College NY 2
Inst NdFN Italy 2
Iowa State 2
Johns Hopkins 2
Joint Inst Nuc Research 2
Mississippi State 2
New Mexico State 2
NCAR 2
N. Carolina State 2
Northeastern U. 2
U. Chicago 2
U. Maryland 2
U. S. California 2
U. Utah 2
UC Irvine 2
UC Santa Barbara 2
Warsaw Tech 2
Other University 38
Other Gov. Labs 8
Industry 6
Other DOE Labs 4
Other Private Labs 3

 

Which NERSC resources do you use?

Note that users did not always check all the resources they use -- compare the table below with How Satisfied are you? (sorted by Number of Responses).

ResourceResponses
SP 226
HPSS 145
NIM 87
Consulting 80
PDSF 79
HPCF Website 73
Account Support 55
Computer Operations and Network Support 26
CVS 16
grid 13
Newton 12
Escher 11
Visualization Services 9
Alvarez 9
The lattice gauge connection 1
SCS 1
not yet 1

 

How long have you used NERSC?

TimeNumberPercent
less than 6 months 61 19
6 months - 3 years 161 51
more than 3 years 95 30

 

What desktop systems do you use to connect to NERSC?

SystemResponses
UNIX Total 338
PC Total 195
Mac Total 60
UNIX-linux 229
PC-win2000 84
PC-winXP 79
UNIX-solaris 54
MAC-OSX 41
PC-win98 27
UNIX-irix 24
MAC-macos 19
UNIX-aix 15
UNIX-tru64 10
UNIX-hpux 4
PC-win95 3
Cygwin 2
UNIX-other 2
PC-other 2
OSF1 1
iMac 1
FreeBSD 1
alpha 1

 

Web Browser Used to Take Survey:

BrowserNumberPercent
Mozilla 91 26.1
MS Internet Explorer 6 90 25.9
Netscape 4 59 17.0
Netscape 7 46 13.2
Safari 22 6.3
MS Internet Explorer 5 15 4.3
Galeon 9 2.6
Konqueror 7 2.0
Netscape 6 6 1.7
w3m (Text browser) 1 0.3
Opera 6.0 1 0.3
Opera 7 1 0.3

 

OS Used to Take Survey:

OSNumberPercent
UNIX Total 164 47.1
Windows Total 145 41.7
MacIntosh Total 39 11.2
Linux 136 39.1
Windows XP Pro 70 20.1
Windows NT 64 18.4
Mac PowerPC 39 11.2
SunOS 19 5.5
Windows 98 10 2.9
DEC OSF 5 1.4
SGI IRIX 4 1.1
Windows 2000    

 

Overall Satisfaction and Importance

SatisfactionAverage Score
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4
ImportanceAverage Score
Very Important 2.5 - 3
Somewhat Important 1.5 - 2.4
Significance of Change
significant increase
significant decrease
not significant

Overall Satisfaction with NERSC:

TopicNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
Account Support Services 245 6.39 1.05 0.01 -0.04
Overall satisfaction with NERSC 298 6.37 0.88 0.05 0.12
Consulting Services 233 6.34 1.01 0.04 0.04
Network Connectivity 241 6.23 0.94 0.16 0.20
Available Computing Hardware 255 6.13 1.06 0.16 0.02
Mass Storage Facilities 207 6.12 1.10 0.08 0.07
HW management and configuration 210 6.07 1.07 -0.03 0.25
Available software 242 6.05 1.09 0.07 0.24
SW maintenance and configuration 213 6.04 1.20 -0.13 0.12
HPCF Website 213 6.00 1.10 -0.09 -0.18
Allocation Process 196 5.69 1.26 -0.15 -0.31
Training 94 5.04 1.26 0.05 0.12
Visualization Services 97 4.81 1.17 -0.02 0.30

Importance to Users:

TopicNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
Overall satisfaction with NERSC 267 2.84 0.39 0.00 0.02
Available Computing Hardware 236 2.84 0.41 -0.05 0.03
Network Connectivity 223 2.75 0.47 -0.00 -0.07
Allocation Process 183 2.68 0.56 0.11 0.01
HW management and configuration 187 2.66 0.56 -0.05 0.04
Consulting Services 227 2.59 0.61 0.01 -0.05
Account Support Services 230 2.54 0.60 0.07 0.10
Mass Storage Facilities 190 2.53 0.62 0.02 0.05
SW maintenance and configuration 194 2.52 0.62 -0.02 -0.08
Available software 222 2.50 0.64 -0.03 -0.06
HPCF Website 197 2.39 0.64 -0.13 -0.10
Training 116 1.80 0.79 0.00 0.28
Visualization Services 125 1.60 0.75 -0.02 -0.11

Why do you compute at NERSC?

[Read all 229 responses ]

Many of the answers were along the lines of "to run my codes in order to get my science done". Users pointed out that they need powerful compute resources that they can't get elsewhere. Many users specifically mentioned large numbers of processors or parallel computing as a reason to compute at NERSC. Turnaround time (getting results fast) is very important. Data analysis, especially in the context of PDSF computing is also a common theme. One user even pointed out that the time is "free".

139 Need lots of compute power to do science (focus on compute resources)
32 Focus on data analysis / PDSF
31 NERSC is a well run center / provides good services:
20 Need lots of storage and compute power
10 Need large memory (and lots of storage and compute power)
8 Need to test / install / maintain software
3 Other

Security

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
How satisfied are you with NERSC security? 261 6.37 1.09 NA NA

Has security gotten in the way of your work at NERSC?

No. of Responses
No 217 (90%)
Yes 23 (10%)

If yes, how?

[Read all 26 responses]

10 Access problems (ssh, secure ftp)
6 Password problems, Failed login attempts
3 HPSS access difficult
3 Grid, Distributed computing, Collaborative computing
3 It's inconvenient
1 Security is OK

How do you compare NERSC openness and access to your home site and others?

[Read all 146 responses]

49 (34%) Similar openness
41 (28%) NERSC's openness / security measures are good
23 (16%) NERSC is more open / more flexible
13 (9%) NERSC is less open / more restrictive; too secure / too complicated
2 (1%) Concerns about NERSC security
2 (1%) Home site is secure
1 (0.7%) NERSC is less secure
4 (3%) Other

Flexible Work Option at NERSC

Beginning in May, NERSC started participating in Berkeley Lab's Flexible Work Option (FWO) Pilot. FWO means some staff work 9 hours a day and are off one day every two weeks. NERSC always has on duty qualified staff for all areas.

Have you noticed any changes specifically due to the FWO participation?

No. of Responses
No 220
Yes 0

If so, describe your experience. 2 responses

 

Why do you compute at NERSC? 229 responses

Note: the responses have been broadly categorized by response category and also color-coded by topic. Individual responses have sometimes been split across several response categories (this is indicated by ...). The response categories are:

139 Need lots of compute power to do science (focus on compute resources)
32 Focus on data analysis / PDSF
31 NERSC is a well run center / provides good services:
20 Need lots of storage and compute power
10 Need large memory (and lots of storage and compute power)
8 Need to test / install / maintain software
3 Other

The color codes are:

  • Need powerful computing resources / good hardware management, good uptime
  • Focus on the number of processors / scaling
  • Focus on parallel computing
  • Good network connectivity
  • Focus on getting science done
  • Focus on data analysis
  • User support, good staff
  • Job scheduling / turnaround time
  • Can't get the research done on local systems / couldn't do the research without NERSC
  • Documentation
  • Software / user environment
  • The time is "free"

 

Need lots of compute power to do science (focus on compute resources): 139 responses

we need to test our program to see how well it scales up (up to 6000 cpu - if possible) since we need these kind of compute power to simulate the whole ecoli metabolism reactions in real life

The chief attraction is the sheer size of seaborg. Being able to run on 1024 or more processors allows our group to look at cutting-edge problems.

One of the very few sites where I can use several 1000 processors.

Large scale computation using 1000 PEs for days

NERSC is very important for my research because of the larger number of processors available for computations. Requesting 512 or more CPUs is very possible at NERSC.

I need access to large numbers of processors (> 512) to run my scientific computations

To have access to a large (larger than 200's) of PE system. This provides the compute power required for the calculations.

This is our only computer resource, and our research does need large scale parallel computer (> 64 processors)

Need to do parallel processing on a minimum of 16 processors

For production runs of Quantum Monte Carlo simulations on the SP-cluster, since it has 16 CPUs on each node.

Lots of processors!

Availability of large number of nodes on the SP.
Homogeneity of the cluster.

Because I can use a large number of nodes.

The total number of processors it has provides a necessary condition for our large scale computation.

I perform large scale mhd and pic simulations of magnetically confined plasmas for fusion applications and need a massively parallel computer to carry them out.

I need massively parallel computers to run particle simulations.

The main reason is the high speed of computing using parallel computers. It allows us to achieve the numerical accuracy needed for the beam-beam simulation at very high beam intensities.

Large parallel code and multinode computing for cosmological simulations. Mainly run several simulations in parallel to produce a large number of simulations fast.

Because there is a very fast parallel computer: seaborg.

The ability to perform massively parallel simulations

We need the cycles and can run more effectively on a massively parallel architecture.

... I use a sophisticated (and complicated) MPI+SMP fortran (and C, C++) program that computes the radiative properties of Supernova, Nova, White Dwarfs, and whole slew of other astrophysical objects. This code works very well on the IBM and parallelizes very well. I also have had a long (happy) history with IBM hardware. ...

available high-performance parallel computing

Staff on my project use NERSC to perform DFT calculations of defect properties in ceramics and to carry out Molecular Dynamics simulations of high-energy ion-solid interactions. NERSC provides the parallel platforms that are otherwise unavailable to my project.

I need the parallel computer facilities to run a chain of "consecutive codes" in an input-output structure. I also need to calculate a high number (up to 20,000) realizations of our modeled system to obtain good enough statistics.

My problems need a stable, large-scale parallel computer. NERSC is the best the system of which I am aware to which I am likely to be granted access.

Advanced electronic structure calculations are extremely demanding to be run on single CPUs. Seaborg offers an ideal platform for parallel computing.

Need for massively parallel computations

we can do large scale parallel computation work on nersc. For current particle acceleration experiment, simulation can help to design the experiment, and predict the experiment results. With the help of super computers and parallel algorithm, we can simulate the plasma acceleration experiment more accurate and faster.

NERSC is our only large scale parallel computer

Parallel computing using SP for high resolution numerical simulations of carbon dioxide disposal in geologic formations

Because it is the only place where I can perform my simulations in parallel processing

I need access to highly parallel computing resources.

Parallel computations, ...

Need lot of computing time for transport calculations. Huge time saving (trivial parallelization) using the computer farm (batch system).

Need massively parallel machines ...

My work focuses on object-oriented high performance computing. NERSC provides the hardware to test and run my code on a massively parallel computer.

Parallel computers

For fast and massively parallel calculations

I compute at NERSC as I need the parallel processing power of pdsf to run my physics simulations and analysis. It would take far too long to run on a single machine.

I have been using Alvarez for parallel processing of electron micrograph images

Groundwater simulations. Important because of the large computing resource.

I'm running time-demanding quantum chemistry calculations. These calculations would not be possible on a single UNIX machine.

In years 1997-98 I used Cray T3E to develop lattice quantum theory of scattering processes and performed extensive computations of scattering cross sections. These results, considered today as benchmark, wouldn't be possible without the NERSC support. Since 2001 I have been using IBM SP. My current research concerns parallel discrete-event simulations (PDES) for stochastic modeling of information-driven systems. This research opens a new interdisciplinary area between non-equilibrium statistical physics and computer science. It contributes to basic research in non-equilibrium surface growth and complex systems as well as being a pilot study that applies methods of surface science to performance studies of algorithms for PDES. The access to the NERSC computing facilities allowed me to perform large-scale simulations that wouldn't be possible otherwise. Currently, this study is in the publishing stage. I very much appreciate the opportunity of having access to the NERSC facilities. I hope to use IBM SP and visualization services in my future computational projects.

I compute at NERSC because adds more computing time for my calculations

IBM SP is the only machine available to me that can handle my simulations.

Need computing power

Because they give my boss computer time :)

large scale astrophysical simulations that cannot be run on a serial machine or a cluster with relatively few CPU's

Free and fast

Size of the system - enabling large calculations

I need two orders of magnitude more FPO's than is available to me from other computers (and reasonable turn-around time).

As an experimentalist in Surface Physics collaborating with a theorist, large supercell calculations are an important part of my work. Without the computer resources required to perform these calculations, many advances in the field of surface science (including some of our work) would not be possible.

NERSC is an essential resource for our large-scale atmospheric modeling under our DOE grant. NERSC provides the computing power needed for our work.

Heavy computer time simulations

I do Monte Carlo simulations related to the Relativistic Heavy Ion Collider. These are large scale simulations that require many machines and a lot of computer time to accumulate statistics.

The problem size and runtime are big that massively multiple machines such as the NERSC SP2 are needed for computations.

To allow large simulations of complex regimes to be analyzed quickly

Computing essential to research

One of two places nationally that has the power to do my problems (to which I have access).

Fastest computer I have ever used!

For large computation that my computer cannot handle

Need capability beyond single CPU desktops --- the ability to do >100-hour runs in less than a day can be very important.

NERSC is the major computing resource available to us. It provides much-needed computer power that would otherwise be unavailable.

Seaborg allows significant acceleration of our large scale simulations of atomic structures.

It is faster and more stable than other resources I can use.

I compute at NERSC because it saves me valuable time. It also allows me to run test simulations to see if there is anything wrong with my input files before running a complete simulation.

Large scale SPDE simulations.

I get simulations done much faster than in any other computing facilities, saving a lot of time.

NERSC has the most powerful computer hardware. This makes it possible to carry out the most demanding scientific computations.

The NERSC IBM 'seaborg' is the fastest and most efficient (turnaround time) machine available for my research. Without it, my scientific output would drop by a factor of 4.

The facilities and computer time are not available elsewhere

These SPs have the most cpus per node and show the best scalability among available machines. ...

Availability of large scale computing power that is not really available elsewhere!

it is faster than others

fast, large-scale calculations of excited state electronic structure - one-two orders of magnitude faster than desktop

Large resources make speedy work. ...

It is pretty much the only facility where I can do computations around. Besides, the computing speed is second to none. However, waiting time on queue seems to grow significantly, considering last year and this year.

I can perform larger simulations, with faster turnaround times, than elsewhere. ...

NERSC provides much needed processing time

the other computer resources that I usually use are too small for a lot of tasks

Perform large-scale fluid flow and head flow simulations which can not be done on other computers.

large amount of jobs submission.

Do the large scale problem

NERSC is the best supercomputer for civilian research.

This is the most powerful computer system I have access to.

supercomputing power

a lot of resources!

NERSC provides extremely fast, powerful computing services. ...

Computing at NERSC forms the core of my basic research. The simulations we run on the IBM-SP are indispensable to our entire programme.

Big codes need big machines, this is one of the few available to me.

It allowed for large computations leading to the design of the new detect (EMCal for ALICE experiment at LHC)

Fast computing, and possibility of running many jobs at once.

It's where our group is running large simulations to which I'm bound. Availability of SMP nodes is important.

I use NERSC IBM computer for large scale calculations, not amenable on the computer at NREL

I have a research group composed of 5 Ph. D. students, two post-doctoral associates, and several undergraduate students. Most of them have accounts at NERSC. It is very important for my research (purely computational) to have access to NERSC computers.

I run Quantum Monte Carlo jobs for electronic structure. NERSC is very important for me because it gives me access to an important amount of computer time, which is basic for my research

I am studying proteins with dft and molecular mechanics, I need to use many computers in order to get results.

NERSC runs the largest computing system for scientific research in the United States.

For me it is a very important tool to do my research. I use NERSC to make runnings of molecular dynamic simulations.

NERSC allows me to conduct massive calculations that would be impossible with my in-house facilities.

It provides the high end computing facilities that are essential to my research

NERSC has fast supercomputers ...

Running climate model for climate simulation needs supercomputer

Access to large SP computer.

Use seaborg to conduct computations for one of SciDAC projects.

I am working on my thesis and doing virtually all my calculations through NERSC.

The primary allocations for our project are seaborg

NERSC offers me unmatched computing resources. I simply cannot do my research without NERSC.

I am computing at NERSC because the scientific problem I am working on (computational nuclear structure theory related to DOE experimental programs) would take too much CPU time on my local LINUX workstation (about 3 weeks compared to 20 hours at the IBM-SP at NERSC). Therefore, being able to compute at NERSC is essential for my research productivity.

Need more CPU than I can get locally; run into inconvenient CPU time limits often.

access to fast computers

The only facility adequate for large Monte Carlo transport calculations

NERSC is the only large-scale computing facility that is generally available for fusion energy computation. As a computational physicist, these resources are critical for my group's work.

I do big simulations. DOE gives me a big allocation. Where else could I go? I am a computational physicist, so NERSC is vitally important to me. DOE and NERSC take good care of me.

we do climate modeling. In order to do the experiments to evaluate physical parameterizations on climate time scales and have them in a reasonable and useful time requires a computer of the scale of the SP. A local workstation would just take way too long to be useful and would not allow the number or different simulations that are required not the horizontal and vertical resolution required.

Without NERSC, I can't complete numerical calculations within reasonable times. In some cases, the calculations become impossible.

Multiprocessor simulation for SASE FEL Development.

Larger scale computations than I can do at home.

most powerful computing resources available to me

we have to run very large jobs beyond the local computing capability

We require for climate model studies top of the line HPC.

NERSC has fast computation facilities which is essential to my research.

Because NERSC can supply the computation power needed for my research.

I need run my code on SuperComputer

Very fast machine ...

Save some time to get the job done.

the powerful resource NERSC can provide is the most important reason, We can't do our research in our university machines.

I am working on Monte Carlo simulations, with some important modifications to try to do some new and valuable things.

Because I am doing my primary research using NERSC seaborg.

We perform molecular dynamics simulations of large protein molecules and protein complexes.

Running Quantumchemical Simulations for my PhD research

Ab initio and MD calculations.

I compute at NERSC because it allows me to do physics that would be impossible otherwise. NERSC resources are directly related to my competitiveness within my community.

I am doing on a SciDac Project to simulate the chemical reaction of spray combustion. As the reaction is very stiff, we have to use very fine grid and very short step step to perform the simulations. NERSC is very important to the successful completion of the project.

for my Ph.D. research

I use it for research in support of the Terascale Supernova Initiative, a DOE-funded project administered through the SciDAC program.

NERSC resources are essential for my research

Mathematica and Matlab calculations, some big protein structure calculations. I wish SAS was still on the new newton. also the control system toolbox for matlab isn't available. I find the system difficult to learn and use; for most big calculations it is easier to use my local linux cluster and just wait awhile for the calculations to finish. [math server user]

  Focus on data analysis / PDSF: 32 responses

Some STAR data only available at PDSF

KamLAND data analysis

work for the STAR experiment at BNL, doing analysis

STAR data Analysis

One of only two sites where our data is stored

It's important for the data analysis of my thesis project.

PDSF is the main analysis site for our experiment. HPSS is used as our main data archive system.

Component separation in astrophysical data; CMB power spectrum estimation from CMB data; Cleaning CMB data from systematic effects

We transfer data from Brookhaven National Lab to NERSC for star computing.

I use the PDSF system to work on STAR data. This is much easier than trying to work at BNL over the network.

data reduction and analysis for STAR experiment

This is for a practical reason. All the embedding data files I use are at pdsf

ATLAS grid

Data retrieval

As a member of STAR, I need to use pdsf for my analysis work

analyzing huge data sets of observations of the cosmic microwave sky

STAR computing

physics analysis

I use the CMIP data that is exclusively archived at NERSC HPSS

I am a member of the STAR collaboration and PDSF represents a major fraction of our computing resources.

All of our experiment's data is on HPSS/PDSF.

I am a member of the STAR Collaboration. The embedding jobs are run at NERSC

RHIC data analysis.

Data analysis and simulations for ultra-relativistic nuclear collisions, STAR & RHIC

availability of PDSF for massive data and simulation productions

PDSF

To do US ATLAS Grid Computing. We (U. Oklahoma) are one of the 11 testbed sites, just like LBL is one.

Part of the data I use are located on nersc disk (and only there). I never had problems to do what I needed to do.

I am a member of STAR experiment at RHIC

Software development and analysis for the KamLAND experiment NERSC is our main computing resource and we couldn't do without it.

Research for STAR

STAR

  NERSC is a well run center / provides good services: 31 responses

I am involved primarily in capability computing. The NERSC facility provides an excellent balance of high performance networking to raw compute power which I am unable to find in most commodity cluster environments.

Simply without access to NERSC I could not complete most of my research projects. My group makes enormous use of NERSC and over the last 22 years we have come to rely on NERSC for access to parallel supercomputers, access to novel and useful algorithm packages (QD and SuperLU) and access to advice on tricky technical questions.

It is one of the best in the whole world; much better than NCSA.

We have performed some grand challenge climate model computations (High resolution climate modeling and Coupled carbon cycle and climate modeling) recently. These are unclassified work. NERSC has been one of the primary resources for these computations. When I need to perform computationally demanding climate model simulations, NERSC has provided the best solution for me.

NERSC provides a parallel platform with reasonable power per processor and a queuing structure that makes it possible for users who need an intermediate number of processors (64-128) to compute.

NERSC is by far the best computing service I have ever used. I've been at NPACI, NCSA (and way back in time to the Cornell Theory center). While these other places are fine, NERSC has always been on top of every thing. Consulting Services at NERSC contains the most thoughtful and responsive group of people around. I've used the consulting services to deal with software issues on several occasions and 100% satisfied (give those guys a raise!). ... And, most importantly, my work gets done at NERSC and done fast!

NERSC is a well maintained and powerful service that is essential for my research analysis. With our data volume, the HPSS system is a necessary tool. Processing our data with the PDSF batch system is reliable and simple.

... Also the long queue times available on Seaborg are really nice for long running big jobs, which you don't want to checkpoint and restart too often. Recently I also started using Escher for some post processing on large amounts of data produced on Seaborg. I was really pleased with the assistance of the consulting service in figuring out how to best transfer the data and for allowing me to use most of the disk space on Escher for an extended time.

... maintained STAR experiment software

availability of math libraries, powerful software, good consultants

Queue wait times seem reasonable (they are generally quite short) for all job sizes. The quick turnaround on jobs makes everything much easier, from development, through debugging, and onto production.

excellent queue response. all kinds of production job sizes in my experience start very quickly. queue time limits are very nice, in particular the 48 hour limit.

NERSC is one of the most user friendly and powerful centers in the country. Having computed or moved data through most of the other centers, I say this from experience. The support staff, particularly in mass storage, have been extremely helpful in setting up a data archive which has had a tremendous impact on my field (lattice gauge theory).

... Many resources are well documented, especially on the web, and very handy to use.

Less portability problems than on many other super-computers I use, predictable queueing policies

... Good support. Environment is maintained well and has very little downtime.

... SP machine is stable, and well-managed. Effort of getting time is well worth it.

... I have found NERSC resources very easy to learn how to use to get what I need to done and am very happy with it in general.

... Eric Hjort and Iwona Sakrejda are _extremely_ helpful and prompt. Super user support compared to Brookhaven/rcf.

... It is nice to get real-time answers.

NERSC computing tends to be stable and reliable.

... availability of math software (Matlab).

Consistent turn-around time. Availability of short, fast debug cycles. Consulting

NERSC has ... great support.

NERSC offers a flexible, significant, large scale computational facility. In short, we can do serious, cutting-edge research in a timely fashion at NERSC.

NERSC has excellent computing facilities, it is easy to use, and has been reliable.

I worked for a summer using Seaborg to run computational chemistry simulations. I enjoyed using the computational resources of NERSC. The account support people made it easy for me to get set-up and working quickly.

...with the necessary support.

Most of my simulations require a mainframe. NERSC is an excellent central computing facility, with first-rate support.

... The documentation on the website is excellent, and the support staff has been great at filling in any gaps and at clarifications.

... with STAR library loaded.

  Need lots of storage and compute power: 20 responses

The performance and storage resources are necessary for the research project.

Analyzing large datasets and performing large simulations not feasible on a desktop.

PDSF offers huge parallel cpu power, and there is easily accessible mass storage space (>100 GB) for users. ...

Massive computing for analyzing CDF data.

high performance computing and mass storage

Because of the computational resources and the available disc space.

I need to process lots of data fast.

I need the computing power made available by seaborg to do my research. The large amount of disk space also makes it possible for me to do more complicated analyses.

NERSC is our major source of high performance computing. HPSS at NERSC is the major backup system for our projects which are sometimes spread over many sites.

Lots of CPU; Disk space ...

Lots of compute power on PDSF, lots of storage on HPSS.

I'm in charge of data processing for the KamLAND experiment and NERSC is a vital resource in this endeavor. We're big users of th e batch queue and HPSS, two systems without which our job would be impossible.

CPU power and large scratch disks to cache data samples.

I use nersc to process data from HPSS and run any jobs that would take longer than 10 minutes on my computer here. ...

Availability of mass data storage and large number of computer nodes for running many jobs at once.

Our data sits there (STAR).
Closest batch job facility.

I perform all-electron fully relativistic Dirac-Fock calculations for the electronic structure of systems of superheavy elements with hundreds of electrons ( an example is Seaborgium Hexabromide molecular species with 316 electrons!). The formalism we developed in 1975 can handle such problems but the resulting gargantuan calculations require huge amounts of CPU even on the supercomputers. In addition disk storage of ~ 20-50 Gbytes may be needed for each calculation ( which is freed after the completion of the calculation,of course). Since experimental chemical ( and physical) information about the man-made superheavy elements is scarce due to their very short life ( ~ a few seconds at best) and production of ~ half a dozen atoms. Therefore atomic and molecular systems of these superheavy elements are ideal systems to be studied theoretically , and we can make predictions about their chemistry and physics. We have been studying the chemistry ( and physics) of the superheavy elements Rutherfordium ( atomic number Z=104) to the primordial superheavy elements E126 ekaplutonium ( Z=126) and their numerous compounds. The results have been mentioned in the various ERCAP applications and published in open literature and featured in Chemical and Engineering News as News of the Week ( see Dec 16,2002, issue of Chemical and Engineering News). Such calculations can only be performed at the state-of the art superb world-class supercomputing facility like NERSC and nowhere else! This is my motivation for using NERSC facility, which is sine quo non for my theoretical research in the prediction of the Chemistry of man-made superheavy transactinide elements with Z=104-126.

NERSC is the ideal facility for the computing I have been doing. I usually run jobs on 1-4 nodes of the SP, generating large output files. The connectivity between the SP and the HPSS is splendid and allows me to use these large files several times, which saves on computing time. My jobs have usually been going through the queue without much waiting. ...

The large storage resources of HPSS are critical to storing our terabytes of astronomical images and PDSF is similarly vital to processing all of that data. The resources to provide either on our own would be significant obstacles to our scientific research.

Fast computing facility with massive storage.

  Need large memory (and lots of storage and compute power): 10 responses

3d simulations need memory

We need a large memory and fast machine to do first-principles calculations and collaborative works with Dr. L. W. Wang at NERSC.

I use NERSC (Seaborg) for very large scale simulations, that is difficult to fit into memory on most other machines I have access to. ...

Thc computer is fast, and has large memory. The archive is very good. Our job needs all above properties.

Enough memory available for high level calculations

Our large scale simulation of accelerator structures requires huge amount of memory and CPUs. NERSC IBM SP2 provides the parallel runtime environment and suitable amount of memory and CPUs for our simulations.

Large simulations requiring more than 500 GB of RAM. NERSC is one of the only sites with this capability. Smaller simulations can be done elsewhere.

Because I am running jobs that are rather large (>10Gbytes) and computationally intensive, which makes it impossible to run them on a single machine.

It has a lot of nodes with a huge amount of memory. I could not solve my problems in a common PC cluster because of lack of memory.

Big jobs that require large memory and more CPU time.
To get fast turn around time.

  Need to install / test / maintain software: 8 responses

I have only used NERSC to do some routine maintenance on NetPIPE (http://www.scl.ameslab.gov/netpipe/).

I am a staff and I do work on projects that require NERSC computing resources - HPSS/PDSF. I work however, mostly on Linux and Solaris workstations.

I work on domain decomposition methods for numerical PDE's. Parallel scalability is one of the main concerns in this area and it is very important for me to run experiments in many processors.

For testing the performance and scalability of programs intended to use for benchmarking on the IBM SP up to large number of processes.
to develop performance prediction methods
to develop system characteristics and numbers that describe the usage of a parallel system ( Peak Application Performance PAP, Reached Application Performance)

I use the NERSC computers to test the performance and scalability of numerical software.

To setup and validate atlas servers at CERN with offsite clients

I am developing software ( in Titanium language ) for serious PDE simulations.

CCSM software testing.

  Other: 3 responses

I do not do computing at NERSC, only use the mass storage facility

I use the NERSC in the frame of the CMIP project... I did not compute on the NERSC system, but only FTP to get the data. I find the procedure very complicated and not well documented on the information I was looking for. A good thing that I was able to contact the people there by phone and that they were very helpful.

I used to use Killeen, but I don't really use anything anymore.

 

If security has gotten in the way of your work at NERSC, how?

Access problems (ssh, secure ftp): 10 responses

I haven't had any problems so far. But I don't understand the need of the exclusive use of ssh protocol 2. I expect to have some problems with connections from both erc.msstate.edu and from my PC at home. I will deal with it in October, when the transition v1-to-v2 occurs.

In the beginning we had to set up port forwarding in order to use CVS to check out our code on seaborg making it somewhat cumbersome. However this has changed and is not longer necessary, so this complaint has disappeared.

with the retirement of FTP, I had to look for and install new software to access my NERSC login, and to transfer files from my Mac to/from NERSC. This conversion process was quite painful for a while..

Because of a lot of security measures, tunneling through html and perl doesn't seem to be allowed, which one of my simulations programs needs.

complaints when I log in directly from a foreign dial-up connection when traveling. now I log in to a university computer first and then connect from there.

Sometimes I had troubles displaying graphics on my Mac even using SSH and MacX. I tested the same SSH+MacX combination else where from my computer and had no problems.

Encrypted communication.

It take me a long time to connect there and to get the data I needed... Sometime, it does not work, and I have no idea why... Sometime it does... and I do not know why...

Very frequently, I receive a message saying the host key has changed and there might be a "man-in-the-middle" attack. I ignore the message, but it worries me every time I see it.

Too many complicated procedures to set up connections to mainframes, to mass store, for file transfer, etc....

Password problems / failed login attempts: 6 responses

Unwanted login's (not successful) where detected.

expiring passwords

I think that in general the security at NERSC is fairly good, while not being overly draconian. I have just one serious complaint: The new (DOE?) directive to disallow shared accounts is giving us a serious headache in terms of how to do our data analysis. Typically, our data is reconstructed during a 1-2 month period, once every ~6months. We would like to share the responsibility of doing the reconstruction among a small group of people, rather than one individual. The granularity of group permissions is simply not adequate to achieve this task. A shared account would make it much easier for us. It is also not clear to us what the reason is for disallowing shared accounts. If it is accountability, then there are other ways of implementing something that would maintain a trail of who logged into the account at a given time.

I did not log in for 2 months and then I cannot change my password what was required. Problems!

My login to seaborg is disabled about 50% of the time, forced me to login and work from sadmin.nersc.gov much of the time.

Minor Suggestion:
I'd like the number of times I'm allowed to mistype my password to increase from 3 to around 5-8. My passwords are random, and I'm a bad typist. I'm always afraid of being locked out due to mistyping or misremembering the password, although it's only happened once.

HPSS access difficult: 3 responses

The current access to HPSS from outside through ftp has severe limitations. One difficulty is when one wants to access HPSS from multiple systems sharing a common home file system, but independent data disks. The second deficiency is that ftp has no mechanism for copying directory trees from one system to another.

the initial setup for mass storage access was painful. It is however well documented and once setup works easily. I made the mistake of not setting it up prior to an important job. When I had the data I thought that with my account login I could just use mass storage. I started to sweat quite a bit when I realized that this didn't work. I was afraid the system might delete my data before I had a chance to transfer it to mass storage. It worked out OK in the end.

The HPSS password "keys" wasn't well documented on the web and caused me a few calls to the consultants. The lack of documentation may have been part of the security, though, so although it took me a bit of time, it was probably necessary.

Grid / distributed computing / collaborative computing issues: 3 responses

Hardly any distributed computing work can be done on networked workstations that try to access NERSC.

There are needs for group and role-based work that is not handled very well with just the unix uid/gid capabilities. There are some new developments for helping this type of work in grid software that will be helpful. I hope that NERSC could be more proactive in this regard.

Strict security and firewalls and the likes tend to get in the way of Grid computing, since used ports aren't always predictable, and we have to find ways around it. Luckily, that's usually possible, just takes some time.

It's inconvenient: 3 responses

A bit of inconvenience that's all

I am located in Munich, Germany. All security matters which require a phone call during the Pacific time zone working hours are difficult due to the 9 hour time difference.

Security enhancements usually require a little time and effort to understand and adjust to the new regime (typically a few hours to a day).

Other (security measures OK): 1 response

Security appears adequate without being a major obstacle.

 

How do you compare NERSC openness and access to your home site and others? 146 responses

Similar openness: 49 responses

Very similar.

Comparable

Comparing with the erc.msstate.edu, I'd say, they are both very secure. I haven't experienced any problems yet.

Comparable.

comparable

Similar

About the same

About the same.

The security is good. I don't really have anything other to compare it to besides the University's site. Comparatively, it's just as good, if not better.

I have seen no difference in the security measures; they seem equivalent to the other systems I work with.

No comments really. I access NERSC computers the same way I access all other computers with SSH.

It is comparable and in some ways more open than my home site. Other sites vary.

very similar

similar to LBNL

about the same

computer security and access at nersc seems on par with LBL's network ...

Comparable.

Roughly the same level.

About the same.

more or less equivalent, slightly better maybe

Security seems about the same as at most similar institutions.

Similar

It is comparable

Similar

I do not feel much difference as of the access.

similar or perhaps more secure

similar

About the same.........

Comparable

Nearly equal

Computer security at Berkeley is very tight. NERSC equals or exceeds their high standards.

Similar.

About equally hard compared to my other large-scale computing on my university.

The same, I'm on-site.

No noticeable difference to me.

Similar to my home institution (LBL). ...

Comparable

similar

About the same.

Fermilab uses kerberos V. After the snags of the first few months of its implementation has provided a successful environment of operation. Access of the NERSC facilities allows for comparable freedom, but the problems with alvarez security where annoying (and we haven't had such a large scale problem at FNAL, although it might be luck)

Similar

Comparable

comparable to our campus computing facility.

Comparable

home site has comparable level of security

Very similar.

NERSC computer security is similar to my home site, both are quite reasonable.

About the same, as far as I can tell. I just do what the experts tell me to do.

Similar. NERSC security doesn't seem particularly intrusive, nor does it seem lax.

NERSC's openness / security measures are good: 41 responses

Very balanced: openness vs. security

Use of SSH provides good security while not making things too difficult, due to the wide availability of clients.

Seems secure, but I am not an expert.

The network is very fast and the access via ssh very convenient

it's fine

I use secure ssh, and had no trouble.

Very good

Well, I use ssh everywhere (if I could start my car with ssh I would!) I don't know what the rest of the people on campus are using. I have never had any problems getting info to and from NERSC. I think the level of security at NERSC it pretty good. ...

Ok.

pretty good

ok

excellent.

The openness of NERSC is excellent.

excellent, I haven't noticed any problems since the access has been restricted to ssh only

I think it is good.

I've never had any problems in regards to security when using NERSC machines.

SSH seems to be the best system.

NERSC has easily usable yet secure facilities.

Great

I think NERSC does a fine job of balancing security with ease of use and openness.

I am very satisfied with openness and access.

Not aware of any problem.

I think it quite secure (I wish I can say "very secure"....but I can't since I just use NERSC facilities less than 3 mos) ... but so far I think it's quite secure.

To my estimation, NERSC security is pretty tight, but necessary and not a barrier of my work.

good

NERSC computer security seems very reliable and efficient.

I think NERSC has taken a "common-sense" approach, which does not hamper productivity

Excellent

Very good.

I do not really concern about the security problem. But I think nersc is good because I never lose my data, and I was asked to changed my password often.

i think that it is one of the best about the support, the security, the visualization, and the computational time.

NERSC has struck a good balance between security and usability in my opinion.

We are fully ssh so it is seamless security between my computer and NERSC. It seems like this should be the standard everywhere, so NERSC compares favorably.

NERSC is open without being sloppy. I think the security is effective without being intrusive.

It is not necessary to compare any computer facility with NERSC! If we had accessibility at a facility similar to NERSC, it would be not sensible to use NERSC! The very fact that thousands of scientists throughout the world have been using and still continue to use NERSC speaks for itself about the user-friendly atmosphere at NERSC without which nothing much can be achieved in scientific research by thousands of users of NERSC.

very good

Good

OK

well maintained and very good

Basically security is not an issue to me for now, so all the machine's security openness are enough for me.

I am for.

NERSC is more open / more flexible: 23 responses

It is much more flexible than Livermore's policy.

From discussions with my colleagues, NERSC has a security openness unique to DOE labs. Frankly that make NERSC computers "user friendly" and allows us to get our work done without undo extra work arounds. I think that openness is a crucial part of making NERSC a success. While I realize that security is important also.

I think NERSC computer security openness and access is better than what we have here.

Much more easily accessed than others I have experienced and have not found this to compromise security

NERSC computer security openness/access is probably a bit better than the security environment here at Yale Physics Department.

... with both being far superior to BNL's network, which is fraught with obstructions (closed ports, restricted gateways, etc), in ease of use, performance, and security.

Better than Rhic Computing Facility (Brookhaven)

better than RCF. its nice to be able to sftp to/from the current host, and not have to go through a gateway machine or dedicated sft machine.

NERSC is doing an excellent job when compared to other DOE unclassified computing facilities.

Better

It's much simpler to access NERSC than comparable machines. The ssh access and other security features are totally transparent and do not cause any trouble for me or reduce convenience. I am very happy with the setup.

NERSC security is much less bothersome that security at other sites I'm familiar with.

For the responsible user, NERSC is a model of transparency from the security point of view. The other place I work intensively is the RHIC Computing Facility at Brookhaven, where security is obtrusive and in-your-face. It is apparent that NERSC is as secure, but with much less overhead for the user. Good work!

Access is much better as well as openness. The access to NERSC is straight forward and simple compared to other large computing sites I have used.

NERSC beats the RHIC computing facility hands down. RCF is fairly draconian in its approach to security and makes it complicated to connect and use the resources. I much prefer working at pdsf (NERSC).

I have had no problems with NERSC security. I greatly prefer computing at NERSC to computing on any of the platforms at LLNL because security at NERSC is much less troublesome.

The NERSC site is more open than some sites that I use. I can directly connect to seaborg using ssh and sftp, whereas at some sites the firewall limits the direction that such connections can be made.

NERSC is more open than my home site. We have experienced numerous security problems in the Astronomy department which has lead to severe security measures which negatively impact the use of our machines.

... It is pleasantly more open than other sites I have dealt with.

NERSC's security process is less intrusive than at most places I compute, but appears to be just as effective.

favorably

I work at LLNL. Our security is more rigorous and requires more effort by the users and the supporting staff.

It is better than my home site.

NERSC is more secure: 15 responses

more secure than other facilities I have used, but easily accessible

NERSC is far more secure.

NERSC is on the top of the security level list of the places I use.

NERSC is much more secure

NERSC is much more secure

It seems to be more secure than my home site.

NERSC is more secure.

NERSC seems to be the most secure computing site that I use.

It is more secure in NERSC computers.

The NERSC computer security is better than my home site.

It is very secure compared to my computer

Although NERSC is more secure than my home site (U of Maryland), the extra protection doesn't adversely affect the openness or access. I'm quite satisfied with the NERSC security and, in fact, wish my home site would be as conscientious.

It is more safe.

Tighter, but not restrictive.

It appears much more secure than our university site.

NERSC is less open / more restrictive / too secure / too complicated: 13 responses

it is kind of too secure

Comparing against the Rhic Computing Facility, they allow the use of shared accounts. It makes organizing the analysis of physics experiments much easier (people take reconstruction 'shift').

It's a little more awkward, but this is mostly because it is a remote site and not the configuration per se.

The necessity for different passwords/access to the system and mass storage is not obvious to me as the user and I have never seen that at another site.

OK but seem to have complicated multiple passwords when I was asked to update my user profile. Required phone call and was still unsuccessful so ignored request.

NERSC is more restrictive than average, not allowing ssh 1.0 (I think it is). This often means that I have to scp FROM seaborg TO local machines, and not opposite. No big deal, though.

A bit too complicated access.

Much more complicated...

I have experienced some problems depending on the node used during the access. For instance, 128.55.128.37 was not accessible while 128.55.128.32 was. I use a Window application called cygwin.

NERSC security seems a little more conservative, but it really is no problem at all. My work does not require high security, but the current system has worked very well for me.

U. Oklahoma has a less restrictive firewall (or rather, ACLs), and we have full control over which ports are opened if we have the need for it. Of course, it's not a National Lab, so we can get away with slightly less security.

Slightly tighter (than CERN) which allowed telnet and ftp until quite recently (and still has an insecure ftp service but you will probably never guess which machine it is on). ...

Too cumbersome to access.

Concerns about NERSC security: 2 responses

... However, a few times I have requested my password over the phone. At that level, I thought the guy on the other end didn't ask me enough questions. At least on one occasion (more than a year ago) I felt like any body could have called and (with a little "social engineering") obtained my password. But maybe these guys have caller id?

... As I've forgotten my password (from now on I will risk writing it down) it will be interesting to see how long it takes to get a new one. At CERN/opal/atlas I would do this for a known colleague in my next coffee break. At SLAC/Babar I think it took a week and I had to make phone calls at midnight Birmingham UK time.

Home site is secure: 2 responses

Well, my desktop runs linux, and I don't have any security problem, as I update regularly all available patches, and I feel more comfortable with my desktop in that any graphics and programs can possibly run without any security hassle.

Our home site is pretty safe so far, primarily because of the restricted access policies we have been implementing very consistently. Therefore, we do not see our system as a potential loophole for NERSC.

NERSC is less secure: 1 response

Homesite computers are not reachable except via hosts.allow, which is very restricted. Therefore homesite is more secure than NERSC.

Other: 4 responses

I am an LBNL employee, so I'm not familiar with the other sites.

Unclear on what this means

It is the only choice for us to have access to a highly parallel real production system. Our inhouse system has only 130 CPUs.

usually NERSC is less crowded

 

Flexible Work Option at NERSC

Beginning in May, NERSC started participating in Berkeley Lab's Flexible Work Option (FWO) Pilot. FWO means some staff work 9 hours a day and are off one day every two weeks. NERSC always has on duty qualified staff for all areas.

If you have noticed any changes specifically due to the FWO participation, describe your experience: 2 responses

The only problem I encountered was a consultant working at home (probably not FWO). Here the consultant re-routed calls to his home that had a block on caller-ID. This generated a confusing message that lead me to believe that I had the wrong number - it wasted some of my time.

I don't use seaborg on daily basis. But, there were couple times when I really needed to run something on it, it was down accidentally.

 

All Satisfaction Topics and Changes from Previous Years

  • How Satisfied are you? (sorted by Average Score)
  • How Satisfied are you? (sorted by Number of Responses)

Legend

SatisfactionAverage Score
Very Satisfied 6.5 - 7
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4
Significance of Change
significant increase
significant decrease
not significant

 

How Satisfied are you?

Sorted by average score.

Topics from the Overall Satisfaction with NERSC section are indicated in bold; they tend to be answered by more users and rated lower than the topics in the ensuing sections that rate specific areas in more detail.

TopicNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
HPSS Reliability 126 6.61 0.77 0.10 -0.02
Consult: Timely response 207 6.55 0.73 0.04 -0.01
Consult: Technical advice 200 6.54 0.69 0.08 0.08
HPSS Uptime 126 6.54 0.79 0.17 0.21
Network LAN 114 6.54 0.67 NA NA
Consult: Follow-up to initial questions 186 6.49 0.75 0.10 0.12
HPSS Overall 134 6.46 0.84 0.07 -0.04
HPSS Performance 126 6.46 0.88 0.11 0.10
PDSF C/C++ Compilers 54 6.44 0.79 -0.02 NA
SP Overall 192 6.43 0.78 0.05 0.61
SP Uptime 191 6.42 0.83 -0.14 0.89
PDSF Overall 68 6.41 0.87 0.15 NA
Account Support Services 245 6.39 1.05 0.01 -0.04
Overall satisfaction with NERSC 298 6.37 0.88 0.05 0.12
Consult: Time to solve problems 196 6.36 0.84 -0.04 NA
Consult: Response to special requests 126 6.35 1.06 -0.05 0.12
PDSF Uptime 62 6.35 1.04 -0.16 NA
Consulting Overall 233 6.34 1.01 0.04 0.04
SP Fortran Compilers 152 6.34 0.97 -0.02 0.38
PDSF User Environment 55 6.33 0.77 -0.05 NA
SP Libraries 131 6.27 0.86 0.18 0.27
New User's Guide 137 6.26 0.86 0.05 0.32
SP User Environment 169 6.24 0.90 0.12 0.17
Network Connectivity 241 6.23 0.94 0.16 0.20
SP C/C++ Compilers 103 6.22 1.10 0.11 0.50
SP Disk Configuration and I/O Performance 156 6.15 1.03 0.18 0.48
Available Computing Hardware 255 6.13 1.06 0.16 0.02
Network WAN 100 6.12 1.02 NA NA
Mass Storage Facilities 207 6.12 1.10 0.08 0.07
NERSC Online Tutorials 121 6.07 0.99 0.10 0.10
HW management and configuration 210 6.07 1.07 -0.03 0.25
Available software 242 6.05 1.09 0.07 0.24
SW maintenance and configuration 213 6.04 1.20 -0.13 0.12
PDSF Fortran Compilers 29 6.03 1.09 -0.42 NA
RightNowWeb interface 109 6.02 1.07 0.08 NA
HPCF Website 213 6.00 1.10 -0.09 -0.18
PDSF Queue Structure 59 6.00 0.96 0.03 NA
SP Applications 94 6.00 1.04 0.30 0.33
PDSF Libraries 43 6.00 1.07 -0.24 NA
SP General Tools and Utilities 111 5.98 1.04 0.18 0.26
HPSS User Interface 127 5.98 1.24 0.03 -0.04
PDSF General Tools and Utilities 44 5.93 1.07 -0.11 NA
PDSF Batch Wait Time 61 5.93 1.12 0.19 NA
PDSF Applications 39 5.87 1.06 -0.34 NA
PDSF Bug Resolution 34 5.85 1.21 -0.15 NA
NERSC Training Web Pages 98 5.83 1.06 -0.06 NA
PDSF Ability to Run Interactively 64 5.77 1.39 -0.41 NA
PDSF Disk Configuration and I/O Performance 59 5.69 1.15 0.06 NA
SP Queue Structure 177 5.69 1.22 -0.23 0.50
Allocation Process 196 5.69 1.26 -0.15 -0.31
SP Bug Resolution 81 5.64 1.15 0.05 0.19
SP Performance and Debugging Tools 117 5.57 1.31 0.08 0.88
SP Ability to Run Interactively 162 5.57 1.49 0.10 0.86
PDSF Performance and Debugging Tools 42 5.31 1.39 0.06 NA
SP Batch Wait Time 190 5.24 1.52 -0.17 0.32
Escher 13 5.23 1.30 -0.15 0.15
Newton 15 5.20 1.37 -0.24 -0.27
SP Viz Software 37 5.08 1.46 NA NA
Training 94 5.04 1.26 0.05 0.12
NERSC Training Classes 24 4.88 1.15 -0.25 -0.67
Visualization Services 97 4.81 1.17 -0.02 0.30
Escher Viz Software 8 4.75 1.39 NA NA
Access Grid classes 27 4.67 1.41 NA NA

 

 

How Satisfied are you? (sorted by Number of Responses)

This ordering helps to indicate which services are used most by users (and is probably a better indicator than the services clicked for the question What NERSC resources do you use?

TopicNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
Overall satisfaction with NERSC 298 6.37 0.88 0.05 0.12
Available Computing Hardware 255 6.13 1.06 0.16 0.02
Account Support Services 245 6.39 1.05 0.01 -0.04
Available software 242 6.05 1.09 0.07 0.24
Network Connectivity 241 6.23 0.94 0.16 0.20
Consulting Overall 233 6.34 1.01 0.04 0.04
HPCF Website 213 6.00 1.10 -0.09 -0.18
SW maintenance and configuration 213 6.04 1.20 -0.13 0.12
HW management and configuration 210 6.07 1.07 -0.03 0.25
Consult: Timely response 207 6.55 0.73 0.04 -0.01
Mass Storage Facilities 207 6.12 1.10 0.08 0.07
Consult: Technical advice 200 6.54 0.69 0.08 0.08
Allocation Process 196 5.69 1.26 -0.15 -0.31
Consult: Time to solve problems 196 6.36 0.84 -0.04 NA
SP Overall 192 6.43 0.78 0.05 0.61
SP Uptime 191 6.42 0.83 -0.14 0.89
SP Batch Wait Time 190 5.24 1.52 -0.17 0.32
Consult: Follow-up to initial questions 186 6.49 0.75 0.10 0.12
SP Queue Structure 177 5.69 1.22 -0.23 0.50
SP User Environment 169 6.24 0.90 0.12 0.17
SP Ability to Run Interactively 162 5.57 1.49 0.10 0.86
SP Disk Configuration and I/O Performance 156 6.15 1.03 0.18 0.48
SP Fortran Compilers 152 6.34 0.97 -0.02 0.38
New User's Guide 137 6.26 0.86 0.05 0.32
HPSS Overall 134 6.46 0.84 0.07 -0.04
SP Libraries 131 6.27 0.86 0.18 0.27
HPSS User Interface 127 5.98 1.24 0.03 -0.04
Consult: Response to special requests 126 6.35 1.06 -0.05 0.12
HPSS Performance 126 6.46 0.88 0.11 0.10
HPSS Reliability 126 6.61 0.77 0.10 -0.02
HPSS Uptime 126 6.54 0.79 0.17 0.21
NERSC Online Tutorials 121 6.07 0.99 0.10 0.10
SP Performance and Debugging Tools 117 5.57 1.31 0.08 0.88
Network LAN 114 6.54 0.67 NA NA
SP General Tools and Utilities 111 5.98 1.04 0.18 0.26
RightNowWeb interface 109 6.02 1.07 0.08 NA
SP C/C++ Compilers 103 6.22 1.10 0.11 0.50
Network WAN 100 6.12 1.02 NA NA
NERSC Training Web Pages 98 5.83 1.06 -0.06 NA
Visualization Services 97 4.81 1.17 -0.02 0.30
SP Applications 94 6.00 1.04 0.30 0.33
Training 94 5.04 1.26 0.05 0.12
SP Bug Resolution 81 5.64 1.15 0.05 0.19
PDSF Overall 68 6.41 0.87 0.15 NA
PDSF Ability to Run Interactively 64 5.77 1.39 -0.41 NA
PDSF Uptime 62 6.35 1.04 -0.16 NA
PDSF Batch Wait Time 61 5.93 1.12 0.19 NA
PDSF Disk Configuration and I/O Performance 59 5.69 1.15 0.06 NA
PDSF Queue Structure 59 6.00 0.96 0.03 NA
PDSF User Environment 55 6.33 0.77 -0.05 NA
PDSF C/C++ Compilers 54 6.44 0.79 -0.02 NA
PDSF General Tools and Utilities 44 5.93 1.07 -0.11 NA
PDSF Libraries 43 6.00 1.07 -0.24 NA
PDSF Performance and Debugging Tools 42 5.31 1.39 0.06 NA
PDSF Applications 39 5.87 1.06 -0.34 NA
SP Viz Software 37 5.08 1.46 NA NA
PDSF Bug Resolution 34 5.85 1.21 -0.15 NA
PDSF Fortran Compilers 29 6.03 1.09 -0.42 NA
Access Grid classes 27 4.67 1.41 NA NA
NERSC Training Classes 24 4.88 1.15 -0.25 -0.67
Newton 15 5.20 1.37 -0.24 -0.27
Escher 13 5.23 1.30 -0.15 0.15
Escher Viz Software 8 4.75 1.39 NA NA

 

DOE and NERSC Scaling Initiatives

Legend:

UsefulnessAverage Score
Very Useful 2.5 - 3.0
Somewhat Useful 1.5 - 2.4

How useful were the following scaling initiatives?

InitiativeNo. "don't use" ResponsesNo. of other ResponsesAverageStd. Dev.Change from 2002Change from 2001
large scale job reimbursement program 9 24 2.75 0.61 NA NA
consulting scaling services 66 32 2.59 0.50 NA NA
poe+ 12 104 2.50 0.54 NA NA
new Seaborg batch class structure 48 115 2.17 0.72 NA NA

Comments on NERSC's scaling initiatives:

[Read all 39 comments]

20 Good scaling initiatives
11 Wrong approach
4 Didn't help / not ready to use / not interested
2 Startup projects weren't eligible
2 Users need more technical info
1 Sometimes hurts smaller jobs / sometimes OK

Did you participate in the Large Job Reimbursement Project?

No. of Responses
No 183 (90%)
Yes 21 (10%)

Have you or your project submitted information to the Applications performance Matrix?

No. of Responses
No 118 (71%)
Yes 48 (29%)

Have you use poe+?

No. of Responses
Yes 104 (52%)
No 95 (48%)

Do you plan to submit information to the the Applications performance Matrix in the coming year?

No. of Responses
Yes 85 (64%)
No 48 (36%)

If you don't plan on submitting to the Matrix next year, why not?

[Read all 36 responses]

13 Don't know about the Matrix / not my role to do this
7 Codes don't scale well / don't have large codes
6 Question doesn't apply to PDSF users:
5 It's inappropriate / don't have enough time:
2 Have already submitted and no changes expected next year:
3 Other

 

Comments on NERSC's scaling initiatives: 39 responses

Good scaling initiatives: 20 responses

Please push this project as much as you can. This type of consulting is very important if one goes to the limit of a system in terms of #processors and sustained performance.

I think, it's a good idea. It promotes more efficient use of the NERSC resources.

Good idea

We were not quit ready to take advantage of the initiatives, but they are are a good idea.

- Ultimately, I think this is a good idea and will lead to better architectures in the future, as well as allowing us to make optimal use of the systems we have today.
- I don't think anything prompts a systems vendor to fix issues better than having a clear characterization of those issues.

always good to think about that.

The restructured the LoadLeveler classes on Seaborg has provided us a leap in our progress in high resolution climate simulations.

Thought the program was very successful and was very beneficial to our group.

These services are very great to get our work done quickly.

It provides more incentive to improve the scalability of our codes.

I favor the shift to large-CPU, large-RAM jobs.

the initiative is great!!! ...

My research was particularly benefited from the Large Job Reimbursement Project, which helped us to: 1) test our code using large number of processors, 2) run very long simulations at no cost.

Interesting, I appreciate it.

... That having been said, I am glad that the queues are becoming more friendly to large jobs.

It is great, ...

I think it is appropriate since NERSC is not in the workstation business, it is for serious users of computer power.

It is the right thing to do. Usage efficiency and proficiency are important.

Eventually, I will want to run jobs on more nodes than at present. I expect these initiatives to be very helpful. As a user of fewer nodes, I don't really notice a difference, positive or negative, in my productivity due to these initiatives.

It is very important step in the development of scientific computing, since parallelization is the trend, especially for massive computing techniques.

NERSC response: Thanks for the feedback. The NERSC scaling initiative has thus far been positive for many projects.

 

Wrong approach: 11 responses

I believe that they are totally misguided. The emphasis should be on maximizing the SCIENTIFIC output from NERSC. If the best way to do this is for the user to run 100 1-node jobs at a time rather than 1 100-node job, every effort should be made to accommodate him/her. Even for codes which do run well on large numbers of nodes, it makes little sense for a user to use 1/8 or more of the system, unless he/she has an allocation of 1/8 or more of the available CPU time. Even then it might be more useful to run for example 4 jobs using 1/32 of the machine at a time rather than 1 job using 1/8 of the machine. The IBM SP is not designed for fine-grained parallelism requiring communication every 10-1000 floating-point operations. In the final analysis, it should be up to the users to decide how they use their allocations. Most, if not all of us, will choose a usage pattern which maximizes our scientific output. Remember that most of us are in computational science, not in computer science. We are interested in advancing our own fields of research, not in obtaining Gordon Bell awards. ...
If it were not for the fact that our FY2003 allocations are nearly exhausted, we would be complaining loudly because with the new Class (Queue) structure which favours "large" jobs, we can't get our work done.

Don't freeze out the small-to-moderate user --- the science/CPU hour is often higher for the moderate user

Although we successfully tested large jobs, I do not believe these jobs could serve our scientific goals well. I could easily see using 8 nodes of seaborg for our activation energy barriers determination jobs, but using more nodes than that would not be efficient or necessary. In fact, I see a significant threat to excellent quality supercomputing research by expanding in the direction of using more and more nodes per job. I suspect that a good fraction of the work pursued at seaborg, although excellent, because of the very nature of the problems handled, one cannot expect linear scaling to very many nodes. We believe that 25% of the resources devoted on this super large jobs is already too much.

I have a negative view of NERSC's scaling initiatives. I understand that NERSC needs to propagate itself and to justify the purchase of a new, larger machine. But in my opinion not all good science is done on thousands of processors and I feel penalized both in time available and priority in the queues because I use hundreds of processors and not thousands.

There is always a tension between massive users and those who want to run smaller jobs. While many researchers use a single node (16 processors), I think it would not be cost effective for DOE to pay them to run on their own machines. ...

The new classes mean that jobs not using a lot of processors but still doing useful state-of-the-art physics calculations are not getting through as much as before. In fact, requesting fat nodes may mean a wait of a few weeks. Not all good physics is done by using 6000 processors.

NERSC response: The queue policies for how nodes are selected were changed this summer (2003) in an effort to improve access to large memory ("fat") nodes. The larger memory nodes are now the last to be selected for batch work unless the job specifically requests the large memory resource. If your work requires the 64 gigabyte memory nodes we encourage you to contact NERSC consultants in order to discuss your specific requirements.

For our work, this initiative has been counterproductive. We perform very challenging time-dependent computations for stiff systems of nonlinear PDEs, which require the solution of ill-conditioned matrices at every time-step. Although we are using state-of-the-art parallel linear algebra software (SuperLU_DIST), scaling to increase speed for a given problem has limits. Furthermore, when solving initial-value problems, the time-dimension is completely excluded from any 'domain' decomposition. Our computations typically scale well to 100-200 processors. This leaves us in a middle ground, where the problems are too large for local Linux clusters and too small to qualify for the NERSC queues that have decent throughput. My opinion is that it is unfair for NERSC to have the new priority initiative apply to all nodes of the very large flagship machine, since it weights the "trivially parallel" applications above the more challenging computations, which have required a more serious to achieve parallelism.

Excessive emphasis is being placed on this.

It is detrimental to research group like ours. We need NERSC resources to run 100-1000 jobs or so at a time (in serial mode with one processor per job) or 10-20 jobs with 2-3 nodes per job. There are no other resources available to us that would enable us to do this. On the other hand, our jobs are no longer as favored in the queue since they are smaller scale jobs.

I am unsatisfied with the scaling initiatives. Quite a lot of my calculations require small number of nodes with long wallclock time (due to the code I used), which is slow in the queue. Good science often comes from small, routing calculations, not from massive parallel jobs.

First, I understand the reasons for the initiatives. But it may not be the most cost-effective way to use the computing resources. For example, we only need 32 nodes for eight hours to complete a simulation. The 32 processors is most efficient because we simulate the collision of two beams and each beam is on a single node(16 processors). But we need to repeat the same simulation with thousands of different parameters to help optimizing the performance of collider. In this situation, if we are forced to use more processors for each simulation, it actually wastes more resource.

NERSC response: The focus of the scaling initiative is not to meant to judge what is or is not good science. NERSC realizes that scaling up the parallelism in scientific codes is not always possible, can require significant effort, and is not in every case productive. Where it is possible and productive we stand ready to assist and promote highly parallel computations which make full use of the resources we provide. Providing opportunities which encourage researchers to explore the parallel scaling of their codes has been useful to many but not all projects and codes.

It is the inevitable outcome of advances in computing that computations which were once difficult should become easier. As the domain of what is considered a workstation sized problem expands, so should the realm of capability computing. The best matching of compute resources to scientific problems is a moving target and we try our best at providing resources which meet NERSC users' needs.

Didn't help / not ready to use / not interested: 4 responses

Doesn't help us, we would need much faster I/O to scale up to more processors at good efficiency. That may change in the future with new codes.

we're not yet ready to use these, but we're gearing up.

I am more interested in getting the results than exploring the scalability. ...

... perhaps another person in my group used this but I did not. I plan to do this in the future, but currently my runs which scale to such large processor numbers take an extremely long time to run, hence we have not performed these long runs. There is the possibility that further performance and scaling work on our code could increase our ability to use larger processor numbers.

Startup projects weren't eligible: 2 responses

As a startup account, we run jobs on 2000 Cpu s but could not be part of the Reimbursement Project

I am currently trying to scale my code to 4096 P, however the sheer cost of start-up alone means that my small ERCAP allocation was exhausted rapidly when I began testing on 4096 P. It would be useful to have fund available at all times for large scale testing.

Users need more technical info: 2 responses

... I would like consultants to know more about the software available at NERSC, e.g. compilers, parallelization libraries, and mathematical libraries.

... the minus is some lack of information (detailed surveys) about performance and scaling characteristics of currently available chemistry codes

Sometimes hurts smaller jobs / sometimes OK: 1 response

This month with everyone trying to use up their computer time the turn around for smaller node jobs is now on the order of a number of days while those using many nodes get reasonable turn around. The rest of the year letting them have much higher turn around probably doesn't hurt those of ue who can't use 32 or 64 nodes.

 

If you don't plan on submitting to the Matrix next year, why not? 36 responses

Don't know about the Matrix / not my role to do this: 13 responses

I am just new using NERSC and I do not know what it is that about

I don't really understand what it is.

Not familiar with the Applications Performance Matrix

I'm not one of the lead PI's on this project. They would have done this ... not me.

?

My repo may well do so. I'm not certain of our plans in the regard.

I don't understand what it is

I don't know what this is about. I do use poe but have no idea what poe+ is.

More exactly, probably not. The reason is that I just have not looked at Application Performance Matrix and don't know much about it.

I was not aware of that service

It would be more appropriate for others in my group to do this.

I'm unaware of what it is.

I don't know how to do it.

NERSC response: Submitting data to the application performance matrix is done via a submission form on the web .

Codes don't scale well / don't have large codes: 7 responses

our code scales badly using more than 16 nodes

We were somewhat surprised by the results of our benchmarks and would like help identifying and eliminating bottlenecks.

I have to learn more about the APM before I can give an answer. In any case, I believe my applications are too small.

I do not currently conduct large-scale simulations, just diagnostics on the output of previous simulations conducted by others (escher). Large scale simulations would require a shift in our funding paradigm.

I used hpmcount since I used a serial or one node job using 16 processors. I will try to do a combined OMP/MPI this coming year. We also have some molecular dynamics calculations to run that apparently scale well in MPI.

POE isn't necessary for optimizing performance for my codes.

Applications I am running right now are not large enough to make this relevant.

Question doesn't apply to PDSF users: 6 responses

I wasn't aware that such as system existed. Also, I have the feeling that these 'measurements' have a fundamental flaw: often things like 'efficiency' of the application is measured by comparing the CPU time to the wall clock time. This makes codes that calculate theoretical models, which often do very little IO, appear very efficient, while codes reconstructing experimental data and do much IO appear less efficient. Within experimental reconstruction software there is also a wide range of codes: some large experiments for instance have to do tracking on almost all events and require a lot of CPU, while other experiments need to scan through many background events without doing very much except for IO in order to find a signal event.

Primary processing of our data is done; I am not involved in the microDST construction for individual analyses, so I am not running large jobs at the moment.

I don't think it applies to me personally - would be better suited to a study of the STAR simulations/reconstruction usage probably.

I think it applies to MPP and not to PDSF style computing.

Our applications are not large enough

The ATLAS code doesn't support true parallelization at this point, only breaking the jobs into pieces and submit the pieces to multiple machines.

It's inappropriate / don't have enough time: 5 responses

We are benchmarking ourselves. The performance of our application could be made to vary from very bad to very good. Such results are not representative.

I would like to, but have very limited time available for this type of activity.

1. Life is too short 2. As a small-to-moderate user, I just do not have the time to do this --- my NERSC computing remains a small (<33%) fraction of my overall scientific responsibilities and I do not want to set aside the hours-to-days-to-weeks to get involved with this

I will submit using the Applications Performance Matrix because the application form seems to require it. However, I believe it is basically useless as a measure of code performance. I use an AMR code, the result of which is that the performance is highly dependent on the geometry of the problem, the size of the problem, and what physics is being simulated. Depending on the problem and the physics my code may scale from anywhere to 16 processors (for smooth collapse over a large dynamic range using radiation, gravity, and hydrodynamics) to hundreds of processor (for pure hydrodynamics with a fairly small dynamic range). A single measure like poe+ is not useful in this case.

lack of manpower

Have already submitted and no changes expected this year: 2 responses

I have already benchmarked the codes I intend to run in the coming year, or at least codes which should perform very similarly. If I gain access to a new machine with a large number of nodes in the coming year or there are major hardware/software improvements on those machines on which I plan to run, perhaps I will perform new benchmarks and submit the results to the Applications Performance Matrix. Similarly if we find new algorithms, or ways to improve the performance of existing codes then I might submit new results. However, I do not have any such plans, a priori.

No expected change in performance.

Other: 3 responses

Assuming a refereed publications describing the application is available.

If asked (and I assume we will be), our project will submit data.

I will not be at LBL the coming year


 

Web, NIM, and Communications

  • Legend
  • Satisfaction with the Web
  • How useful are these methods for keeping you informed?
  • Are you well informed of changes?
  • Summary of Information Technology Comments
  • Comments concerning the HPCF web site:   22 responses
  • Comments concerning the PDSF web site:   6 responses
  • Comments concerning NIM:   20 responses
  • Comments concerning how NERSC keeps you informed of changes:   19 responses

 

Legend

SatisfactionAverage Score
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4
UsefulnessAverage Score
Somewhat Useful 1.50 - 2.49
Not Useful / Not Used 0.50 - 1.49
Significance of Change
significant increase
significant decrease
not significant

 

Satisfaction with the Web

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
Accuracy 181 6.25 0.93 0.00 0.10
NIM 132 6.08 1.18 0.08 NA
Seaborg job displays 157 6.07 1.06 NA NA
Timeliness of info 172 6.05 1.09 -0.15 -0.03
SP Pages 159 6.00 1.04 0.04 0.22
Help section 163 5.93 1.11 NA NA
General programming info 162 5.91 1.02 0.06 0.19
Software Section 160 5.87 1.04 -0.09 NA
PDSF 84 5.83 0.99 0.02 NA
Ease of finding info on web 216 5.80 1.10 -0.00 -0.08
File Storage Pages 125 5.74 1.14 -0.10 -0.05
IBM docs 142 5.70 1.18 NA NA
Search facilities 164 5.44 1.19 -0.12 -0.11

 

How useful are these methods for keeping you informed?

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
Email 186 2.46 0.66 -0.00 0.02
Announcement web archive 174 2.40 0.65 0.06 0.35
MOTD 165 2.18 0.77 0.09 -0.09
Phone calls 142 1.77 0.90 0.09 -0.04

 

Are you well informed of changes?

QuestionNo. of Yes ResponsesNo. of No ResponsesPercent Who Said YesPercent Who Said Yes in 2002Percent Who Said Yes in 2001
Do you feel you are adequately informed? 208 12 95 96 94
Are you aware of major changes at least 1 month in advance? 162 25 87 91 81
Are you aware of software changes at least 7 days in advance? 165 22 88 86 81
Are you aware of planned outages 24 hours in advance? 170 20 89 87 91

 

Summary of Information Technology Comments

Comments concerning the HPCF web site

[Read all 32 responses]

9   Improve navigation / organization / presentation
8   Good website
5   Provide additional or clearer information
5   Keep info up-to-date / correct errors
4   Improve searching
4   Other

Comments concerning the PDSF web site

[Read all 12 responses]

5   Improve navigation / organization / presentation
4   Keep info up-to-date / correct errors
4   Good website
1   Provide additional or clearer information
1   Other (certificates)

Comments concerning NIM

[Read all 26 responses]

8   Issues with ERCAP or the allocations process
8   Good interface
3   Problems with reliability
1   Difficult to use
1   Violates privacy

Comments concerning how NERSC keeps you informed of changes

[Read all 22 responses]

8   Satisfied / well informed
6   Comments on system outages
5   Comments on using email
2   Comments on the MOTD
1   Software changes

 

Comments concerning the HPCF web site:   32 responses

Improve navigation / organization / presentation:

I feel like I'm going through a gigantic maze to find things that should be simple to find, such as keywords that appear in load leveler submit scripts, or clear and concise descriptions of the specific capabilities of the different available software libraries. There is a lot of information on the site, however.

Probably very good for amount of information but some times it takes me a while to find what I am looking for.

sometimes it takes a fair amount of searching to find the information that I am looking for

Make the structure more transparent, currently one often has to guess where the required information may be found.

Very often, the same information is spread over several web pages with slightly different content.

Needs lots of improvement. Most pages cram lots of info in a single page, hard to find what you want, etc. Beyond the home page, the website has an 80's look, Just compare it to Ohio, San Diego, or Pittsburg supercomputing centers web sites.

more compact, to-the-point info

There should be a condensed section with the basic commands, such as Fortran compiling (64 bit), starting/stopping/listing batch jobs, the location of scratch disks, the tape archive, queue info, and usage info.

I think that the PDSF website in particular could use an overhaul, more accurate info on the status of machines, how to optimize the batch queues and common pitfalls etc could be included. ...

Good website:

It's absolutely perfect. :-))

... The NIM and HPSS websites are quite useful.

The Web site improved significantly during the years.

It is the best site, and have been very helpful.

This site is very important and is a greater helper to my work.

One of the best aspects of NERSC - your useful web site.

I've found it very useful - certainly better than those have used at other supercomputing centers.

Keep up the good work.

Provide additional or clearer information:

The only part I do not like about the NERSC web pages is the HSI and HPSS documentation. My first reaction when I access the HPSS page is ... ah crap ... look at all this stuff. I'm never going to find what I need. (note: on the left hand side "summary", the back ground is blue and so are the links ... they can be easy to miss when reading.) Then I go to HSI ... same thing. Go to the "tricks and tips" section ... hmmm. It all looks kind of complicated. Now I know what to do ... but I remember the first time looking at this crap and, well, after reading is all I felt very nervous that I was going to wipe out everything! What would have been nice was a simple command: backup /scratch/mywork and it makes mywork.tar and dumps it on HPSS. done that's it. (I don't care if it uses hsi commands pftp or whatever, it just makes a tar file of mywork and dumps in on HPSS. I guess I could write my own shell script for this ... :) Anyway, I don't want to sound like I'm complaining ... I just remember my first time using HSI and HPSS as being well .... scary.

On the examples for the MPI tutorial, please define $EXAMPLES.

... I think that there should be more examples of how to use commands, particularly with respect to HPSS/HSI where only one not particularly general example is given for each command (get, cfget, etc...).

An archive of some more example loadleveler scripts might be useful. There are a few in the IBM manuals, but more would be nice for beginners like myself. (Maybe I just haven't found them, in which case I would suggest a more prominent link to them.)

The web site does not necessarily have relevant information to fix problems one might be having, i.e. a change in compiler causes a job not to run anymore: where does one go to the web site to get hints on what the differences might be and on how to go about fixing the problem?

Keep info up-to-date / correct errors:

Pretty user friendly but some of the tables seemed out of date such as the classes of running DL-POLY (this was in July).

Some information is quite out of date. While the web pages can still exist and be useful to some, a note should be on the page indicating that it is out of date.

keep it better updated with downtimes [PDSF user]

The MOTD is sometimes out of date with the current systems status on PDSF.

... I've also found that some examples of batch queue use are incorrect or at least misleading.

Improve searching:

I've had trouble finding straightforward information about some NERSC functionality. Often, the information pulled up by a search is either far too technical or too basic. ...

Search facilities are so bad. It seldom shows what I want to find out from the website.

It will be better if search facilities can be improved. Every time, I use search to get help with some topic, I got the news, and email notes about the topic.

I still find it difficult to find exactly the information I want, but I have no suggestions on how to improve it. I often find a google search with "site:nersc.gov" to be more useful than NERSC's own search engine.

Other:

I preferred when such information was kept on line on the platforms where I need to use it.

I don't really use the web sites. When I need information I ask someone in my group who already knows.

I seem to get along in a relative state of ignorance by consulting with code gurus rather than prowling around in the web site. Does this make me lazy?

Can't log in to get realtime status of the machine I use.

 

Comments concerning the PDSF web site:   12 responses

Improve navigation / organization / presentation:

It takes a little practice to find things.

... I find that some items on the PDSF website are quite hard to find.

It is organized probably as well as I can imagine something like this being and yet I occasionally find it hard to navigate and find what I need. Since I don't have any specific suggestions, I don't propose changing it. It works well enough for my purposes.

The front page looks a bit confusing if you want to find your way through the items you are looking for.

The inconsistencies between www-pdsf.nersc.gov and pdsf.nersc.gov, specifically for www-pdsf.nersc.gov/stats (but not pdsf.nersc.gov/stats), are a little irritating.

Keep info up-to-date / correct errors:

Include more accurate info on the status of machines, ...

FAQ could use updating.

The system configuration web pages sometimes get a bit old. When we talk to new projects about using PDSF, it would be nice to know what the projections are for future CPU/network/storage. I know this is hard to predict, but it would be nice.

I think the software section is a little out of date. ...

Good website:

I use PDSF and the web site is pretty good, I think. It does seem to get updated regularly and has what people need.

I have always been able to find what I needed to know in a timely manner.

no complaints, very well run facility

Things are easy to find and it is easy to submit trouble tickets -good site!

Provide additional or clearer information:

... how to optimize the batch queues and common pitfalls. I think that the details of optimizing HPSS<->PDSF communication could be included, particular in a batch queue setting. ...

Other:

... It is very good that users can submit trouble tickets without having to login. I suggest that help.nersc.gov move toward X509 to eliminate the login problem, same comment below for NIM.

 

Comments concerning NIM:   26 responses

Issues with ERCAP or the allocations process:

ERCAP continues to get more complicated and demanding each year. Some Dilbertian manager must be in charge of this process at NERSC!

As I have indicated ever since NERSC went over to a web-based ERCAP, I felt that it was a great leap backwards. If you want a web-based ERCAP, let the PI prepare it in a format of his/her own liking, place it on his/her own web site, and point you at it.

For ERCAP review, show last year's allocation and actual usage (percentage).

It would be very helpful to have available a place to print the questions on ERCAP before starting on the web site. This year, it would have been nice to know that poe+ was strongly encouraged --- this was just buried somewhere in the application process.

Good interface:

I find NIM straightforward to use.

It worked well and was easy to use.

NIM has been improved in last couple years and serves me acceptably well.

A excellent resource!

The interface is very convenient.

Excellent tool for PI's!

In my role of PI and account manager I really appreciate having access to NIM. It is very convenient to monitor the use of our allocation, especially having so many accounts for my students. The NIM site is really useful.

This is wonderful - it makes submitting allocation requests easy and managing our account equally as easy.

Excellent. I found it extremely useful.

overall, pretty slick

Problems with reliability:
Difficult to use:

I have never been able to get it to work

A bit cumbersome. The new accounts procedure is ridiculous. The new user fills out a web page, which then generates an email with all the information and mails it to me to set up the new account. I then have to re-enter all the information into another web page. Why can't it be set up for me to just go to a web page with all the information entered so I just have to sign off on it? [PDSF user]

I have rarely used nim and I don't have basis to compare, my impression was that it is not very straightforward.

I usually find it somewhat difficult to find what I need and navigate through the various pages...

A little hard to navigate and find out which buttons do what without going back to the NIM manual.

Authentication issues:

difficult to get a password from Europe

The NIM login is a problem. I suggest that you move toward X509 to automatically authenticate users with NIM based on their X509 certificate.

Difficult to access

Other:

A job-by-job report option would be nice.

Every once in a while there is a bug or glitch in NIM. NERSC is very responsive in helping with work arounds and fixes.

I would like to get my account information directly from my seaborg.nersc.gov account. The current commend getnim -u gives too little information.

Sometimes it's a little slow, but that's not a big deal.

 

Comments concerning how NERSC keeps you informed of changes:   22 responses

Satisfied / well informed:

This is one area where NERSC beats all others. I always know when changes are coming. At least I think I know ...

I have been very pleased with the efforts of NERSC staff to keep me informed of changes and to explain things when I misunderstand the nature/effect of a change.

That's nice.

Current system provides enough information, and it seems ok. It is user's responsibility to check the status from available resources.

It has been adequate so far.

The planned outage announcements are very important and should be stated as they are now. It allows good planning around them.

Excellent communication service!

It seems to me that NERSC is doing the right thing already.

Comments on system outages:

Given the typical turnaround time of longer jobs, 24 hours advance notice for planned outages is really not enough advance notice.

Maybe NERSC should send an email 12 hr in advance of the regular maintenance shutdowns to remind users.

I haven't made enough use of the system to have any problems with outages. In the case of a sudden major failure what do you do ? Is there some other site which monitors if you are up ?

sometimes when the sp is down the motd is talked about but has no information about why computer is down or for how long

I would like immediate email when PDSF experiences trouble that affects running jobs or goes down for any reason.

setup an email list (optional subscription) with notifications of planned and unplanned seaborg downtime

Comments on using email:

Email is fine

There are too many mass e-mails sent for help sessions and tutorials. There should be an option to not receive e-mails of this sort.

E-mail messages (or 'phone calls) are best. I consult the web site only infrequently.

I've never been able to successfully join announcement lists at NERSC. Generally, I have to ask colleagues about outages and software changes. Most likely, I've neglected to do something but information on how to sign up should be a little bit clearer.

Many lesser announcements not directly affecting usage are unnecessary.

Comments on the MOTD:

Due to its length, the MOTD is not useful at all. Some of the information is the standard federal notice, other is information pertaining to the cluster, then there is contact information, more cluster information etc. The standard federal notice probably has to be included, but the contact information can be dropped - people can see on the website where to get information if necessary. The cluster status should stay. Why is there Seaborg info on the PDSF MOTD?

There is so much information on the message of the day, it scrolls by so fast I NEVER read it. Except when something goes really wrong, and then I occasionally remember to read it. It's worth having there as a reference but I don't think NERSC personnel can or should assume that users read it every day.

Software changes:

When software changes are made, someone should make sure that what was running before is running o.k. again before releasing this new software to the general population.


 

Hardware Resources

  • Legend
  • Satisfaction - Compute Platforms (sorted by Average Score)
  • Satisfaction - Compute Platforms (sorted by Platform)
  • Max Processors Used and Max Code Can Effectively Use
  • Satisfaction - HPSS
  • Satisfaction - Servers
  • Satisfaction - Networking
  • Summary of Hardware Comments
  • Comments on NERSC's IBM SP:   51 responses
  • Comments on NERSC's PDSF Cluster:   17 responses
  • Comments on NERSC's HPSS Storage System: 29 responses
  • Comments about NERSC's auxiliary servers:   5 responses

 

Legend:

SatisfactionAverage Score
Very Satisfied 6.5 - 7
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4
Significance of Change
significant increase
significant decrease
not significant

 

Satisfaction - Compute Platforms

Sorted by average score

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
SP Overall 192 6.43 0.78 0.05 0.61
SP Uptime 191 6.42 0.83 -0.14 0.89
PDSF Overall 68 6.41 0.87 0.15 NA
PDSF Uptime 62 6.35 1.04 -0.16 NA
SP Disk Configuration and I/O Performance 156 6.15 1.03 0.18 0.48
PDSF Queue Structure 59 6.00 0.96 0.03 NA
PDSF Batch Wait Time 61 5.93 1.12 0.19 NA
PDSF Ability to Run Interactively 64 5.77 1.39 -0.41 NA
PDSF Disk Configuration and I/O Performance 59 5.69 1.15 0.06 NA
SP Queue Structure 177 5.69 1.22 -0.23 0.50
SP Ability to Run Interactively 162 5.57 1.49 0.10 0.86
SP Batch Wait Time 190 5.24 1.52 -0.17 0.32

 

Satisfaction - Compute Platforms

Sorted by Platform

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
SP Overall 192 6.43 0.78 0.05 0.61
SP Uptime 191 6.42 0.83 -0.14 0.89
SP Disk Configuration and I/O Performance 156 6.15 1.03 0.18 0.48
SP Queue Structure 177 5.69 1.22 -0.23 0.50
SP Ability to Run Interactively 162 5.57 1.49 0.10 0.86
SP Batch Wait Time 190 5.24 1.52 -0.17 0.32
PDSF Overall 68 6.41 0.87 0.15 NA
PDSF Uptime 62 6.35 1.04 -0.16 NA
PDSF Queue Structure 59 6.00 0.96 0.03 NA
PDSF Batch Wait Time 61 5.93 1.12 0.19 NA
PDSF Ability to Run Interactively 64 5.77 1.39 -0.41 NA
PDSF Disk Configuration and I/O Performance 59 5.69 1.15 0.06 NA

 

Max Processors Used and Max Code Can Effectively Use

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
SP Processors Can Use 139 609.41 1006.23 63.41 -141.59
Max SP Processors Used 161 444.84 733.31 273.84 242.84
Max PDSF Processors Used 35 13.06 43.25 -21.94 NA
PDSF Processors Can Use 34 10.26 21.37 -86.74 NA

 

Satisfaction - HPSS

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
Reliability 126 6.61 0.77 0.10 -0.02
Uptime 126 6.54 0.79 0.17 0.21
Performance 126 6.46 0.88 0.11 0.10
HPSS Overall 134 6.46 0.84 0.07 -0.04
User Interface 127 5.98 1.24 0.03 -0.04

 

Satisfaction - Servers:

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
Escher 13 5.23 1.30 -0.15 0.15
Newton 15 5.20 1.37 -0.24 -0.27

 

Satisfaction - Networking

Question;No. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
LAN 114 6.54 0.67 NA NA
WAN 100 6.12 1.02 NA NA

 

Summary of Hardware Comments

Comments on NERSC's IBM SP

[Read all 51 responses]

16   Good machine
15   Queue issues
12   Scaling comments
8   Provide more interactive and debugging resources
5   Allocation issues
4   Provide more serial resources
3   User environment issues
2   Other (down times and need faster processors)

Comments on NERSC's PDSF Cluster

[Read all 17 responses]

10   Disk, I/O and file issues
6   Batch issues
4   Good system
4   Provide more interactive and debugging resources
2   Down time issues
2   Other (slow processors, utilitarian code)

Comments on NERSC's HPSS Storage System

[Read all 29 responses]

14   Good system
7   Hard to use / user interface issues
5   Performance improvements needed
3   Authentication is difficult
2   Don't like the down times
2   Network / file transfer problems
2   Other (Grid, SRUs)

Comments about NERSC's math and vis servers

[Read all 5 responses]

3   Network connection too slow
2   Good service
1   Remote licenses

 

Comments on NERSC's IBM SP:   51 responses

Good machine:

Great! Now where is the power 4 version? But really, this is a great machine.

... I appreciate the fact the machine is rarely down.

It is an amazing machine. It has enabled us to do work that would be entirely out of reach otherwise.

This is a very useful machine for us.

... The system is so good and so well managed in just about every other way [except that there are not enough interactive resources]. On a positive note, I am very happy that NERSC opted to expand the POWER3 system rather than moving to the POWER4 or another vendor. Seaborg is very stable and, the above comment notwithstanding, very well managed. It is also large enough to do some serious work at the forefront of parallel computing. This strategy is right in line with the aims of my research group and, I believe, in line with a path that will lead to advancements in supercomputing technology. ...

Great Machine!

SP is very user friendly.

It is doing its job as expected

Nice system when everything works.

NERSC's facilities are run in impressive fashion. ...

My code uses MPI-2 one sided primitives and I have been pleasantly surprised by the dual plane colony switch performance. Previously we have worked extensively on Compaq SCs with quadrics, and while the IBM performance is not quite equal, there are far fewer penalties for running n MPI processes on n processors of a node (on the SC is often better to run only 3 processes per 4 processor node). Although the latency of the IBM is quite high (measured performance is at least x3 of quadrics) we can accept this penalty and still achieve good performance. ...

Excellent! ...

The IBM SP is very powerful.

It is a very fast machine. ...

Great!

Excellent system with the most advanced performance!!!!!!!!!!

Queue issues:
long turnaround for smaller node jobs:

Batch can be very slow and seems to have gotten worse in the past year despite the increase in CPU # -- I often use the "debug" queue to get results when the job is not too big. Things seem to have worsened for users requesting 1 or 2 nodes --- perhaps because of the increasing emphasis on heavy-duty users.

Wait time for short runs with many processors sometimes takes too long.

The queue structure is highly tilted towards large jobs. While it is not as bad as at some other supercomputer centers, running jobs that only use 64-128 processors with reasonable wait times still requires extensive use of the premium queue. An 8 hour job on the regular queue using 64 processors generally requires several days of waiting time. Such low compute efficiencies are not only frustrating, they make it very difficult to use a reasonably sized allocation in a year.

Batch turnaround time is long, but this is unavoidable because of the number of users. The premium queue is necessary for running benchmark/diagnostic short-time jobs.

.... Also, the current queue structure favors many-node jobs which is a disadvantage for me. I am integrating equations of motion in time, I don't need many nodes but would like to see queue wait time go down. Currently my jobs may spend more time waiting in the queue than actually running.

The batch queues take much longer than last year (probably because there are more jobs submitted). The queue can also stall when a large long job (>128 nodes, >8 hours) is at the head of the queue while the machine waits for enough processors to free up. Is it possible to allow jobs to allow small jobs to start then checkpoint so there are less wasted cycles? It would also be good if the pre_1 queue had higher priority, which could be offset by a higher cost for using it.

The job stays in queue too long. Sometime, I need to wait 2-3 days to run a one-hour job.

sometimes there are so many big nodes jobs running it sort of locks out everyone else.

Long turnaround / increase wall limit for "reg_1l" class:

With the old queue structure, we seldom had to wait more than a day for a job to start. With the new queue structure the wait has increased to of order a week. If this continues into FY2004, we will not be able to get our work done. ...

waiting time for small jobs is quite frequently inadequately long; there should be also a queue for small, but non-restartable jobs with longer than 24 hours limit.

very long waits at low priority:

My students complains are usually due to the waiting time in queues, especially for medium-size jobs, if using priority 0.5, which is a must if we want to optimize the use of our allocation.

long waits in September:

I am very satisfied with the average batch job wait time, but I noticed that recently (as of 09/12/03) the waiting time is too long.

long wait time for large memory jobs:

... and waiting for a 64-node job on the 32 GB nodes is often not practical.

wants pre-emption capability:

... Alternatively [to providing a Linux cluster for smaller users] is there any kind of scheduling algorithm that would pre-empt long-running 4-processor jobs when the bigger users need 1000+ processors? ...

has to divide up jobs:

I have found that to get things to run, you have to break the job into smaller pieces. ...

Scaling comments:
limits to scaling:

Please inform me if the 4k MPI limit is not existent anymore

... I have noticed a large increase in memory usage as the number of MPI processes is increased. This is quite a concern since very often we run in a memory limited capacity. I would like to be able to provided further comment about the performance of the machine when using 4096 P but as yet I have only been able to partial run my code on this many processors. I'd like to see a focus on increasing network bandwidth and reducing latency in the next generation machine. I am not convinced that clusters of fat SMPs is an effective model unless the connecting pipes are made much larger.

I'd VERY MUCH like to see more large-memory nodes put into service, or memory upgrades for the existing nodes. Using 16 MPI tasks per node, the 1 GB/processor limit is a definite constraint for me.

... The IBM SP is capable of only coarse-grained parallelism. This limits the number of processors that it can use efficiently on our jobs. Its capabilities are now exceeded by large PC (Linux) clusters, which probably have a better price/performance ratio than the SP. Of course such comparisons are unfair, since the SP is no longer new. I should thus be comparing new clusters with Regatta class IBM machines. I am unable to do this since, on the only Regatta class machine to which I have access, I am limited to running on a single (32 processor) node.

I can't believe any one can actually make use of this machine with an efficiency level that's any more than pathetic.

general scaling comments:

We're improving our code to make use of many more processors - e.g. 300-600 within the next year.

Currently, our jobs do not yet require more than 64 processors. Our code scales quite well with number of processors up to 64.

Max. number of processors your code can effectively use per job depends on the size of the problem I am dealing with.

The max. # of processors really depends on how big a given lattice is. For next year, we may have access to 40^3x96 lattices, so I expect the max number of nodes to increase to 32, maybe more.

We could use more processors, but our current simulation size dictates that this is the most efficient use of resources. In the upcoming year we plan to use more processors, and perform some larger simulations.

too much emphasis on large number of processors:

See earlier comments on promoting super large jobs, for reasoning behind our reservations. [Although we successfully tested large jobs, I do not believe these jobs could serve our scientific goals well. I could easily see using 8 nodes of seaborg for our activation energy barriers determination jobs, but using more nodes than that would not be efficient or necessary. In fact, I see a significant threat to excellent quality supercomputing research by expanding in the direction of using more and more nodes per job. I suspect that a good fraction of the work pursued at seaborg, although excellent, because of the very nature of the problems handled, one cannot expect linear scaling to very many nodes. We believe that 25% of the resources devoted on this super large jobs is already too much.]

Please, see my comments regarding the scaling initiative. [... Although we are using state-of-the-art parallel linear algebra software (SuperLU_DIST), scaling to increase speed for a given problem has limits. Furthermore, when solving initial-value problems, the time-dimension is completely excluded from any 'domain' decomposition. Our computations typically scale well to 100-200 processors. This leaves us in a middle ground, where the problems are too large for local Linux clusters and too small to qualify for the NERSC queues that have decent throughput. My opinion is that it is unfair for NERSC to have the new priority initiative apply to all nodes of the very large flagship machine, since it weights the "trivially parallel" applications above the more challenging computations, which have required a more serious approach to achieve parallelism.]

Provide more interactive and debugging resources:

It is virtually impossible to do any interactive work on seaborg. This is a major shortcoming of its configuration, not of the IBM SP architecture. With a system of seaborg's size, it should be straightforward and not ultimately detrimental to system throughput to allow more interactive work. I usually debug interactively on a Linux cluster prior to moving an application to the SP. Often in moving it over, there are IBM-specific issues that I need to address prior to submitting a long run. Being forced to do this final bit of debugging by submitting a succession of batch jobs is not a good use of my time nor optimal for job throughput on seaborg itself. PLEASE....fix this.

I commented in the 2002 that interactive access to seaborg was terrible. It still is. YOU NEED TO HAVE DEDICATED _SHARED_ACCESS_ NODES TO SOLVE THIS PROBLEM!!!!! I am sick to death of trying to run Totalview and being DENIED because of "lack of available resources" error messages. GET WITH IT ALREADY!!

NERSC response: Please see the "-retry" and "-retrycount" flags to poe (man poe). These flags can set your interactive job to retry its submission automatically so that you won't have to do so manually due to "lack of available resources".

We are developers so interactive, benchmarking time even for very large node configurations is often important.

Ability to run interactively after-hours is sometimes < desirable. Because of the dedicated nodes, it is generally satisfactory during office hours, but grad students are not restricted to office hours... :-)

... (Except the ability for interactive runs)

For interactive jobs, 30 min limit is too short, I feel. Would you increase this limit to 1 or 2 hours?

Sould reduce the waiting time for debug queue

... Debug class should be given priority during the weekends and the nights.

NERSC response: One of the major constraints of the batch queue system on seaborg is the speed with which resources can be shifted from one use to another. This impacts how quickly we can take the system down and very directly impacts the speed at which we can provide interactive/debug resources. Resource demands for debug and interactive tend to be very spiky and our approach thus far has been to try to estimate demand based on past usage. In order to best meet future demand for debug and interactive we monitor the utilization of the resources currently devoted to the debug and interactive classes.

We also allow debug and interactive work to run anywhere in the machine if such an opportunity arises. This is somewhat at odds with the fact that utilization in the main batch pool is very high, but every little bit helps and we do what we can within the given constraints.

The hours prior to system downtimes are a excellent time to do debug cycles. The cycles lost to the period of hours prior to the downtime over which the machine is drained can be in part recovered by debug and interactive work that is allowed to proceed during that time. This is a very limited time frame, but may be useful users who can schedule a time to do development/debug work on their parallel codes.

Batch queue policies and system issues are often discussed at the NERSC Users Group meeting. If you feel debug/interactive classes are not working, we encourage you to participate in NUG's work to improve the system. Your ideas and suggestions for how to work within the constraints inherent in the machine are welcome.

Provide more serial services:

I am still partly on the stage of porting the codes to the AIX system, and it would be good to have some nodes available as single processors.

For some time a number of users have been asking for you to set aside some processors for serial jobs that run for longer than 30 minutes. I also would find this useful for calculations that are not readily parallizable and need the software environment provided by NERSC. Given the large number of processors that your system now has this would seem to have a very minimal impact on your overall throughput. You need to be responsive to this need in your queue structure.

Need to run on one processor for a parallel code (e.g., MPI code), and to be charged for only one processor. The problem is many programs do not have serial versions, but nevertheless need to be run for small systems (on a single processor) from time to time. One solution is to have a special queue run on a few nodes (1-2) and to be run on a time sharing fashion. So, a MPI job can always be run (no waiting time), and the performance for a single processor job will be okay, and the charge is based on a single processor.

I'd like to see a high priority queue with very long CPU time limit (days or weeks) [this project only ran 1 processor jobs]

User environment issues:

The IBM SP is not native 64 bit and this has created headaches which were inexistent on Cray hardware.

My comment concerns the output, writing to a file. It happened that my program exceeded time limits. It was terminated (as it should be) and it returned no data in the output files. I lost many hours of valuable computing. Is there any way to do something about it? It would be useful to have an output even when the program exceeds the time limit. Other than that I am very satisfied with I/O performance.

... Finally, for such a large system, there is frightenily little disk space for checkpoint files, graphics dump files etc. I have been in a situation where a job of mine is going to generate more output than $SCRATCH can accommodate. This leaves me to sit there and quickly offload files interactively to HPSS. I would like to see this rectified in the current system, if possible. Additionally, I would like this problem to be kept in mind when budgeting is done for follow-on systems.

Other:

... But I don't understand the regular down time due to maintenance. Is it possible to shutdown only the troubled nodes and keep the rest of the machine running?

Processors are getting older. You need to get new, faster systems

 

Comments on NERSC's PDSF Cluster:   17 responses

Disk, I/O and file issues:

I've only had a little trouble with the disks; I've had jobs die with uninteruptible-sleep status that is probably due to disk access problems. Also the "effective" number of CPUs I can use for a job (well, a set of jobs) is limited by disk access.

... One of the main issues I have faced as a user is disk I/O for lots of jobs running against a common dataset. The dvio resource tool helps keep things running smoothly but it has a complicated syntax/interface and limits the number of active jobs, thus slowing down the performance of my work.

... Disk vaults seem to crash quit often and it takes a long time for them to come back. Since all my data is sitting on one vault this causes delays of my analysis.

There is still work needed on the overall data management couple with I/O performance issues.

Problems with the disk to which I was writing put constraints on the number of processors that I could use at a time to run my jobs.

... situation with NFS on the datavaults not very satisfactory ...

The commodity disk vaults are not as reliable as we would like them to be. The reliability has been markedly better this past year then previously, but we still lose a few days a year to disk vault failures. In addition, the simultaneous load caps for the disk vaults is starting to impact us as we scale up our processing. ...

iI think the individual machines could have larger swap space. Also the disk vault system seems to be a bit unreliable, perhaps it is NFS.

... Slow IO

get file list could be implemented

Batch issues:

It sometimes takes over a day for a job to start. ...

... a new batch queue between short and medium, say 8 hours of CPU, would be nice.

... We would also like an intermediate queue on PDSF in between short and medium. The jump between 1-hour and 24-hours is a large jump and we have a number of jobs of a few hours or less but more than 1 hour which would benefit from such an intermediate queue.

... I also understand that it is not good to allow jobs to run for too long.

... The only reason I didn't put "Very" satisfied in some of my answers above is that the LSF software has some "features" which I don't like that well (you have to write a script if you want to selectively kill a large number of jobs, for example), but I don't think it's NERSC's fault. It's a pretty good batching system overall. ...

LSF sometimes terminates my job when the network bandwidth gets slow. This is unpredictable, because it depends not only on my jobs, but other people's jobs that share the network. I'm not sure if there's a fix for this, but I'd like to know about it if there is.

Good system:

Very nice system, well maintained and running smoothly. System admins try and maintain the system mostly 'behind the scenes' which is a great relief compared to some other large scale clusters. The well running of PDSF was essential in achieving our scientific goals.

A well-oiled operation!

great system. ...

pdsf is well maintained and very useful. ...

Provide more interactive and debugging resources:

We need better interactive response. ...

... Oftentimes when I am running interactively, the wait is very, very long to do anything (even if I just type "ls").

- interactive nodes overloaded - when someone is using HSI on an interactive node, the node is basically unusable ...

No problems except: - No working debugger for STAR software - Slow IO

Down time issues:

Recently I got the impression that a lot of nodes were down and therefore eating up my jobs.

... We also would like to decrease the down time.

Other:

The machines are somewhat too slow. I understand that there are cost considerations. ...

My own code is usually rather utilitarian and not run many times, KamLAND data processing is excluded from this statement.

 

Comments on NERSC's HPSS Storage System:   29 responses

Good system:

HPSS is the best mass storage I've ever used or heard of.

Better than it's ever been.

HPSS is very useful. ...

Just like with the PDSF system - very well run, with little visible interference. The Storage Group has gone out of their way to help us out doing our science. Staff has contacted us on various occasions to help optimize our usage.

Very fast and very useful.

I am very happy with the "unix-like" interface.

very good

Fast, efficient, and simple.

This system is great. Of all of the mass storage systems I have used at multiple sites around the world, this is the best.

It could be that I've been around long enough to have experienced the old storage system and thus have very low expectations, but I think this is a great system.

I have been impressed with the relative ease of use and efficiency of HPSS.

HPSS is great, ...

I like it.

I couldn't get much done without this.

Hard to use / user interface issues:

web documentation is overwhelming for a first time user of HPSS and HSI.

My only real substantive comment is that I find the hsi interface to be unnecessarily user unfriendly. Why can it not be endowed with some basic UNIX shell functionality (examples: (1) a history mechanism, (2) command-line editing, (3) recall of previous commands with emacs and vi editing capability, (4) more mnemonic command names in parallel to UNIX.) None of this improvements is rocket science and could be easily implemented to make everybody's life easier.

HSI is a truly godawful tool, and useless for manipulation of data on mass storage by programs. Thus I am restricted to FTP for any work involving data on HPSS. Thank goodness there's a Perl module.

You can't backspace when you are inside hpss, which means I either have to be very slow and precise in how I type my commands (which is a pain for navigating through multiple directory levels) or I end up typing them more than once. Can this be fixed? ...

A transparent user interface i.e. direct use of ls & cp would be a nice feature.

I did not use this system for quite a while. I remember the user interface was not very good before. It is not easy to get multiple files using one command.

... I haven't gotten to spend enough time with HSI 2.8 to see if it offers improvements with it's scheduling.

Performance improvements needed:

... It is a little slow in fetching data, but I guess that is to be expected given the amount of data stored there.

... Otherwise, I find hpss useful and even though transfer rates are slower than my patience would like, I understand the time limitations of transferring from tape to disk and can work around the speed issue.

... the only negative is the occasional long wait to access something I have put on their a long time previously, but this is understandable. ...

... Faster file retrieval would always be nice. ...

KamLAND has worked with the HPSS people to improve throughput by taking advantage of the fact that we read out data in big chunks, but that could still be utilized further.

Authentication is difficult:

Very difficult to understand. I stopped using it because every time I have to use it again, I forget the whole Jazz of the login/password. For example, today I couldn't use it! Why not make it simpler?

please think about the initial password setup. maybe it is possible to make this easier for the user. Once it is setup it works really well.

I have never used it because it sounds so complicated to use it for the first time.

Don't like the down times

HPSS's weekly maintenance occurs in the middle of the work day for both the East and the West Coast. To my eyes it would make more sense to take advantage of time zones and inconvenience fewer users. (And yes, I know this comes off as self-serving since the most likely alternative --- afternoon on the West Coast --- is a great help for those of us in the East. However, I still think the idea is sound.)

The Tuesday maintenance is always irritating, but I understand that if it needs to be done it's better to have it scheduled during the day when people will notice rather than at night we people's jobs will fail because they'll forget. ...

Network / file transfer problems:

I was having problems with not getting complete file transfers from HPSS to Seaborg sometimes, I think. I would have to reload from HPSS to get complete large files.

I transferred about 800Mb of data between the HPSS at NERSC and ORNL. Apart from some bugs in hsi which were eventually resolved, it worked well, except the link between NERSC and ORNL would die every few hours, which meant more intensive baby-sitting on my part. I don't know what would cause the link to die. It may just have been random hiccups.

Other:

Looking forward to Grid access tools.

... Further, the accounting for HPSS, i.e. SRU units, is a bit strange in that prior year information stored counts so much (Gb x 4).

 

Comments about NERSC's math and vis servers:   5 responses

Network connection too slow:

The only problem is that the network connection from Germany makes interactive visualization impractical. But I'm not sure you can do anything about this.

From my point of view as a remote user, the visualization resources are not always convenient to use or well projected to those of us in the outside world (network response times are too slow). ...

It is too slow, so I was not able to use them in last year.

Good service:

The math server is very helpful for me.

I don't use escher as much as I'd like, but it has certainly been nice and easy to use when I have done anything on it.

Remote licenses:

... You need to better develop the floating license approach and make it easier to use for remote users.


 

Training

  • Legend
  • Satisfaction with Training
  • How Useful are these resources for training in HPC?
  • What training methods would you like NERSC to offer?   161 responses
  • Comments about training:   23 responses

 

Legend

SatisfactionAverage Score
Mostly Satisfied 5.5 - 6.4
Somewhat Satisfied 4.5 - 5.4
UsefulnessAverage Score
Very Useful 2.50 - 3
Somewhat Useful 1.50 - 2.49
Not Useful 1.00 - 1.49
Significance of Change
significant increase
significant decrease
not significant

 

Satisfaction with Training

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
New User's Guide 137 6.26 0.86 0.05 0.32
NERSC Online Tutorials 121 6.07 0.99 0.10 0.10
NERSC Training Web Pages 98 5.83 1.06 -0.06 NA
NERSC Training Classes 24 4.88 1.15 -0.25 -0.67
Access Grid classes 27 4.67 1.41 NA NA

 

How Useful are these resources for training in HPC?

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
Online Tutorials 54 2.44 0.63 -0.08 -0.11
Classes 33 1.76 0.75 -0.24 -0.31
Access Grid classes 30 1.63 0.72 0.00 0.00

 

What training methods would you like NERSC to offer?   161 responses

138 (85%) General online web documentation
83 (83%) Online web tutorials on specific topics
26 (16%) Live web broadcasts with teleconference audio
21 (13%) Live in-person classes at LBNL
18 (11%) Live in-person classes at your site
10 (16%) Live Access Grid Classes

 

Comments about training:   23 responses

This year most of the comments concerned the training format rather than suggestions for topics. Seven users said that web documents are sufficient; six pointed out that training via the Access Grid is not a good solution; and five recommended webcasts.

NERSC response: As of October 2003 NERSC has captured all training sessions using RealPlayer. These are available for replay on the web. On the day of the training classes users can participate in any of the following ways:

  • Watch live media streaming using RealPlayer from their desktop.
  • Join an audio conference to interact with NERSC staff and ask questions.
  • Attend via the Access Grid.
  • Watch the slides via PowerPoint of HTML slides

The training topics users requested most frequently are performance and how to get started.

15   Recommended a training format
8   Suggested topics
2   General comments
  Recommended a training format:   15 responses

You need to explore other models for remote training than the access grid. Even at large national labs, there are a limited number of access grid rooms available. These are often either booked up or require trained people (who aren't always available) to operate them. Also, it requires users to be out of their offices. You should look for software videoconference options (RealPlayer, etc.) that any of us can run on our office computers. If that is not possible, then use teleconferencing with viewgraphs posted on the web.

I have not been able to access the access grid and don't know when OSU is going to provide these facilities if ever, so format is useless to me.

Working from a site that has no Access Grid node, the training lectures are effectively unavailable to me. I have tried several times to attend but been discouraged since I could not get it to work. A simple streamed webcast of the lectures would go a long way to making them more available.

Access Grid is a large mystery to me. Some sort of series at LBL would be interesting. The currently available online information is so useful perhaps I wouldn't use classes at all.

many people (in the universities) don't have DOE grid access... online web docs, web tutorials, and web broadcasts are more useful for us

I don't have access to DOE access Grid. NERSC should have better video conference allowing phone-in questions

Make more video lectures. ...

Web is usually the easiest and most convenient way to get training info to the most people.

Since i have limited time to attend training classes, I tend to prefer Web-based tutorials.

I like to learn as I need to. Therefore, the web is the best resource for me.

My opinion in this matter is not representative because I am a self-learner. For me the NERSC online web documentation and tutorials (supplemented by other readings and "hands-on" exercises) was enough to get me started. Then I learned more as I progressed in my computational research projects. And when I was "in trouble" I contacted the NERSC consultants. I evaluate the NERSC documentation and the NERSC consulting services as EXCELLENT.

For the most part, I am very experienced with IBM SP POWER3 systems, so I am not making a great deal of use of training materials, other than web pages that describe local NERSC-specific configurations. When the time comes for a new system and I need training, nothing beats a classroom session, and I will probably try to attend one. Having material in ready reach on the web is important as well. The more modern distance-learning techniques with which you are experimenting work less well for me, but they are probably experiments that are still worth pursuing, especially as this technology continues to develop.

... Specific advice in person may be very helpful.

I haven't yet attended any access grid classes but would like to in the future.

It would be useful to get all users to visit NERSC, e.g. annually to make better use of the NERSC facilities.

  Suggested topics:   8 responses

How to improve performance and scalability. Visualization.

More tutorials on the code performance improvement based on the information obtained by profilers. ...

Would like to see more presentations on optimization techniques and parallel performance issues.

methods to use open MP and MPI together to get best multi node performance.

General guides on rethinking old codes to use multiple processors efficiently.

getting people, started; what's available, how (and why!) to use

Unix command Debug

When a major hardware or software change occurs. For new users.

  General comments:   2 responses

I've only gone to one training class at NERSC (ACTS) and found it to be excellent. In general, I view this as a resource that I don't fully take advantage of due to time constraints.

I believe in training, even if I don't use it much myself.


 

User Services

Legend:

SatisfactionAverage Score
Very Satisfied 6.5 - 7
Mostly Satisfied 5.5 - 6.4
Significance of Change
significant increase
not significant

Satisfaction with User Services

QuestionNo. of ResponsesAverageStd. Dev.Change from 2002Change from 2001
Timely response 207 6.55 0.73 0.04 -0.01
Technical advice 200 6.54 0.69 0.08 0.08
Follow-up to initial questions 186 6.49 0.75 0.10 0.12
Time to solve problems 196 6.36 0.84 -0.04 NA
Response to special requests 126 6.35 1.06 -0.05 0.12
RightNowWeb interface 109 6.02 1.07 0.08 NA

 

Comments about Consulting and Account Support:   25 responses

19   Good service
4   Mostly happy, mixed evaluation
2   Unhappy
Good service:   19 responses

They are: OUTSTANDING, SUPERB, EXCELLENT :-))

Great. I've used NERSC consulting a few times and have always been satisfied. Fast friendly service. Give'm a raise!

I am extremely pleased with both the response time and the quality of the answers from the two consultants who have helped me. I have told other people that this is a great strength of NERSC.

I hadn't noticed the Online Consulting web site, but will use it in the future. Generally I think you respond pretty fast and with good answers.

The few times I've had serious problems, these folks have been great.

Support by Iwona Sakrejda in particular is excellent. I am very happy with this side of things.

They do a great job. I would have been lost without them

Francesca Verdier and David Skinner have been most helpful with my requests!

from my experience NERSC Consulting Services work much better then anywhere else! VERY GOOD JOB!

I was very satisfied with the quality of the NERSC consulting and account support. My queries were always very quickly answered and solved and I really appreciate that! My queries were also followed up on afterwards to ensure that I didn't have any more problems - all in all , excellent service!

Consultants are very customer-oriented and seem to be genuinely interested in trying to help.

The NERSC consulting services are exceptional

I just want to note how impressed I am with the yeoman's effort Jonathan Carter put in trying to track down a problem in the MPI I/O routines for large numbers of processors. Although he never came to a definite conclusion about the problem's source (although it was tracked to poor documentation by IBM), he stayed with it for several months and came up with a reasonable work-around.

I have nothing but wonderful things so say regarding NERSC consultants. They have been useful, friendly and competent. When they don't know answers to questions I always get a prompt response back. Our allocation has pushed our HPSS limits and the staff has been helpful in monitoring this and adjusting our allotment. Thanks.

I have found NERSC consulting to be nothing short of excellent. Responses have been prompt and at all times helpful. Without question this is the best user support I have worked with. It's been truly a pleasure to deal with all of the consultants who have responded to my questions.

Excellent service. I found it extremely helpful.

Excellent Job ! Thanks.

NERSC consulting and user services are excellent.

overall they do a very good job. it was nice to meet some of them at users meeting at Argonne.

Mosty happy, mixed evaluation:   4 responses

Support staff is friendly for the most part, especially on the telephone. Some e-mails received in response to questions posed were curt. That could be improved.

I have not used the consultants a great deal, but I am very happy to know they are available to help. On the few occasions, where I have sought help, I have come away with mixed feelings. In one memorable case, this was less a problem with the quality of consulting than with the quality and features of the product. (This was a GPFS/MPI-IO issue.)

Consulting: sometimes the consulting staffs look overloaded.

I'm generally happy with the consulting services. There are only a few problems I've had that have not been resolved.

Unhappy:   2 responses

I would like consultants to know more about software available to the users. Currently it sometimes seems that I can find out myself more than consultants can tell me.

NERSC should more flexible policy regarding quota etc for big user


 

Comments about NERSC

What does NERSC do well?

[Read all 119 responses]

69 Good hardware management, good uptime, access to HPC resources
62 User support, good staff
51 Generally happy, well run center
17 Job scheduling / batch throughput
11 Documentation
10 Software / user environment
6 Good network access
5 Allocations process

What should NERSC do differently?

[Read all 75 responses]

The area of greatest concern is job scheduling; 14 users expressed concerns with favoring large jobs at the expense of smaller ones; six wanted more resources devoted to interactive computing and debugging. Next in concern is the need for more hardware: more compute power overall, different architectures, mid-range computing support, vector architectures. Seven users pointed out the need for better documentation and six wanted more training.

24 Seaborg job scheduling / job policies
16 Provide more/new hardware; more computing resources
7 Better documentation
6 General center policies
6 More/better training
6 PDSF improvements
5 Seaborg software improvements
4 Other Seaborg improvements
3 Network improvements
3 No need for change
2 HPSS improvements
1 Shorter survey

How does NERSC compare to other centers you have used?

[Read all 65 responses]

Reasons given for preferring NERSC include good hardware, networking and software management, good user support, and better job throughput. The most common reason for finding dissatisfaction with NERSC is job scheduling.

41 NERSC is the best / overall NERSC is better / positive response
11 NERSC is the same as / mixed response
7 NERSC is less good / negative response
6 No comparison made

 

What does NERSC do well?   119 responses

Note: individual responses often include several response categories, but in general appear only once (in the category that best represents the response). A few have been split across response categories (this is indicated by ...). The response categories are color-coded:

  • Good hardware management, good uptime, access to HPC resources   69 responses
  • User support, good staff   62 responses
  • Generally happy, well run center   51 responses
  • Job scheduling / batch throughput   17 responses
  • Documentation   11 responses
  • Software / user environment   10 responses
  • Good network access   6 responses
  • Allocations process   5 responses
  Generally happy, well run center   51 responses

Powerful and well maintained machines, great mass storage facility, and helpful and responsive staff. What more could you want?

Fast computers, infinite and accessible storage, very helpful staff. I think it is the good relationship between users and staffers that sets NERSC apart.

NERSC is a very high quality computing center with regard to hardware, available software and most important highly trained and motivated consulting staff.

Everything. Both the hardware, and the user support, as well as organization and management, are outstanding. I am very pleased with interactions with the NERSC personnel.

As Apple would put it .... "it just works". I get my work done and done fast. Seaborg is up and working nearly all the time. Network, storage, it's all there when I need it. That is what matters most and NERSC delivers.

  NERSC simply is the best run centralized computer center on the planet. I have interacted with many central computer centers and none are as responsive, have people with the technical knowledge available to answer questions and have the system/ software as well configured as does NERSC.

I am a very satisfied user of PDSF. The NERSC staff on PDSF are excellent: highly competent, very responsive to users, and forward thinking. NERSC is making major scientific contributions through PDSF. Don't mess with success!

NERSC offers a very fast and powerful computer in a professional and timely way. Uptime is excellent and the service is excellent.

Organization and accessibility.

NERSC has had a good tradition of catering to the computing needs of the scientific community. NERSC has managed to provide more than adequate archival storage. NERSC has been good in handling informal requests for supplemental allocations.

- provides reliable access to state of the art parallel computing resources
- good training opportunities

- keeps their systems well stocked with software resources

- excellent web site

- excellent consulting help

- excellent public visibility of your management decisions and broad-based involvement of user group input

So far, NERSC has stayed far less oversubscribed than the NSF centers. Deficiencies notwithstanding (see below), seaborg is a very stable platform, and the total number of available nodes is sufficient for my current needs.

Overall, I think that NERSC does an excellent job in carrying out its mission. Both hardware and support are first-rate.

NERSC is the most important source of computational time that I have, which is very important in my research. And all the options that they offer make this work easy.

- keeping the systems up and stable. NERSC is absolutely the best in this category I have seen.
- getting response from vendors on issues. (The obstacle course at system acceptance time is exasperating for the vendors, but it ultimately leads to highly usable systems for the user community.) Please continue to stay vigilant!
- procuring systems that are capable of doing computing at the forefront. Although, I have issues with the way prioritization of jobs takes place (see my previous comments), the systems are at least capable of doing leading science. This is important and sets it apart for most of its pretenders.

Consulting, storage, and basic computing capability is good. The ERCAP submission process continues to steadily improve.

I am most pleased with the quality of service offered by the support staff - they are very quick and efficient in solving problems. I was also very pleased with the consistency of pdsf and the minimal down-time. My work was never held up due to NERSC problems.

My experience with NERSC was positive in all respects.

The resources seem to be well matched to the demands on them.

Consulting services. Well maintained resource for extensive all-purpose computing. Advances in technology and current maximum limits of performance.

The PDSF cluster is well maintained, and the admins are aware of what's going on. The support staff are extremely helpful.

the large quota in the $SCRATCH system for temporary data mass storage resources (quota,performance) consulting service web pages with documentation

For my purposes, I have no complaints with any of NERSC services.

Large facility is well maintained and reliable.

I haven't had any problems with NERSC with the exception of some recent issues with a disk going down.

I am overall satisfied with NERSC.

The user service and response is excellent and the quality of the computing resources offered is special.

Very similar to home Linux environment => can immediately compile and run at NERSC.

The facility is organized in a very professional manner. This makes it highly reliable.

NERSC has an excellent organization.

I like the facility as a whole, and have been very pleased with the little use I have made of it so far, through globus and its LSF batch manager.

The operation of the high end computing resources, archival storage facilities, consulting services and allocations process are all outstanding.

NERSC provides large scale computational resources with minimal pain and suffering to the end user. These resources are vital for our research.

Reliability, consulting, speed, storage ease.

Consulting. Emphasis on capability computing is really welcomed. Interest in architectures also.

I was happy with the ease of getting set-up and starting work. The systems were always capable of running my work with a reasonable wait.

Excellent facility. Excellent consulting service. Excellent communication with users.

I am most pleased with all aspects of NERSC facility as checked in all my answers above. I am grateful that I have access ( I am from Vancouver, B.C, CANADA) to a world-class facility which is second to none! I have been using NERSC for ~ 5 years and whatever I have achieved in my research ( and that is quite substantial) is totally due to the NERSC's access to me. I am sure thousands of overseas users of NERSC feel the same way as I and we all thank you for the golden opportunity the US DOE has offered to so many scientists for research which most of us could not even dream of carrying out anywhere else but at NERSC. Thank you again DOE and NERSC.

Catering to the high end user at the expense of less resource gobbling calculations which still yield important physics results.

Excellent computing facilities. Excellent website. Excellent Fortran-95 compiler.

Consulting and user services are excellent. Overall up time and quality of facilities are excellent. NERSC attitude is excellent.

I think that NERSC has a very powerful computer with a good web page and very good consultants.

very "customer oriented" - quick creation of accounts, easy to do things on-line, good machines & good turn-around time

Consulting, hardware & software configuration

Excellent center! The center handles large jobs effectively.

The performance of the hardware is very good, the available software and support is quite good.

The machine works! Web pages are very good. Telephone help is available.

Hardware (speed, uptime, storage), consulting, allocation procedure, informative website

The consulting help and the overall management of the computing environment are very good.

Well managed for such a big system, the administrators are always responsive. Can concentrate on getting work done rather than worrying about computing issues.

powerful resources, and account services, and web announcement for any changes applied.

  Good hardware management, good uptime, access to high performance computing resources:   69 responses

... HPSS is a great tool, which really makes difference for large projects like ours (the large-scale structure of the Universe).

mass storage; tech support

NERSC makes our analysis easier by providing a well-maintained and powerful computing system for our research use.

high performance and the fact that /scratch directories are not automatically deleted, gives the user more freedom to manage files!

It is up more often than RCF at Brookhaven

Provide fairly direct access to high performance computing resources

The uptime and reliability of SP2.

mass storage

It is very good that almost there is no down time.

NERSC consistently keeps its hardware up and efficiently running. ...

lots of machines

Provides good uptime, fast response to problem reports, reliable service.

Excellent configuration and maintenance.

Computing power Turnaround time

great access to lots of data, with lots of computing power.

NERSC does well keeping its CPUs up and running. ...

Seaborg has a good uptime, and it is reliable. HPSS is excellent. ...

Performance computing. Large CPU and large RAM.

pdsf

faster than rcf and more reliable

pdsf !

Hardware operations, uptime, user support.

Hardware availability consulting services

PDSF

... Seaborg is nice hardware.

Provide computing resources for scientific work.

Ability to request specific computers for jobs. Fast--once program is running.

The powerful of the seaborg machine

The SP speed is satisfactory (much better than the old T3E)

Hardware performance, technical consulting

  User support, good staff:   62 responses

Short response time, efficiency, high professional level of staff

Service and support are great!

The staff has always been very helpful in resolving problems and redirecting me to the correct non-NERSC person when appropriate.

I am most pleased with the timely support provided by the consultants at PDSF.

Consulting has been really efficient.

NERSC staff goes out of their way to provide the necessary tools for scientists to achieve their scientific objectives. I have experienced this with both the HPSS and the PDSF groups.

help response and network connection

Quality of technical advice from consultants

running large simulation and helping finding and solving problems

Support & CPU

Consulting services and the website

Excellent responsiveness to requests and stellar uptime performance.

Consulting staff are first class and have generally benefited our group

I really appreciate the help from consulting service.

Consultants are great.

NERSC has a good connection between hardware and consulting. I have found that the consultants can usually actually solve problems for me without too much bureaucratic overhead. Its good to give the consultants enough control to actually change things. The consultants have a good attitude and I can tell that they try to help. helpful.

Consultants are the greatest.

... Also, its consulting services have always been helpful even with the dumbest questions.

interaction with customers;

Consultants are extremely helpful.

Good user support. PDSF runs very reliable.

... The users support that help with any problem.

Consulting service is very good. They reply very fast. They are very helpful to me. Can run job interactively. Fast network.

Consultants are great. ...

The consultants are quite personable and really try to help (even if you do something stupid, they are nice about it).

Especially pleased with consultants - excellent!

Reliable and fast support, problem resolution

  Job scheduling / batch throughput:   17 responses

NERSC has a very short turn-around time for small and very large jobs. This makes it easy to debug large jobs and then to run them. ...

NERSC has a very short turn-around time for small and very large jobs. The time required to run a job is quite adequate. The support staff is great too!

... I also think the queuing structure works effectively.

Big jobs starts running quickly and for a long time.

The queue structure seems nearly ideal. Short/small jobs can be run almost instantly, making debugging much easier than on some other systems. Large jobs (at least in terms of processors -- none of my jobs take very long) generally seem to start within 24 hours.

queue throughput on seaborg and long queue time limit. this is what really matters to me most.

The queuing system works very well. Very few problems when porting a program to NERSC.

We are very pleased with the ability to use several 1000 Cpu s ...

The queuing system and the waiting time before a job runs are excellent.

  Documentation:   11 responses

... I've also found that compiling my codes (F90, in my case) and getting up and running was very painless, due to the good online documentation concerning compiling and the load-leveler.

Web site

Well documented resources.

  Software / user environment:   10 responses

I was using g98 and g03 and they were running very well. It has been very useful.

... The selection of software and debugging tools is quite good.

  Good network access:   6 responses

It's easy to connect to NERSC, since there are less security hassles (in comparison to LLNL, say). ...

Openness of computing environment and network performance.

I think easy access and storage are the strongest features of NERSC.

  Other:

Too early to say

 

What should NERSC do differently?   75 responses

  Seaborg job scheduling / job policies   24 responses

NERSC's new emphasis favoring large (1024+ processor) jobs runs contrary to its good record of catering to the scientific community. It needs to remember the community it is serving --- the customer is always right. The queue configuration should be returned to a state where it no longer favours jobs using large numbers of processors. ...
[user's jobs use 16-1,152 processors; most typical use is 192]

As indicated previously, I'm not in favor of giving highest priority to the extremely large jobs on all nodes of seaborg. I think that NERSC must accommodate capacity computing for energy research that cannot be performed anywhere else, in addition to providing capability computing for the largest simulations.
[user's jobs use 16-1,600 processors; most typical use is 128]

My only concern is the steadily increasing focus on the heavy-duty "power" users --- small research groups and efforts may get lost in the shuffle and some of these could grow to moderately big users. This is more a political issue for DOE Office of Science --- the big users right now are most needed to keep Congress happy.
[user's jobs typically use 16-128 processors]

Smaller users are the lowest priority, and that can be (predictably!) frustrating for smaller users. That said, we know that NERSC exists for the larger users, and our requests for small amounts of additional time are always honored if the time is available. So things always seem to turn out OK in the end.
[user's jobs typically use 16-128 processors]

The queue is horrendously long, making the computer resources at NERSC essentially worthless. I've had to do most of my calculations elsewhere - on slower machines - due to the extremely long time spent waiting in the queue. (Over a week, on average!)
[user computes on 1 node]

queue time for "small" jobs (using a relatively small number of nodes)
[user computes on 1 node]

Alternative policy for management of batch jobs. [Wait time for short runs with many processors sometimes takes too long.]
[user's typical job uses 256 processors]

Change the queue structure to make it possible for smaller jobs to run in reasonable time. A 48 hour wait for 8 hours of computing time, which is typical for a 64 processor job on the regular queue, makes the machine extremely difficult to use. I wind up burning a lot of my allocation on the premium queue just so I can more than 24 hours of compute time per job per week.
[user's jobs typically use 32-256 processors]

differentiate the queues [waiting time for small jobs is quite frequently inadequately long; there should be also a queue for small, but non-restartable jobs with longer than 24 hours limit]
[58% of job time spent at 1,024 processors; 25% in "regular_long"]

Sometimes the only problem is the time that we have to wait to start one work; especially if we submit this as a low priority (sometimes more than one week).
[user's jobs typically use 32-80 processors]

It takes a very long time to get a queued job to run on the SP. There are many things to be taken care just in order to submit a job.
[user's typical job uses 96 processors]

keep working on job queues so that both big node uses and those who can't uses big node jobs both have fair turn around for their jobs. ...
[user's jobs typically use 16-128 processors]

Some jobs such as climate models do not scale well so it is difficult to use large numbers of processors.
[user's jobs typically use 16-224 processors]

Recently, the queue is very long and need long wait to get program run. Need more NODEs or to optimize the queue system.
[user's jobs typically use 16 (5%) to 2,048 (35%) processors]

It would be great if the interactive/debug queue response time can be improved.

... As mentioned earlier, allow more interactive use on seaborg

A higher memory limit for interactive jobs would be nice.

Queue waiting time it will be even better if the interactive job can run for 1 hour.

Estimate how long it will take for a job to be started once it is pending in the queue. Assure the interactive sessions will not "hang" so much.

more flexible user policy ... [Should reduce the waiting time for debug queue]

The time limits placed on jobs are very restrictive. The short times mean that I have to checkpoint and restart after only a few iterations. This can increase the time it takes me to get results by an order of magnitude. My program also runs more efficiently if allowed to run for more iterations (as each new iteration is a refinement of previous steps) - continual stopping (due to time limits) and restarting can cause problems, sometimes resulting in incorrect results.

... Wallclock time limit is too short.

Limit users to those that run parallel jobs.

If you want to encourage big parallel jobs, you might consider giving a discount for jobs over 1024 processors.

  Provide more/new hardware; more computing resources:   16 responses

NERSC should move more aggressively to upgrade its high end computing facilities. It might do well to offer a wider variety of architectures. For example, the large Pentium 4 clusters about to become operational at NCSA provide a highly cost effective resources for some problems, but not for others. If NERSC had a greater variety of machines, it might be able to better serve all its users. However, the most important improvement would be to simply increase the total computing power available to users.

more computing power :-)

Computational research is becoming an essential need, and our needs of computer time increase constantly, since we want to tackle increasingly complex problems. Keeping updated in hardware and software and increasing the hardware capacity will benefit all the academic community.

Get new hardware.

I would like to see NERSC have some faster processors available. Also if NERSC had test boxes of the newest hardware available for benchmark purposes, it would be useful in helping me make my own purchasing decisions for local machines.

In addition to the IBM SP a system which allows combined vector and parallel computing would enable a challenge to the Japanese Earth Simulator. The program I mostly use would benefit from vector inner loops and parallel outer loops.

Stop putting all of your eggs in the IBM basket. If you want to compete with the Earth Simulator, you won't do with it SP's. Reason: poor (relative to Crays) scalability and poor (<15% of peak, sustained) single-cpu performance.

Not much - perhaps the next generation of SP would be nice.

Having more than one mainframe of one type is wise. That way when one is down one can still get some useful work done. Native double precision mainframe would be nice. Having kept some Cray hardware would have resulted in less time wasted porting valuable codes to a less code friendly platform like the SP.

It would be great if NERSC could again acquire a supercomputer with excellent vector-processing capability, like the CRAY systems which existed for many years. The success of the Japanese "Earth Simulator" will hopefully cause a re-examination of hardware purchase decisions. Strong vector processors make scientific programming easier and more productive.

I would like to have access to machines non specifically dedicated to parallel jobs.

I'd like to see some queues and/or machines to support legacy codes that have not or can not yet efficiently utilize multiple processors.

As mentioned earlier, please find a way a getting users who are not doing true supercomputing to find a more cost-effective solution to their computing needs. (I mentioned the possibility of smaller NERSC-managed Linux clusters.) Because of the clogging that occurs with these small jobs, the turnaround time for large runs can be unacceptably long. ...

Memory/processor upgrades should be considered.

These are more hardware issues, but: - More large-memory nodes ...

Have a machine with better usability of the memory and better bandwidth

  Better documentation:   7 responses

I would like to see better web docs. Sometimes I think what is there is over kill. Simpler is better. Make the full details available (deeper down the tree) to those who want and need it ... but try to keep everything concise and straight forward. Try to anticipate the basic user questions and make the answers easy to find and easy to understand. Always keep in mind, the user just wants to do blank ... he doesn't want to be an expert with HSI or loadleveler or whatever. The look and feel of NERSC web pages is a little stark ... and the pages all blend together. It helps to remember where you where (3 months later) if the page stands out. For example, when I was looking for blank I remember finding the answer on the bright green page with big banner on top. Nearly all nersc pages look the same. Imagine trying to find your way home in a city where all the streets look nearly identical.

The web pages can use improvement. There have been recent improvements on the HPCF website, I hope that other pages will improve as well (finding information, not just be searching, but also by browsing).

Make it easier to find specific info on system commands/functionality.

I don't see any major problems, although I often have a hard time finding information on the website.

Web page. Should be more informative and easier to find out.

... Provide information to help those with models that can't use 64 node to best use MPI and Open Mp to maximize number of nodes while retaining efficiency. Using 64 nodes at 1 percent efficiency would be a big waste of computer time and nodes both.

Update the webpage! Make the information more accessible to less trained users.

  General center policies:   6 responses

Reduce the security level. Not have passwords expire.

The overhead on account managers still seems a bit much for what we're getting. I still find the ERCAP process onerous (i.e., more information requested than should be necessary). Also, most of the codes we are using are changing much more from year to year in a scientific sense than a computational sense, it becomes repetitious to have to keep evaluating them computationally each year. You need to keep in mind that most of us are being funded to do science rather than computational research.

Measure success on science output and not on size of budgets or quantity of hardware.

Can't think of anything. Satisfying both capacity and capability missions remains a challenge.

NERSC should attune more towards the individuals that ask for help and less towards the masses. I receive too many e-mails!!!

Level of functionality in data handling / data management services are quite low.

  More/better training:   6 responses

It would be nice if NERSC can provide more tutorials.

I have not yet participated in a training session - perhaps that should be more strongly encouraged. We could also use advice in tuning our codes to make better use of the facilities.

Training either more accessible and or better known as to what is available. Possibly an index of resources could help.

... grid classes.

Please offer more on-line video/on-site courses

Make training lectures more accessible to users at remote sites. (Video lectures will be a big help to users who have no access to Live DOE grid lectures)

  PDSF improvements:   6 responses

Disk vault tricky to use

more interactive nodes on pdsf, ...

Would prefer faster CPUs at PC farm.

make it faster and bigger diskspace

Better check on 'crashed' batch nodes, i.e. LSF shouldn't submit jobs to these nodes. Faster recovery from disk vault 'crashes' (= unavailability of data sitting on pdsfdvxx).

some of the disks we write to could be upgraded - it limits the number of processors I can use and my code could run a little faster

  Seaborg software improvements:   5 responses

possibly add simple visualization software to Seaborg (xv, xmgrace)

Software availability could be better, especially regarding C++ libraries and tools.

... The compilers and debuggers need to be improved.

I think the software environment should be broader, include python by default for example.

... more responsive to software bugs, etc.

  Other Seaborg improvements:   4 responses

Allow larger scratch space per user.

... Although its disk policies have improved since the days of the CRAYs, there is still room for improvement.

I would like to see the SP running for months without shutdown. Maintenace should only affect few troubled nodes, not the whole machine. ...

Reliability of seaborg has been problematic over the summer, especially with GPFS. ...

  Network improvements:   3 responses

faster connection to East coast

... Faster WAN connectivity to the external world

better interactive connectivity

  No need for change:   3 responses

As Garfield would say, "Don't change a thing.

Nothing

All is well.

  HPSS improvements:   2 responses

  access to HPSS without special utility but with migration/robot system

Hire a few programmers to write remote file system drivers to access HPSS as a native disk instead of the monstrosity that is hsi (keep the ftp access, though...it's invaluable for offsite use).

  Shorter survey:   1 response

I find this survey too detailed, especially after being told it would take "only a few minutes." You should strive to consolidate it to about 1/2 of its present length.

 

How does NERSC compare to other centers you have used?   65 responses

Parts of the responses have been color-coded using the same scheme as the What does NERSC do well? section.

  NERSC is the best / overall NERSC is better / positive response:   41 responses

NERSC is one of the best centers

I used computer facilities at the University of Western Ontario in Canada (1990-96), at Auburn University, AL (1997-99), at the University of South Florida, FL (2000), and at the Engineering Research Center at Mississippi State University, MS (2001-present). I think, NERSC is the exemplary organization from which many centers can learn how to operate efficiently, facilitating progress in science.

Very well, number one compared with: Los Alamos, Mano/ETH Switzerland !

NERSC is the best. I'm comparing to SDSC, NCSA, and a few smaller places at various Univ. Best uptime ever. Jobs get through the queues faster on NERSC. Bluehorizon is bogged down and takes for ever to get through the queue. And, it seems to be up and down a lot. Seaborg is very steady. I personally don't like SGI machines ... so I'm staying away from places that have mostly SGI's. Machines are o.k. ... but the software (i.e. compilers) are not as nice as XLF. I'm probably not the best person to survey ... since I'm nearly 99.9% happy.

superior

NERSC does better at serving the customer than NPACI which introduced an emphasis on large jobs somewhat earlier, and it has better turnaround. NERSC allows external access to its HPSS, which NPACI doesn't. Other centres I use are much smaller operations, so a comparison makes little sense.

I've used NCSA for many years. I've tried to use the Pittsburg Terra Scale system, which was not a good experience. I have a limited access to centers in Italy (CENECA) and in Germany (LRZ, Munchen). NERSC is an outstanding facility.

As compared to other centers, I like NERSC most because of the hpcf web site and the consulting service. They are making things really different.

NERSC is much better compare to Los Alamos ASCI

it's the best. Ever since NERSC got IBMs, it's on top of the list of computing centers in terms of turnaround, speed and overall setup/performance/configuration. HLRN has a nice Power4, but the setup stinks and they seem to have massive setup/configuration problems, NERSC is doing a fabulous jobs with those. The people at NERSC seem to have a real idea about supercomputing. Most supercomputer facilities don't.

See above. NERSC works well. Most computer centers limp along. I compare NERSC with SDSC, NCSA, our local supercomputer consortium (OSCER) and supercomputer centers in France, Germany, and Sweden.

The computers appear to be run more effectively than my previous experience with Maui.

I think NERSC performs a little better than the ORNL facilities at present.

NERSC is doing the right job compared to the RHIC Computing Facility, it's hard to even compare both. The staff at both facilities seem to have a vastly different approach to their job, I hope that NERSC can keep up the good work. I have tried to use a 500-node Linux farm at LSU, but gave up because of the large amount of downtime (~8hrs/wk or more) and taking the whole cluster down with little advanced warning making running long term jobs very painful. PDSF on the other hand is of similar size, yet has hardly any downtime.

Quick response, easy access. Brookhaven National Lab. (tight security access, firewall)

Better than RCF (Brookhaven lab): much better support and more CPU

usually its faster than RHIC computing at BNL

I use Eagle and Cheetah at Oak Ridge. They are smaller machines. So the throughput is not good. The jobs wait for too long. The satisfaction from NERSC is great!

NERSC is bigger (in terms of computer size) and faster (in terms of job turnaround time) than SDSC, and I always seem to receive prompt, personal service when I have a problem with my account. Very nice.

I've used "super"computers in Los Alamos, Leeds and Spain. NERSC is by far the best.

NERSC is one of the very best, both in terms of the amount of work I can accomplish and the responsiveness of the staff. Other places I have run are: PSC, SDSC, NCSA, Minnesota, and Oak Ridge.

NERSC does an excellent job. We have also had some time in San Diego.

Improvement of single cpu performance and increment of memory, if possible. Overall, SP at NERSC is the best for large scale simulations, code debugging and profiling, or try softwares, as compared with other computational facilities (ASCI Blue, Intel Linux Cluster, TC2k, MCR...) at the LLNL.

NERSC is enormously better than RCF (see previous comments about heavy-handed security at that facility). The other large facility I have worked at is CERN but these are hardly comparable. NERSC via PDSF is doing fine.

I have also used LSU's new super-cluster, they are a miserable wreck compared to NERSC, due almost entirely to poor administration. I also use a home-grown cluster, but the queuing system at NERSC is much better than the queuing system on this local cluster.

NERSC is as good and probably better than other centers I have used.

As I said in a previous section NERSC beats RCF hands down. I like working at pdsf. All the resources I need are available. Even when I submit a trouble ticket I am confident in the people who are responding to them and my issues always get resolved and followed up.

Compares very well as compared to Oklahoma State University and to commercial software vendors.

I think NERSC is doing much better compared to a number of NPACI and DoD sites.

superior to bnl/rcf.

Much better in terms of reliability and easier accessibility (no freaking out about security, which messes up the whole system). I'm comparing to RCF ant BNL.

It compares very well with other centers I have used (NCSA, ASC, ARL)

NERSC compares very favorably to the Pittsburgh Supercomputer Center, the National Center for Supercomputer Applications and the San Diego Supercomputer center.

Blows their doors off! PSC (Pitt. SC Center) was so challenging to use we spent weeks just trying to finish one simulation.

Very stable environment to work with.

The other farms I've used don't provide any consulting support at all.

I found NERSC easier to use than some of the other sites.

I mainly compare NERSC with the CERN computing facilities, and several other university computing centres. NERSC's system of sharing resources and accounting for usage seems to be logical and works well. [PDSF user]

NERSC is very good computing center, I didn't used other similar center, so I don't have comparison.

I use the -BCPL (Bergen Computational Physics Lab.) -CSC (Frankfurt Center of Scientific Computing) -GSI-Linux Cluster. NERSC performs nicely.

I have also used Mike at LSU, ( a very young system that makes me appreciate how smoothly NERSC operates) so I have little basis for comparison.

  NERSC is the same as / mixed response:   11 responses

The machines are better, faster, and seem to be well maintained (more uptime, fewer killed jobs). But the wait time in the queue is absolutely horrible. This is in comparison to the GPS and ILX clusters at LLNL and local beowulf clusters here at UC Davis. I'd love to use NERSC more, but the time I spend in the queue makes the resource impractical.

NERSC is better in almost all categories, with the exception of interactive capabilities on seaborg. The lack of disk space and scratch space that I complain about seems to be a problem almost everywhere. I am comparing NERSC to Oak Ridge, SDSC, NCSA, as well as some secure computing facilities and vendor systems.

ANL LCRC - JAZZ, APAC in Australia, VPAC in Australia. The online documentation of NERSC is better, but the others are more flexible with wall-time limits and/or queuing/prioritizing of large parallel jobs.

SDSC, The performance of NERSC and SDSC is comparable. Both are excellent!

Most of the other centers I use have POWER4 based machines (NAVO, MHPCC) which are nice. The quality of the consulting seems to be comparable. The websites at the DOD sites are worse than the NERSC site.

NERSC has more software and consulting support, also more stable than the ccsl at ORNL. However, the ccsl at ORNL has the access to the file system even though the big machine is down. Can nersc also implement this type of system? Meanwhile, for the queue system, can NERSC also let small job using a few nodes run a maximum 48 hrs even though the policy encourages big scalable jobs? This is because that not all of our codes are scalable to large number of processors under all physical conditions. You can give low priority for small long time job.

I have used the San Diego Supercomputer Center and the Texas Supercomputer Center. Compared to them, NERSC has a higher performance machine, generally better uptime, a better selection of programming and debugging tools, and a more useful help system. NERSC's queueing system, in contrast is worse because it is less flexible. While SDSC also favors large jobs in its queue, it has tools to allow users with smaller jobs to work around the large jobs. Thus, for example, there is a tool called "showbf" to allow users to determine when the next job is scheduled to start, and to fit their job into the backfill window created by processors going idle before the large job starts. Similarly, the queue system gives users an estimate of how long it will take for jobs to start, allowing them to adjust the number of nodes and amount of time requested to make jobs start sooner. The flexibility that tools like this provide makes it possible to be a small-node user without resorting to the premium queue. NERSC lacks comparable facilities.

All centers, such as PSC, NCSA, NERSC, and ORNL are managed well.

It is very complicated to start with. Once the batch files are setup and one got used to the structure, it becomes much more accessible.

NERSC and LLNL LC compare very favorably.

NERSC is on a par with Oak Ridge and NCAR, however, it does suffer from very large numbers of users. This develops very long waiting times.

  NERSC is less good / negative response:   7 responses

I think seaborg should be accompanied by a cluster of serial machines such as the ADM cluster morpheus at the University of Michigan.

NIC Juelich,Germany has got the above mentioned robot system to migrate data to tapes (in Cray cluster, planned for new IBM SP)

ok (NCSA), although the queue-ing time at NERSC seems longer

1.Eagle: ORNL, 2. IBM SP: North Carolina Supercomputing Center (decommissioned now) The only major thing I dislike at NERSC is the inability to run jobs interactively for even short period of time which can be quite frustrating if I am trying to debug a code.

I'm comparing to our IBM-SP computer centers here at Oak Ridge (Eagle and Cheetah) that require much less of a regular application process for resources. The trade-off is that I don't always get the same regular access to resources as the higher priority projects (i.e., SciDAC, etc.). But even being a spare cycle user can give me significant computing on these systems over time.

I said this a year ago: PLEASE do what the San Diego Supercomputing Center does and provide shared-access nodes for interactive use. Debugging a code on Seaborg can take an order of magnitude more time that it should be because of 'access denied due to lack of available resources' errors. SDSC discovered a simple solution -- WHY HAVEN'T YOU IMPLEMENTED IT???

HPCC of USC. No limit on walltime.

  No comparison made:   6 responses

No previous experience.

RCF, SLAC, FNAL

I haven't really used any other center, besides the OU OSCER facility here, so I can't really compare NERSC to other big centers.

Edinburgh Parallel Computing Center Pittsburg Supercomputing Center NCSA

Variety of mainframes like at Pittsburg

I have used resources at Cornell and SDSC, but that was quite a while ago.

Show Pagination