- Good scaling initiatives: 20 responses
-
Please push this project as much as you can. This type of consulting is very
important
if one goes to the limit of a system in terms of #processors and sustained
performance.
I think, it's a good idea. It promotes more efficient use of the NERSC
resources.
Good idea
We were not quit ready to take advantage of the initiatives, but they are are a
good idea.
- Ultimately, I think this is a good idea and will lead to better architectures
in the future, as well as allowing us to make optimal use of the systems we
have today.
- I don't think anything prompts a systems vendor to fix issues better than
having a clear characterization of those issues.
always good to think about that.
The restructured the LoadLeveler classes on Seaborg has provided us a leap in
our progress in high resolution climate simulations.
Thought the program was very successful and was very beneficial to our group.
These services are very great to get our work done quickly.
It provides more incentive to improve the scalability of our codes.
I favor the shift to large-CPU, large-RAM jobs.
the initiative is great!!! ...
My research was particularly benefited from the Large Job Reimbursement
Project, which helped us to: 1) test our code using large number of processors,
2) run very long simulations at no cost.
Interesting, I appreciate it.
... That having been said, I am glad that the queues are becoming more
friendly to large jobs.
It is great, ...
I think it is appropriate since NERSC is not in the workstation business, it is
for serious users of computer power.
It is the right thing to do. Usage efficiency and proficiency are
important.
Eventually, I will want to run jobs on more nodes than at present. I expect
these
initiatives to be very helpful. As a user of fewer nodes, I don't really
notice a
difference, positive or negative, in my productivity due to these initiatives.
It is very important step in the development of scientific computing, since
parallelization is the trend, especially for massive computing techniques.
NERSC response:
Thanks for the feedback. The NERSC scaling initiative has thus far been
positive for many projects.
- Wrong approach: 11 responses
-
I believe that they are totally misguided. The emphasis should be on maximizing
the
SCIENTIFIC output from NERSC. If the best way to do this is for the user to run
100
1-node jobs at a time rather than 1 100-node job, every effort should be made
to
accommodate him/her. Even for codes which do run well on large numbers of nodes,
it
makes little sense for a user to use 1/8 or more of the system, unless he/she
has an
allocation of 1/8 or more of the available CPU time. Even then it might be more
useful
to run for example 4 jobs using 1/32 of the machine at a time rather than 1 job
using
1/8 of the machine. The IBM SP is not designed for fine-grained parallelism
requiring
communication every 10-1000 floating-point operations. In the final analysis,
it should be
up to the users to decide how they use their allocations. Most, if not all of
us, will choose
a usage pattern which maximizes our scientific output. Remember that most of us
are
in computational science, not in computer science. We are interested in
advancing our
own fields of research, not in obtaining Gordon Bell awards. ...
If it were not for the fact that our FY2003 allocations are nearly exhausted,
we would be
complaining loudly because with the new Class (Queue) structure which favours
"large" jobs, we can't get our work done.
Don't freeze out the small-to-moderate user --- the science/CPU hour is often
higher for the moderate user
Although we successfully tested large jobs, I do not believe these jobs could
serve our scientific goals well. I could easily see using 8 nodes of seaborg
for our activation energy barriers determination jobs, but using more nodes
than that would not be efficient or necessary. In fact, I see a significant
threat to excellent quality supercomputing research by expanding in the
direction of using more and more nodes per job. I suspect that a good fraction
of the work pursued at seaborg, although excellent, because of the very nature
of the problems handled, one cannot expect linear scaling to very many nodes.
We believe that 25% of the resources devoted on this super large jobs is
already too much.
I have a negative view of NERSCs scaling initiatives. I understand that NERSC
needs to propagate itself and to justify the purchase of a new, larger machine.
But in my opinion not all good science is done on thousands of processors and I
feel penalized both in time available and priority in the queues because I use
hundreds of processors and not thousands.
There is always a tension between massive users and those who want to run
smaller jobs. While many researchers use a single node (16 processors), I think
it would not be cost effective for DOE to pay them to run on their own
machines. ...
The new classes mean that jobs not using a lot of processors but still doing
useful state-of-the-art physics calculations are not getting through as much as
before. In fact, requesting fat nodes may mean a wait of a few weeks.
Not all good physics is done by using 6000 processors.
NERSC response:
The queue policies for how nodes are selected were changed this summer (2003)
in an effort to improve access to large memory ("fat") nodes. The larger
memory nodes
are now the last to be selected for batch work unless
the job specifically requests the large memory resource. If your work
requires the 64 gigabyte memory nodes we encourage you to
contact NERSC consultants in order to discuss your specific requirements.
For our work, this initiative has been counterproductive. We perform very
challenging time-dependent
computations for stiff systems of nonlinear PDEs, which require the solution of
ill-conditioned matrices at every
time-step. Although we are using state-of-the-art parallel linear algebra
software (SuperLU_DIST), scaling to
increase speed for a given problem has limits. Furthermore, when solving
initial-value problems, the
time-dimension is completely excluded from any 'domain' decomposition. Our
computations typically scale well to
100-200 processors. This leaves us in a middle ground, where the problems are
too large for local Linux clusters and
too small to qualify for the NERSC queues that have decent throughput. My
opinion is that it is unfair for NERSC to
have the new priority initiative apply to all nodes of the very large flagship
machine, since it weights the "trivially
parallel" applications above the more challenging computations, which have
required a more serious to achieve
parallelism.
Excessive emphasis is being placed on this.
It is detrimental to research group like ours. We need NERSC resources to run
100-1000 jobs or so at a time (in serial mode with one processor per job) or
10-20 jobs with 2-3 nodes per job. There are no other resources available to us
that would enable us to do this. On the other hand, our jobs are no longer as
favored in the queue since they are smaller scale jobs.
I am unsatisfied with the scaling initiatives. Quite a lot of my calculations
require small number of nodes with long wallclock time (due to the code I
used), which is slow in the queue. Good science often comes from small, routing
calculations, not from massive parallel jobs.
First, I understand the reasons for the initiatives. But it may not be the most
cost-effective
way to use the computing resources. For example, we only need 32 nodes for
eight hours to complete a simulation. The 32 processors is most efficient
because we simulate the collision of two beams and each beam is on a single
node(16 processors). But we need to repeat the same simulation with thousands
of different parameters to help optimizing the performance of collider. In this
situation, if we are forced to use more processors for each simulation, it
actually wastes more resource.
NERSC response:
The focus of the scaling initiative is not to meant to judge what is or is
not good science.
NERSC realizes that scaling up the parallelism in scientific codes is not
always possible, can require significant effort, and is not in every case productive. Where it
is possible and productive we stand ready to assist and promote highly
parallel computations which make full use of the resources we provide.
Providing opportunities which encourage researchers to explore the parallel
scaling of their codes has been useful to many but not all projects and codes.
It is the inevitable outcome of advances in computing that computations
which were once difficult should become easier. As the domain of what is considered a workstation sized problem expands, so should the realm of capability computing. The best matching of compute resources to scientific problems is a moving
target and we try our best at providing resources which meet NERSC users' needs.
- Didn't help / not ready to use / not interested: 4 responses
-
Doesn't help us, we would need much faster I/O to scale up to more processors
at good efficiency. That may change in the future with new codes.
we're not yet ready to use these, but we're gearing up.
I am more interested in getting the results than exploring the scalability.
...
... perhaps another person in my group used this but I did not. I plan to do
this in the future, but currently my runs which scale to such large processor
numbers take an extremely long time to run, hence we have not performed these
long runs. There is the possibility that further performance and scaling work
on our code could increase our ability to use larger processor numbers.
- Startup projects weren't eligible: 2 responses
-
As a startup account, we run jobs on 2000 Cpu s but could not be part of the
Reimbursement Project
I am currently trying to scale my code to 4096 P, however the sheer cost of
start-up alone means that my small ERCAP allocation was exhausted rapidly when
I began testing on 4096 P. It would be useful to have fund available at all
times for large scale testing.
- Users need more technical info: 2 responses
-
... I
would like consultants to know more about the software available at NERSC, e.g.
compilers, parallelization libraries, and mathematical libraries.
... the minus is some lack of information (detailed surveys) about performance
and scaling characteristics of currently available chemistry codes
- Sometimes hurts smaller jobs / sometimes OK: 1 response
-
This month with everyone trying to use up their computer time the turn around
for smaller node jobs is now on the order of a number of days while those
using many nodes get reasonable turn around. The rest of the year letting
them have much higher turn around probably doesn't hurt those of ue who
can't use 32 or 64 nodes.
- Don't know about the Matrix / not my role to do this: 13 responses
-
I am just new using NERSC and I do not know what it is that about
I don't really understand what it is.
Not familiar with the Applications Performance Matrix
I'm not one of the lead PI's on this project. They would have done this ...
not me.
?
My repo may well do so. I'm not certain of our plans in the regard.
I don't understand what it is
I don't know what this is about. I do use poe but have no idea what poe+ is.
More exactly, probably not. The reason is that I just have not looked at
Application Performance Matrix and don't know much about it.
I was not aware of that service
It would be more appropriate for others in my group to do this.
I'm unaware of what it is.
I don't know how to do it.
NERSC response:
Submitting data to the application performance matrix is done via a
submission form on the web .
- Codes don't scale well / don't have large codes: 7 responses
-
our code scales badly using more than 16 nodes
We were somewhat surprised by the results of our benchmarks and would like help
identifying and eliminating bottlenecks.
I have to learn more about the APM before I can give an answer.
In any case, I believe my applications are too small.
I do not currently conduct large-scale simulations, just diagnostics on the
output of previous simulations conducted by others (escher). Large scale
simulations would require a shift in our funding paradigm.
I used hpmcount since I used a serial or one node job using 16 processors. I
will try to do a combined OMP/MPI this coming year. We also have some molecular
dynamics calculations to run that apparently scale well in MPI.
POE isn't necessary for optimizing performance for my codes.
Applications I am running right now are not large enough to make this relevant.
- Question doesn't apply to PDSF users: 6 responses
-
I wasn't aware that such as system existed. Also, I have the feeling that
these 'measurements' have a fundamental flaw: often things like
'efficiency' of the application is measured by comparing the CPU time to the
wall clock time. This makes codes that calculate theoretical models, which
often do very little IO, appear very efficient, while codes reconstructing
experimental data and do much IO appear less efficient. Within experimental
reconstruction software there is also a wide range of codes: some large
experiments for instance have to do tracking on almost all events and require
a lot of CPU, while other experiments need to scan through many background
events without doing very much except for IO in order to find a signal event.
Primary processing of our data is done; I am not involved in the microDST
construction for individual analyses, so I am not running large jobs at the
moment.
I don't think it applies to me personally - would be better suited to a study
of the STAR simulations/reconstruction usage probably.
I think it applies to MPP and not to PDSF style computing.
Our applications are not large enough
The ATLAS code doesn't support true parallelization at this point, only
breaking the jobs into pieces and submit the pieces to multiple machines.
- It's inappropriate / don't have enough time: 5 responses
-
We are benchmarking ourselves. The performance of our application could be made
to vary from very bad to very good. Such results are not representative.
I would like to, but have very limited time available for this type of
activity.
1. Life is too short 2. As a small-to-moderate user, I just do not have the
time to do this --- my NERSC computing remains a small (<33%) fraction of my
overall scientific responsibilities and I do not want to set aside the
hours-to-days-to-weeks to get involved with this
I will submit using the Applications Performance Matrix because the application
form seems to require it. However, I believe it is basically useless as a
measure of code performance. I use an AMR code, the result of which is that the
performance is highly dependent on the geometry of the problem, the size of the
problem, and what physics is being simulated. Depending on the problem and the
physics my code may scale from anywhere to 16 processors (for smooth collapse
over a large dynamic range using radiation, gravity, and hydrodynamics) to
hundreds of processor (for pure hydrodynamics with a fairly small dynamic
range). A single measure like poe+ is not useful in this case.
lack of manpower
- Have already submitted and no changes expected this year: 2
responses
-
I have already benchmarked the codes I intend to run in the coming year, or at
least codes which should perform very similarly. If I gain access to a new
machine with a large number of nodes in the coming year or there are major
hardware/software improvements on those machines on which I plan to run,
perhaps I will perform new benchmarks and submit the results to the
Applications Performance Matrix. Similarly if we find new algorithms, or ways
to improve the performance of existing codes then I might submit new results.
However, I do not have any such plans, a priori.
No expected change in performance.
- Other: 3 responses
-
Assuming a refereed publications describing the application is available.
If asked (and I assume we will be), our project will submit
data.
I will not be at LBL the coming year