NERSCPowering Scientific Discovery for 50 Years

2006 User Survey Results

Visualization and Data Analysis

Where do you perform data analysis and visualization of data produced at NERSC?

LocationResponsesPercent
All at NERSC 10 3.9%
Most at NERSC 35 13.8%
Half at NERSC, half elsewhere 40 15.7%
Most elsewhere 90 35.4%
All elsewhere 71 28.0%
I don't need data analysis or visualization 8 3.1%

Are your data analysis and visualization needs being met? In what ways do you make use of NERSC data analysis and visualization resources? In what ways should NERSC add to or improve these resources?

 

Requests for additional services / problems:   21 responses

Requests for additional software

Improve the support of add on python libraries.

would be nice to have R (open-source S-PLUS)

... The only thing I can think of that I would like to see added is the mapping toolbox for matlab.

I would like to see the Climate Data Management System (CDAT) working on Seaborg (with the GUI)

3D visualization softwares such as AVS, Visit, etc., are hard to learn and to use for the typical researcher. That's why gnuplot is still the preferred tool for analysis and vis for many. NERSC's resources are extremely good for analysis and vis but the thing missing is a closer working relationship between members of the vis group and the researchers. Tailored analysis and visualization tools for specific applications would be great but researchers usually don't know what they want or what they are missing... Maybe the vis group should take the initiative of building a few of those vis tools for chosen applications and publicized them.

I know NERSC works at improving viz (which for me means both data analysis and visualization) but the codes we currently run at NERSC don't need any high end viz. Someday we may be doing long MD (version QMC) calculations. Then I will want NERSC to even have or install out real-time multi-resulation analysis software that allows us to detect stable structures and transitions, currently over 2-to-the-12th time scales.

... Sometimes I used XmakeMol compiled by myself at Home directory, which is light and good for atomistic structural analysis. Can you offer higher visualization/analysis softwares likewise?

So far I have been enjoying the various options of visualization softwares (mostly AVS and IDL) available in DaVinci. However, one of the major simulation code I have been recently using, the NIMROD code, has been designed to have its data output format work mostly with the visualization package Tecplot. Tecplot is a commonly used commercial visualization package that is well known for its easy accessibility and short learning curve. Unfortunately it is not available on DaVinci. I requested the consideration of installation of Tecplot on DaVinci about a year ago, based on not only the need from my own project, but also from the more important fact that the installation of Tecplot will benefit a large pool of NERSC users who are also users of the NIMROD code, which is one of the two major fusion MHD codes supported by DOE Office of Fusion Science. Yet my request is still under "evaluation" after nearly a year. I would like to take the opportunity of this annual survey to reinstate my request and concern about this request.

We use our own software for data analysis and do not rely on external, commercial software, our needs are satisfied by using the CERN ROOT package.
However, we currently lack any basic graphics visualization tools on PDSF. By this I mean a tool to look at PDF, GIF, PNG etc. We often create graphs in batch mode and these can only be viewed by copying them back to the desktop machine. We would like to see some basic graphics package installed on SL302 on PDSF.

Requests for more resources

We do most of our visualization in house with IDL, on serial machines. We have begin working with the visualization group for advanced visualization. The best addition would be stabilization and increase in capacity of shared file systems to make interoperation between code running machines and analysis easier. Added capacity in scratch and other file systems would also be very helpful; we often need to store an analyze large data sets, which often requires special arrangement.

I write my own analysis codes which must be run on the large machines (mainly Bassi) since DaVinci is not large enough. I move results from these to local resources where I visualize them and process further. I am happy with the situation. It would be nice to have a larger post-processing machine though, since post-processing development is quite iterative for me and this doesn't fit with the long queue times on the production machines.

I think the current way that queues are structured has a very significant and adverse effect on the ability of users to do vis/analysis on NERSC resources. The difficulty is that for large data sets, massive computational power is needed for analysis and vis. Currently the only way of getting that power is by using the production batch queues on the big machines. The problem with this is that it almost entirely eliminates the possibility of doing actual interactive viz and data analysis. In one recent set of run we were creating multiple 60 Gb data dumps and needed to run complicated algorithms to analyze the data and then we wanted to do viz. The problem is that we either have to run using:
1. Davinci
2. Interactive queues on on the big machines.
I realize that it is an extremely difficult problem to schedule jobs that are i) require many nodes i) need to be executed on demand. But, this is a huge limitation currently when it comes to data viz and analysis.

It is important that visualization server is available for dedicated data analysis and visualization as well as software that can leverage the server.

Requests for consulting help

At present, most of our visualization is done at DoD - but intend to switch to doing more at DaVinci. We will then request considerable help from the Visualization Group at NERSC

My group relied on help from NERSC visualization consultants in the past. But it seems too hard for us as regular users do all of it ourselves.

I am mostly glad with the data analysis and visualization support on nersc. 3D visualization might be a direction to pursue.

consulting help

Other

NWChem is not working completely well. Certain modules like PMF does not work (at least in Jacquard). The task shell command used in NWCHem does not work either in Jacquard.

I use matlab on davinci almost daily. Occasionally, I can't start it because of license shortage.

No.

It is very cumbersome to use PDSF when you use modern tools. For instance, I edit files and want to use code management tools found on MacOSx. However, I do not have enough disk space on AFS. Also, i cannot run batch jobs on PDSF using AFS.
What I want is to mount my PDSF files on my local computer. NERSC does not allow it. As a result, I use my own desktop most of the time. It is simply to hard to use NERSC.

 

Yes, data analysis needs are being met / positive comments:   14 responses

My viz needs at NERSC currently have to do with the LLNL VisIt tool. This past year we (LLNL researchers using VisIt to analyze data from NIMROD runs) came to the NERSC viz group requesting help interfacing NIMROD & VisIt and received *excellent* support.

I use visualization software and had collaborations with the visualization group who have always been very helpful.

My needs are mostly satisfied. I use mostly IDL on daVinci or other platforms. NGF made things easier in that respect for me.

My needs are currently met well by the data analysis capabilities of DaVinci.

I'm happy with DaVinci.

DaVinci for large-scale data analysis

My analysis and visualization needs are being met. I use DaVinci a lot with very large data sets. Most often I use Matlab, Grads, and NCL. ...

I use matlab and mathematica, and I amd satisfied with the current level of resources.

Yes. Serial queues on Jacquard or Bassi with my own software.

I use xmgr and gnuplot routinely. But that's about it.

I have not worked with the visualization group yet. My approach so far has been to use IDL and python/gnuplot to run where the data is. I have not explored the use of DaVinci and if that will require moving large dump files (which will likely be less efficient than postprocessing where the data is).

Satisfied.

They are met.

Seems OK

 

Do vis locally:   14 responses

I do the data analysis on my own PC. ...

I use my desktop for visualization and data analysis.

I do all post-processing and visualization off-site.

We export our produced data to other non-NERSC machines for final analysis and visualization where we have better X connections, better control of software configuration, better uptime, etc. I have not explored non-PDSF options at NERSC for these things; PDSF is simply not stable enough or designed for this kind of work. For the most part our final analysis and visualization needs are fairly modest and are well served by a mini cluster under our own control rather than having to submit proposals, share a cluster with other users, etc. to use NERSC resources for these needs.

Most of my visualization is done in Matlab, requiring moving large blocks of data to my local computer. This can sometimes be time consuming.

I usually do elsewhere, so not important to me.

I do not use the data analysis and visualization resources on NERSC. All of that is handled on local machines.

I don't use data analysis and visualization resources on machines at NERSC. I use local machines instead.

I do all visualization and data analysis elsewhere, because I have everything set up and I do not need a lot of resources.

it is easier to process data on a local machine for me because for data analysis I don't have to wait in a queue. For visualization, manipulating x windows it is much better to be local

I do analysis and visualization at our facility. I'm not sure that's the best solution for us, but it's the way we do it now.

We analyze our data locally. Data analysis is inexpensive for our projects. We don't use data analysis and visualization software at NERSC.

I have checked out that the matlab graphics works on Jacquard. However I have used the software for real work at OSC - it is closer and the same time zone if I have to do phone consultation.

Most of my data analysis and visualization take place off site.

 

Network speed is an inhibitor:   9 responses

I use simple visualization tools such as gnuplot to do quick checks of data. More complex visualization is performed elsewhere. Typically you do not want to attempt to perform complex visualizations on a remote resource at NERSC because of slow internet connectivity. You would not be able to interactively work with the visualization.

Data transfer from NERSC to NREL, Colorado is so slow that I cannot use any visualization software in production level. ...

To improve the speed of network connectivity so that remote visualization will be more convenient.

I'm satisfied with most of the service and hardware and software. But I'm using my account in China mostly. Sometime when I connect the pdsf through ssh, the transfers are so slow that I can't work almost. Can it be improved?

I try to use Mathematica and Maple on DaVinci, but forwarding X-services is quite slow and tedious. Perhaps its my network connection as well, but using X-windows remotely is too slow for me.

Overall, our network connection is too slow to even use Xwindows easily, so I usually just use a dumb terminal window.

Sometime I just want to do simple visualization using tools such as matlab. But the connection is very slow from my pc.

I've noticed that network response for IDLDE (the graphical UI with IDL) is very slow. It's typically been quicker to just copy everything to my local machine and work here. This isn't any great inconvenience for me, since I have most of what I need here.

I use python, pytables to access HDF5 data, then gnuplotpy. I do mostly batch generation of 2D plots, as network connectivity is too poor to do more. Also, the idea of moving data around nersc to get it on the right machine is clumsy.

 

Need more information / training:   8 responses

make some tutorial webpage on the using of visualization software.

We would like to use these services more. Providing more information of the form 'Getting Started with Analyzing your Data on DaVinci, serial queues on Seaborg' would be helpful.

I know that the information is available but I don't have time to spend to learn new software. Then, my position is that there is not enough information available to easily access the software. It might have a tutorial but I am not aware; then having a tutorial with some examples how to use would help to start using such software and machines.

I don't know how to use those softwares, so I have to download to my local computer and use some window softwares.

More frequent on-site user training. The problem is that we do not have the resources to come to NERSC for such training. We love to use NERSC facility for visualization and data analysis.

Not familiar with the NERSC data analysis and visualization resources available to me. It would be helpful to better understand what resources are available to me.

It would be great for the visualization resources to be more visible - eg I don't really know what is available for users. Maybe you should publicize this more?

My basic problem is to get up to speed with what is available. I am reluctant to learn new things when I want to get something achieved. This is my problem and not NERSC. Of the three software problems I have had, the staff has been extremely competent and helpful on two of these. The current problem is still on-going and is something I need to better understand.
I guess to improve services, it would be difficult to identify what would be required. I have gone through the manuals but find it always easier when you talk to a human being. For visualization capabilities, I am unaware if a manual or sample case exists. This would be helpful. I have stored the IBM manual on my desktop to help debug problems and understand system usage. Does a similar capability exist for visualization?

 

Don't use (yet):   8 responses

I have not started to use the NERSC data analysis and visualization resources. But, these are very important and we will begin to realize and utilize these resources as much as possible.

I really should do more with visualization. It is becoming increasingly important.

I have not had time this year to really explore use of DaVinci --- in FY07 I hope to really get to use it.

I am a new user and I have several students using the facilities. We are getting up to speed on the systems and that is taking somewhat longer than we thought. This problem is one that is local.

I don't use the available visualization tools.

I don't use this

I do not use data analysis and visualization on NERSC machines

I do not use those tools.

 

Yes, I can do data analysis, but poor performance:   3 responses

Not sure what 'data analysis' means in this context. 80% of what I do is called 'data analysis' and all is done on PDSF. And mostly fine, except slowness/outages. I use ROOT at PDSF for 'visualization' (making plots).

yes, it is good. But, this year I experienced inefficiency of PDSF more often than last year, i.e., sometimes PDSF is terribly slow.

I mainly use Matlab on Jacquard for data analysis. DaVinci's performance on running Matlab is very poor. I also tried to use Visit for data visualization but somehow the performance in speed is below my expectation.

 

Other:   2 responses

... I only use the IPM module and I m not really satisfied with it since the results are displayed on the net one day later

I usually check the websites before submitting large numbers of jobs. Given that I don't submit jobs on a regular basis, this has been very helpful.