NERSCPowering Scientific Discovery Since 1974

STAT

Description

STAT (the Stack Trace Analysis Tool) is a highly scalable, lightweight tool that gathers and merges stack traces from all of the processes of a parallel application.  After running the STAT command, STAT will create a STAT_results directory in your  current working  directory.   This directory will contain a subdirectory, based on your parallel application's executable name, with the merged stack traces in DOT format.

How to Use STAT

STAT needs to be run on a MOM (batch management) node.  The steps are:

1. Find the MOM node your job is on:
% qstat -f jobid |grep exec_host

For example:
% qstat -f 1356655 | grep exec_host
exec_host = nid05417/6

2. Login to any MOM node first, then login to the MOM node your job is on:

For example:
% ssh hmom05 (or any MOM node from hmom01 to hmom23)
% ssh nid05417

Now you should be on the MOM node nid05417, where your job is on.

3. Load the module stat
% module load stat

4. Create a directory to store the stat info on $SCRATCH2
% cd $SCRATCH2
% mkdir stat_info_job_jobid
% cd stat_info_job_jobid

For example:
% cd $SCRATCH2
% mkdir stat_info_job_1356655
% cd stat_info_job_1356655

5. Find the process id of the aprun command for your batch job
% ps -aux |grep my_login_name

For example:
% ps -aux |grep yyyy
Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
yyyy 24252 0.0 0.0 12968 1772 ? Ss 12:25 0:00 -bash
yyyy 24519 0.0 0.0 14668 1616 ? S 12:25 0:00 /bin/sh /var/spool/torque/mom_priv/jobs/1369726.sdb.SC
yyyy 27045 0.0 0.0 14100 2072 ? S 13:05 0:00 aprun -n 1400 ./my_executable
yyyy 28568 0.0 0.0 5536 876 pts/0 S+ 13:24 0:00 grep yyyy

So you see that the PID for the aprun command is 27045.

6. Obtain the stat info for this process
% STAT 27045
Attaching to job launcher and launching tool daemons...
....
Results written to /scratch2/scratch/...

7. Then on a regular login node, the command "STATview" can be used to visualize the generated *.dot files for stack backtrace information. X11 needs to be enabled to have the GUI interface.

% module load stat
% STATview [file.dot ....]

Avalability

PackagePlatformCategoryVersionModuleInstall DateDate Made Default
stat edison libraries/ debugging 1.2.1.3 stat/1.2.1.3 2012-12-05
stat hopper applications/ general 1.2.1.1 stat/1.2.1.1 2012-01-17 2012-02-17
stat hopper applications/ general 1.2.1.3 stat/1.2.1.3 2012-11-29 2013-02-27