<?xml version="1.0"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title> blog</title>
		<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/</link>
		<atom:link href="http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/" rel="self" type="application/rss+xml" />
		<description></description>

		
		<item>
			<title>Unable to allocate hugepages in running jobs</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/unable-to-allocate-hugepages-in-running-jobs/</link>
			<description>&lt;p&gt; &lt;/p&gt;
&lt;h2&gt;Symptom&lt;/h2&gt;
&lt;p&gt;User job sometimes get an error message similar to the following, usually at the start of a batch job, causing the job to abort:&lt;/p&gt;
&lt;p&gt;MPICH2 ERROR [Rank 7436] [job id 14638087] [Sat Jan 12 04:56:54 2013] [c11-2c1s3n1] [nid04487] - MPIU_nem_gni_get_hugepages(): Unable to mmap 4194304 bytes for file /var/lib/hugetlbfs/global/pagesize-2097152/hugepagefile.MPICH.0.5841.kvs_14638087, err Cannot allocate memory&lt;/p&gt;
&lt;p&gt;This is caused by available huge page memory being not sufficient on one or more of the allocated compute nodes.  The above error happens more often with jobs using the &quot;-ss&quot; option for the aprun command. It is confirmed that the available hugepages are not even among the 4 NUMA nodes on a compute node. &lt;/p&gt;
&lt;h2&gt;Workaround&lt;/h2&gt;
&lt;p&gt;The first workaround is to resubmit your batch job so that it launches on a different set of compute nodes.  We monitor failed jobs and manually reboot the problem nodes. The second workaroud is to not use the &quot;-ss&quot; option in your batch script, it sometimes has negative performance impact especailly for hybrid MPI/OpenMP applications.&lt;/p&gt;
&lt;h2&gt;Status&lt;/h2&gt;
&lt;p&gt;Several bugs regarding hugepages have been opened with Cray. &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Mon, 14 Jan 2013 13:48:29 -0800</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/unable-to-allocate-hugepages-in-running-jobs/</guid>
		</item>
		
		<item>
			<title>Resolved -- Error Message: &quot;ModuleCmd_Switch.c(172):ERROR:152: Module &#39;PrgEnv-­‐cray&#39; is currently not loaded”</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-error-message-modulecmd-switch-c-172-error-152-module-prgenv-cray-is-currently-not-loaded/</link>
			<description>&lt;h2&gt;Symptom:&lt;/h2&gt;
&lt;p&gt;User batch jobs with &quot;#PBS -V&quot; in the script are seeing an error message: &quot;ModuleCmd_Switch.c(172):ERROR:152: Module 'PrgEnv-­‐cray' is currently not loaded”.  This is caused by overwriting the default Cray Programming Environment from Cray to PGI.  This message can also be seen with &quot;script&quot; on the login nodes, and &quot;nodestat&quot; on the MOM nodes.  User jobs without &quot;#PBS -V&quot; do not trigger this error message.  Solution: ignore as harmless.&lt;/p&gt;
&lt;h2&gt;Status:&lt;/h2&gt;
&lt;p&gt;This issue has been resolved by Cray modifying the PrgEnv module load sequences. &lt;br/&gt;&lt;br/&gt;&lt;/p&gt;</description>
			<pubDate>Thu, 28 Jun 2012 10:19:30 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-error-message-modulecmd-switch-c-172-error-152-module-prgenv-cray-is-currently-not-loaded/</guid>
		</item>
		
		<item>
			<title>pgi/12.4.0 has link error with OpenMP when perftools/5.3.x is loaded</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/pgi-12-4-0-has-link-error-with-openmp-when-perftools-5-3-x-is-loaded/</link>
			<description>&lt;p&gt; &lt;/p&gt;
&lt;h2&gt;Symptom:&lt;/h2&gt;
&lt;p&gt;Using the default pgi/12.4.0 compiler, when perftools/5.3.x is loaded, OpenMP codes have linking errors similar to the following: &lt;/p&gt;
&lt;div&gt;
&lt;div&gt;/global/homes/y/yunhe/.craypat/a.out/13554/pgccPJtd7jk8ihft.o: In function `main':&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;/global/u1/y/yunhe/shared/./xthi.c:46: undefined reference to `_mp_trace_parallel_enter'&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;/global/u1/y/yunhe/shared/./xthi.c:52: undefined reference to `_mp_trace_parallel_begin'&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;/global/u1/y/yunhe/shared/./xthi.c:60: undefined reference to `_mp_trace_parallel_end'&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;/global/u1/y/yunhe/shared/./xthi.c:60: undefined reference to `_mp_trace_parallel_exit'&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;/usr/bin/ld: link errors found, deleting executable `a.out'&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;Workaround&lt;/h2&gt;
&lt;p&gt;Use the older pgi/12.2.0 instead for openmp codes when perftools/5.3.x is loaded.&lt;/p&gt;
&lt;p&gt;% module swap pgi pgi/12.2.0&lt;/p&gt;
&lt;p&gt;There is another workaround which will enable you to use pgi 12.4.0 and later versions.&lt;/p&gt;
&lt;p&gt;Create a file called stubs.c which contains this:&lt;/p&gt;
&lt;p&gt;void _mp_trace_loop_chunk_begin () {}&lt;br/&gt;void _mp_trace_loop_chunk_end () {}&lt;br/&gt;void _mp_trace_loop_enter () {}&lt;br/&gt;void _mp_trace_loop_exit () {}&lt;br/&gt;void _mp_trace_master_enter () {}&lt;br/&gt;void _mp_trace_master_exit () {}&lt;br/&gt;void _mp_trace_parallel_begin () {}&lt;br/&gt;void _mp_trace_parallel_end () {}&lt;br/&gt;void _mp_trace_parallel_enter () {}&lt;br/&gt;void _mp_trace_parallel_exit () {}&lt;br/&gt;void _mp_trace_section_begin () {}&lt;br/&gt;void _mp_trace_section_end () {}&lt;br/&gt;void _mp_trace_sections_enter () {}&lt;br/&gt;void _mp_trace_sections_exit () {}&lt;br/&gt;void _mp_trace_single_enter () {}&lt;br/&gt;void _mp_trace_single_exit () {}&lt;br/&gt;void _mp_trace_task_begin () {}&lt;br/&gt;void _mp_trace_task_end () {}&lt;br/&gt;void _mp_trace_task_enter () {}&lt;br/&gt;void _mp_trace_task_exit () {}&lt;br/&gt;void _mp_trace_workshare_begin () {}&lt;br/&gt;void _mp_trace_workshare_end () {}&lt;br/&gt;void _mp_trace_workshare_enter () {}&lt;br/&gt;void _mp_trace_workshare_exit () {}&lt;br/&gt;&lt;br/&gt;Include this file in the command line and the code will compile and link without problems:&lt;/p&gt;
&lt;p&gt;ftn -o omphello -mp=nonuma omphello.f stubs.c&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;
&lt;h2&gt;Status&lt;/h2&gt;
&lt;p&gt;A bug has been opened with Cray.  It is confirmed that the bug has been fixed with pgi/12.5.0.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Thu, 14 Jun 2012 15:04:46 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/pgi-12-4-0-has-link-error-with-openmp-when-perftools-5-3-x-is-loaded/</guid>
		</item>
		
		<item>
			<title>Job dependency broken between xfer and regular jobs</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/job-dependency-broken-between-xfer-and-regular-jobs/</link>
			<description>&lt;p&gt; &lt;/p&gt;
&lt;h2&gt;Symptom:&lt;/h2&gt;
&lt;p&gt;A job in xfer queue that is dependent on another job in the regular queue would still be held in the queue after its dependent job completes. &lt;/p&gt;
&lt;p&gt;(Note: job dependency between regular jobs are still working.)&lt;/p&gt;
&lt;h2&gt;Workaround&lt;/h2&gt;
&lt;p&gt;Users can submit their xfer jobs to the &quot;regualr&quot; queue if the jobs are dependent on other regular jobs.  &lt;/p&gt;
&lt;h2&gt;Status&lt;/h2&gt;
&lt;p&gt;Cray is working with Adaptive Computing on the job dependency issue between the xfer and regular jobs.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Tue, 12 Jun 2012 11:58:04 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/job-dependency-broken-between-xfer-and-regular-jobs/</guid>
		</item>
		
		<item>
			<title>Resolved: Reports of Hanging Jobs on Hopper</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-reports-of-hanging-jobs-on-hopper/</link>
			<description>&lt;h2&gt;Issue:&lt;/h2&gt;
&lt;p&gt;A number of users have reported intermittent large jobs hanging on Hopper.  A job appears to start and then hangs shortly after producing no output.  The job stops when the wall clock limit has been reached.&lt;/p&gt;
&lt;h2&gt;Status:&lt;/h2&gt;
&lt;p&gt;Cray has identified a few bad nodes in the system. After rebooting these nodes, no new hung jobs have been reported since Mar 12. A new xt-mpich2/5.4.4 has been installed and set to default, with a system wide MPI env set so that a job will be aborted if detected being hung.  A kernel patch has been installed on Apr 3 to finally address the issue. &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Thu, 01 Mar 2012 09:56:49 -0800</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-reports-of-hanging-jobs-on-hopper/</guid>
		</item>
		
		<item>
			<title>Resolved -- &quot;cannot find -lhdf5_hl_cpp&quot; compiler error with C++ code using hdf5</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-cannot-find-lhdf5-hl-cpp-compiler-error-with-c-code-using-hdf5/</link>
			<description>&lt;h2&gt;Symptom:&lt;/h2&gt;
&lt;p&gt;After the 1/18 system maintenance, C++ code compilation gets an error if the default hdf5/1.8.5.0 module is loaded: &quot;/usr/bin/ld: cannot find -lhdf5_hl_cpp&quot;. &lt;/p&gt;
&lt;h2&gt;Workaround&lt;/h2&gt;
&lt;p&gt;Users can either do any of the following module swaps and recompile: 1) module swap xt-asyncpe xt-asyncpe/5.01; or 2) module swap hdf5 hdf5/1.8.7; or 3) if the default netcdf/4.1.1.0 is loaded, then do &quot;module swap netcdf netcdf/4.1.3&quot;.&lt;/p&gt;
&lt;h2&gt;Status&lt;/h2&gt;
&lt;p&gt;Problem is resolved by setting newer netcdf/4.1.3 and hdf5/1.8.7 to the default versions. The underlying problem has been reported to Cray.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Tue, 24 Jan 2012 10:00:48 -0800</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-cannot-find-lhdf5-hl-cpp-compiler-error-with-c-code-using-hdf5/</guid>
		</item>
		
		<item>
			<title>&quot;module: command not found&quot; in batch jobs</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/module-command-not-found-in-batch-jobs/</link>
			<description>&lt;h2&gt;Sympotom:&lt;/h2&gt;
&lt;p&gt;Users with csh/tcsh as default login shells will get this error when trying to use bash syntax in the batch scripts.  The following batch script will get the &quot;module: command not found&quot; error at run time.&lt;/p&gt;
&lt;p&gt;#!/bin/bash&lt;br/&gt;#PBS -q debug&lt;br/&gt;#PBS -l mppwidth=24&lt;br/&gt;#PBS -l walltime=00:10:00&lt;br/&gt;cd $PBS_O_WORKDIR&lt;br/&gt;module load acml&lt;/p&gt;
&lt;h2&gt;Workaround:&lt;/h2&gt;
&lt;p&gt;Use PBS keyword to specify the bash shell syntax for the batch script.  The following batch script works successfully:&lt;/p&gt;
&lt;p&gt;#PBS -S /bin/bash&lt;br/&gt; #PBS -q debug&lt;br/&gt; #PBS -l mppwidth=24&lt;br/&gt; #PBS -l walltime=00:10:00&lt;br/&gt; cd $PBS_O_WORKDIR&lt;br/&gt; module load acml&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Fri, 06 Jan 2012 09:08:32 -0800</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/module-command-not-found-in-batch-jobs/</guid>
		</item>
		
		<item>
			<title>Resolved -- Job cannot be executed</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-job-cannot-be-executed/</link>
			<description>&lt;h2&gt;Symptom:&lt;/h2&gt;
&lt;p&gt;The problem happens mostly to interactive batch jobs, but it happens to regular batch jobs as well.  It usually happens when user does ctrl-C at the &quot;qsub -I&quot; command, but it also happens when user does nothing.&lt;/p&gt;
&lt;p&gt;Job just fails, showq can not locate the job (qstat can), then user gets an email similar to:&lt;/p&gt;
&lt;p&gt;PBS Job Id: 1095290.sdb&lt;br/&gt;Job Name: STDIN&lt;br/&gt;Exec host: nid03934/9&lt;br/&gt;Aborted by PBS Server&lt;br/&gt;Job cannot be executed&lt;br/&gt;See Administrator for help&lt;/p&gt;
&lt;p&gt;This is a bug related to the connection timeout between MOM nodes and compute nodes with the Moab scheduler. NERSC is working with Cray and Adaptive Computing to resolve the issue.&lt;/p&gt;
&lt;h2&gt;Workaround:&lt;/h2&gt;
&lt;p&gt;The problem is transient. Resubmit usually works.&lt;/p&gt;
&lt;h2&gt;Status:&lt;/h2&gt;
&lt;p&gt;Problem resolved with an increased value for connection timeout setting in the MOM node configuration.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Tue, 03 Jan 2012 11:25:53 -0800</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-job-cannot-be-executed/</guid>
		</item>
		
		<item>
			<title>Pathscale/4.0.9 not compatible with default libraries</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/pathscale-4-0-9-not-compatible-with-default-libraries/</link>
			<description>&lt;h2&gt;Symptom:&lt;/h2&gt;
&lt;p&gt;New xt-libsci/11.x.x does not have pathscale support. Other default libraries (netcdf, hdf5, fftw, petsc, etc.) have no pathscale/4.0.9 support.&lt;/p&gt;
&lt;h2&gt;Workaround:&lt;/h2&gt;
&lt;p&gt;-- The pathscale/4.0.9 module swaps the current default (xt-libsci/11.0.01) to xt-libsci/10.5.01 (no action from users needed).&lt;/p&gt;
&lt;p&gt;-- To use pathscale/3.2.99, users need to swap xt-libsci to xt-libsci/10.5.01 first.&lt;/p&gt;
&lt;p&gt;-- User codes need hdf5, netcdf, fftw, petsc, etc. will have to stick with pathscale/3.2.99 version for now.&lt;/p&gt;
&lt;h2&gt;Status:&lt;/h2&gt;
&lt;p&gt;NERSC is working with Cray to provide pathscale/4.0.9 support for these libraries.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Fri, 14 Oct 2011 10:45:48 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/pathscale-4-0-9-not-compatible-with-default-libraries/</guid>
		</item>
		
		<item>
			<title>Resolved -- Linking error when mixing C++ and Fortran using PGI under xt-asyncpe/4.9</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/linking-error-when-mixing-c-and-fortran-using-pgi-under-xt-asyncpe-4-9/</link>
			<description>&lt;h2&gt;Symptom&lt;/h2&gt;
&lt;p&gt;Mixed C++ and Fortran codes had the linking error using PGI under xt-asyncpe:&lt;/p&gt;
&lt;pre class=&quot;code-plain&quot;&gt;In function `std::uncaught_exception(void)': eh_util.c:(.text+0x317): undefined reference to `__zceh_uncaught_exception'&lt;/pre&gt;
&lt;p&gt;The linking works under xt-asyncpe/4.8 for pgi wrapper. It works with gnu compiler wrapper, and pgi native compiler under xt-asyncpe/4.9.&lt;/p&gt;
&lt;p&gt;The workaround is to either use old xt-asyncpe/4.8, or still use xt-asyncpe/4.9, and use &quot;ftn -v&quot; to find out the exact link line, then remove &quot;-lstdc++&quot; from it and relink.&lt;/p&gt;
&lt;h2&gt;Status&lt;/h2&gt;
&lt;p&gt;Problem resolved with the new xt-asyncpe/5.01 installed and set to default on Hopper as of 09/24/11.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Fri, 01 Jul 2011 13:49:37 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/linking-error-when-mixing-c-and-fortran-using-pgi-under-xt-asyncpe-4-9/</guid>
		</item>
		
		<item>
			<title>Resolved -- Incorrect path with bash under csh</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-incorrect-path-with-bash-under-csh/</link>
			<description>&lt;h2&gt;Symptom:&lt;/h2&gt;
&lt;p&gt;With default csh or tcsh shell, entering bash (as login or non-login) messes up the path, causing qsub or ftn command either not found or uses wrong path.  Unloadind and loading the PrgEnv-xxx module again may partially address the problem.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;pre class=&quot;code-plain&quot;&gt;Under csh:&lt;br/&gt;% echo $PATH&lt;br/&gt;/usr/common/usgbin:...&lt;br/&gt;% which qsub&lt;br/&gt;/usr/common/nsg/bin/qsub&lt;br/&gt;% which ftn&lt;br/&gt;/opt/cray/xt-asyncpe/4.7/bin/ftn&lt;br/&gt;&lt;br/&gt;Now enter bash:&lt;br/&gt;% bash&lt;br/&gt;..&lt;br/&gt;% which qsub&lt;br/&gt;/opt/torque/2.4.8-201004261413/bin/qsub&lt;br/&gt;% which ftn&lt;br/&gt;no ftn in ... &lt;br/&gt;% module unload PrgEnv-pgi &lt;br/&gt;% module load PrgEnv-pgi&lt;br/&gt;% which qsub&lt;br/&gt;/opt/torque/2.4.8-201004261413/bin/qsub&lt;br/&gt;% which ftn&lt;br/&gt;/opt/cray/xt-asyncpe/4.7/bin/ftn&lt;br/&gt;% exit&lt;br/&gt;&lt;br/&gt;Now enter bsh as a login shell:&lt;br/&gt;% bash -l&lt;br/&gt;% which qsub&lt;br/&gt;/usr/common/nsg/bin/qsub&lt;br/&gt;% which ftn&lt;br/&gt;no ftn in ..&lt;br/&gt;% module unload PrgEnv-pgi &lt;br/&gt;% module load PrgEnv-pgi&lt;br/&gt;% which qsub&lt;br/&gt;/usr/common/nsg/bin/qsub&lt;br/&gt;% which ftn&lt;br/&gt;/opt/cray/xt-asyncpe/4.7/bin/ftn&lt;br/&gt;% exit&lt;/pre&gt;
&lt;p&gt;There are similar problems for using #!/bin/bash in your batch script if your default shell is csh/tcsh. &lt;/p&gt;
&lt;pre class=&quot;code-plain&quot;&gt;For a csh/tcsh user, running the following batch script generates error about&lt;br/&gt;&quot;module command not found&quot; due to the module command in not in the default path.&lt;br/&gt;&lt;br/&gt;#!/bin/bash&lt;br/&gt;#PBS -l mppwidth=4&lt;br/&gt;#PBS -l walltime=8:00&lt;br/&gt;#PBS -V&lt;br/&gt;cd $PBS_O_WORKDIR&lt;br/&gt;module list&lt;br/&gt;&lt;br/&gt;While the following script runs well:&lt;br/&gt;&lt;br/&gt;#PBS -S /bin/bash&lt;br/&gt;#PBS -l mppwidth=4&lt;br/&gt;#PBS -l walltime=8:00&lt;br/&gt;#PBS -V&lt;br/&gt;cd $PBS_O_WORKDIR&lt;br/&gt;module list&lt;/pre&gt;
&lt;h2&gt;Workaround&lt;/h2&gt;
&lt;p&gt;Change your default shell to bash from NIM interface to use bash directly.  Avoid invoking bash under csh/tcsh.&lt;/p&gt;
&lt;p&gt;For batch script, use&quot;#PBS -S /bin/bash&quot; instead of &quot;#!/bin/bash&quot;&lt;/p&gt;
&lt;h2&gt;Status:&lt;/h2&gt;
&lt;p&gt;This bug has been resolved as of 9/14/2011.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Fri, 01 Jul 2011 13:38:43 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-incorrect-path-with-bash-under-csh/</guid>
		</item>
		
		<item>
			<title>Variation in application runtime</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/variation-in-application-runtime/</link>
			<description>&lt;p&gt;Some users have reported variation in application runtime while Hopper is running in production mode with many applications running concurrently.  NERSC also sees runtime variability in some benchmarking codes.  NERSC is exploring the causes of runtime variability hypothesizing it could be caused by the placement of an application across the network,  contention for resources or a from a particular mix of jobs on the system.  &lt;/p&gt;
&lt;p&gt;We will put our variability numbers online and analyze some of the hypotheses described above and report back at the next NUG call.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Thu, 09 Jun 2011 14:50:29 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/variation-in-application-runtime/</guid>
		</item>
		
		<item>
			<title>&quot;Unable to open kgni version file /sys/class/gemini/kgni0/version&quot; error</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/unable-to-open-kgni-version-file-sys-class-gemini-kgni0-version-error/</link>
			<description>&lt;h2&gt;Symptom:&lt;/h2&gt;
&lt;p&gt;Dynamic executables built with compiler wrappers running directly on the external login nodes are getting the following error message:&lt;/p&gt;
&lt;pre class=&quot;code-shell&quot;&gt;% ftn -dynamic -o testf testf.f&lt;br/&gt;% ./testf &lt;br/&gt;./testf: /opt/pgi/10.9.0/linux86-64/10.9/libso/libnuma.so.1: no version information available (required by ./testf) &lt;br/&gt;Unable to open kdreg version file: No such file or directory &lt;br/&gt;Warning: Unable to open kgni version file /sys/class/gemini/kgni0/version errno 2 at line 599 in file cdm.c &lt;br/&gt;LIBDMAPP ERROR: Unable to open kgni version file /sys/class/gemini/kgni0/version errno 2 &lt;br/&gt;&lt;br/&gt;aborting job: &lt;br/&gt;LIBDMAPP ERROR: Unable to open kgni version file &lt;br/&gt;Aborted&lt;/pre&gt;
&lt;h2&gt;Workaround:&lt;/h2&gt;
&lt;p&gt;Please use the native compilers such as pgf90, pathcc, etc instead of the Cray compiler wrappers ftn, cc, or CC to build the executable. It will then run successfully on the external login nodes.&lt;/p&gt;
&lt;pre class=&quot;code-shell&quot;&gt;% pgf90 -o testf testf.f&lt;br/&gt;% file testf&lt;br/&gt;testf: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), for GNU/Linux 2.6.4, dynamically linked (uses shared libs), not stripped&lt;br/&gt;% ./testf &lt;br/&gt;Test ok&lt;br/&gt; &lt;br/&gt;&lt;/pre&gt;
&lt;p&gt;Dynamic executables compiled with compiler wrappers could also run successfully on the MOM nodes (via &quot;qsub -I -V -lmppwidth=24 -q interactive&quot;).  However, only short jobs (a few minutes) are appropriate to run on MOM node since these are shared resources.&lt;/p&gt;
&lt;pre class=&quot;code-shell&quot;&gt;% qsub -I -V -q interactive -lmppwidth=24&lt;br/&gt;... wait for a new session ...&lt;br/&gt;&lt;br/&gt;% cd $PBS_O_WORKDIR&lt;br/&gt;% ftn -dynamic -o testf testf.f&lt;br/&gt;% ./testf &lt;br/&gt;Test ok&lt;/pre&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Wed, 13 Apr 2011 10:15:26 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/unable-to-open-kgni-version-file-sys-class-gemini-kgni0-version-error/</guid>
		</item>
		
		<item>
			<title>Resolved -- Default version not shown in &quot;module avail module_name&quot; command</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-default-version-not-shown-in-module-avail-module-name-command/</link>
			<description>&lt;h2&gt;Symptom:&lt;/h2&gt;
&lt;p&gt;The default software version is not shown when &quot;module avail module_name&quot; is issued.  For example:&lt;/p&gt;
&lt;pre class=&quot;code-shell&quot;&gt;% module avail pgi&lt;br/&gt;&lt;br/&gt;---------------------------------------------- /opt/modulefiles -----------------------------------------------&lt;br/&gt;pgi/10.9.0 pgi/11.0.0 pgi/11.1.0 pgi/11.2.0 pgi/9.0.4&lt;/pre&gt;
&lt;h2&gt;Workaround:&lt;/h2&gt;
&lt;p&gt;Issue the command &quot;module avail&quot; without the specific module name, the output gives the complete list of available modules, including the default version of each module. For example:&lt;/p&gt;
&lt;pre class=&quot;code-shell&quot;&gt;% module avail
&lt;br/&gt;&amp;lt;snippet&amp;gt;&lt;br/&gt; &lt;p&gt;PrgEnv-pathscale/3.1.49                   pgi/10.9.0(default)&lt;br/&gt;PrgEnv-pathscale/3.1.49A                  pgi/11.0.0&lt;br/&gt;PrgEnv-pathscale/3.1.61(default)          pgi/11.1.0&lt;br/&gt;PrgEnv-pgi/3.1.27A                        pgi/11.2.0&lt;br/&gt;PrgEnv-pgi/3.1.35                         pgi/9.0.4 &lt;/p&gt;&lt;p&gt;&amp;lt;snippet&amp;gt;
&lt;/p&gt;&lt;/pre&gt;
&lt;h2&gt;Status:&lt;/h2&gt;
&lt;p&gt;This problem has been fixed on June 28, 2011. &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Wed, 13 Apr 2011 08:06:25 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-default-version-not-shown-in-module-avail-module-name-command/</guid>
		</item>
		
		<item>
			<title>Job dependencies do not work as expected. Require additional string in job name</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/job-dependencies-do-not-work-as-expected-require-additional-string-in-job-name/</link>
			<description>&lt;p&gt;Symptom: submitting jobs which depend on other jobs does not work.&lt;/p&gt;
&lt;p&gt;Work Around: An additional string needs to be added to the jobid @sdb.  We are working to resolve this issue.  See the example below.  Submit &quot;jobA&quot; and &quot;jobB&quot; will run after &quot;jobA&quot; has completed.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;div&gt;
&lt;pre class=&quot;code-basic&quot;&gt;&amp;gt;&amp;gt; cat dependency_jobA.scr&lt;br/&gt;#!/bin/csh&lt;br/&gt;#PBS -q debug&lt;br/&gt;#PBS -l mppwidth=1&lt;br/&gt;#PBS -l walltime=00:05:00&lt;br/&gt;#PBS -N jobA&lt;br/&gt;#PBS -j oe&lt;br/&gt;cd $PBS_O_WORKDIR&lt;br/&gt;&lt;br/&gt;echo &quot;submitting jobB&quot;&lt;br/&gt;&lt;br/&gt;qsub -W depend=afterany:${PBS_JOBID}@sdb dependency_jobB.scr&lt;br/&gt;sleep 60&lt;br/&gt;echo &quot;hi from jobA, ${PBS_JOBID}&quot;&lt;br/&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt; &lt;/p&gt;
&lt;div&gt;
&lt;pre class=&quot;code-basic&quot;&gt;&amp;gt;&amp;gt; cat dependency_jobB.scr&lt;br/&gt;#!/bin/csh&lt;br/&gt;#PBS -q debug&lt;br/&gt;#PBS -l mppwidth=1&lt;br/&gt;#PBS -l walltime=00:02:00&lt;br/&gt;#PBS -N jobB&lt;br/&gt;#PBS -j oe&lt;br/&gt;cd $PBS_O_WORKDIR&lt;br/&gt;echo &quot;hi from jobB, ${PBS_JOBID}&quot;&lt;br/&gt;&lt;/pre&gt;
&lt;/div&gt;</description>
			<pubDate>Fri, 08 Apr 2011 15:53:17 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/job-dependencies-do-not-work-as-expected-require-additional-string-in-job-name/</guid>
		</item>
		
		<item>
			<title>DVS scalability issue on GPFS file system when reading an initialization file with a small IO size</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/dvs-scalability-issue-on-gpfs-file-system-when-reading-an-initialization-file-with-a-small-io-size/</link>
			<description>&lt;h2&gt;Description:&lt;/h2&gt;
&lt;p&gt;It has been observed by some users that it takes a very long time at job start up for the initial IO when the input files are in GPFS file systems, such as /home, /project, or /global/scratch.  The IO time for files in Lustre /scratch or /scratch2 are very fast. This is a DVS bug related to GFPS file systems.&lt;/p&gt;
&lt;p&gt;Here are some of our testing results: for a 4KB IO size, the time it took 32 nodes on /global/scratch took over 11 min, while it took only 7 seconds on /scratch. Changing the file IO size to 4MB, it still took about 100 sec on /global/scratch.&lt;/p&gt;
&lt;h2&gt;Status:&lt;/h2&gt;
&lt;p&gt;Cray and NERSC are looking into DVS performance tuning.&lt;/p&gt;
&lt;h2&gt;Recommendation:&lt;/h2&gt;
&lt;p&gt;Put application IO initial/startup files on /scratch or /scratch2 Lustre files systems, instead of GPFS file systems.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Tue, 05 Apr 2011 10:14:07 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/dvs-scalability-issue-on-gpfs-file-system-when-reading-an-initialization-file-with-a-small-io-size/</guid>
		</item>
		
		<item>
			<title>Resolved: some OpenMP flags ignored in PGI C/C++ compiler</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-some-openmp-flags-ignored-in-pgi-c-c-compiler/</link>
			<description>&lt;h2&gt;Description:&lt;/h2&gt;
&lt;p&gt;OpenMP flags other than -mp=nonuma are ignored with the PGI C and C++ wrapper on Hopper.  The PGI Fortran wrapper behaves correctly.&lt;/p&gt;
&lt;h2&gt;Status:&lt;/h2&gt;
&lt;p&gt;This problem has been fixed with the xt-asyncpe/4.8 and later versions.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Tue, 29 Mar 2011 12:16:27 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-some-openmp-flags-ignored-in-pgi-c-c-compiler/</guid>
		</item>
		
		<item>
			<title>&quot;gni_pub.h&quot; not found in compilation</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/gni-pub-h-not-found-in-compilation/</link>
			<description>&lt;h2&gt;Description:&lt;/h2&gt;
&lt;p&gt;After the OS upgrade to CLE3.1UP03, codes using &quot;gni_pub.h&quot; are getting the &quot;gni_pub.h not found&quot; error at compile time.  The workaround is to specify the detailed location of this include file in the CFLAGS or FCFLAGS. For example:  set CFLAGS = &quot;$CFLAGS -I/opt/cray/gni-headers/default/include&quot;.&lt;/p&gt;
&lt;h2&gt;Status:&lt;/h2&gt;
&lt;p&gt;Problem has been reported to Cray.&lt;/p&gt;</description>
			<pubDate>Tue, 29 Mar 2011 11:50:24 -0700</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/gni-pub-h-not-found-in-compilation/</guid>
		</item>
		
		<item>
			<title>Resolved - Segfaults when mixing C/C++ and Fortran under PrgEnv-gnu</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-segfaults-when-mixing-c-c-and-fortran-under-prgenv-gnu/</link>
			<description>&lt;h2&gt;Description&lt;/h2&gt;
&lt;p&gt;If a code has C/C++ and Fortran routines, it does I/O in a Fortran code and it is built with the GNU compiler, the built executable can generate a segfault during runtime with the following error message:&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;pre style=&quot;padding-left: 30px;&quot;&gt;[NID 03978] 2011-02-12 11:10:53 Apid 599991: initiated application termination&lt;/pre&gt;
&lt;pre style=&quot;padding-left: 30px;&quot;&gt;Application 59991 exit signals: Segmentation fault, Killed&lt;/pre&gt;
&lt;pre style=&quot;padding-left: 30px;&quot;&gt;Application 599991 resources: utime ~1s, stime ~0s&lt;br/&gt;&lt;/pre&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;This happens with xt-asyncpe/4.7. Cray has provided the following workaround:&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;% CC -c main.cpp&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;% ftn -c sub.f&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;% CC main.o sub.o -u pthread_mutex_trylock -u pthread_mutex_destroy -u pthread_create&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;or&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;% CC -dynamic main.o sub.o&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;h2&gt;Status&lt;/h2&gt;
&lt;p&gt;The bug is fixed in xt-asyncpe/4.9 which was released on March 17, 2011 and installed as the default version on March 24, 2011. The workaround doesn't need to be applied any more.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;</description>
			<pubDate>Thu, 24 Feb 2011 10:36:35 -0800</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-segfaults-when-mixing-c-c-and-fortran-under-prgenv-gnu/</guid>
		</item>
		
		<item>
			<title>Resolved - Trouble scheduling jobs on large memory nodes</title>
			<link>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-trouble-scheduling-jobs-on-large-memory-nodes/</link>
			<description>&lt;p&gt;We are having some trouble scheduling jobs on the large memory nodes.  This weekend jobs requesting large memory nodes will be held until we can look at the issue more closely next week.&lt;/p&gt;</description>
			<pubDate>Sat, 05 Feb 2011 09:34:19 -0800</pubDate>
			
			
			<guid>http://www.nersc.gov/users/computational-systems/hopper/updates-and-status/open-issues/resolved-trouble-scheduling-jobs-on-large-memory-nodes/</guid>
		</item>
		

	</channel>
</rss>