blog Connection error: Unable to connect to **.***.**.***, port 607, socket fd ** <p>This problem has been fixed as of 17:45 PDT on 8/14.</p> <p> </p> <p>Some of the user jobs are running into connection errors on Edison. Cray engineers are investigating this issue. If your job run into this error,</p> <p><em>Unable to connect to **.***.**.***, port 607, socket fd *, Connection timed out</em></p> <p>or</p> <p><em>Unable to connect to **.***.**.***, port 607, socket fd *, No route to host</em></p> <p>Please resubmit. However, please be aware that your job may still run into the same error until the problem is fixed.</p> <p> </p> <p> </p> Wed, 14 Aug 2013 09:47:58 -0700 Please do not use the torque directive mppnppn in your job scrpts <p>As of 8/15 12:30 PDT, the submit filter was updated to allow the #PBS -l nodes directive to be used in the jobs submitted through batch scripts. We are still experimenting with this directive, please report any problem you see with this directive. 8/21/2013</p> <p> </p> <p>Edison has some issues with the explicit use of #PBS -l mppnppn=48  to use Hyper-Threading (HT). Please do not use it. To run with Hyper-Threading, please request needed nodes as you would normally do for a non-HT job, ie., use mppwidth/24 to request the number of nodes you need, and then use the aprun option -j2 to use all the cores on the nodes. So the following job script is appropriate to use 2 nodes and use all 96 logical cores on the two nodes.</p> <pre class="code-basic">#PBS -l mppwidth=48<br/><br/>...<br/><br/>aprun -j 2 -n 96 ./a.out</pre> <p>Please note, the mppnppn is not used explicitly, which means then the default value of mppnppn=24 is used.</p> <p>Or alternatively, you can use the<em> <strong>nodes</strong> </em>directive to request nodes.</p> <pre class="code-basic">qsub -I -l nodes=2 -q debug<br/><br/>...<br/><br/>aprun -j2 -n96 ./a.out</pre> <p> </p> <p> </p> Fri, 02 Aug 2013 09:47:48 -0700