FranklinQuad Core Upgrade
Quick Start Guide
Access to Franklin
Software
Status & StatsNERSC MOTD Announcements Known Problems Current Queue Look Completed Jobs List Job Stats |
Franklin Quad Core Upgrade Plan
NERSC upgraded Franklin to a quad-core XT4 between July and October 2008. The 2.6 GHz AMD Opteron dual-core compute nodes will be replaced with 2.3 GHz single socket quad-core nodes (Budapest) with improved 128-bit floating point units. The theoretical peak for each cmpute core is 9.2 GFlop/sec (4 flops/cycle). The memory on each node will also double to 8 GB, keeping the same average of 2 GB/core. The new memory speed will be 800 MHz, an improvement over the old 667 MHz chips. The theoretical peak performance of Franklin after the upgrade will be about 356 TFlops/sec. The upgrade will be done in phases in order to have maximum system availability and job throughput. During the transition period all users will have access to the Franklin "production environment," which will be a mixture of dual- and quad-core nodes. A job can be run on either set of nodes, but a single job can not run on a mixture of nodes of differing core size. The production environment will experience brief periods of system unavailability while nodes are migrated into a separate "test environment" system where the hardware will be physically replaced. The test system will have limited access by selected users, who will stress-test the nodes. After a period of testing, those nodes will be integrated back into the production system. Please note that the Franklin inter-node network topology will not be a complete 3D torus during the course of the quad core upgrade. Some applications may experience performance slowdowns and variation depending on job placement. The upgrade schedule is detailed below. The precise dates and times may changed, based on the progression of testing and installation. There is also a 7-day "production stabilization" time between phases built into the upgrade plan. If production problems are encountered under a new configuration, the system can revert to the previous production environment within 7 days until the problem can be resolved. Important Notice for Publishing Performance Results During Quad Core UpgradeBefore the Franklin quad core upgrade is completed and the whole quad core system is officially accepted, Franklin performance data on quad cores will be OK to publish starting from phase 3b AFTER reviewing your results, especially quad core performance penalty, with NERSC. Please write to consult@nersc.gov for publication and presentation purpose. Science results obtained from the quad core nodes are ok to publish, as well as performance (and science) results obtained from the dual core nodes. Franklin Production Environment Summary
Configuration Before the Upgrade
Phase 1: July 15 - Aug 12
User Environment Changes for Phase 1Nothing major has changed except that the total number of available compute nodes has been decreased by 2,304 nodes. Franklin is still a pure dual core system with 14,712 compute cores. Please refer to the following table for the maximum number of nodes and job sizes.
A related change in the queue structure is that the maximum number of available nodes for the reg_xbig queue is decreased accordingly, to 7,128 nodes (14,256 cores). Please refer to the following table for the maximum number of nodes and job sizes. Phase 2: Aug 13 - Sept 9
User Environment Changes for Phase 2During this phase, Franklin production system has both quad core nodes and dual core nodes available for the users. Please refer to the following table for the maximum number of nodes and job sizes.
The default programming environment for this phase is still dual core environment. In other words, users do not need to make any changes to run on the dual core compute nodes. Please read the Important Notice for Publishing Results before you start to run on quad core nodes. To run on the quad core nodes, you must load the "xtpe-quadcore" module specifically and then recompile. This module sets the default quad core programming environment under PGI, Pathscale, or GNU if the corresponding PrgEnv-xxx (where xxx here is pgi, pathscale or gnu) is loaded.
franklin% module load xtpe-quadcore franklin% ftn ... or franklin% cc ... or franklin% CC ... And add the following lines in the job submission script:
#PBS -l feature=quad #PBS -l mppnppn=4 To run on a packed quad core node, please make sure to set both the Torque keyword "#PBS -l mppnppn=" and "aprun -N" option to be 4. The following is an example batch script submitting to the debug queue, requesting 2 quad core nodes using 8 processors total with a 10 minute wall clock limit.
#PBS -q debug #PBS -l feature=quad #PBS -l mppwidth=8 #PBS -l mppnppn=4 #PBS -l walltime=00:10:00 #PBS -j eo cd $PBS_O_WORKDIR aprun -n 8 -N 4 ./a.out
Note that the code compiled for dual core will run on the quad core nodes, however, it will not be taking advantage of the quad core architecture. Code compiled for quad core may or may not run successfully on the dual core nodes depending on whether your code uses any Barcelona optimization. libsci/10.2.1 and gcc/4.2.0.quadcore have been installed on Franklin and set to default. These two modules work for both quad core and dual core environment. The message passing toolkit version xt-mpt/3.0.2 has been installed since phase 2a, and has been set to default version since phase 2b. Users need to recompile to take advantage of the new xt-mpt. Users are encouraged to test code performance on the quad core nodes with mixed MPI/OpenMP applications. A sample job script including the compile line is as follows (using 8 MPI tasks and 4 OpenMP threads per MPI task): #PBS -N jac #PBS -q debug #PBS -l feature=quad #PBS -l mppwidth=8 #PBS -l mppnppn=1 #PBS -l walltime=00:10:00 #PBS -e jacobijob.out #PBS -j eo cd $PBS_O_WORKDIR ftn -o jac -mp=nonuma -Minfo=mp jac-openmp.f setenv OMP_NUM_THREADS 4 time aprun -n 8 -N 1 ./jac
Charging during Phase 2Franklin charging for the quad core nodes during phase 2 is free. The charge factor on the dual core nodes during phase 2 remains 6.5. Phase 3: Sept 10 - Oct 16
User Environment Changes for Phase 3During this phase, the Franklin production system has both quad core and dual core nodes available for users. Please refer to the following table for the maximum number of nodes and job sizes.
The major difference in phase 3 is that the default programming environment is now set to be the quad core environment. This means that the module xtpe-quadcore will be loaded by default and the compiler wrappers will include quad core specific compiler options (-tp barcelona-64) by default. Executables built in this default environment are targeted to run on the quad core nodes, and will not run on dual core nodes. NERSC recommends that codes not explicitly compiled for quad-core be recompiled to run on these nodes. Codes built in the old default environment will run on the quad-core nodes, but probably at lower performance. To compile for the quad core nodes:
franklin% ftn ... or franklin% cc ... or franklin% CC ... Note: If you specify -tp options, such as "-tp amd64e" or "-tp k8-64", in your original dual core Makefiles, please make sure to remove them first in order to compile correctly for the quad core. "#PBS -l feature=quad" is now set by default. Below is a sample job script to run on the quad core nodes:
#PBS -q debug #PBS -l mppwidth=8 #PBS -l feature=quad (this line is optional) #PBS -l mppnppn=4 #PBS -l walltime=00:10:00 #PBS -j eo cd $PBS_O_WORKDIR aprun -n 8 -N 4 ./a.out
The quad core compiled executables will not run on the dual core nodes. To compile for the dual core nodes, issue "module unload xtpe-quadcore" first, then recompile.
To compile for the dual core nodes:
franklin% module unload xtpe-quadcore franklin% ftn ... or franklin% cc ... or franklin% CC ...
To run on the dual core nodes, make sure you have "#PBS -l feature=dual" and "#PBS -l mppnppn=2" lines in the job script. Below is a sample job script to run on the dual core nodes:
#PBS -q debug #PBS -l mppwidth=8 #PBS -l feature=dual #PBS -l mppnppn=2 #PBS -l walltime=00:10:00 #PBS -j eo cd $PBS_O_WORKDIR aprun -n 8 -N 2 ./a.out OpenMP or mixed MPI/OpenMP jobs are still encouraged. Please refer to here for a sample mixed MPI/OpenMP script.
Charging during Phase 3Charging for quad core nodes will start with phase 3, but will be charged at the dual core rate, i.e., wall clock hours x num_nodes_used x 2 cores/node (instead of 4 cores/node here) x 6.5 machine charge factor x queue priority. The above means that effectively the rate per core for jobs that run on quad-core nodes is one-half that on the dual-core nodes. There will be no new allocations from DOE for the remainder of allocation year 2008. The Franklin queue structure will be modified so that basically the min and max cores for each execution queue will be doubled from the current values (see https://www.nersc.gov/nusers/systems/franklin/running_jobs/classes.php). This also means that the entry point for the 50% discount for reg_big will be doubled. The reg_xbig and reg_xblg queues are not available during this phase due to not enough cores. Phase 4: Oct 17 - Oct 28
User Environment Changes for Phase 4The Franklin production system becomes a pure quad core system. There are no more dual core compute nodes available, and the total number of quad core nodes is increased by 4,042 nodes. Please refer to the following table for the maximum number of nodes and job sizes.
To compile for the quad core nodes:
franklin% ftn ... or franklin% cc ... or franklin% CC ...
Below is a sample job script to run on the quad core nodes:
#PBS -q debug #PBS -l mppwidth=8 #PBS -l mppnppn=4 #PBS -l walltime=00:10:00 #PBS -j eo cd $PBS_O_WORKDIR aprun -n 8 -N 4 ./a.out
Charging during Phase 4Charging for quad core nodes in phase 4 remians the same as in phase 3, i.e., are charged at the dual core rate: wall clock hours x num_nodes_used x 2 cores/node (instead of 4 cores/node here) x 6.5 machine charge factor x queue priority. Final Configuration: Oct 29
Franklin quad core upgrade is now completed. Franklin becomes a pure quad core system. Please refer to the following table for the maximum number of nodes and job sizes.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
Page last modified: Mon, 10 Nov 2008 22:21:02 GMT Page URL: http://www.nersc.gov/nusers/systems/franklin/quadcore_upgrade.php Web contact: webmaster@nersc.gov Computing questions: consult@nersc.gov Privacy and Security Notice |
![]() |