NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

Franklin Quad Core Upgrade Plan

PhaseDatesDefault Franklin
Environment
Quad Core Charging
Phase 1 July 15 - Aug 12 (completed) Dual Core No quad core nodes on Franklin.
Phase 2 Aug 12 - Sept 9 (completed) Dual Core No charging for Franklin quad core nodes.
Phase 3 Sept 10 - Oct 16 Quad Core Charging starts for Franklin quad core nodes.
Phase 4 Oct 17 - Oct 28 Quad Core  
Final Configuration Oct 29 and after Quad Core  

NERSC upgraded Franklin to a quad-core XT4 between July and October 2008. The 2.6 GHz AMD Opteron dual-core compute nodes will be replaced with 2.3 GHz single socket quad-core nodes (Budapest) with improved 128-bit floating point units. The theoretical peak for each cmpute core is 9.2 GFlop/sec (4 flops/cycle). The memory on each node will also double to 8 GB, keeping the same average of 2 GB/core. The new memory speed will be 800 MHz, an improvement over the old 667 MHz chips. The theoretical peak performance of Franklin after the upgrade will be about 356 TFlops/sec.

The upgrade will be done in phases in order to have maximum system availability and job throughput. During the transition period all users will have access to the Franklin "production environment," which will be a mixture of dual- and quad-core nodes. A job can be run on either set of nodes, but a single job can not run on a mixture of nodes of differing core size. The production environment will experience brief periods of system unavailability while nodes are migrated into a separate "test environment" system where the hardware will be physically replaced. The test system will have limited access by selected users, who will stress-test the nodes. After a period of testing, those nodes will be integrated back into the production system. Please note that the Franklin inter-node network topology will not be a complete 3D torus during the course of the quad core upgrade. Some applications may experience performance slowdowns and variation depending on job placement.

The upgrade schedule is detailed below. The precise dates and times may changed, based on the progression of testing and installation. There is also a 7-day "production stabilization" time between phases built into the upgrade plan. If production problems are encountered under a new configuration, the system can revert to the previous production environment within 7 days until the problem can be resolved.

Important Notice for Publishing Performance Results During Quad Core Upgrade

Before the Franklin quad core upgrade is completed and the whole quad core system is officially accepted, Franklin performance data on quad cores will be OK to publish starting from phase 3b AFTER reviewing your results, especially quad core performance penalty, with NERSC. Please write to consult@nersc.gov for publication and presentation purpose.

Science results obtained from the quad core nodes are ok to publish, as well as performance (and science) results obtained from the dual core nodes.

Franklin Production Environment Summary

DatesQuad-Core NodesDual-Core NodesTotal Cores
7/14 and before 0 9,660 19,320
7/15 - 8/12 0 7,356 14,712
8/13 - 8/20 1,728 7,932 22,776
8/21 - 9/ 9 1,728 5,052 17,016
9/10 - 9/17 4,588 5,052 28,456
9/18 - 10/16 4,588 1,040 20,392
10/17 - 10/28 8,630 0 34,520
10/29 and after 9,660 0 38,640

Configuration Before the Upgrade

  • Franklin production environment: 17 columns (0-16) dual-core, total of 19,320 cores.

Phase 1: July 15 - Aug 12

  • Franklin production environment: 13 columns (0-11) dual core, total of 14,712 compute cores.
  • Test environment: columns 12,14,16 quad core, column 15 dual core, total of 8,032 compute cores.

User Environment Changes for Phase 1

Nothing major has changed except that the total number of available compute nodes has been decreased by 2,304 nodes. Franklin is still a pure dual core system with 14,712 compute cores. Please refer to the following table for the maximum number of nodes and job sizes.

Available dual core nodes
Duration max dual core nodes max dual core processors
Phase 1 7,356 14,712

A related change in the queue structure is that the maximum number of available nodes for the reg_xbig queue is decreased accordingly, to 7,128 nodes (14,256 cores). Please refer to the following table for the maximum number of nodes and job sizes.

Phase 2: Aug 13 - Sept 9

  • Aug 13, 7am - 9pm
    -- System maintenance and dedicated IO testing. Test system quad core nodes brought back to Franklin.
  • Phase 2a: Aug 13 - Aug 20
    -- Franklin production environment: columns 12,14,16 quad core, columns 0-10,11,13,15 dual core, total of 22,776 compute cores.
  • Aug 21:
    -- Franklin shutdown and reconfigured. cols 2-10 even to be removed for upgrade.
  • Phase 2b: Aug 21 - Sep 9
    -- Franklin production environment: columns 12,14,16 quad core, columns 0,1,3,5,7,9,11,13,15 dual core, total of 17,016 compute cores.
    -- Test environment: columns 2,4,6,8,10 quad core, total of 11,424 compute cores.

User Environment Changes for Phase 2

During this phase, Franklin production system has both quad core nodes and dual core nodes available for the users. Please refer to the following table for the maximum number of nodes and job sizes.

Available quad core and dual core nodes
Duration max quad core nodes max quad core processors max dual core nodes max dual core processors
Phase 2a 1,728 6,912 7,932 15,864
Phase 2b 1,728 6,912 5,052 10,104

The default programming environment for this phase is still dual core environment. In other words, users do not need to make any changes to run on the dual core compute nodes.

Please read the Important Notice for Publishing Results before you start to run on quad core nodes. To run on the quad core nodes, you must load the "xtpe-quadcore" module specifically and then recompile. This module sets the default quad core programming environment under PGI, Pathscale, or GNU if the corresponding PrgEnv-xxx (where xxx here is pgi, pathscale or gnu) is loaded.

 
   franklin% module load xtpe-quadcore
   franklin% ftn ... 
or franklin% cc ... 
or franklin% CC ... 
 

And add the following lines in the job submission script:

#PBS -l feature=quad
#PBS -l mppnppn=4

To run on a packed quad core node, please make sure to set both the Torque keyword "#PBS -l mppnppn=" and "aprun -N" option to be 4. The following is an example batch script submitting to the debug queue, requesting 2 quad core nodes using 8 processors total with a 10 minute wall clock limit.

#PBS -q debug
#PBS -l feature=quad
#PBS -l mppwidth=8
#PBS -l mppnppn=4
#PBS -l walltime=00:10:00
#PBS -j eo
 
cd $PBS_O_WORKDIR
aprun -n 8 -N 4 ./a.out

Note that the code compiled for dual core will run on the quad core nodes, however, it will not be taking advantage of the quad core architecture. Code compiled for quad core may or may not run successfully on the dual core nodes depending on whether your code uses any Barcelona optimization.

libsci/10.2.1 and gcc/4.2.0.quadcore have been installed on Franklin and set to default. These two modules work for both quad core and dual core environment. The message passing toolkit version xt-mpt/3.0.2 has been installed since phase 2a, and has been set to default version since phase 2b. Users need to recompile to take advantage of the new xt-mpt.

Users are encouraged to test code performance on the quad core nodes with mixed MPI/OpenMP applications. A sample job script including the compile line is as follows (using 8 MPI tasks and 4 OpenMP threads per MPI task):

#PBS -N jac
#PBS -q debug
#PBS -l feature=quad
#PBS -l mppwidth=8
#PBS -l mppnppn=1
#PBS -l walltime=00:10:00
#PBS -e jacobijob.out
#PBS -j eo
 
cd $PBS_O_WORKDIR
 
ftn -o jac -mp=nonuma -Minfo=mp jac-openmp.f
 
setenv OMP_NUM_THREADS 4
time aprun -n 8 -N 1 ./jac

Charging during Phase 2

Franklin charging for the quad core nodes during phase 2 is free. The charge factor on the dual core nodes during phase 2 remains 6.5.

Phase 3: Sept 10 - Oct 16

  • Sept 10
    -- System maintenance. Test system quad core nodes brought back to Franklin.
  • Phase 3a: Sept 10 - Sept 17
    -- Franklin production: even columns 2-16 quad core, columns 0, and odd columns 1-15 dual core, total of 28,456 compute cores.
  • Sept 17
    -- Franklin shutdown and reconfigured. odd columns 3-15 removed for upgrade.
  • Phase 3b: Sept 17 - Oct 16
    -- Franklin production environment: even columns 2-16 quad core, columns 0,1 dual core, total of 20,392 compute cores.
    -- Test environment: odd columns 3-15 quad core, total of 16,128 cores.

User Environment Changes for Phase 3

During this phase, the Franklin production system has both quad core and dual core nodes available for users. Please refer to the following table for the maximum number of nodes and job sizes.

Available quad core and dual core nodes
Duration max quad core nodes max quad core processors max dual core nodes max dual core processors
Phase 3a 4,588 18,352 5,052 10,104
Phase 3b 4,588 18,352 1,020 2,040

The major difference in phase 3 is that the default programming environment is now set to be the quad core environment. This means that the module xtpe-quadcore will be loaded by default and the compiler wrappers will include quad core specific compiler options (-tp barcelona-64) by default. Executables built in this default environment are targeted to run on the quad core nodes, and will not run on dual core nodes. NERSC recommends that codes not explicitly compiled for quad-core be recompiled to run on these nodes. Codes built in the old default environment will run on the quad-core nodes, but probably at lower performance.

To compile for the quad core nodes:

 
   franklin% ftn ... 
or franklin% cc ... 
or franklin% CC ... 
 

Note: If you specify -tp options, such as "-tp amd64e" or "-tp k8-64", in your original dual core Makefiles, please make sure to remove them first in order to compile correctly for the quad core.

"#PBS -l feature=quad" is now set by default. Below is a sample job script to run on the quad core nodes:

#PBS -q debug
#PBS -l mppwidth=8

#PBS -l feature=quad (this line is optional)
#PBS -l mppnppn=4
#PBS -l walltime=00:10:00
#PBS -j eo
 
cd $PBS_O_WORKDIR
aprun -n 8 -N 4 ./a.out

The quad core compiled executables will not run on the dual core nodes. To compile for the dual core nodes, issue "module unload xtpe-quadcore" first, then recompile.

To compile for the dual core nodes:

 
   franklin% module unload xtpe-quadcore 
   franklin% ftn ... 
or franklin% cc ... 
or franklin% CC ... 
 

To run on the dual core nodes, make sure you have "#PBS -l feature=dual" and "#PBS -l mppnppn=2" lines in the job script. Below is a sample job script to run on the dual core nodes:

#PBS -q debug
#PBS -l mppwidth=8
#PBS -l feature=dual
#PBS -l mppnppn=2
#PBS -l walltime=00:10:00
#PBS -j eo
 
cd $PBS_O_WORKDIR
aprun -n 8 -N 2 ./a.out

OpenMP or mixed MPI/OpenMP jobs are still encouraged. Please refer to here for a sample mixed MPI/OpenMP script.

Charging during Phase 3

Charging for quad core nodes will start with phase 3, but will be charged at the dual core rate, i.e., wall clock hours x num_nodes_used x 2 cores/node (instead of 4 cores/node here) x 6.5 machine charge factor x queue priority. The above means that effectively the rate per core for jobs that run on quad-core nodes is one-half that on the dual-core nodes. There will be no new allocations from DOE for the remainder of allocation year 2008.

The Franklin queue structure will be modified so that basically the min and max cores for each execution queue will be doubled from the current values (see https://www.nersc.gov/nusers/systems/franklin/running_jobs/classes.php). This also means that the entry point for the 50% discount for reg_big will be doubled. The reg_xbig and reg_xblg queues are not available during this phase due to not enough cores.

Phase 4: Oct 17 - Oct 28

  • Oct 17, 7:30am PDT - Oct 19, 6pm PDT
    -- Franklin will be shut down.
    -- 7 columns of quad core modules from the test system will be merged to the production system.
    -- Modules from columns 14 and 16 will be swapped with those in columns 0 and 1.
  • Franklin production environment: columns 0-13 and 15 quad core, total of 34,032 compute cores.
  • Test environment: columns 14,16 quad core, total of 4,576 compute cores.

User Environment Changes for Phase 4

The Franklin production system becomes a pure quad core system. There are no more dual core compute nodes available, and the total number of quad core nodes is increased by 4,042 nodes. Please refer to the following table for the maximum number of nodes and job sizes.

Available quad core nodes
Duration max quad core nodes max quad core processors
Phase 4 8,630 34,520

To compile for the quad core nodes:

 
   franklin% ftn ... 
or franklin% cc ... 
or franklin% CC ... 
 

Below is a sample job script to run on the quad core nodes:

#PBS -q debug
#PBS -l mppwidth=8
#PBS -l mppnppn=4
#PBS -l walltime=00:10:00
#PBS -j eo
 
cd $PBS_O_WORKDIR
aprun -n 8 -N 4 ./a.out

Charging during Phase 4

Charging for quad core nodes in phase 4 remians the same as in phase 3, i.e., are charged at the dual core rate: wall clock hours x num_nodes_used x 2 cores/node (instead of 4 cores/node here) x 6.5 machine charge factor x queue priority.

Final Configuration: Oct 29

  • Franklin production environment: columns 0-16 quad core, total of 38,640 compute cores.

Franklin quad core upgrade is now completed. Franklin becomes a pure quad core system. Please refer to the following table for the maximum number of nodes and job sizes.

Available quad core nodes
Duration max quad core nodes max quad core processors
Final Configuration 9,660 38,640

LBNL Home
Page last modified: Mon, 10 Nov 2008 22:21:02 GMT
Page URL: http://www.nersc.gov/nusers/resources/franklin/quadcore_upgrade.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science