NERSCPowering Scientific Discovery Since 1974

MILC

Description

The benchmark code MILC represents part of a set of codes written by the MIMD Lattice Computation (MILC) collaboratoration used to study quantum chromodynamics (QCD), the theory of the strong interactions of subatomic physics. It performs simulations of four dimensional SU(3) lattice gauge theory on MIMD parallel machines. "Strong interactions" are responsible for binding quarks into protons and neutrons and holding them all together in the atomic nucleus.

The MILC collaboration has produced application codes to study several different QCD research areas, only one of which, ks_dynamical simulations with conventional dynamical Kogut-Susskind quarks, is used here.

QCD discretizes space and evaluates field variables on sites and links of a regular hypercube lattice in four-dimensional space time. Each link between nearest neighbors in this lattice is associated with a 3-dimensional SU(3) complex matrix for a given field. The version of MILC used here uses matrices ranging in size from 84 to 1284.

Download

TrN8MILC7May30.tar   (Updated May 31, 2013)

How to Build

Note regarding Makefiles. At least two makefiles are involved although separate compiles not required. In the libraries subdirectory the makefile called "Make_opteron" is currently used although other possibilities include Make_RS6K Make_alpha Make_t3e Make_SSE_nasm and Make_vanilla. The makefile used in this subdirectory includes compiler options that affect the libraries only. The C compiler used to create objects in this subdirectory can be a serial (i.e., not MPI) compiler.

NB: Despite the existence of flags enabling OpenMP in Make_opteron, tthere are currently no OpenMP directives in any library routine.

Compiler flags that have been tested as working and providing correct results are listed in section 2 under 'compiler optimization level'. Compilers tested on this version of the code include PGI, Intel, Cray, and GNU.

In the ks_imp_dyn subdirectory the makefile currently used is called "Makefile." You can edit this file to change compiler options (variable "OPT" around line 53). The PRECISION variable should remain 'single'. The makefile used in this subdirectory includes compiler options that affect code in the ks_imp_dyn, generic, and generic_ks subdirectories only. The C compiler used to create objects in this subdirectory must be an MPI-aware compiler (typically something like mpicc, etc.).

Compiler flags which have been tested as working and providing correct results are listed in sections 5 and 6 under 'compiler optimization level'. Compilers tested on this version of the code include PGI, Intel, Cray, and GNU. Other compilers have not been tested to guarantee code execution, performance, or correctness.

NB: OpenMP directives currently exist only in source code in the generic_ks directory (specifically, in the files d_congrad5_fn.c and dslash_fn2.c). Also, note that the inlined SSE instuctions available in MILC have been disabled as they do not always work between different compilers. 

In several subdirectories there is a file called "Make_template" that should not changed.

Building the code involves first building two libraries. The library complex.a contains routines for operations on complex numbers. See `complex.h' for a summary. The library su3.a contains routines for operations on SU3 matrices,3 element complex vectors, and Wilson vectors (12 element complex vectors). See `su3.h' for a summary. None of the library routines involve communication so a sequential compiler, i.e., one not involving MPI "wrapper" scripts, can be used.

To build the code, cd to the ks_imp_dyn subdirectory and type "gmake su3_rmd." This command will build two libraries (complex.1.a and su3.1.a) in the ../libraries subdirectory and the target program in ks_imp_dyn, transferring object files from the ../generic and ../generic_ks subdirectories as needed to ks_imp_dyn.

NB: The module commands in the build script are a simple means of swapping programming environments and, hence, compilers that share a common compiler wrapper. If these module commands are irrelevant for your system, comment them out and edit the appropriate macros in the make files described above.

Typing "make clean" in the ks_imp_dyn subdirectory will eliminate object files and the executable in that directory only (i.e., the libraries will remain unchanged).

If your compiler is not ANSI compliant, try using the gnu C compiler gcc instead. Note that if the library code is compiled with gcc the application directory code must also be compiled with gcc, and vice versa. This is because gcc understands prototypes and some other C compilers don't, and they therefore pass float arguments differently. We recommend gcc. The code can also be compiled using a C++ compiler but it uses no exclusively C++ constructs.

There is no automatic detection of operating system done in the build. 

How to Run

Move the executable to the benchmark_n8 directory.   Invoke the application with syntax similar to the following:

export OMP_NUM_THREADS=2
mpirun -np #mpi_tasks -d #threads su3_rmd < n8_small.in

Required Runs

Input decks and sample batch submission scripts for two problem sizes are provided in the "benchmark_n8" directory: "single_node", and "large". The single node case is designed to have 8x8x8x8 sites per MPI task and is sized to fit on a current node of NERSC's hopper system (24 core/node Cray XE6). The large test case has the same number of sites per core but is sized to fit on 1024 nodes of hopper.

For both the single_node case, two runs are required:

(a) MPI-only. Un-optimized code using only MPI as an execution model is allowed (see RFP documentation). Concurrency and affinity (number of MPI tasks and their placement that produce a minimum execution time are at the discretion of the vendor). NB: OpenMP compilation is on by default in the tested compilers so please ensure that this feature is turned off for these runs.

(b) MPI+X: Using an additional parallelism-enabling API (such as OpenMP, etc), and any other possible optimizations (see RFP documentation) the vendor will return the result of a run that minimizes execution time with concurrency (MPI tasks and number of threads) and affinity at the discretion of the vendor.

 For the large case, two runs are required:

(a) MPI-only. As per (a) for the single node test.

(b) MPI+X: Same rules as for the optimized single node case but for a large problem that spans many nodes. This problem has been tested at 24576 MPI ranks in MPI-only mode

In both of the large case runs, the target concurrency is approxmiately 10000 nodes (see RFP).

To run a case, go to one of the directories listed above. A script is provide called 'runit' that prepares a batch job script for submission and then launches the job. The script takes two input parameters, the number of MPI tasks and the number of threads. Thus, to run with 2 MPI tasks and 6 threads, submit the command

./runit 2 6

Capability Improvement Runs

For the capability improvement runs, you will need to scale up the large problem (which is, itself, a weak scaled version of the small problem). For the 24,576-MPI rank large problem, the size of the four-dimensional space-time lattice is controlled by the following parameters in the input deck:

nx 64
ny 64
nz 128
nt 192

As an example, to weak scale an 8x8x8x8 (nx x ny x ny x nt) problem, one can begin by multiplying nt by 2, then nz, then ny, then nz so that all variables get sized accordingly in a round robin fashion.  As mentioned, the large problem is a weak scaled version of the small problem (16x16x16x24). Decomposing the large problem and following the rule just mentioned, we can see that the last variable to be updated was nz. Thus, to continue scaling the large problem, your next option is to multiply ny by 2, then nx, and then nt, and then nz, and so forth. 

Verification

Use "checkout_n8" to check correctness for all cases. This script prints either
"OK" or "Failed."
Usage: checkout_n8 name_of_output_file

Reporting

For the electronic submission, include all the source and the makefiles used to build on the target platform.  Include all standard output files.

Authorship

This code was developed by the MILC collaboration.

Change Log

  • 05/30/2013
    • Sample output for Capability Improvement test on Hopper added
    • Fixed missing variable declaration in generic_ks/f_meas.c when compiling without OpenMP
  • 02/15/2013
    • README.MILC updated
    • updated test cases, verification script
    • updated source code to include additional OpenMP work
    • added sample outputs in sample_outputs subdirectory
    • fixed single node run script
    • tar file v.0.9.1
  • 12/17/2012
    • Initial tar file MILC-NERSC8.tar