MiniDFT is a plane-wave denstity functional theory (DFT) mini-app for modeling materials. Given an set of atomic coordinates and pseudopotentials, MiniDFT computes self-consistent solutions of the Kohn-Sham equations using either the LDA or PBE exchange-correlation functionals. For each iteration of the self-consistent field cycle, the Fock matrix is constructed and then diagonalized. To build the Fock matrix, Fast Fourier Transforms are used to tranform orbitals from the plane wave basis ( where the kinetic energy is most readily compted ) to real space (where the potential is evaluated ) and back. Davidson diagonalization is used to compute the orbital energies and update the orbital coefficients.
The MiniDFT mini-app was excised from the general-purpose Quantum Espresso (QE) code. Quantum Espresso is licensed per the GNU General Public License (GPL). A copy of the GPL is provided in the distribution's 'License' file.
The MiniDFT distribution presently consists of four top-level directories:
- src: source code and makefiles for MiniDFT
- test: small input files for build validation
- benchmark: input files for performance measurement
- espresso: mirrors the test and benchmark directories, but with QE-compatible input files
The MiniDFT application is hosted on the qe-forge.org website.
MiniDFT-1.04 tar file (from qe-forge site, Updated April 25, 2013)
How to Build miniDFT
Dependencies: MiniDFT requires the ScaLAPACK, BLAS and FFT libraries. MiniDFT calls FFT routines through the FFTW3 interface.
- Define the mini-DFT root directory
> export MDFT_ROOT=/path/to/mini_dft
- Move to the src directory.
> cd $MDFT_ROOT/src
- Configure manually...
a) Select the Makefile.system.compiler that most closely resembles your own.
b) Edit the chosen makefile to your liking.
NB: The first line of the makefile can be uncommented to enable OpenMP support.
c) Create a link to the selected makefile.
- Initiate the build
How to Run
MiniDFT accepts three mutally-compatible command line options.
The -in flag (required) is used to identify the input file read by MiniDFT.
The -ntg flag (optional) enables task-group parallelism to improve the parallel scalingof the FFTs. The number of task groups must be a divisor of the number of MPI ranks. The default value for ntg is 1.
The -ndiag flag (optional) sets the number of MPI ranks used for diagonalizaton.
Ndiag must be a square integer. The default value for Ndiag is the largest square integer less
than half the total number of MPI ranks.
To run the code, do someting like:
mpirun -np 32 ./mini_dft -in Si_333.in -ntg 4 -ndiag 25 > Si_333.out
Move to the test directory. The two jobs in this directory are significantly smaller than the required benchmark runs, and can be used to quickly confirm that your build gives correct results. These jobs can run with any concurrency less than about 70 MPI tasks. The preceeding section, "How to Run", provides more detail about how to run the code.
> cd $MDFT_ROOT/test
The first test simulates a 3x3x3 super-cell for bulk silicon, using the LDA functional and a 30 Ry plane-wave cutoff.
mpirun -np 8 ./mini_dft -in Si_333.in > Si_333.out
The second test is a 2 x 2 x 2 super-cell for TiO2, using the PBE GGA functional and a 100 Ry plane-wave cutoff.
The output from these to runs can be compared to the Quantum Espresso output in the test directory. After the SCF has converged, the final value of the "total energy" should agree with the QE output to within 1e-6 Ry, regardless of the number of MPI tasks used.
diff QE_Si_333.out.ref Si_333.out
diff QE_TiO2_222.out.ref TiO2_222.out
Input decks for for two problem sizes (single-node and large) are provided in the $MDFT_ROOT/benchmark directory.
The single-node input file (single-node.in) performs one SCF cycle for a 3x3x3 super-cell for TiO2, using the PBE GGA functional and a 120 Ry plane wave cutoff. Only one SCF iteration is performed, so the jobs will exit before the SCF cycle converges. Two runs are required:
- MPI-only: Limited-optimized code using only MPI as an execution model is allowed (see RFP documentation). Concurrency and affinity (the number of MPI tasks and their placement are at the discretion of the vendor). The -ndiag and -ntg flags may not be used. Please ensure that OpenMP is turned off for these runs.
- MPI+X: Using an additional parallelism-enabling API (such as OpenMP, etc), and any other possible optimizations (see RFP documentation) the vendor will return the result of a run that minimizes execution time with concurrency (MPI tasks and number of threads) and affinity at the discretion of the vendor. The -ndiag and -ntg flags may also be tuned by the vendor.
The large test case (large.in) performs one SCF cycle for a 10x10x10 super-cell of MgO, using the LDA functional and a 130 Ry plane wave cutoff. Only one SCF iteration is performed, so the jobs will exit before the SCF cycle converges. Two runs are required:
- MPI-only: Same as for the single node test.
- MPI+X: Same rules as for the optimized single node case but for a large problem that spans many nodes. This problem has been tested at 10000 MPI ranks in MPI-only mode
Capability improvement measurements are enabled by increasing the number of k-points used in the large test case. The k-point grid is specified on the last two lines of the file 'large.in':
nk1 nk2 nk3 1 1 1
To increase the number of k-points, adjust the (integer) parameters nk1, nk2 and nk3, which determine the size of the k-point integration grid. The number of k-points increases (roughly) linearly with the product nk1 * nk2 * nk3, though a significant fraction of these points are excluded due to symmetry. Grep for "number of k points" to determine the actual number of k-points used. The increase in capability for the capability improvement calculation is the increase in the number of k-points relative to that used for the large problem (1).
The rules for the capability improvement measurement are the same as the MPI+X case (D.1.b), but require 10000 MPI ranks per k-point. The -npool command line argument should be set to the number of k-points.
Advice to vendors: The number of k-points is printed at the beginning of a MiniDFT run, but cannot be easily counted beforehand to set -npool. A reasonable solution is to initiate a trial-run with -npool=1, determine the number of k-points, cancel the trial-run, and restart with an appropriate value for npool.
The benchmark runs are validated based on the total energy after one SCF cycle. The total energy should agree with the reference value to within 1e-6 Ry. The script validate_minidft.py should be used extract the energies and perform the comparison. The script requires the name of the output file. Example usage:
The time measured by the MiniDFT benchmark excludes initialization and finalization stages. It is labeled "Benchmark_Time" (without quotes) and will be on the final line of output. The Benchmark_Time will be also printed by validate_minidft.py.
When running QE instead of mini-DFT report both times (CPU and WALL) for the line in the output labelled 'electrons.' This time is equivalent to the NERSC_Time that will be reported by mini-DFT.
For the electronic submission, include all the source, makefiles, and a log of the entire build on the target platform. Include all standard output files from executions of the code.
The MiniDFT mini-app was excised from the general-purpose Quantum Espresso http://www.quantum-espresso.org/ (QE) code. QE is an open-source program licensed per the GNU General Public License (GPL). A copy of the GPL is provided in the 'License' file in the MiniDFT distribution.