Hopper:Improving I/O performance to GSCRATCH and PROJECT
What are GSCRATCH/PROJECT?
GSCRATCH and PROJECT are two file systems at NERSC that one can access on most computational systems. They are both based on the IBM GPFS file system and have multiple racks of dedicated servers and disk arrays.
How are GSCRATCH/PROJECT connected to Hopper?
As shown in the figure below, GSCRATCH and PROJECT are each connected to several Private NSD Servers (PNSD; for details, please refer to GPFS documentation). These PNSDs are connected to the Cray Data Virtualization Service (DVS) nodes via the InfiniBand fabric. The DVS nodes are connected to the Hopper Gemini interconnect that connects all Hopper nodes. Because GPFS file systems can't be natively mounted on Hopper compute nodes, the DVS nodes act as a bridge between the GPFS file systems and the compute nodes. In total there are 14 DVS nodes, shared by PROJECT and GSCRATCH. This is a very important number for performance tuning.
Note that the above description/configuration is very different from the Hopper SCRATCH file systems, which use Lustre. The DVS nodes are not involved in connecting Hopper's $SCRATCH and $SCRATCH2 file systems to the compute nodes.
The DVS_MAXNODES Environment Variable
The DVS_MAXNODES environment variable defines how many DVS nodes to use when accessing a single file on GPFS from a Hopper compute node. For example if DVS_MAXNODES=2, when reading/writing file /project/foo, no matter how many processes are reading/writing the file, at most 2 DVS nodes will be used. If you are reading a different file /project/bar, at most 2 DVS nodes (not necessarily the same 2) will be used. If you are reading /project/foo and /project/bar, at most 4 DVS nodes will be used.
The default value for DVS_MAXNODES is 1, i.e., any single GPFS file will be served by only one DVS node. When you have multiple processes reading and writing to PROJECT/GSCRATCH, this setting might become the performance bottleneck. So if you are reading and writing intensively on PROJECT/GSCRATCH, please set the DVS_MAXNODES variable as below. To set the variable, put in the following line in your PBS script before you launch the executable with aprun.
setenv DVS_MAXNODES N
where N=1...14. If a number larger than 14 is used, it has the same effect as N=1.
File per Process (Set DVS_MAXNODES to 1)
If each process of your program opens and reads/writes an individual file, your program is running in the "File per Process" mode. You will get the best performance in the "File per Process" mode by using the default DVS_MAXNODES=1. In this case, each file is served by a single DVS node and since you are talking with many different files, you will be able to use the bandwidth of all available DVS nodes.
Using POSIX Shared File or MPIIO (Set DVS_MAXNODES to 14)
If multiple processes in your program use POSIX calls (such as Linux fopen, fread, fwrite) to access the same file, your programing is using "Posix Shared File" mode. In this case you need to set DVS_MAXNODES=14. This will use the maximum number of DVS nodes when accessing the same file.
If your program supports it, you can also use MPIIO to read and write files on PROJECT/SCRATCH. Currently we have the default environment variables set for all users so that the MPIIO library on Hopper can efficiently utilize the DVS nodes to get relatively good performance withtout complicated tuning. So if you are on Hopper and using MPIIO, NO EXTRA SETTING is necessary.