I/O Benchmarking Details
These benchmarks are simply the transfer rate for copying some files from an eliza file system to $TMPDIR on a batch node. Each transfer is done as part of its own batch job running in the debug queue. These batch jobs are run sequentially, i.e., only one job runs at a time. The measured metric (the transfer rate) is calculated from the transfer time (not the wall time of the batch job) and the size of the files transferred. The same set of files is used for each eliza.
Note that the size of the transfer is the product of the transfer rate and the metric (transfer time) as shown on the detailed page for each eliza. Also, under the "notes" column on the detailed page is shown the number of jobs running on PDSF and the number of io units being used for that eliza at the time of the transfer.
Although the elizas all have a 10Gb network connection the batch nodes have 1Gb connections so these transfers will have a theoretical maximum rate of 1Gb/s (125MB/s). In practice factors such as network sharing, overhead, etc., will make this unattainable but rates of 75 to 100 are typical for an eliza that is not heavily loaded. Slower rates are indicative of other factors degrading the performance such as many batch jobs accessing that eliza or other data management tasks being done. Reading and writing at the same time in particular can load down the eliza in question, and being very close to the quota on the volume will also make it difficult for GPFS to manage the space.
If you encounter poor performance there are a number of things to consider:
- Get the total number of elizaio units for the eliza in question reduced. This will lower throughput but your jobs might run more efficiently.
- Clear out some space if you are within 5% of the quota.
- If you have some dataset that gets very heavy use consider duplicating it in another location or "striping" it across multiple elizas.


