Optimizing the sizes of the files you store in HPSS and minimizing the number of tapes they are on will lead to the most effient use of NERSC HPSS:
- File sizes of about 1 GB or larger will give the best network performance (see graph below)
- Files sizes greater than about 500 GB can be more difficult to work with and lead to longer transfer times. Files larger than 15 TB cannot be uploaded to HPSS.
- Aggregate groups of small files with HTAR (or other aggregation methods such as tar, cpio, etc.). This will reduce the number of tapes that get used.
- If streaming via pipe is unavoidable use PFTP with ALLO64 <bytes> as shown below. This way HPSS will know what the size of the file is and will store it accordingly.
bash-4.0$ pftp archive <<EOF
> quote allo64 7706750976
> put "|tar cf - ./joeuser" /home/j/joeuser/joeuser.tar
In general when you retrieve a file from HPSS the system must mount the tape the file is stored on, move to the beginning of the file on the tape and then read the file. This takes a certain amount of time and for retrieving a lot of files it is worthwhile to minimize the time spent doing this. This can be accomplished by using "hsi ls" with the "-P" option to find out what tape each file is on and where on the tape the file is located. The output can then be sorted such that the transfers are ordered by tape and the position of the file on the tape. An example of a script that does this can be found in the Usage Examples at HSI Tape Ordering Script.