NERSCPowering Scientific Discovery Since 1974

New Users Guide

HPSS stands for High Performance Storage System and is the general term for software that can be used to store data on robotic tape libraries. At NERSC the primary HPSS system for data storage is called "HPSS User" or just "HPSS" and is accessed at archive.nersc.gov. By default every NERSC user has an account on the HPSS system. There is also another, smaller HPSS system at NERSC called "HPSS Backup" (or "regent") that is used primarily by NERSC staff for system backups and occasionally by users in special cases.

Accessing HPSS

You can access HPSS from any NERSC system. Inside of NERSC, files can be archived to HPSS individually with the "hsi" command or in groups with the "htar" command (similar to the way "tar" works). HPSS is also accessible via ftp, pftp, gridFTP, and Globus. Please see the Accessing HPSS page for a list of all possible way to access HPSS and details on their use.

HPSS uses NIM  to create an "hpss token" for user authentication. On a NERSC system, typing "hsi" or "htar" will usually be enough to create this token. If you are access HPSS remotely (using ftp, pftp, or gridFTP), you may need to manually generate a token. Please see the HPSS Passwords page for more details.

Best Practices

HPSS is intended for long term storage of data that is not frequently accessed.

The best guide for how files should be stored in HPSS is how you might want to retrieve them. If you are backing up against accidental directory deletion / failure, then you would want to store your files in a structure where you use htar to separately bundle up each directory. On the other hand, if you are archiving data files, you might want to bundle things up according to month the data was taken or what detector run characteristics, etc. The optimal size for htar bundles is 100 - 500 GBs, so you may need to do several htar bundles for each set depending on the size of the data.

Group Small Files Together

HPSS is optimized for file sizes of 100 - 500 GB. If you need to store many files smaller than this, please use htar to bundle them together before archiving. HPSS is a tape system and responds differently than a typical file system. If you upload large numbers of small files they will be spread across dozens or hundreds of tapes, requiring multiple loads into tape drives and positioning the tape.  Storing many small files in HPSS without bundling them together will result in extremely long retrieval times for these files and will slow down the HPSS system for all users.

Please see the Htar Usage page for more details on how to use htar.

Very Large Files

Files sizes greater than 1 TB can be difficult for HPSS to work with and lead to longer transfer times, increasing the possibility of transfer interruptions. Generally it's best to aim for file sizes in the 100 - 500 GB range. You can use "tar" and "split" to break up large aggregates or large files into 500 GB sized chunks:

tar cvf - myfiles* | split -d --bytes=500G - my_output_tarname.tar.

This will generate a number of files with names like "my_output_tarname.tar.00", "my_output_tarname.tar.01", which you can use "hsi put" to archive into HPSS. When you retrieve these files, you can recombine them with

cat my_output_tarname.tar.* | tar xvf -

Accessing HPSS Data Remotely

We recommend a two-stage process to move data to / from HPSS and a remote site. Use globus to transfer the data between NERSC and the remote site (your scratch directory would make a useful temporary staging point) and use hsi or htar to move the data into HPSS.

When connecting with HPSS via ftp or pftp, it is not uncommon to encounter problems due to firewalls at the client site. Often you will have to configure your client firewall to allow connections to HPSS. See the HPSS firewall page for more details.

HPSS Usage Charging

In order to provide a balanced computing environment with appropriate amounts of storage and adequate bandwidth to keep the compute engines fed with data, HPSS usage is tracked using Storage Resource Units (SRUs). SRUs are reported and managed through the NERSC Information Management (NIM) system. For details on how the SRUs are calculated and managed please see the HPSS Charging page.

Troubleshooting and Further Questions

Some of the more common issues encountered by users accessing HPSS are described here. If you run into any issues or have any questions, please don't hesitate to ask for help.