NERSCPowering Scientific Discovery Since 1974

HTAR Usage

HTAR is a command line utility that creates and manipulates HPSS-resident tar-format archive files.  It is ideal for storing groups of files in HPSS.  Since the tar file is created directly in HPSS, it is generally faster and uses less local space than creating a local tar file then storing that into HPSS.  Furthermore, HTAR creates an index file that (by default) is stored along with the archive in HPSS.  This allows you to list the contents of an archive without retrieving it to local storage first.

HTAR is installed and maintained on all NERSC production systems and can be installed on remote workstations and servers by downloading pre-compiled binaries from the NERSC Software Downloads page.  Note also, that if you need to access the member files of an HTAR archive from a system that does not have the HTAR utility installed, you can retrieve the tar file to a local file system and extract the member files using the local tar utility.

Examples of when to use HTAR

HTAR is useful for storing groups of related files that you will probably want to access as a group in the future.  Examples include:

  • archiving a source code directory tree
  • archiving output files from a code simulation run
  • archiving files generated by the run of an experiment

If stored individually, the files will likely be distributed across a collection of tapes, requiring possibly long delays (due to multiple tape mounts) when fetching them from HPSS.  On the other hand, an HTAR archive file will likely be stored on a single tape, requiring only a single tape mount when it comes time to retrieve the data.

 

Accessing NERSC Systems with HTAR

The HTAR utility installed on the NERSC production system (as well as the versions from the DOWNLOAD_PAGE) will, by default, connect to the NERSC Archive HPSS system.  So, for example, to store all the files in the local directory named nova into an HTAR archive named nova.tar in your HPSS home directory, type:

% htar -cf nova.tar nova

This will also create the 'index file' nova.tar.idx and store it in the same HPSS directory.

To access the NERSC Backup system, named Hpss (not all users have access to this system), you will have to specify the hostname specifically, using the '-H' option:

% htar -H server=hpss.nersc.gov -cf nova.tar nova

Optionally, you can set the HPSS_SERVER_HOST environmental varable and then run the HTAR command without the '-H' option:

% env HPSS_SERVER_HOST=hpss.nersc.gov  htar -cf nova.tar nova

To specify what HPSS username to use set the HPSS_PRINCIPAL environmental variable:

% env HPSS_PRINCIPAL=username  htar -cf nova.tar nova

To list the member files of nova.tar type:

% htar -tf nova.tar

To extract all the files in nova.tar to the local file system, type:

% htar -xf nova.tar

To extract only the member file 'nova/sn1987a'

% htar -xf nova.tar nova/sn1987a

 

Commonly Used HTAR Options

The basic syntax of HTAR is similar to the standard tar utility:

 htar -{c|K|t|x|X} -f tarfile [directories] [files]

As with the standard unix tar utility the "-c" "-x" and "-t" options respectively function to create, extract, and list tar archive files. The "-K" option verifies an existing tarfile in HPSS and the "-X" option can be used to re-create the index file for an existing archive.  
Please note, you cannot add or append files to an existing archive.

Note: when HTAR creates an archive, it places an additional file (with a strange name) at the end of the archive.  Just ignore the file, it is for HTAR interal use and will not be retrieved when you extract the files from the archive.

# Create an archive with directory "nova" and file "simulator"
% htar -cvf nova.tar nova simulator
HTAR: a   nova/                                                                   
HTAR: a   nova/sn1987a
HTAR: a   nova/sn1993j
HTAR: a   nova/sn2005e
HTAR: a   simulator
HTAR: a   /scratch/scratchdirs/joeuser/HTAR_CF_CHK_61406_1285375012
HTAR Create complete for nova.tar. 28,396,544 bytes written for 4 member files, max threads: 4 Transfer time: 0.420 seconds (67.534 MB/s)
HTAR: HTAR SUCCESSFUL      

# Now List the contents
% htar -tf nova.tar
HTAR: drwx------  joeuser/joeuser          0 2010-09-24 14:24  nova/
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1987a
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1993j
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn2005e
HTAR: -rwx------  joeuser/joeuser     398552 2010-09-24 17:35  simulator
HTAR: -rw-------  joeuser/joeuser        256 2010-09-24 17:36  /scratch/scratchdirs/joeuser/HTAR_CF_CHK_61406_1285375012
HTAR: HTAR SUCCESSFUL

# now, as an example, using hsi remove the nova.tar.idx index file from HPSS
# (Note: you generally do not want to do this)
% hsi "rm nova.tar.idx"
...
rm: /home/j/joeuser/nova.tar.idx (2010/09/24 17:36:53 3360 bytes)

# Now try to list the archive contents without the index file:
% htar -tf nova.tar
ERROR: No such file: nova.tar.idx           
ERROR: Fatal error opening index file: nova.tar.idx
HTAR: HTAR FAILED

# Here is how we can rebuild the index file if it is accidently deleted
% htar -Xvf nova.tar
HTAR: i nova                         
HTAR: i nova/sn1987a
HTAR: i nova/sn1993j
HTAR: i nova/sn2005e
HTAR: i simulator
HTAR: i /scratch/scratchdirs/joeuser/HTAR_CF_CHK_61406_1285375012
HTAR: Build Index complete for nova.tar, 5 files 6 total objects, size=28,396,544 bytes
HTAR: HTAR SUCCESSFUL

#
% htar -tf nova.tar
HTAR: drwx------  joeuser/joeuser          0 2010-09-24 14:24  nova/
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1987a
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1993j
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn2005e
HTAR: -rwx------  joeuser/joeuser     398552 2010-09-24 17:35  simulator
HTAR: -rw-------  joeuser/joeuser        256 2010-09-24 17:36  /scratch/scratchdirs/joeuser/HTAR_CF_CHK_61406_1285375012
HTAR: HTAR SUCCESSFUL

 

Less Used Options

OptionMeaning
-? Print the help menu
-D, -U Soft delete and undelete of archive member files
-P Create intermediate subdirectories of archive path during creation
-I {IndexFile | .suffix} Specify an alternate pathname or suffix for naming of index file
-L listfile Specify a file that contains all the files and directories to index, one per line
-p When extracting, restore files with their original mode (ignoring umask)
-K Verify the contents of an archive in HPSS
-H<opt> A collection of HTAR specific options (see manual page)

Soft Delete and Undelete

The "-D" option can be used to "soft delete" one or more member files or directories from an HTAR archive.  The files are not really deleted, but simply marked in the index file as deleted.  A file that is soft-deleted will not be retrieved from the archive during an extract operation.  If you list the contents of the archive, soft deleted files will have a 'D' character after the mode bits in the listing:

# Soft delete a file from the archive:
% htar -Df nova.tar nova/sn1993j
HTAR: d  nova/sn1993j                                     
HTAR: HTAR SUCCESSFUL

# Now list the files and note that sn1993j is marked as deleted:
% htar -tf nova.tar
HTAR: drwx------  joeuser/joeuser          0 2010-09-24 14:24  nova/
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1987a
HTAR: -rwx------ D joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1993j
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn2005e
. . .

# To undelete the file, use the -U option:
% htar -Uf nova.tar nova/sn1993j
HTAR: u  nova/sn1993j                                     
HTAR: HTAR SUCCESSFUL
# List the file and note that the 'D' is missing
% htar -tf nova.tar nova/sn1993j
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1993j

Using ListFiles to Create an HTAR Archive

Rather than specifying the list of files and directories on the command line when creating an HTAR archive, you can place the list of file and directory pathnames into a ListFile and use the "-L" option.  The contents of the ListFile must contain exactly one pathname per line.

# Create a ListFile containing all the filename in the nova subdirectory that match a pattern:
% find nova -name 'sn19*' -print > novalist

% cat novalist
nova/sn1987a
nova/sn1993j

# Now create an archive containing only these files
% htar -cvf nova19.tar -L novalist
HTAR: a   nova/sn1987a                                                            
HTAR: a   nova/sn1993j
. . .

%htar -tf nova19.tar
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1987a
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1993j
. . .

Archive Verification

you can request that HTAR compute and save checksum values for each member file during archive creation.  The checksums are saved in the corresponding HTAR index file.  You can then further request that HTAR compute checksums of the files as you extract them from the archive and compare the values to what it has stored in the index file. 

# Request that HTAR create checksum values for each file stored
% htar -Hcrc -cvf nova.tar nova
HTAR: a   nova/                                                                   
HTAR: a   nova/sn1987a
HTAR: a   nova/sn1993j
HTAR: a   nova/sn2005e
. . .
# Now, in another directory, extract the files and request verification
% htar -Hverify=crc -xvf nova.tar
HTAR: x nova/                        
HTAR: x nova/sn1987a, 9331200 bytes, 18226 media blocks
HTAR: x nova/sn1993j, 9331200 bytes, 18226 media blocks
...

 

Limitations

HTAR has several limitations to be aware of:

Member File Path Length

File path names within an HTAR aggregate of the form prefix/name are lmited to 154 characters for the prefix and 99 characters for the file name. Link names cannot exceed 99 characters.

Member File Size

The maximum file size the NERSC archive will support is approximately 20 TB. However, we recommend you aim for HTAR aggregate sizes of around 1 TB. Member files within an HTAR aggregate are limited to approximately 68GB.

Member File Limit

HTAR aggregates have a default soft limit of 1,000,000 (1 million) member files. Users can increase this limit to a maxium hard limit of 5,000,000 member file