File storage and I/O
There are 5 main user file spaces available on Genepool/Phoebe (soon to be 4 once House is retired on December 20, 2013).
Your home directory on Genepool mounted across all NERSC systems. You should refer to this home directory as $HOME where ever possible. If you had an old home directory at JGI, on /house or on the netapps we recommend you refer to this old home directory as $OLD_HOME. You should not change the environment variable $HOME. For more details on quotas, dotfile initialization and back ups, please see our global homes page.
The house file system is mounted on Genepool as well as JGI systems. Users can access data from house as usual. /house is being retired in December, 2013. Immediate effort should be put into archiving or migrating data to other filesystems. Note that house is highly utilized and often is more than 90% utilized. Please backup data to HPSS and delete data you don't need. Consider using the projectb file system described below. Genepool is the only NERSC computational resource that has access to /house. If you want to use hopper or carver, please migrate your data to projectb.
projectb is a 2.7PB GPFS based file system for the JGI's active projects. There are two distinct user spaces in the projectb filesystem: projectb/sandbox and projectb/scratch. The projectb filesystem is available on genepool, hopper and carver. The projectb filesystem is only mounted on NERSC computer systems.
|projectb Projects (migrating to DnA in 2014)||projectb Scratch||projectb Sandbox|
|Quota||5TB default||20TB, 5M inodes by default; 40TB upon request||Defined by agreement with the JGI Management|
|Backups||Daily, only for projectdirs with quota <= 5TB||Not backed up||Not backed up|
|File Purging||Files are not automatically purged||Files not accessed for 90 days are automatically deleted||Files are not automatically purged|
The intention of the projectb "Project" space is to put data that is needed by multiple people collaborating on a project to allow for easy access for data sharing. Given the space constraints and demand for both Scratch and Sandbox space, the project directories will be migrating to DnA after the /house shutdown is complete.
If you would like a project directory, please use the Project Directory Request Form.
projectb "Scratch" and "Sandbox" space is intended for staging and running JGI calculations on the NERSC systems, including genepool, hopper, and carver. On genepool, the projectb scratch space is the recommended filesystem for performing file IO during all your calculations. If you have access to the genepool resource, you should have space on projectb scratch. If you don't, please file a ticket at http://help.nersc.gov. The Sandbox areas were allocated by program. If you have questions about your program's space, please see your group lead.
DnA (Data n' Archive)
DnA is a 1PB GPFS based file system for the JGI's archive, shared databases and project directories.
|DnA Projects (migrating to DnA in 2014)||DnA Shared||DnA DM Archive|
|Quota||5TB default||Defined by agreement with the JGI Management||Defined by agreement with the JGI Management|
|Backups||Daily, only for projectdirs with quota <= 5TB||Backed up by JAMO||Backed up by JAMO|
|Files are not automatically purged||Files are not automatically purged||Purge policy set by users of the JAMO system||Files are not automatically purged|
The intention of the DnA "Project" and "Shared" space is to be a place for data that is needed by multiple people collaborating on a project to allow for easy access for data sharing. The "Project" space is owned and managed by the JGI. The "Shared" space is a collaborative effort between the JGI and NERSC.
The "DM Archive" is a data repository maintained by the JAMO system. Files are stored here when migrated using the JAMO system. The files can remain in this space for as long as the user specifies. Any file that is in the "DM Archive" has also been placed in the HPSS tape archive. This section of the file system is owned by the JGI data management team.
Each user has a "scratch" directory. Scratch directories are NOT backed up and file are purged if they have not been accessed for 90 days. Access your scratch directory with the environment variable "$SCRATCH" for example:
Scratch environment variables:
|Environment Variable||Value||NERSC Systems|
|$SCRATCH||Best-connected file system||All NERSC computational systems|
|$BSCRATCH||/global/projectb/scratch/<username>||genepool, hopper, carver|
|$GSCRATCH||/global/scratch/sd/<username>||All NERSC computational systems|
$GSCRATCH points to your Global scratch space, and $BSCRATCH points to your projectb scratch space if you have a BSCRATCH allocation. $SCRATCH will always point to the best-connected scratch space available for the NERSC machine you are accessing. For example, on genepool $SCRATCH will point to $BSCRATCH, whereas on carver $SCRATCH will point to $GSCRATCH.
The intention of scratch space is for staging, running, and completing your calculations on NERSC systems. Thus these filesystems are designed to allow wide-scale file reading and writing from many compute nodes. The scratch filesystems are not intended for long-term file storage or archival, and thus data is not backed-up, and files not accessed for 90 days will be automatically purged.
/jgi/tools is the legacy software installation environment. As of May 21, 2013 /jgi/tools can no longer be written to; all new software installations should be directed to the modules system. Please contact NERSC consulting to request new software installations. On October 1, 2013, /jgi/tools will be removed from the default path. On November 15th, 2013, /jgi/tools will no longer be a valid path. Users should adapt scripts to the modules system as soon as possible.
Other file systems
Other file systems are also be mounted on Genepool:
- SeqFS - file system used exclusively by the Illumina sequencers, SDM and Instrumentation groups at the JGI.
- /usr/common (/global/common/genepool) - is a file system where NERSC staff build software for user applications. This is the principal site for the modular software installations.
- /global/scratch - is a GPFS based file system that is accessible on almost all of NERSC's other compute systems used by all the other NERSC users. The scratch/sandbox portions of projectb should be favored by JGI users instead of /global/scratch.
- /global/project - is a GPFS based file system that is accessible on almost all of NERSC's other compute systems used by all the other NERSC users. The projectdir portion of projectb should be favored by JGI users instead of /global/project.