NERSCPowering Scientific Discovery Since 1974

Sharing Data

Data sharing naturally divides into three different categories:  a single user accessing data from multiple platforms, multiple users accessing data from a single platform, or multiple users accessing data from multiple platforms.

A Note About Security and Data Integrity

Sharing data with other users must be done carefully.  The chances for data loss increase as the number of users who can access the data increases.  Permissions should be set to the minimum necessary to achieve the desired access.  Be sure to have archived backups of any critical shared data.  It is also important to ensure that SSH private keys are not shared with other users (either intentionally or accidentally) .

Accessing Your Own Data From Multiple NERSC Platforms

The NERSC Global Filesystem (NGF) facilitates data access from multiple compute platforms.   Global Homes is appropriate for small files (source code files, configuration files, ASCII input files, etc).  For large data, use Global Scratch and/or Global Project.  Global Homes, Global Scratch, and Global Project are available on most NERSC compute platforms.  Care must be taken to insure that data activities on one platform do not interfere with activities on a different platform.  For example, if a batch job on Carver is writing data into Global Scratch, it is safe to read that file on Euclid (for job monitoring, or for some real-time data analysis).  However, writing to that same file from Euclid (or even worse, deleting the file from Euclid) would most likely lead to data corruption and/or batch job failure on Carver.

Note that the scratch file systems on any NERSC Cray system are completely local to that Cray system.  In order to access such data from a different platform, the data must be copied to one of the globally-available file systems.

Sharing Your Data With Other NERSC Users

Using NERSC Give and Take

NERSC provides two commands called give and take that lets users share files and directories with other users at NERSC. The giver uses the give command to share a file:

give -u <recipient-username> <file-or-directory>

The recipient will receive an email notifying them of the file available to them. They can then use the take command to copy the file to their current directory or (optionally) a specific destination folder. 

take -u <sender-username> (-d <destination folder>) <filename>

The "-a" flag can be used to take all files from a given user.

take -u -a <sender-username> (-d <destination folder>) <filename>

To see all files available to you from a given user (without taking them) you can use the following command:

take -u <sender-user>

Using Group Permissions to Share Data

This type of data sharing is based on Unix file permissions, in particular the appropriate use of group permissions.  Most data sharing needs can be achieved with careful use of group permissions.  For an overview of Unix file groups and their various functions at NERSC, please see File Groups at NERSC.  For a brief overview of Unix file permissions, please see Unix File Permissions.  The following sections assume some familiarity with file groups and permissions.

For the following example, assume there  are two NERSC users named Elvis and Jimi (NERSC usernames "elvis" and "jimi"), who have default file groups "elvis" and "jimi", respectively.  They are both members of the "Big Science" project, and belong to the "bigsci" repository, and therefore both belong to the "bigsci" file group as well.  This file group has many other members. 

Example 1:  Generating data that can be read by other users

Elvis is running batch jobs (in his global scratch directory) that generate data that Jimi needs to read.  In order to share his data with Jimi, but not with other members of bigsci, Elvis asks his PI to create a file group called "ejdata".  The only members of this file group are elvis and jimi.  Elvis has other data in his scratch directory to which he does not want Jimi to have access.  Here are the commands Elvis should use to create an area in his scratch directory to share data with Jimi.

% cd /global/scratch/sd/elvis
% ls -ld .
drwx------ 5 elvis elvis 131072 Jul 17 08:40 .

The above permissions are the default values provided by NERSC when creating scratch directories.  In order for other users to traverse this directory into lower-level directories, it is necessary to add execute permissions to the directory (note that read permission is not needed, unless you wish to allow other users to see the entire contents of the scratch directory):

% chmod o+x .
% ls -ld .
drwx-----x 5 elvis elvis 131072 Jul 17 08:40 .

Now Elvis creates the top-level shared directory:

% mkdir BatchRuns
% ls -ld BatchRuns
drwxr-x--- 2 elvis elvis 4096 Nov 20 09:17 BatchRuns
% umask
0027

Elvis's umask value (027) provides group read (and execute for directories) permissions.  However, this is not really useful for his (default) personal file group.  Therefore the next step is to change the group ownership:

% chgrp ejdata BatchRuns
% ls -ld BatchRuns
drwxr-x--- 2 elvis ejdata 4096 Nov 20 09:17 BatchRuns

Elvis wants any file created in the BatchRuns directory to be owned by file group ejdata, so he sets the sgid bit:

% chmod g+s BatchRuns
% ls -ld BatchRuns
drwxr-s--- 2 elvis ejdata 4096 Nov 20 09:17 BatchRuns
% cd BatchRuns
% touch testfile
% ls -l testfile
-rw-r----- 1 elvis ejdata 0 Nov 20 09:20 testfile

 Jimi is able to read (but not modify) the above testfile.  Elvis wants to run four batch jobs to share with Jimi:

% mkdir Run1 Run2 Run3 Run4
% ls -l
total 1024
drwxr-s--- 2 elvis ejdata 131072 Nov 20 09:22 Run1
drwxr-s--- 2 elvis ejdata 131072 Nov 20 09:22 Run2
drwxr-s--- 2 elvis ejdata 131072 Nov 20 09:22 Run3
drwxr-s--- 2 elvis ejdata 131072 Nov 20 09:22 Run4

Note that the sgid bit is set on each of these directories automatically (because it is set on the parent directory).  The group ownership is correct.  Elvis can now run his batch jobs in each of these directories, and Jimi will be able see all the results.

Outside NERSC - Sharing Data with Your Collaborators over the Web

You can easily share your data with the rest of the world by creating a "www" directory in your Project directory. This now becomes publicly visible through the science gateway nodes at:

http://portal.nersc.gov/project/<yourproject>/

The "www" directory must be readbale and executable by "other" (chmod -R o+rX). For more information on how to share your data, or building more advanced data sharing portals, visit the science gateways page.