NERSCPowering Scientific Discovery Since 1974

Data Transfer Examples

Moving data to Projectb

Projectb is where data should be written from jobs running on the cluster or Gpints.  There are intermediate files or bad results from a run that didn't work out that don't need to be saved.  By running these jobs in the SCRATCH areas, these files will be deleted for you by the puge.  If you run in the SANDBOX, you will have to clean up after yourselves.

Batch Scheduled Transfers

Use any queues to schedule jobs that move data to Projectb.  A basic transfer script is here:

kmfagnan@genepool12 ~ $ cat data_to_projb.sh
#!/bin/bash –l
#$ -N data2projb
 
<cp,rsync> <my_files_for_analysis> /projectb/scratch/<myscratch>/<dir>

kmfagnan@genepool12 ~ $  qsub data_to_projb.sh

Interactive Data Transfer

Log into ANY node (gpint, genepool*, dtn) either directly via ssh or through qlogin to transfer data to Projectb.

 

Use GlobusOnline

The final recommendation we can make for transferring data to DnA, is that you can set up transfers using Globus Online.  This is a web-based utility that allows you to start transfers, close the browser window and get an email when they're finished.  For more information on how to use this resource, please see, http://www.globusonline.org or contact consult at nersc dot gov.

 

Moving data to DnA

The data that is moved to DnA should be finished products, ready to archive or ready to be shared between groups at the JGI.  It is not for intermediate data. 

Batch Scheduled Transfers

Use the xfer queue to schedule jobs that move the data.  A basic transfer script is here: 

kmfagnan@genepool12 ~ $ echo $DNAFS
/global/dna
kmfagnan@genepool12 ~ $ cat projb_to_dna.sh
#!/bin/bash –l
#$ -N projb2dna
#$ -l xfer.c 

rsync /house/groupdirs/my_group/my_shared_files $DNAFS/projectdirs/<my_group_dir>
kmfagnan@genepool12 ~ $  qsub projb_to_dna.sh

Interactive Data Transfers

If you would like to interactively transfer data, you can log in to one of the Data Transfer Nodes.

kmfagnan@genepool10 ~ $ ssh dtn03 

Then enter the commands that you'd like to use for data transfer and don't forget to take advantage of the environment variables that are set up for you.

kmfagnan@dtn03~$ echo $DNAFS 
/global/dna
kmfagnan@dtn03~$ rsync /house/groupdirs/my_group/my_shared_files $DNAFS/pprojectdirs/<my_group_dir>

Use GlobusOnline

The final recommendation we can make for transferring data to DnA, is that you can set up transfers using Globus Online.  This is a web-based utility that allows you to start transfers, close the browser window and get an email when they're finished.  For more information on how to use this resource, please see, http://www.globusonline.org or contact consult at nersc dot gov.

Move data to WebFS 

There have been a few questions about how to get data to WebFS from the compute nodes or gpints if it is only mounted on the gpweb systems.  

First, set up passphraseless ssh keys (link)

Then run the following rsync command from either a batch script or the command line. 

rsync -v -e ssh files gpweb08.nersc.gov:/webfs/<dir>