NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory

Grid Computing at NERSC: Data Transfer

Table of Contents:


How to transfer data to and from NERSC using grid client tools

GridFTP provides a convenient, high performance transfer mechanism to move data in and out of NERSC. GridFTP is available on the following systems:

System GridFTP hosts Notes
PDSF pdsfgrid.nersc.gov (or pdsfgrid4.nersc.gov)
pdsfgrid1.nersc.gov
pdsfgrid3.nersc.gov
pdsfgrid5.nersc.gov
Datatran dtn01.nersc.gov
dtn01.nersc.gov
Recommended host for NGF access
Carver carvergrid.nersc.gov
Franklin franklingrid.nersc.gov For access to Franklin /scratch
Archive HPSS garchive.nersc.gov *NEW* Supports full GridFTP access

We suggest using one of the following clients to move your data:

1. globus-url-copy

Syntax: globus-url-copy [-help | -usage] [-version[s]] [-vb] [-dbg] [-b | -a]
                        [-q] [-r] [-rst] [-f <filename>]
                        [-s <subject>] [-ds <subject>] [-ss <subject>]
                        [-tcp-bs <size>] [-bs <size>] [-p <parallelism>]
                        [-notpt] [-nodcau] [-dcsafe | -dcpriv]
                        <sourceURL> <destURL>
In the examples below, we assume that you have installed the Globus client package on your workstation. All commands will be run from the client machine i.e. your workstation.

Initialize your proxy cert:

% grid-proxy-init 

Copy a file from your workstation to datatran (dtn01):

% globus-url-copy file:///path/to/file \ 
gsiftp://dtn01.nersc.gov//path/file 

Copy a file from HPSS archive to your workstation:

% globus-url-copy \
gsiftp://garchive.nersc.gov/path/file file:///path/to/file

Copy a file from PDSF to dtn01 ("third party copy" without directly logging in to either system)

% globus-url-copy gsiftp://pdsfgrid.nersc.gov/path/to/file \
gsiftp://dtn01.nersc.gov/path/to/file 
For more information on globus-url-copy refer to the Globus GridFTP documentation.

2. uberftp

UberFTP provides a rich interactive client for GridFTP. It mimics standard ftp clients in behavior, along with providing some additional features.

To initialize your proxy and connect to dtn01:

% grid-proxy-init
% uberftp dtn01.nersc.gov
220 dtn01.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp>        
To list files in a directory:
uberftp> ls
drwxr-xr-x   2  shreyas  shreyas       27 Apr 26 12:28  .
drwxr-xr-x  19  shreyas  shreyas     4096 Jun 20 15:57  ..
-rw-r--r--   1  shreyas  shreyas   692224 Apr 26 12:28  zebu
-rw-r--r--   1  shreyas  shreyas  2097153 Apr 26 12:28  gnu
To get a file:
uberftp> get dtn01
dtn01:  107 bytes in 0.05 seconds. 2.30 KB/sec
To put a file:
uberftp> put localfile
localfile:  107 bytes in 0.05 seconds. 2.30 KB/sec
To do a third party copy between pdsf and dtn01, we issue an lopen, which causes treats the "lopen"ed host as the local filesystem
% grid-proxy-init
% uberftp
uberftp> lopen pdsfgrid.nersc.gov
220 pdsfgrid4.nersc.gov GridFTP Server 2.3 (gcc32dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> open dtn01.nersc.gov
220 dtn01.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> put pdsffile dtn01
pdsffile:  107 bytes in 0.05 seconds. 2.17 KB/sec
uberftp> get dtn01 pdsffile
dtn01:  107 bytes in 0.05 seconds. 2.30 KB/sec
For more details on how to use uberftp refer to the UberFTP user documentation


GridFTP Performance Optimization and Firewall Considerations

For optimal data transfer perfomance, you may need to tune certain parameters for your network. We have found that using 4 parallel streams with a TCP block size of 1MB works well for moving medium/large files across the WAN. However, actual performance for any given network may require further tuning of these parameters.

Here is an example that uses these parameters for globus-url-copy:

% globus-url-copy -p 4 -tcp-bs 1MB file:///path/to/file \
gsiftp://dtn01.nersc.gov//path/file 

Uberftp also supports similar options in the form of the tcpbuf and parallel commands:

uberftp
uberftp> open dtn01
220 dtn01.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> parallel 4
uberftp> tcpbuf 1048576
TCP buffer set to 1048576 bytes
uberftp> put file
Parameter globus-url-copy flag UberFTP command
TCP buffer size -tcp-bs SIZE
where SIZE includes a value an a unit
eg. -tcp-bs 256KB
tcpbuf SIZE
where SIZE is number of bytes
eg. tcpbuf 262144
Number of Parallel Streams -p N
where N is the number of parallel streams
eg. -p 4
parallel N
where N is the number of parallel streams
eg. parallel 4

Firewall Considerations

If you have problems using GridFTP across a firewall (eg. your transfer hangs without moving any data), you may need to ask your network administrator to open a range of ports in your firewall. Once this is done, you will need to set this range in your environment so that GridFTP clients are aware of this.

For example, to use the port range 60000 to 60064 set the following environment variable, before starting your client:

% export GLOBUS_TCP_PORT_RANGE=60000,60064  

LBNL Home
Page last modified: Fri, 02 Jul 2010 02:19:27 GMT
Page URL: http://www.nersc.gov/nusers/services/Grid/data.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science