IBM General Parallel File System for AIX: Administration and Programming Reference
Name
gpfsDataShipStart_t - Initiates data shipping mode.
Structure
typedef struct
{
int structLen;
int structType;
int numInstances;
int reserved;
} gpfsDataShipStart_t;
Description
Once all participating threads have issued this directive for a file, GPFS
enters a mode where it logically partitions the blocks of the file among a
group of agent nodes. The agents are those nodes on which one or more
threads have issued the GPFS_DATA_SHIP_START directive. Each
thread that has issued a GPFS_DATA_SHIP_START directive and the
associated agent nodes are referred to as the data shipping collective.
In data shipping mode:
- All file accesses result in GPFS messages to the appropriate agent(s) to
read or write the requested data.
- The GPFS_DATA_SHIP_START directive is a blocking collective
operation. That is, every thread that intends to access the file
through data shipping must issue the GPFS_DATA_SHIP_START directive
with the same value of numInstances. These threads all block
within their gpfs_fcntl( ) call until all
numInstances threads have issued the GPFS_DATA_SHIP_START
directive.
- The number of threads that issue the GPFS_DATA_SHIP_START
directive does not have to be the same on all nodes. However, each
thread must use a different file handle. The default agent mapping can
be overridden using the GPFS_DATA_SHIP_MAP
directive.
- Applications that perform a fine-grained write, sharing across several
nodes, should benefit most from data shipping. The reason for this is
that the granularity of GPFS cache consistency is an entire file block, which
rarely matches the record size of applications. Without using data
shipping, when several nodes simultaneously write into the same block of a
file, even non-overlapping parts of the block, GPFS serially grants, and then
releases, permission to write into the block to each node in turn. Each
permission change requires dirty cached data on the relinquishing node to be
flushed to disk, yielding poor performance. Data shipping avoids this
overhead by moving data to the node that already has permission to write into
the block rather than migrating access permission to the node trying to write
the data. However, since most data accesses are remote in data shipping
mode, clients do not benefit from caching as much in data shipping mode as
they would if data shipping mode were not in effect. The cost to send a
message to another instance of GPFS to fetch or write data is much higher than
the cost of accessing that data through the local GPFS buffer cache.
Thus, whether or not a particular application benefits from data shipping is
highly dependent on its access pattern and its degree of block sharing.
- Another case where data shipping can help performance is when multiple
nodes must all append data to the current end of the file. If all of
the participating threads open their instances with the O_APPEND flag
before initiating data shipping, one of the participating nodes will be chosen
as the agent to which all appends are shipped. The aggregate
performance of all the appending nodes will be limited to the throughput of a
single node in this case, but should still exceed what the performance would
have been for appending small records without using data shipping.
Data shipping mode imposes several restrictions on file usage:
- Because an application level read or write may be split
across several agents, Posix read and write file atomicity
is not enforced while in data shipping mode.
- A file in data shipping mode cannot be written through any file handle
that was not associated with the data shipping collective through a
GPFS_DATA_SHIP_START directive.
- Calls that are not allowed on a file that has data shipping enabled:
- chacl
- fchacl
- chmod
- fchmod
- chown
- fchown
- chownx
- fchownx
- link
- Note:
- The GPFS_DATA_SHIP_START directive exits cleanly only when cancelled
by a GPFS_DATA_SHIP_STOP directive. If all threads issue a
close for the file, the file will be taken out of data shipping mode
but errors will also be returned.
Members
- structLen
- Length of the gpfsDataShipStart_t structure.
- structType
- The directive identifier GPFS_DATA_SHIP_START
- numInstances
- The number of open file instances, on all nodes, collaborating to operate
on the file.
- reserved
- This field is currently unused.
For compatibility with future versions of GPFS, this field should be set to
0.
Recovery
Since GPFS_DATA_SHIP_START directives block their calling threads
until all participants respond accordingly, there needs to be a way to recover
if the application program uses the wrong value for numInstances or
one of the participating nodes crashes before issuing its
GPFS_DATA_SHIP_START directive. While a gpfs_fcntl( ) call is blocked waiting for other
threads, the call can be interrupted by any signal. If a signal is
delivered to any of the waiting calls, all waiting calls on every node will be
interrupted and will return EINTR. GPFS will not establish
data shipping if such a signal occurs. It is the responsibility of the
application to mask off any signals that might normally occur while waiting
for another node in the data shipping collective. Several libraries use
SIGALRM; the thread that makes the gpfs_fcntl( ) call should use sigthreadmask
to mask off delivery of this signal while inside the call.
Error status
- EINTR
- A signal was delivered to a blocked gpfs_fcntl(
) call. All waiting calls, on every node, are
interrupted.
- EINVAL
- The file mode has been changed since the file was opened to include or
exclude O_APPEND.
The value of numInstances is inconsistent with the value issued by
other threads intending to access the file.
An attempt has been made to issue a GPFS_DATA_SHIP_START directive
on a file that is already in use in data shipping mode by other
clients.
- ENOMEM
- The available data space in memory is not large enough to allocate the
data structures necessary to establish and/or run in data shipping
mode.
- EPERM
- An attempt has been made to open a file in data shipping mode that is
already open in write mode by some thread that did not issue the
GPFS_DATA_SHIP_START directive. GPFS will not initiate data
shipping.
- ESTALE
- A node in the data shipping collective has gone down.
[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]