NERSC 3rd Party IPI-3 Project
Phase III
3/5/2000
Summary of Changes
and New Development
I. Introduction
This document summarizes the modifications that were made to the HPSS source code in support of the 3rd Party IPI-3 project. The changes were intentionally kept to a minimum to make it more likely that they would be accepted as part of the HPSS Baseline – the majority of the code to support this capability on Unicos platforms at NERSC was external to HPSS, primarily in the development of the Unicos IPI-3 libraries, the architecture-independent thread libraries, and the complete redesign/rewrite of the internal HSI I/O subsystem. These libraries are described in sections III – V, below.
The HSI I./O subsystem was rewritten to fully support the HPSS parallel I/O capabilities, as described in section VI.. Changes were also made to the non-DCE Client API library to implement the hpss_ReadList and hpss_WriteList calls.
II. HPSS Modifications
As described in the Phase II Feasibility Study, in order to use HIPPI Framing Protocol as User Level in order to do 3rd Party IPI-3 transfers, the Unicos Client process must first reserve an Upper Layer Protocol identifier to associate inbound HIPPI messages with a particular opened device. The ANSI standard reserves low-numbered IDs, for example, ULP ID 6 and 7 are reserved for IPI-3 Master and Slave messages. Applications normally reserve ULP IDs > between 128 and 255, but this is a configurable parameter (on AIX, this can be changed via SMIT panel, and is settable at compile time for the Unicos IPI-3 ULP reservation library).
The HPSS mover and IPI-3 code changes were made with the following primary design goals in mind:
IPI3_INTERFACE_STRUCT
The IPI3_INTERFACE_STRUCT is currently defined as
typedef struct
{
short interface; /* interface type flag */
char *name; /* client name for data transfers */
}
The
interface member currently has only one defined value:/* supported values for interface type flag
#define IPI3_HIPPI 0 /* IPI3 Third Party over HIPPI */
The format of the "interface" member was redefined to contain 3 logical fields as follows (bit 0 is the rightmost, least-significant bit):
bits 2-0 (3 bits): "Interface type" (values 0-7).
A value of 0 continues to mean IPI3_HIPPI (i.e., no change is necessary to existing code).bits 7-3 (5 bits):
Flags to support special features.bits 15-8 (8 bits):
Contents are defined differently, based upon the flag bitsFor the IPI-3 project, if bit 7 is set to 1, then the ULP ID to use when sending HIPPI FP messages to the client is contained in bits 15-8.
The coding to support this is shown below:
/* supported values for interface type flag (values are 0-7) */
#define IPI3_IFTYPE_MASK 0x0007 /* mask for extracting the transport type field */
#define IPI3_HIPPI 0 /* IPI3 Third Party over HIPPI */
#define IPI3_FIBERCHANNEL 1 /* IPI3 Third Party over FIBERCHANNEL */
#define IPI3_ETHERNET 2 /* IPI3 Third Party over ETHERNET (e.g. GBit) */
/* flag bit definitions in the "interface" type flag */
#define IPI3_IFTYPE_ULP_PRESENT 0x0080 /* 1: ULP ID present in upper 8 bits */
#define IPI3_IFTYPE_BIT6 0x0040 /* unused */
#define IPI3_IFTYPE_BIT5 0x0020 /* unused */
#define IPI3_IFTYPE_BIT4 0x0010 /* unused */
#define IPI3_IFTYPE_BIT3 0x0008 /* unused */
/* right shift count for extracting the ULP ID from the upper bits of the interface value */
#define IPI3_IFTYPE_ULP_SHIFT 8
typedef struct
{
unsigned short interface; /* interface type flag */
char *name; /* client name for data transfers */
} IPI3_INTERFACE_STRUCT;
Note that a value of 0 for "
interface" has the same meaning as it did previously, so existing code and executables will continue to work without requiring any changes.Slave I/O Transfer Protocol Addition
The IPI-3 Slave interface in the AIX HIPPI driver has the same restrictions as the Max Strat, namely, it expects that ULP ID 6 and 7 are the only ULP IDs used for IPI-3 master/slave messages. This interface does not provide the "Alternate HIPPI-FP" parameter which was used to circumvent the restrictions in the Max Strat. Instead, a new Slave capability was added which makes use of the direct HIPPI Framing Protocol user-level interface provided by the AIX HIPPI driver. When this transport method is used, the slave IPI-3 code opens a direct HIPPI FP connection, obtaining a random ULP ID assigned by the driver. For complex writes, this ULP ID must be transmitted to the client, so that it can direct the data back to the mover using the correct ULP ID.
For Slave I/O transfers, a new IPI-3 protocol parameter was defined:
parameter name: IPI3_ALT_ULPID_PARM
parameter ID: 0xD4
parameter length: 3
All of the remaining HPSS changes described below were to support the use of the alternate ULP ID, if specified by the client. Additional detail regarding the use of the Alternate HIPPI-FP Header Parameter (0xD7) can be found in the Phase II Feasibility Study.
HPSS Changes – Detail
The following source files were modified (starting at the root of the HPSS source tree).
./Makefile.macros
./include/ipi3/ipi3.h
./include/ipi3/ipi3cmd.h
./include/ipi3/ipi3defs.h
./include/ipi3/ipi3en.h
./include/ipi3/ipi3slave.h
./include/hpss_iod.idl
./src/ipi3/ipi3.c
./src/ipi3/ipi3drvr.c
./src/ipi3/ipi3en.c
./src/ipi3/ipi3struct.c
./src/ipi3/ipi3log.c
./src/ipi3/ipi3slave.c
./src/mvr/mvr_clientif.c
./src/mvr/pre.ksh
The changes for each of these files are as follows:
Makefile.macros
include/ipi3/ipi3.h
include/ipi3/ipi3cmd.h
include/ipi3/ipi3defs.h
short interface;
to
unsigned short interface;
include/ipi3/ipi3en.h
include/ipi3/ipi3slave.h
include/hpss_iod.idl
signed16
to unsigned16src/ipi3/ipi3.c
src/ipi3/ipi3drvr.c
src/ipi3/ipi3en.c
src/ipi3/ipi3struct.c
src/ipi3/ipi3log.c
src/ipi3/ipi3slave.c
src/mvr/mvr_clientif.c
src/mvr/pre.ksh
III. Non-DCE Library Changes
The Gleicher Enterprises-supported non-DCE client library (NDAPI) was augmented with the following capabilities:
IV. Threads Library
In order to support the new multi-threaded HSI on architectures other than Unicos, an architecture-independent thread library was developed, to wrap the native thread implementation on the architectures for which HSI is supported. The names of all data types and functions in the library are prefixed by "ai_", e.g. "ai_thread_abort()", [Architecture Independent]. The thread library includes wrappers for:
Only a subset of the pthreads capability, as required for HSI, was implemented. This basic capability includes:
The threads library was adapted from the library developed for the Feasibility Study during Phase 2 of the NERSC IPI-3 Project. Only functions which were available in Unicos were implemented, since this was the primary platform to be supported by the new HSI.
V. Unicos IPI-3 Libraries
The Unicos version of the IPI-3 library, as developed for phase II of this project, was modified to support the
IPI-3 Interface Struct changes described above. These changes were nearly identical to the changes made in the HPSS version of the IPI-3 library.Administrative Controls.
A requirement for this project was the ability to disable either IPI-3 reads or writes if a software problem was suspected, or hardware problems require disabling the HIPPI channel or one of the HIPPI switches.
For the Unicos implementation, a new module, called
ipi3AdminCtl.c, was added to the Unicos library. This module contains administrative control functions to allow the Unicos application to determine if IPI-3 transfers have been disabled. The following control flags are defined:These controls are implemented by dynamically checking for the presence of a file, which can be created by using the Unix
touch(1) command. Unless the file(s) are present, it is assumed that IPI-3 reads and writes are both enabled. The filenames are defined in the Makefile for the IPI-3 library at compile time, as follows:IPI3_ADMIN_MASTER_DISABLE_FILE
: if this file exists, both IPI-3 reads and writes are disabledIPI3_ADMIN_READ_DISABLE_FILE
: if this file exists, IPI-3 reads are disabledIPI3_ADMIN_WRITE_DISABLE_FILE
: if this file exists, IPI-3 writes are disabled.To re-enable reads or writes, it is sufficient to rename or remove the administrative control file.
The administrative capability is conditionally compiled into the IPI-3 library for the Unicos operating system, and the code to check the state of the administrative controls is conditionally compiled in the Parallel I/O file transfer code within HSI. Prior to the beginning of each new file transfer, the code checks the state of the controls, and will only perform IPI-3 transfers if the appropriate control (read or write) is still enabled.
AIX Library Compatibility
A goal of this project was to implement a Unicos library using the same IPI-3 function names and parameters as the existing AIX library. This was accomplished by the development of a new module which implements the "
ipiMasterXXX" functions as a separate thread, which is invoked when an IPI-3 device is first opened. The new thread assumes the role of the AIX OS HIPPI driver, processing requests passed to it via a signaling mechanism in the main thread.A restriction which was added for the Unicos implementation is that the I/O buffer which the application uses for IPI-3 I/O MUST be allocated via a call to an internal IPI-3 library function. This function allocates additional space to accommodate the extra space required for the I-field and FP Header, as described in the Phase II report. The new functions which were added are:
ipi3_buf_malloc
- allocates a bufferipi3_buf_free
– deallocates a bufferipi3_buf_ok
– internal function to verify whether a buffer was allocated by ipi3_buf_malloc() call.With the exception of the requirement to call
ipi3_buf_malloc/ipi3_free_buf, the Unicos IPI-3 library is compatible with the AIX version. The HPSS PFTP client uses only a subset of the AIX IPI-3 library calls which are available, and only that subset was actually implemented in the Unicos version of the library. The remaining functions were implemented as stubs that can be expanded, if necessary, in the future.VI. ULP Library
The Upper Layer Protocol library, used to manage the assignment of ULP IDs (using advisory locks on a globally-shared file in the Unicos file system), as developed in phase II of this project, was used without modification. No code changes to PFTP or HSI were necessary to make use of this library; all calls are internal to the IPI-3 master thread.
Two goals of this library were:
These goals were met by implementing an advisory lock on the ULP reservation file, which is a small fixed-length file containing one entry for each possible ULP assignment (max of 256). This file is created, if it does not exist, by the first program which attempts to open a HIPPI device. Each time a new ULP reservation is requested, the library verifies that each reserved ULP ID is owned by a process that is still active. If it is determined that an entry belongs to a process that no longer exists, the entry is reassigned to the requesting process.
VII. HSI Changes
The HSI I/O subsystem was completely rewritten, involving approximately 7000 lines of new or changed code (including comments) in HSI.. The existing file transfer function (fileio.c) was replaced by modules to support the following new capabilities:
VIII. Issues
During the testing of the new code, two problems were encountered which must be resolved before the new HSI can be put into production. Both of these problems are currently being investigated.
It is unknown at this time whether the problem is caused by a malfunctioning HIPPI switch, or by a problem in the Cray Gigaring software in the HPN.
Conclusion
The IPI-3 interface has been tested, using the new version of HSI, on both the J90 and T3E Unicos systems at NERSC. Significant improvements in I/O to and from HPSS have been observed on both of these architectures, particularly on the J90. Transfer rate improvements on the order of 10X have been achieved on the J90, despite the heavy loading on the system. In addition, the new HSI parallel I/O features have been tested on both DCE and non-DCE AIX platforms, Apple Mac OSX, HPUX, Sun Solaris, and SGI IRIX platforms; all architectures have demonstrated significantly improved I/O performance. .
The HPSS modifications may possibly be included in the HPSS baseline. An informal peer review of the changes has taken place with one of the primary HPSS mover developers , with positive results. In order to include the changes as part of the baseline, it will be necessary to go through the formal HPSS requirement and review/test process.
The primary goals, as set forth in the initial phase I requirements, have been met. In addition, other significant benefits have been realized as a result of this project:
References