The Cray Scientific Libraries package, LibSci, is a collection of numerical routines optimized for best performance on Cray systems. When possible, you should use calls to the Cray LibSci routines in your code in place of calls to public-domain or user-written versions.
The Cray LibSci collection contains the following libraries:
- BLAS (Basic Linear Algebra Subroutines, including routines from the University of Texas 64-bit libGoto library);
- BLACS (Basic Linear Algebra Communication Subprograms);
- LAPACK (Linear Algebra Routines, including routines from the University of Texas 64-bit libGoto library);
- ScaLAPACK (Scalable LAPACK);
- IRT (Iterative Refinement Toolkit), linear solvers using 32-bit factorizations that preserve accuracy through mixed-precision iterative refinement;
- CRAFFT (Cray Adaptive Fast Fourier Transform Routines);
- FFT (Fast Fourier Transform Routines);
- FFTW2 (the Fastest Fourier Transforms in the West, release 2)*; and
- FFTW3 (the Fastest Fourier Transforms in the West, release 3)*.
Access to LibSci
The modulefile is loaded by default. No user action is required. This is true for all programming environments (PGI, Pathscale, GNU, and Cray) as long as you use the Cray compiler wrappers (ftn, cc, and CC).
All LibSci routines will be loaded by all Cray-provided compiler wrappers by default. No user flags or options are required for compiling or linking.
Here is an example Fortran code using ScaLAPACK that you can try. To compile this code on Hopper simply use
When linking a code against more than one library you can have the linker print which library a given routine is from using the -Wl,-y option. For example:
cc d.c -Wl,-ydgemm_
pgccSTvgejljebQS.o: reference to dgemm_
/opt/xt-libsci/11.0.01/pgi/109/mc12/lib/libsci_pgi_mp.a(dgemm.o): definition of dgemm_
Cray XT-LibSci for Hybrid MPI / ScaLAPACK Applications
Cray XT-LibSci also supports hybrid MPI / ScaLAPACK applications, which use threaded BLAS on a compute node and MPI between nodes. To use ScaLAPACK in a hybrid application:
- Adjust the process grid dimensions in ScaLAPACK to account for the decrease in BLACS nodes.
- Ensure that the number of BLACS processes required is equal to the number of nodes required, not the number of cores.
- Set the the environment variable OMP_NUM_THREADS environment variable to a value greater than one. (NOTE: Use of GOTO_NUM_THREADS is now outmoded.)
To run a ScaLAPACK application in MPI-only mode (that is, 1 MPI process per core) with 16 BLACS processes on a 4x4 computational grid, use the #PBS -lmppwidth option to specify the number of processing elements needed (16) and the #PBS -lmppnppn option to specify the number of PEs per node (2). Example:
aprun -n 16 -N 2 ./my_scalapack_app
To run the same job using a hybrid application, first reduce the number of BLACS processes from 16 to 8 (by specifying either a 2x4 or possibly a 4x2 computational grid). The additional parallelism within a node is provided through use of the threaded BLAS. In the PBS script, only those tasks actually recognized are requested, so set mppwidth equal to the number of nodes required (8) and mppnppn equal to the number of PEs per node (1).
#PBS -l mppwidth=8
#PBS -l mppnppn=1
#PBS -l mppdepth=2
setenv OMP_NUM_THREADS 2
aprun -n 8 -N 1 -d 2 ./my_scalapack_app
See the LibSci manual for more information about this.
If you require a C interface to BLAS and LAPACK but want to use Cray LibSci BLAS or LAPACK routines, use the Fortran interfaces but make sure to order arrays in Fortran column-major order. You can access the Fortran interfaces from a C program by adding an underscore to the respective routine names and passing arguments by reference (rather than by value). For example, you can call the dgetrf() function as follows:
dgetrf_(&uplo, &m, &n, a, &lda, ipiv, work, &lwork, &info);
Using LibSci's Iterative Refinement Toolkit
IRT (Iterative Refinement Toolkit) consists of linear solvers using 32-bit factorizations that preserve accuracy through mixed-precision iterative refinement. Since single-precision solvers can be up to twice as fast as double-precision solvers IRT uses an iterative refinement process to obtain solutions accurate to double-precision. IRT provides two interfaces, "Benchmarking" and "Expert," that are controlled via the environment variable IRT_USE_SOLVERS. Setting this to the value 1 specifies the Benchmarking interface from which it might be possible to obtain speedups on codes using LAPACK or ScaLAPACK without any source code changes. See the "intro_irt" man page for details and caveats about the safety of using iterative refinement.
The Cray Application Developer's Environment User's Guide contains little more than a list of the library's contents. However, many man pages are available, including intro_libsci(3s), intro_blas1(3s), intro_blas2(3s), intro_blas3(3s), intro_blacs(3s), intro_lapack(3s), intro_scalapack(3s), intro_irt(3), intro_crafft(3s), intro_fft(3s), intro_fftw2(3), intro_fftw3(3). For questions about using LibSci at NERSC please contact the consultants at firstname.lastname@example.org
Cray has recently (Nov. 2011) renamed the multi-threaded libsci libraries to help the performance of dynamic libraries applications on Hopper. So the compiler wrapper now links to the -lsci_pgi_mp (for pgi compiler) instead of -lsci_mc12_mp. You can simply remove the "-lsci_mc12_mp" in the makefile, or replace the -lsci_mc12_mp with the -lsci_pgi_mp, then the code should link without a problem (it will link to the -lsci_pgi_mp by default).
|Package||Platform||Category||Version||Module||Install Date||Date Made Default|