SystemsFranklin Home
About Franklin
Software
Status & StatsNERSC MOTD Announcements Known Problems Current Queue Look Completed Jobs List Job Stats
|
Using LibSci on FranklinThe Cray Scientific Libraries package, LibSci, is available on Franklin. Libsci includes BLAS, LAPACK, ScaLAPACK, BLACS, and SuperLU routines. SuperLu User's Guide, LAPACK, ScaLAPACK, BLAS, and BLACS man pages The LibSci module is automatically loaded upon login. The compiler wrappers (ftn, CC, cc) will automatically load and link to the LibSci library. BLAS and LAPACK functionality included in LibSci contains Cray custom routines combined with the University of Texas' libGoto library. BLAS and LAPACK from LibSci are preferred and default functions than the corresponding BLAS and LAPAC functions provided in ACML. Threaded BLAS in LibSci can be used by setting the GOTO_NUM_THREADS for Franklin and by using the appropriate Torque options for using less than 4 MPI tasks per node for your job. Scalapack in LibSci is also supported in mixed mode, i.e. MPI parallelism across compute nodes with threaded BLAS level parallelism on cores within a node. The code needs to be linked with libsci_mp library via "-lsci_mp". This is performed via compile and runtime options, and environment variables, as follows: 1) adjust the process grid dimensions in ScaLAPACK to account for the decrease in BLACS nodes; 2) ensure that the number of BLACS processes required is equal to the number of nodes required, NOT equal to the number of cores; 3) set GOTO_NUM_THREADS in the PBS job script used to launch the job. Here is a sample job script for using threaded BLAS or mixed mode Scalapack: #PBS -q debug #PBS -l mppwidth=64 #PBS -l mppnppn=1 #PBS -l mppdepth=4 #PBS -l walltime=00:30:00 #PBS -j eo cd $PBS_O_WORKDIR setenv GOTO_NUM_THREADS 4 ftn atest.f90 aprun -n 64 -N 1 -d 4 ./a.out LibSci also includes the automatic benchmarking interfaces for IRT, the Iterative Refinement Toolkit. Iterative Refinement can be used 'under-the-covers' in your application by setting the environment variable IRT_USE_SOLVERS to 1. You can therefore obtain significant speed-ups on applications that are performing LAPACK or ScaLAPACK linear solvers (Cholesky or LU) just by setting this environment variable. It is particularly useful for benchmarkers who cannot change their codes. Please refer to "intro_irt" man page for details about the explicit interface to IRT, as well as the caveats and safety of using iterative refinement. You can use IRT under-the-covers if your code using the following routine or combination o routines:
dgetrf and dgetrs (LAPACK)
dpotrf and dpotrs (LAPACK)
dgesv (LAPACK)
dposv (LAPACK)
pdgetrf and pdgetrs (ScaLAPACK)
pdpotrf and pdpotrs (ScaLAPACK)
pdgesv (ScaLAPACK)
pdposv (ScaLAPACK)
zgetrf and zgetrs (LAPACK)
zpotrf and zpotrs (LAPACK)
zgesv (LAPACK)
zposv (LAPACK)
pzgetrf and pzgetrs (ScaLAPACK)
pzpotrf and pzpotrs (ScaLAPACK)
pzgesv (ScaLAPACK)
pzposv (ScaLAPACK)
Example Fortran Code using ScaLAPACK from LibSciBelow is the compile and run of a Fortran that uses ScaLAPACK from LibSci.Notice that there is no explicit reference to LibSci in the compiler line. It is included in the default user environment, and the compiler wrappers perform the linking automatically.
% cat ABCp.f90
! Matrix-Matrix Multiply, AB = C using LibSci ScaLAPACK
! filename: ABCp.f90
! compile: ftn -fast -o ABCp ABCp.f90
! input: ABCp.dat
! prow number of rows in proc grid
! pcol number of columns in proc grid
! n number of rows/columns in matrix A
! nb matrix distribution block size
! oputput: fort.u, where u=10+processor number, and stdout
implicit none
integer :: n, nb ! problem size and block size
integer :: myunit ! local output unit number
integer :: myArows, myAcols ! size of local subset of global array
integer :: i,j, igrid,jgrid, iproc,jproc, myi,myj, p
real*8, dimension(:,:), allocatable :: myA,myB,myC
integer :: numroc ! blacs routine
integer :: me, procs, icontxt, prow, pcol, myrow, mycol ! blacs data
integer :: info ! scalapack return value
integer, dimension(9) :: ides_a, ides_b, ides_c ! scalapack array desc
! Read problem description
open(unit=1,file="ABCp.dat",status="old",form="formatted")
read(1,*)prow
read(1,*)pcol
read(1,*)n
read(1,*)nb
close(1)
if (((n/nb) < prow) .or. ((n/nb) < pcol)) then
print *,"Problem size too small for processor set!"
stop 100
endif
! Initialize blacs processor grid
call blacs_pinfo (me,procs)
call blacs_get (0, 0, icontxt)
call blacs_gridinit(icontxt, 'R', prow, pcol)
call blacs_gridinfo(icontxt, prow, pcol, myrow, mycol)
myunit = 10+me
write(myunit,*)"--------"
write(myunit,*)"Output for processor ",me," to unit ",myunit
write(myunit,*)"Proc ",me,": myrow, mycol in p-array is ", &
myrow, mycol
! Construct local arrays
! Global structure: matrix A of n rows and n columns
! matrix B of n rows and n column
! matrix C of n rows and n column
myArows = numroc(n, nb, myrow, 0, prow)
myAcols = numroc(n, nb, mycol, 0, pcol)
write(myunit,*)"Size of global array is ",n," x ",n
write(myunit,*)"Size of block is ",nb," x ",nb
write(myunit,*)"Size of local array is ",myArows," x ",myAcols
! Initialize local arrays
allocate(myA(myArows,myAcols))
allocate(myB(myArows,myAcols))
allocate(myC(myArows,myAcols))
do i=1,n
call g2l(i,n,prow,nb,iproc,myi)
if (myrow==iproc) then
do j=1,n
call g2l(j,n,pcol,nb,jproc,myj)
if (mycol==jproc) then
myA(myi,myj) = real(i+j)
myB(myi,myj) = real(i-j)
myC(myi,myj) = 0.d0
! write(myunit,*)"A(",i,",",j,")", &
! " --> myA(",myi,",",myj,")=",myA(myi,myj), &
! "on proc(",iproc,",",jproc,")"
endif
enddo
endif
enddo
! Prepare array descriptors for ScaLAPACK
ides_a(1) = 1 ! descriptor type
ides_a(2) = icontxt ! blacs context
ides_a(3) = n ! global number of rows
ides_a(4) = n ! global number of columns
ides_a(5) = nb ! row block size
ides_a(6) = nb ! column block size
ides_a(7) = 0 ! initial process row
ides_a(8) = 0 ! initial process column
ides_a(9) = myArows ! leading dimension of local array
do i=1,9
ides_b(i) = ides_a(i)
ides_c(i) = ides_a(i)
enddo
! Call ScaLAPACK library routine
call pdgemm('T','T',n,n,n,1.0d0, myA,1,1,ides_a, &
myB,1,1,ides_b,0.d0, &
myC,1,1,ides_c )
! Print results
call g2l(n,n,prow,nb,iproc,myi)
call g2l(n,n,pcol,nb,jproc,myj)
if ((myrow==iproc) .and. (mycol==jproc)) &
write(*,*) 'c(',n,n,')=',myC(myi,myj)
! Deallocate the local arrays
deallocate(myA, myB, myC)
! End blacs for processors that are used
call blacs_gridexit(icontxt)
call blacs_exit(0)
end
! convert global index to local index in block-cyclic distribution
subroutine g2l(i,n,np,nb,p,il)
implicit none
integer :: i ! global array index, input
integer :: n ! global array dimension, input
integer :: np ! processor array dimension, input
integer :: nb ! block size, input
integer :: p ! processor array index, output
integer :: il ! local array index, output
integer :: im1
im1 = i-1
p = mod((im1/nb),np)
il = (im1/(np*nb))*nb + mod(im1,nb) + 1
return
end
! convert local index to global index in block-cyclic distribution
subroutine l2g(il,p,n,np,nb,i)
implicit none
integer :: il ! local array index, input
integer :: p ! processor array index, input
integer :: n ! global array dimension, input
integer :: np ! processor array dimension, input
integer :: nb ! block size, input
integer :: i ! global array index, output
integer :: ilm1
ilm1 = il-1
i = (((ilm1/nb) * np) + p)*nb + mod(ilm1,nb) + 1
return
end
% cat runit
#PBS -N scalapackjob
#PBS -q debug
#PBS -l mppwidth=64
#PBS -l walltime=00:05:00
#PBS -e run.out
#PBS -j eo
cd $PBS_O_WORKDIR
ftn -o ABCp -fast ABCp.f90
aprun -n 64 ./ABCp
% cat ABCp.dat
8
8
20000
64
% cat run.out
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/opt/xt-asyncpe/1.0c/bin/ftn: INFO: linux target is being used
c( 20000 20000 )= 5333133330000.000
Application 243414 resources: utime 0, stime 0
-------------------------- Batch Job Report ------------------------------
Job Id: 5570270.nid00003
User Name: yunhe
Group Name: yunhe
Job Name: scalapackjob
Session ID: 31162
Resource List: walltime=00:05:00
Queue Name: debug
Account String: mpccc
Total Tasks: 64
Tasks per Node: 4
Node Count: 16
Job Exit Summary:
APINFO_SUCCESS: application completed with no detectable system errors
|
![]() |
Page last modified: Wed, 01 Jul 2009 22:02:46 GMT Page URL: http://www.nersc.gov/nusers/systems/franklin/software/libsci.php Web contact: webmaster@nersc.gov Computing questions: consult@nersc.gov Privacy and Security Notice |
![]() |