Research projects
Research Interests
- Scientific computing, domain decomposition methods
- Linear solvers for sparse matrices
- Computational plasma physics
- Grid generation techniques
- GPU computing
Current Research
PDSLin: A hybrid linear solver for large-scale highly-indefinite linear systems
The Parallel Domain decomposition Schur complement based Linear solver (PDSLin), which implements a hybrid (direct and iterative) linear solver based on a non-overlapping domain decomposition technique called chur complement method, and it has two levels of parallelism: a) to solve independent subdomains in parallel and b) to apply multiple processors per subdomain. In such a framework, load imbalance and excessive communication lead to the performance bottlenecks, and several techniques are developed to address those issues: taking advantage of the sparsity of the right-hand-sides during the triangular solutions with interfaces; load balancing sparse matrix-matrix multiplication to form update matrices; and designing an effective asynchronous point-to-point communication of the update matrices.

The scalability and speedup study of PDSLin.
Dynamically adaptive grids
Adaptive solution refinement through grid motion, with high solution variation regions serving as attractors for grid density or magnetic flux surfaces serving as constant-coordinate alignment targets, is a capability of increasing interest to continuum-based models of fusion devices. The MHD equations are transformed from Cartesian coordinates to solution-defined curvilinear coordinates and convergence and accuracy studies show that the curvilinear solution requires less computational effort due both to the more optimal placement of the grid points ant to the improved convergence of the implicit solver, nonlinearly and linearly. Because of the regularity of memory references that are inherited from the logically regular curvilinear coordinate grid, r-type refinement is certain to have advantanges over unstructured adaptive schemes in the multicore and SIMDized processor architectures of the future.

The adaptive grids for the four-field extended MHD equations. From left to right: the stream function for the in-plane component of the ion velocity, the z compnent of the ion velocity, magnetic flux, the z component of the magnetic field, the out-of-plane current density and the adaptive grids. From top to bottom: t=0.0, 10.0, 20.0, 30.0 and 40.0.
Preconditioning techniques for magnetic reconnection simulation
Magnetic reconnection is a fundamental process in a magnetized plasma at a high magnetic Lundquist number, which occurs in a wide variety of laboratory and space plasmas, e.g., magnetic fusion expriments, the solar corona and the Earth's magnetotail. An implicit time advance for the two-fluid magnetic reconnection problem is known to be difficult because of the large condition number of the associated matrix. This is especially troublesome when the collisionless ion skin depth is large so that the Whistler waves, which cause the fast reconnection, dominate the physics.
For small system sizes, a direct solver such as SuperLU can be employed to obtain an accurate solution as long as the conditioner number is bounded by the reciprocal of the floating-point machine precision. However, SuperLU scales effectively only to 100s of processors or less. For larger system sizes, it has been shown that physics-based or other preconditioners can be applied to provide adequate solver performance.
Recently, we have been developing a new algebraic hybrid linear solver, PDSLin. This is based on a non-overlapping domain decomposition technique called the Schur complement method, whereby subdomain problems can be solved by the direct solver SuperLU and the Schur complement system corresponding to the interface equations is solved using a preconditioned iterative solver.
The enhanced scalability is attributed to the ability of employing hierarchical parallelism, namely, solving independent subdomain problems in parallel and using a subset of processors per subdomain. This requires only modest parallelism from SuperLU for each subdomain, and only a handful of iterations for the Schur complement system, because the Schur system is often better condidioned than the original system.


