稀疏矩阵高性能求解器PARDISO的使用
PARDISO是一个商用的高速的稀疏矩阵线性方程组直接法求解器,其可以用于大型稀疏对称或非对称线性方程组的求解,商用版支持的并行方式包括共享内存,分布式内存和NVIDA的GPU并行。多次实践已经表明,PARDISO求解器具有优秀的性能,在各种稀疏矩阵直接求解器的比较中通常处于第一梯队,其他常见的直接法求解器包括SuiteSparse,MUMPS,SPOOLES等,商用有限元软件COMSOL的官方文档甚至直接指出在COMSOL中PARDISO求解速度快于MUMPS。。
对于线性方程组求解来说,一般常见的求解方法可分为直接法和迭代法。直接法常见的方法是高斯消去,LU分解法等,在求解过程中不需要迭代,但是在求解中可能需要较大内存,商业软件中经常使用的一种直接法叫做多波前法。迭代法又可分为两个大类,分裂迭代和Krylov迭代。前者通过将系数矩阵进行拆分形成迭代列式,常见的是雅克比迭代,高斯赛德尔迭代等,分裂迭代这一类型的迭代法在商业软件中很少直接使用,但是经常作为Krylov迭代的预处理基础;后者以Krylov子空间为基础上,常见的包括共轭梯度法,广义最小残差法等,Krylov迭代法在商业软件中较为常见。
在稀疏矩阵直接法求解中,一个常见的技术是矩阵重排,通过矩阵重排,可以使得矩阵的带宽减小(即非0元素位置相对更趋近于对角线),从而使得直接求解效率更高,一种常见的矩阵重排方法是Cuthill_McKee,常见的可以实现矩阵重排的库是METIS。以下是采用Cuthill_McKee重排前和重排后的非0元素分布。
重排前非0元素分布
重排后非0元素分布
商用版本的PARDISO个人一般较少直接使用(主要是要买)。实际中较为广泛使用的其中一个版本是配置在MKL库的PARDISO版本。PARDISO求解需要将矩阵采用CSR格式存储,并且在求解中需要正确的设置较多的参数才能正确运行。
以下提供MKL_PARDISO的一个官方例子:
Fortran版本:
INCLUDE 'mkl_PARDISO.f90' PROGRAM PARDISO_sym_f90 USE mkl_PARDISO IMPLICIT NONE INTEGER, PARAMETER :: dp = KIND(1.0D0) !.. Internal solver memory pointer TYPE(MKL_PARDISO_HANDLE), ALLOCATABLE :: pt(:) !.. All other variables INTEGER maxfct, mnum, mtype, phase, n, nrhs, error, msglvl, nnz INTEGER error1 INTEGER, ALLOCATABLE :: iparm( : ) INTEGER, ALLOCATABLE :: ia( : ) INTEGER, ALLOCATABLE :: ja( : ) REAL(KIND=DP), ALLOCATABLE :: a( : ) REAL(KIND=DP), ALLOCATABLE :: b( : ) REAL(KIND=DP), ALLOCATABLE :: x( : ) INTEGER i, idum(1) REAL(KIND=DP) ddum(1) !.. Fill all arrays containing matrix data. n = 8 nnz = 18 nrhs = 1 maxfct = 1 mnum = 1 ALLOCATE(ia(n + 1)) ia = (/ 1, 5, 8, 10, 12, 15, 17, 18, 19 /) ALLOCATE(ja(nnz)) ja = (/ 1, 3, 6, 7, & 2, 3, 5, & 3, 8, & 4, 7, & 5, 6, 7, & 6, 8, & 7, & 8 /) ALLOCATE(a(nnz)) a = (/ 7.d0, 1.d0, 2.d0, 7.d0, & -4.d0, 8.d0, 2.d0, & 1.d0, 5.d0, & 7.d0, 9.d0, & 5.d0, 1.d0, 5.d0, & -1.d0, 5.d0, & 11.d0, & 5.d0 /) ALLOCATE(b(n)) ALLOCATE(x(n)) !.. !.. Set up PARDISO control parameter !.. ALLOCATE(iparm(64)) DO i = 1, 64 iparm(i) = 0 END DO iparm(1) = 1 ! no solver default iparm(2) = 2 ! fill-in reordering from METIS iparm(4) = 0 ! no iterative-direct algorithm iparm(5) = 0 ! no user fill-in reducing permutation iparm(6) = 0 ! =0 solution on the first n components of x iparm(8) = 2 ! numbers of iterative refinement steps iparm(10) = 13 ! perturb the pivot elements with 1E-13 iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS iparm(13) = 0 ! maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm(13) = 1 in case of inappropriate accuracy iparm(14) = 0 ! Output: number of perturbed pivots iparm(18) = -1 ! Output: number of nonzeros in the factor LU iparm(19) = -1 ! Output: Mflops for LU factorization iparm(20) = 0 ! Output: Numbers of CG Iterations error = 0 ! initialize error flag msglvl = 1 ! print statistical information mtype = -2 ! symmetric, indefinite !.. Initialize the internal solver memory pointer. This is only ! necessary for the FIRST call of the PARDISO solver. ALLOCATE (pt(64)) DO i = 1, 64 pt(i)%DUMMY = 0 END DO !.. Reordering and Symbolic Factorization, This step also allocates ! all memory that is necessary for the factorization phase = 11 ! only reordering and symbolic factorization CALL PARDISO (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, & idum, nrhs, iparm, msglvl, ddum, ddum, error) WRITE(*,*) 'Reordering completed ... ' IF (error /= 0) THEN WRITE(*,*) 'The following ERROR was detected: ', error GOTO 1000 END IF WRITE(*,*) 'Number of nonzeros in factors = ',iparm(18) WRITE(*,*) 'Number of factorization MFLOPS = ',iparm(19) !.. Factorization. phase = 22 ! only factorization CALL PARDISO (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, & idum, nrhs, iparm, msglvl, ddum, ddum, error) WRITE(*,*) 'Factorization completed ... ' IF (error /= 0) THEN WRITE(*,*) 'The following ERROR was detected: ', error GOTO 1000 ENDIF !.. Back substitution and iterative refinement iparm(8) = 2 ! max numbers of iterative refinement steps phase = 33 ! only solving DO i = 1, n b(i) = 1.d0 END DO CALL PARDISO (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, & idum, nrhs, iparm, msglvl, b, x, error) WRITE(*,*) 'Solve completed ... ' IF (error /= 0) THEN WRITE(*,*) 'The following ERROR was detected: ', error GOTO 1000 ENDIF WRITE(*,*) 'The solution of the system is ' DO i = 1, n WRITE(*,*) ' x(',i,') = ', x(i) END DO 1000 CONTINUE !.. Termination and release of memory phase = -1 ! release internal memory CALL PARDISO (pt, maxfct, mnum, mtype, phase, n, ddum, idum, idum, & idum, nrhs, iparm, msglvl, ddum, ddum, error1) IF (ALLOCATED(ia)) DEALLOCATE(ia) IF (ALLOCATED(ja)) DEALLOCATE(ja) IF (ALLOCATED(a)) DEALLOCATE(a) IF (ALLOCATED(b)) DEALLOCATE(b) IF (ALLOCATED(x)) DEALLOCATE(x) IF (ALLOCATED(iparm)) DEALLOCATE(iparm) IF (error1 /= 0) THEN WRITE(*,*) 'The following ERROR on release stage was detected: ', error1 STOP 1 ENDIF IF (error /= 0) STOP 1 END PROGRAM PARDISO_sym_f90
具体使用:安装VS和INTEL oneAPI后,在VS中新建Fortran工程,点项目-属性--配置属性-Fortran-Libraries,将Use Intel Math Kernal Library选项改为Parrallel即可运行。
c语言版本:
#include <stdio.h> #include <stdlib.h> #include <math.h> #include "mkl_PARDISO.h" #include "mkl_types.h" // Define the format to printf MKL_INT values #if !defined(MKL_ILP64) #define IFORMAT "%i" #else #define IFORMAT "%lli" #endif int main (void) { /* Matrix data. */ MKL_INT n = 8; MKL_INT ia[9] = { 1, 5, 8, 10, 12, 15, 17, 18, 19}; MKL_INT ja[18] = { 1, 3, 6, 7, 2, 3, 5, 3, 8, 4, 7, 5, 6, 7, 6, 8, 7, 8 }; double a[18] = { 7.0, 1.0, 2.0, 7.0, -4.0, 8.0, 2.0, 1.0, 5.0, 7.0, 9.0, 5.0, 1.0, 5.0, -1.0, 5.0, 11.0, 5.0 }; MKL_INT mtype = -2; /* Real symmetric matrix */ /* RHS and solution vectors. */ double b[8], x[8]; MKL_INT nrhs = 1; /* Number of right hand sides. */ /* Internal solver memory pointer pt, */ /* 32-bit: int pt[64]; 64-bit: long int pt[64] */ /* or void *pt[64] should be OK on both architectures */ void *pt[64]; /* PARDISO control parameters. */ MKL_INT iparm[64]; MKL_INT maxfct, mnum, phase, error, msglvl; /* Auxiliary variables. */ MKL_INT i; double ddum; /* Double dummy */ MKL_INT idum; /* Integer dummy. */ /* -------------------------------------------------------------------- */ /* .. Setup PARDISO control parameters. */ /* -------------------------------------------------------------------- */ for ( i = 0; i < 64; i++ ) { iparm[i] = 0; } iparm[0] = 1; /* No solver default */ iparm[1] = 2; /* Fill-in reordering from METIS */ iparm[3] = 0; /* No iterative-direct algorithm */ iparm[4] = 0; /* No user fill-in reducing permutation */ iparm[5] = 0; /* Write solution into x */ iparm[6] = 0; /* Not in use */ iparm[7] = 2; /* Max numbers of iterative refinement steps */ iparm[8] = 0; /* Not in use */ iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */ iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */ iparm[11] = 0; /* Not in use */ iparm[12] = 0; /* Maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm[12] = 1 in case of inappropriate accuracy */ iparm[13] = 0; /* Output: Number of perturbed pivots */ iparm[14] = 0; /* Not in use */ iparm[15] = 0; /* Not in use */ iparm[16] = 0; /* Not in use */ iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */ iparm[18] = -1; /* Output: Mflops for LU factorization */ iparm[19] = 0; /* Output: Numbers of CG Iterations */ maxfct = 1; /* Maximum number of numerical factorizations. */ mnum = 1; /* Which factorization to use. */ msglvl = 1; /* Print statistical information in file */ error = 0; /* Initialize error flag */ /* -------------------------------------------------------------------- */ /* .. Initialize the internal solver memory pointer. This is only */ /* necessary for the FIRST call of the PARDISO solver. */ /* -------------------------------------------------------------------- */ for ( i = 0; i < 64; i++ ) { pt[i] = 0; } /* -------------------------------------------------------------------- */ /* .. Reordering and Symbolic Factorization. This step also allocates */ /* all memory that is necessary for the factorization. */ /* -------------------------------------------------------------------- */ phase = 11; PARDISO (pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error); if ( error != 0 ) { printf ("\nERROR during symbolic factorization: " IFORMAT, error); exit (1); } printf ("\nReordering completed ... "); printf ("\nNumber of nonzeros in factors = " IFORMAT, iparm[17]); printf ("\nNumber of factorization MFLOPS = " IFORMAT, iparm[18]); /* -------------------------------------------------------------------- */ /* .. Numerical factorization. */ /* -------------------------------------------------------------------- */ phase = 22; PARDISO (pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error); if ( error != 0 ) { printf ("\nERROR during numerical factorization: " IFORMAT, error); exit (2); } printf ("\nFactorization completed ... "); /* -------------------------------------------------------------------- */ /* .. Back substitution and iterative refinement. */ /* -------------------------------------------------------------------- */ phase = 33; iparm[7] = 2; /* Max numbers of iterative refinement steps. */ /* Set right hand side to one. */ for ( i = 0; i < n; i++ ) { b[i] = 1; } PARDISO (pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, b, x, &error); if ( error != 0 ) { printf ("\nERROR during solution: " IFORMAT, error); exit (3); } printf ("\nSolve completed ... "); printf ("\nThe solution of the system is: "); for ( i = 0; i < n; i++ ) { printf ("\n x [" IFORMAT "] = % f", i, x[i]); } printf ("\n"); /* -------------------------------------------------------------------- */ /* .. Termination and release of memory. */ /* -------------------------------------------------------------------- */ phase = -1; /* Release internal memory. */ PARDISO (pt, &maxfct, &mnum, &mtype, &phase, &n, &ddum, ia, ja, &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error); return 0; }
以下是计算结果:
以上,即是MKL_PARDISO的具体使用例子。在实际中,MKL版本的PARDISO性能是非常高的,在某些稀疏矩阵的条件下,求解速度可以远超matlab。
以下,是某个条件数比较大但是方程组规模不大(61349个未知数)的matlab和PARDISO求解速度对比:
在该问题中,采用AMD3900X的处理器,MKL_PARDISO求解时间为43.6s,求解速度是matlab的两倍多。该稀疏矩阵的下载链接如下:
链接:https://pan.baidu.com/s/1lw-SgNlnSUAB4o9dwJa14Q
提取码:xmim
以上,即是高性能稀疏矩阵线性方程组求解器的简单介绍,感谢您的阅读,欢迎关注公众号 有限元术
查看更多评论 >