Home

BFSAI-IC OpenMP Implementation: User's guide Contents

1. BFSAI IC OpenMP Implementation User s guide Carlo Janna Massimiliano Ferronato Nicola Castelletto Dept Mathematical Methods and Models for Scientific Applications University of Padova Padova Italy E mail janna ferronat castel dmsa unipd it January 2011 Contents 1 Theoretical background 2 2 The BFSAI IC OpenMP Implementation code 2 Dil sInput data l dies all lo tte ty NA eh dea 6 2 2 Output results m e e ia za Ron al pe e ea 7 3 Numerical examples 8 4 Copyright 11 1 Theoretical background The objective of the BFSAI IC OPENMP IMPLEMENTATION code is to solve a symmetric positive definite SPD linear system Ax b 1 where A R and x b R with the aid of the Preconditioned Conjugate Gradient PCG algorithm The BFSAI IC preconditioner M aW Wade I 2 consists of e a unitary lower triangular matrix F Wsp such that D FL r is minimum for an arbitrary block diagonal matrix D with L the exact lower Cholesky factor of A and Ws the set of matrices with non zero pattern Spr e a lower triangular matrix Jr containing the lower Incomplete Cholesky factors of the ny diagonal blocks of FAFT The non zero pattern Sgr is diagonal for the ny diagonal blocks and uses the lower non zero pattern of AF for the off block diagonal part The density of the BFSAI IC preconditioner is controlled by a set of user specified parameters 1 pp maximum allowable number of non zeroes for each row o
2. E 03 10 0 8945910E 04 11 0 1705649E 04 12 0 2435383E 05 13 0 3911664E 06 14 0 4839887E 07 15 0 7707951E 08 16 0 4178030E 09 17 0 7204189E 10 The parm_2 and parm_3 file prescribe np np 2 and 4 respectively obtaining the following terminal logs Reading matrix and rhs End reading Computation of the A kappa pattern Computation of F Computation of JL Preconditioner computed info 0 System solved Info 0 iter 18 bnorm 0 7937E 01 resini 0 1002E 01 resiter 0 3483E 10 resreal 0 3483E 10 T_prec 0 00 T_sol 0 00 T_tot 0 00 mat F nnz density 78 0 225 mat_JL nnz density 170 0 490 Prec nnz density 248 0 715 and Reading matrix and rhs End reading Computation of the A kappa pattern 10 Computation of F Computation of JL Preconditioner computed info 0 System solved Info 0 iter 18 bnorm 0 7937E 01 resini 0 9541E 00 resiter 0 3086E 10 resreal 0 3087E 10 T_prec 0 04 T_sol 0 12 T tot 0 16 mat_F nnz density 105 0 303 mat_JL nnz density 165 0 476 Prec nnz density 270 0 778 4 Copyright BFSAI IC OPENMP IMPLEMENTATION is freely available for scientific non commercial use It was written by Carlo Janna with contributions from his co authors Massimiliano Ferronato and Nicola Castelletto 1 BFSAI IC OPENMP IMPLEMENTATION can be used only for the purpose of internal research excluding any commercial use of BFSAI IC OPENMP IMPLEMENTATION as s
3. al parameter 6 controlling the dropping of the smallest F entries gt 0 00 e tau_1 double precision real parameter 7 controlling the dropping of the smallest Jz entries gt 0 00 e iout integer output unit for the PCG convergence profile if lt 0 no print e itmax integer maximum number of PCG iterations gt 0 e tol_CG double precision real exit tolerance on the relative residual b AXx 2 b l2 gt 0 The parameters above must be listed one each line 2 2 Output results The Output results consist of two files and a summary printed out on the screen The Output files are e a vec_x file containing the converged solution x e a log file containing the possible execution and or error flags The names of the vec_x and log file are user specified and included in the Input file test_bfsai fnames The vec_x file lists the components of the converged solution x The log file provides all the flags warnings and or errors reported during the code execution The summary printed out on the screen at the end of the execution includes e info integer error code if 0 no errors encountered else see the log file for details e n_iter integer number of PCG iterations to converge e bnorm double precision real 2 norm of the right hand side b e resini double precision real initial PCG relative residual e resiter double precision real final PCG relative residual e resreal do
4. broutine mk_OMPDTSTR estimate of the maximum number of non zeroes stored in F and Jg allocation of the BFSAI IC data structure subroutine mk_BFSAI BFSAI IC computation subroutine compute_BFSAT allocation of the PCG data structure subroutine mk_BFSAI_CG solution of the linear system Ax b subroutine PCG_solv printing of the Output results The BFSAI IC OPENMP IMPLEMENTATION package is designed to be easily linked to other user s codes The following classes are defined class_OMPDTSTR variables for the parallel handling of matrices and vec tors class_CSRMAT variables for the CSR matrix storage class_BFSAI variables for the BFSAI IC preconditioner class_BFSAI_CG variables for the PCG algorithm Dynamic allocation and deallocation of variables is allowed for using a construc tor and destructor subroutine respectively Each class contains also a member subroutine characterized by the prefix errchk providing detailed information on the encountered errors mk_OMPDTSTR is the constructor for the class_OMPDTSTR variables requiring the following exchange parameters nequ integer matrix size nproc integer number of processors used in the code execution nbloc integer number of diagonal blocks DTSTR_var OMPDTSTR parallel data structure for matrix and vector handling info integer error code blksbdv integer array optional if present a user specified block sub division e g arising fro
5. e uses the pattern saved in prec_BFSAI iat_F and prec_BFSAI ja_F e parDTSTRmat_A OMPDTSTR parallel data structure for matrix and vector handling e mat_A CSRMAT matrix A in CSR format e prec_BFSAI BFSAI BFSAI IC preconditioner e info integer error code e vinfo integer array detailed error information The subroutine PCG_solv solves a linear system with a BFSAI IC PCG algo rithm using the following exchange variables e mat_A CSRMAT system matrix in CSR format e parDTSTR_mat_A OMPDTSTR parallel data structure for matrix and vector handling e prec _BFSAI BFSAI BFSAI IC preconditioner e PCG BFSAI_CG PCG parameters and work arrays e rhs double precision real array right hand side vector e sol double precision real array solution vector e info integer error code The subroutine prt_BFSAI prints out F FT and Jz 2 1 Input data The Input data must be provided in ASCII text files The program requires four Input units to be set e test bfsai fnames contains the names of the Input Output units linked to the code e a mat_A file containing the matrix A e a vec _b file containing the right hand side vector b e a parm file containing the user specified parameters needed by BFSALIC and the PCG solver The names of the mat_A vec_b and parm files are user specified The Input file test_bfsai fnames lists the names of the following units e the mat_A file Input unit e the v
6. ec_b file Input unit e the parm file Input unit e a vec_x file Output unit with the system solution x e a log file Output unit with the execution and error flags Each file name must be written within apexes and is case sensitive in Unix like operating systems The mat_A file needs a header with the variables e nn integer size n of A gt 0 e nt integer number of non zeroes ny of A gt n separated by a comma or a blank space The non zeroes of A follow in coordinate format i e row index i column index j non zero entry A i The vec_b file contains the n components of b separated by a comma or a blank space The parm file lists the values of the following variables e np integer number of processors np used in the code execution 1 lt np lt Np with N the maximum number of available processors e nb integer number of diagonal blocks np gt np e kappa integer power of A used for the selection of the non zero pattern of F gt 1 e rho_b integer parameter pg controlling the fill in of the diagonal blocks of FAFT gt 0 e rho_1 integer parameter pz controlling the fill in of the diagonal blocks of Jr gt 0 e nnz F integer maximum number of non zero entries allowed for in F n suggested value n 4047 e nnz_JL integer maximum number of non zero entries allowed for in Jz gt nz 2n suggested value n 2n r p8 pL e delta double precision re
7. f any diagonal block of FAF in excess of the corresponding row of A 2 6 tolerance below which an entry in the i th row of F is dropped relative to the 2 norm of the same row 3 pL maximum allowable number of non zeroes for each row of the lower triangular factor of any diagonal block of FAFT in excess of the corre sponding row of FAFT 4 TL tolerance below which an entry in the i th row of J is dropped relative to the 2 norm of the same row The details of the computation of F and Jr along with the overall theoretical background are provided in the paper Carlo Janna Massimiliano Ferronato and Giuseppe Gambolati A Block FSAI ILU parallel preconditioner for symmetric positive definite linear systems SIAM J Sci Comput 32 2468 2484 2010 2 The BFSAI IC OpenMP Implementation code BFSAI IC OPENMP IMPLEMENTATION is coded in Fortran90 using integer logical character and double precision real variables The main program source file is test_bfsai 90 The matrices A and F are stored in CSR format while each diagonal block of Jz is stored in MSSR format The code structure is the following o p S OR EN D 10 11 link to the Input Output units subroutine openio reading and storage of matrix A subroutine mk_CSRMAT and readmat reading and storage of the right hand side vector b reading of the Input parameters construction of the parallel data structure to handle A F and Jz with np processors su
8. gonal blocks of Jz e delta double precision real parameter 6 controlling the dropping of the smallest F entries e tau_L double precision real parameter Tzr controlling the dropping of the smallest Jz entries e prec BFSAI BFSAI IC preconditioner e info integer error code dlt_BFSAI is the destructor for the class_BFSAI variables requiring the follow ing exchange parameters e mat BFSAI BFSALIC preconditioner to be deallocated e info integer error code mk_BFSAI_CG is the constructor for the class_BFSAI_CG variables requiring the following exchange parameters e nequ integer matrix size e iout integer output unit for the PCG convergence profile e itmax integer maximum number of PCG iterations e isol integer if 0 set the initial solution to M b otherwise uses the array already stored in the solution vector e tol_CG double precision real exit tolerance on the relative residual b Ax 2 b l2 e PCG_var BFSAI_CG PCG parameters and work arrays e info integer error code dlt_BFSAI_CG is the destructor for the class_BFSAI variables requiring the fol lowing exchange parameters e PCG_var BFSAICG PCG parameters and work arrays to be deallo cated e info integer error code The subroutine compute_BFSAI computes the BFSAI IC preconditioner using the following exchange variables e cpt_PATT logical if true the non zero pattern of F is that of A if fals
9. m a graph partitioning if not present the block subdivision is automatic dlt_OMPDTSTR is the destructor for the class_OMPDTSTR variables requiring the following exchange parameters e DTSTR_var OMPDTSTR parallel data structure to be deallocated e info integer error code mk_CSRMAT is the constructor for the class_CSRMAT variables requiring the fol lowing exchange parameters nn integer matrix size nt integer number of non zero matrix entries mat CSRMAT matrix in CSR format e info integer error code dlt_CSRMAT is the destructor for the class_CSRMAT variables requiring the fol lowing exchange parameters e mat CSRMAT matrix to be deallocated e info integer error code mk_BFSAI is the constructor for the class_BFSAI variables requiring the follow ing exchange parameters e DEBUG logical if true prints out F and Jz e SPD_OPT logical if true enforces positive definiteness of Jz e nn integer size of A e nproc integer number of processors used in the code execution e nbloc integer number of diagonal blocks e nnz F integer maximum number of non zero entries in F e nnz_JL integer maximum number of non zero entries in Jz e kappa integer power of A used for the selection of the non zero pattern of F e rho_B integer parameter pp controlling the fill in of the diagonal blocks of FAF e rho L integer parameter pz controlling the fill in of the dia
10. uble precision real real PCG relative residual e T prec real execution time in seconds for the BFSAI IC computation e T_sol real execution time in seconds for the PCG to converge e T_tot real total execution time in seconds 3 Numerical examples A numerical test is included in the available package using a matrix A with n 63 and m 347 file mat_63 and a unitary right hand side b file rhs The results are obtained on a machine equipped with an Intel R Core TM i7 CPU 920 at 2 67GHz with 4 computing cores HT disabled 128 kbyte of L1 Cache 1Mbyte of L2 Cache 8 Mbyte of L3 Cache and 6 Gbyte of core Memory The parm_1 parameter file 1 nproc 1 nbloc 2 kappa 0 rho B 0 rho L 1000 nzmax_F 1000 nzmax_JL 0 00d0 delta 0 0d0 taul 101 iout 990 itmax 1 0d 10 tol_CG provides the following terminal log and PCG convergence profile Reading matrix and rhs End reading Computation of the A kappa pattern Computation of F Computation of JL Preconditioner computed info 0 System solved Info 0 iter 17 bnorm 0 7937E 01 resini 0 9815E 00 resiter 0 7204E 10 resreal 0 7204E 10 T_prec 0 00 T_sol 0 00 T_tot 0 00 mat_F nnz density 63 0 182 mat_JL nnz density 174 0 501 Prec nnz density 237 0 683 iter resiter 1 0 2240872E 01 2 0 1057432E 01 3 0 3351808E 00 4 0 7910917E 01 5 0 2074677E 01 6 0 2292885E 02 7 0 7514988E 03 8 0 8680529E 03 9 0 2054852
11. uch or as a part of a software product Users who want to integrate parts of BFSAI IC OPENMP IMPLEMENTATION into commercial products need to have a license agreement 2 BFSAI IC OPENMP IMPLEMENTATION is provided on an as is basis and for the purpose described in paragraph 1 only In no circumstances can neither the authors nor their institutions be held liable for any deficiency fault or other mishappening with regard to the use or performance of BFSAI IC OPENMP IMPLEMENTATION 3 All scientific publications for which BFSAI IC OPENMP IMPLEMENTA TION has been used shall mention usage of BFSAI IC OPENMP IMPLE MENTATION and shall refer to the following publication 11 Carlo Janna Massimiliano Ferronato and Giuseppe Gambolati A Block FSAIL ILU parallel preconditioner for symmetric positive definite linear systems SIAM J Sci Comput 32 2468 2484 2010 Concerning the citation of the software package itself current version is 1 0 we recommend to refer to it in the following way Carlo Janna Massimiliano Ferronato and Nicola Castelletto BFSAI IC OpenMP Implementation Available online at http www dmsa unipd it ferronat softw are html Release V1 0 January 2011 Please document possible bugs and failures of the BFSAI IC OPENMP Im PLEMENTATION code to one of the authors 12

BFSAI-IC OpenMP Implementation: User's guide Contents

Contents

Download Pdf Manuals

Related Search

Related Contents