Home

User Guide for KLU and BTF

1. else printf This is an early version n endif KO OX OX H XX XX KF define KLU_DATE Jan 25 2011 33 define KLU_VERSION_CODE main sub main 1000 sub define KLU_MAIN_VERSION 1 define KLU_SUB_VERSION 1 define KLU_SUBSUB_VERSION 2 define KLU_VERSION KLU_VERSION_CODE KLU_MAIN_VERSION KLU_SUB_VERSION ifdef __cplusplus endif endif 34 8 The BTF routines The file BTF Include btf h listed below describes each user callable routine in the C version of BTF and gives details on their use BTF package BTF_MAXTRANS find a column permutation Q to give A Q a zero free diagonal BTF_STRONGCOMP find a symmetric permutation P to put P A P into block upper triangular form BTF_ORDER do both of the above btf_maxtrans then btf_strongcomp Copyright c 2004 2007 Tim Davis University of Florida with support from Sandia National Laboratories All Rights Reserved BTF_MAXTRANS finds a permutation of the columns of a matrix so that it has a zero free diagonal The input is an m by n sparse matrix in compressed column form The array Ap of size nti gives the starting and ending positions of the columns in the array Ai Ap 0 must be zero The array Ai contains the row indices of the nonzeros of the matrix A and is of size Ap n The row indices of column j are located in Ai Ap j Aplj 1 1 Row indices must be in the range 0 to m 1 Dupli
2. klu_symbolic Symbolic klu_common Common Symbolic klu_analyze n Ap Ai amp Common real or complex include klu h UF_long n Ap nti Ai nz klu_l_symbolic Symbolic klu_l_common Common Symbolic klu_l_analyze n Ap Ai amp Common real or complex 5 7 klu analyze given order and analyze a matrix In this routine the fill reducing ordering is provided by the user Common ordering is ignored Instead the row permutation P and column permutation Q are used These are integer arrays of size n If NULL a natural ordering is used so to provide just a column ordering pass Q as non NULL and P as NULL A NULL pointer is returned if an error occurs These functions may be used for both real and complex cases include klu h int n Ap nti Ai nz P n Q n klu_symbolic Symbolic klu_common Common Symbolic klu_analyze_given n Ap Ai P Q amp Common real or complex include klu h UF_long n Ap n 1l Ai nz P n Q In klu_l_symbolic Symbolic klu_l_common Common Symbolic klu_l_analyze_given n Ap Ai P Q amp Common real or complex 5 8 klu factor numerical factorization The klu_factor function factorizes a matrix using a sparse left looking method with threshold partial pivoting The inputs Ap and Ai must be unchanged from the previous call to klu_analyze that created the Symbolic object A NULL pointer is returned if an er
3. real ok klu_z_condest Ap Az Symbolic Numeric amp Common complex include klu h UF_long ok Ap n 1 double Ax nz Az 2 nz klu_l_symbolic Symbolic klu_l_numeric Numeric klu_l_common Common ok klu_l_condest Ap Ax Symbolic Numeric amp Common real ok klu_zl_condest Ap Az Symbolic Numeric amp Common complex 5 18 klu rcond cheap reciprocal condition number estimation This function returns the smallest diagonal entry of U divided by the largest which is a very crude estimate of the reciprocal of the condition number of the matrix A It is very cheap to compute however In MATLAB notation rcond min abs diag U max abs diag U If the matrix is singular rcond will be zero The result is returned in Common rcond include klu h int ok klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_rcond Symbolic Numeric amp Common real ok klu_z_rcond Symbolic Numeric amp Common complex include klu h UF_long ok klu_l_symbolic Symbolic klu_l_numeric Numeric klu_l_common Common ok klu_l_rcond Symbolic Numeric amp Common real ok klu_zl_rcond Symbolic Numeric amp Common complex 5 19 klu_scale scale and check a sparse matrix This function computes the row scaling factors of a matrix and checks to see if it is a valid sparse matrix It can perform two kinds
4. Availability http www cise ufl edu research sparse klu http www cise ufl edu research sparse btf Acknowledgments This work was supported by Sandia National Laboratories Mike Heroux and the National Science Foundation under grants 0203270 and 0620286 2 Overview KLU is a set of routines for solving sparse linear systems of equations It first permutes the matrix into upper block triangular form via the BTF package This is done by first finding a permutation for a zero free diagonal a maximum transversal 12 If there is no such permutation then the matrix is structurally rank deficient and is numerically singular Next Tarjan s method 13 23 is used to find the strongly connected components of the graph The block triangular form is essentially unique any method will lead to the same number and sizes of blocks although the ordering of the blocks may vary consider a diagonal matrix for example Assuming the matrix has full structural rank the permuted matrix has the following form Au e Aik Akk where each diagonal block is square with a zero free diagonal Next each diagonal block is factorized with a sparse left looking method 14 The kernel of this factorization method is an efficient method for solving Lx b when L x and b are all sparse This kernel is used to compute each column of L and U one column at a time The total work performed by this method is always proportional to the number of floating point
5. include klu h klu_l_symbolic Symbolic klu_l_common Common klu_l_free_symbolic amp Symbolic amp Common real or complex 5 13 klu free numeric destroy the Numeric object It is the user s responsibility to destroy the Numeric object when it is no longer needed or else a memory leak will occur It is safe to pass a NULL Numeric pointer include klu h klu_numeric Numeric klu_common Common klu_free_numeric amp Numeric amp Common real klu_z_free_numeric amp Numeric amp Common complex include klu h klu_l_numeric Numeric klu_l_common Common klu_l_free_numeric amp Numeric amp Common real klu_zl_free_numeric amp Numeric amp Common complex 5 14 Kklu sort sort the columns of L and U The klu_factor function creates a Numeric object with factors L and U stored in a compressed column form not the same data structure as A but similar The columns typically contain lists of row indices in unsorted order This function sorts these indices for two purposes 1 to return L and U to MATLAB which expects its sparse matrices to have sorted columns and 2 to slightly improve the performance of subsequent calls to klu_solve and klu_tsolve Except within a MATLAB mexFunction see KLU MATLAB klu mex c the use of this function is optional include klu h int ok klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_sort
6. return TRUE if successful FALSE otherwise 30 inputs not modified int scale lt 0 none no error check 0 none 1 sum 2 max int n int Ap size n 1 column pointers int Ai size nz row indices double Ax outputs not defined on input double Rs workspace not defined on input or output int WC size n can be NULL klu_common Common UF_long klu_l_scale UF_long UF_long UF_long UF_long double double UF_long klu_l_common UF_long klu_zl_scale UF_long UF_long UF_long UF_long double double UF_long klu_l_common porno nnn klu_extract pore on nee int klu_extract returns TRUE if successful FALSE otherwise inputs klu_numeric Numeric klu_symbolic Symbolic outputs either allocated on input or ignored otherwise L int Lp size n 1 int Li size Numeric gt lnz double Lx size Numeric gt lnz U int Up size n 1 int Ui size Numeric gt unz double Ux size Numeric gt unz F int Fp size n 1 int Fi size Numeric gt nzoff double Fx size Numeric gt nzoff P row permutation int P size n Q column permutation int Q size n Rs scale factors double Rs size n R block boundaries int R size Symbolic gt nblocks i nblocks is
7. 1 if not computed double flops actual factorization flop count from klu_flops double rcond crude reciprocal condition est from klu_rcond double condest accurate condition est from klu_condest double rgrowth reciprocal pivot rgrowth from klu_rgrowth double work actual work done in BTF in klu_analyze size_t memusage current memory usage in bytes size_t mempeak peak memory usage in bytes klu_common typedef struct klu_l_common_struct 64 bit version otherwise same as above double tol memgrow initmem_amd initmem maxwork UF_long btf ordering scale void malloc_memory size_t void realloc_memory void size_t void free_memory void void calloc_memory size_t size_t UF_long user_order UF_long UF_long UF_long UF_long struct klu_l_common_struct void user_data UF_long halt_if_singular UF_long status nrealloc structural_rank numerical_rank singular_col noffdiag double flops rcond condest rgrowth work size_t memusage mempeak 23 klu_l_common pron ene klu_defaults sets default control parameters int klu_defaults klu_common Common gt UF_long klu_l_defaults klu_l_common Common pron ene klu_analyze orders and analyzes a matrix pore o ene Order the matrix with BIF or not
8. SIAM J Matrix Anal Appl 21 1185 1201 2000 G Karypis and V Kumar A fast and high quality multilevel scheme for partitioning irregular graphs SIAM J Sci Comput 20 1998 40 18 19 20 21 22 23 K S Kundert Sparse matrix techniques and their applications to circuit simulation In A E Ruehli editor Circuit Analysis Simulation and Design New York North Holland 1986 K S Kundert and A Sangiovanni Vincentelli User s guide Sparsel 2 Technical report Dept of EE and CS UC Berkeley Oct 1985 L W Nagel and D O Pederson SPICE simulation program with integrated circuit emphasis Technical Report Memorandum No ERL M382 University of California Berkeley 1973 E Palamadai KLU a high performance sparse linear system solver for circuit simulation problems Technical report CISE Department Univ of Florida M S Thesis Thomas L Quarles Analysis of Performance and Convergence Issues for Circuit Simulation PhD thesis EECS Department University of California Berkeley 1989 R E Tarjan Depth first search and linear graph algorithms STAM J Comput 1 146 160 1972 41
9. Symbolic Numeric amp Common real ok klu_z_sort Symbolic Numeric amp Common complex include klu h UF_long ok klu_l_symbolic Symbolic klu_l_numeric Numeric klu_l_common Common ok klu_l_sort Symbolic Numeric amp Common real ok klu_zl_sort Symbolic Numeric amp Common complex 12 5 15 klu flops determine the flop count This function determines the number of floating point operations performed when the matrix was factorized by klu_factor or klu_refactor The result is returned in Common flops include klu h int ok klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_flops Symbolic Numeric amp Common real ok klu_z_flops Symbolic Numeric amp Common complex include klu h UF_long ok klu_l_symbolic Symbolic klu_l_numeric Numeric klu_l_common Common ok klu_l_flops Symbolic Numeric amp Common real ok klu_zl_flops Symbolic Numeric amp Common complex 5 16 klu rgrowth determine the pivot growth Computes the reciprocal pivot growth rgrowth min max cij max uz where cij is a scaled entry in a diagonal block of the block triangular form In MATLAB notation rgrowth min max abs R A p q F max abs U where the factorization is LxU F R A p q This function returns 0 if an error occurred 1 otherwise If rgrowth is very sma
10. UF_long ldim nrhs ok double B ldim nrhs Bz 2 ldim nrhs klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_l_solve Symbolic Numeric ldim nrhs B amp Common real ok klu_zl_solve Symbolic Numeric ldim nrhs Bz amp Common complex 10 5 10 klu tsolve solve a transposed linear system Solves the linear system ATx b or A x b The conj_solve input is 0 for AT x b or nonzero for AF x b Otherwise the function is identical to klu_solve include klu h int ldim nrhs ok double B ldim nrhs Bz 2 ldim nrhs klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_tsolve Symbolic Numeric ldim nrhs B amp Common real ok klu_z_tsolve Symbolic Numeric ldim nrhs Bz conj_solve amp Common complex include klu h UF_long ldim nrhs ok double B ldim nrhs Bz 2 ldim nrhs klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_l_tsolve Symbolic Numeric ldim nrhs B amp Common real ok klu_zl_tsolve Symbolic Numeric ldim nrhs Bz conj_solve amp Common complex 5 11 klu_refactor numerical refactorization The klu_refactor function takes as input the Numeric object created by klu_factor or as modi fied by a previous call to klu_refactor It factorizes a new matrix with the same nonzero pattern as that given to the call to klu_factor whi
11. X XX XX FH KF X M HF HF HF HF HF HF XX HF XX X x HF KH x q maxtrans A has entries in the range O n 39 KO OX OX OX X X X ZX XX X XX X KF X XX HH HF HF HFK XX HFK XX HF KF HF XX x q a column permutation only if sprank A n B A q permuted matrix only if sprank A n sum q gt 0 same as sprank A This behaviour differs from p dmperm A in MATLAB which returns the matching as p j i if row i and column j are matched and p j 0 if column j is unmatched p dmperm A has entries in the range O m p a row permutation only if sprank A m B A p permuted matrix only if sprank A m sum p gt 0 definition of sprank A This algorithm is based on the paper On Algorithms for obtaining a maximum transversal by Iain Duff ACM Trans Mathematical Software vol 7 no 1 pp 315 330 and Algorithm 575 Permutations for a zero free diagonal same issue pp 387 390 Algorithm 575 is MC21A in the Harwell Subroutine Library This code is not merely a translation of the Fortran code into C It is a completely new implementation of the basic underlying method depth first search over a subgraph with nodes corresponding to columns matched so far and cheap matching This code was written with minimal observation of the MC21A B code itself See comments below for a comparison between the maxtrans and MC21A B codes This routine operates on a column form matrix and produ
12. and the ANSI C99 Complex data type KLU does not rely on the ANSI C99 data type however for portability reasons The numerical values in column j of a real matrix are located in Ax Ap j Ap j 1 1 For a complex matrix they appear in Ax 2 Ap j 2 Ap j 1 1 as real imaginary pairs the real part appears first followed by the imaginary part 5 5 klu defaults set default parameters This function sets the default parameters for KLU and clears the statistics It may be used for either the real or complex cases A value of 0 is returned if an error occurs 1 otherwise This function must be called before any other KLU function can be called include klu h int ok klu_common Common ok klu_defaults amp Common real or complex include klu h UF_long ok klu_l_common Common ok klu_l_defaults amp Common real or complex 5 6 klu_analyze order and analyze a matrix The following usage returns a Symbolic object that contains the fill reducing ordering needed to factorize the matrix A A NULL pointer is returned if a failure occurs The error status for this function and all others is returned in Common status These functions may be used for both real and complex cases The AMD ordering is used if Common ordering 0 COLAMD is used if it is 1 the natural ordering is used if it is 2 and the user provided Common user_ordering is used if it is 3 include klu h int n Ap n 1 Ai nz
13. at most n 31 int klu_common klu_z_extract inputs Common returns TRUE if successful FALSE otherwise klu_numeric Numeric klu_symbolic Symbolic outputs L int Lp int Li double Lx double Lz U int Up int Ui double Ux double Uz F int Fp int Fi double Fx double Fz all of which must be allocated on input n i nnz L nnz L nnz L size for the complex case ignored if real size size size n i nnz U nnz U nnz U size for the complex case ignored if real size size size nt 1 nnz F nnz F nnz F size for the complex case ignored if real size size size P row permutation int P size n Q column permutation int Q size n Rs scale factors double Rs size n R block boundaries int R klu_common size Symbolic gt nblocks 1 nblocks is at most n Common UF_long klu_l_extract klu_l_numeric klu_l_symbolic UF_long UF_long UF_long UF_long UF_long UF_long UF_long UF_long UF_long klu_zl_ UF_long double UF_long double UF_long double UF_long double UF_long klu_l_common extract klu_l_numeric klu_l_symbolic UF_long double dou
14. by btf_order for this matrix is given below 2 0 0 1 3 20 310 PAQ 3 6 0 4 ee 1 This ordering is not modified by the AMD ordering because the 3 by 3 matrix A22 AJ happens to be a dense matrix No partial pivoting happens to occur during LU factorization all pivots 18 are selected along the diagonal of each block The matrix contains two singletons which are the original entries a34 2 and a43 1 and one 3 by 3 diagonal block in which a single fill in entry occurs during factorization the u23 entry of this 3 by 3 matrix For a more complete program that uses KLU see KLU Demo kludemo c for an int version and KLU Demo kluldemo c for a version that uses UF_long instead The top level main routine uses CHOLMOD to read in a compressed column sparse matrix from a Matrix Market file because KLU does not include such a function Otherwise no CHOLMOD functions are used Unlike klu simple c CHOLMOD is required to run the kludemo c and kluldemo c programs 6 Installation Installation of the C callable interface requires the make utility in Linux Unix Alternatively you can use the Cygwin make in Windows The MATLAB installation in any platform including Windows is simple just type klu_install to compile and install KLU BTF AMD and COLAMD For make system dependent configurations are in the UFconfig UFconfig mk file You can edit that file to customize the compilation The default settings will work on
15. diagonal blocks nblocks number of blocks maxblock size of largest block ordering ordering used AMD COLAMD or GIVEN do_btf whether or not BIF preordering was requested only computed if BTF preordering requested int structural_rank 0 to n 1 if the matrix is structurally rank deficient 1 if not computed n if the matrix has full structural rank klu_symbolic 20 typedef struct 64 bit version otherwise same as above double symmetry est_flops lnz unz double Lnz UF_long n nz P Q R nzoff nblocks maxblock ordering do_btf structural_rank klu_l_symbolic Es ene Numeric object contains the factors computed by klu_factor porno ne typedef struct LU factors of each block the pivot row permutation and the entries in the off diagonal blocks int n A is n by n int nblocks number of diagonal blocks int lnz actual nz in L including diagonal int unz actual nz in U including diagonal int max_lnz_block max actual nz in L in any one block incl diag int max_unz_block max actual nz in U in any one block incl diag int Pnum size n final pivot permutation int Pinv size n inverse of final pivot permutation LU factors of each block int Lip size n pointers into LUbx block for L int Uip size n pointers into LUbx block for U
16. growth is the smallest such value for all columns j Note that the off diagonal entries are not scaled since they do not take part in the LU factorization of the diagonal blocks In MATLAB notation KO OK OX OX X a aa X rgrowth min max abs R A p q F max abs U int klu_rgrowth int Ap int Ai double Ax klu_symbolic Symbolic klu_numeric Numeric klu_common Common Common gt rgrowth reciprocal pivot growth int klu_z_rgrowth int Ap int Ai double Ax klu_symbolic Symbolic klu_numeric Numeric klu_common Common Common gt rgrowth reciprocal pivot growth UF_long klu_l_rgrowth UF_long UF_long double klu_l_symbolic klu_l_numeric klu_l_common UF_long klu_zl_rgrowth UF_long UF_long double klu_l_symbolic klu_l_numeric klu_l_common porno n ne klu_condest pore nn ne Computes a reasonably accurate estimate of the 1 norm condition number using Hager s method as modified by Higham and Tisseur same method as used in MATLAB s condest int klu_condest int Ap size n 1 column pointers not modified double Ax size nz Aplnl numerical values not modified klu_symbolic Symbolic symbolic analysis not modified klu_numeric Numeric numeric factorization not modified klu_common Common result returned in Common gt condest
17. in the pivot column then the diagonal entry is chosen Default value 0 001 ordering which fill reducing ordering to use 0 for AMD 1 for COLAMD 2 for a user provided permutation P and Q or a natural ordering if P and Q are NULL or 3 for the user_order function Default 0 AMD scale whether or not the matrix should be scaled If scale lt 0 then no scaling is performed and the input matrix is not checked for errors If scale gt 0 the input matrix is check for errors If scale 0 then no scaling is performed If scale 1 then each row of A is divided by the sum of the absolute values in that row If scale 2 then each row of A is divided by the maximum absolute value in that row Default 2 btf if nonzero then BTF is used to permute the input matrix into block upper triangular form This step is skipped if Common btf is zero Default 1 maxwork sets an upper limit on the amount of work performed in btf_maxtrans to maxwork nnz A If the limit is reached a partial zero free diagonal might be found This has no effect on whether or not the matrix can be factorized since the matrix can be factorized with no BTF pre ordering at all This option provides a tradeoff between the effectiveness of the BTF ordering and the cost to compute it A partial result can result in fewer and larger blocks in the BTF form resulting to more work required to factorize the matrix No limit is enforced if maxwork lt 0 Default 0 user_o
18. int Llen size n Llen k of entries in kth column of L int Ulen size n Ulen k of entries in kth column of U void LUbx L and U indices and entries excl diagonal of U size_t LUsize size of each LUbx block in sizeof Unit void Udiag diagonal of U scale factors can be NULL if no scaling double Rs size n Rs i is scale factor for row i permanent workspace for factorization and solve size_t worksize size in bytes of Work void Work workspace void Xwork alias into Numeric gt Work int Iwork alias into Numeric gt Work off diagonal entries in a conventional compressed column sparse matrix int Offp size n 1 column pointers int Offi size nzoff row indices void Offx size nzoff numerical values int nzoff klu_numeric typedef struct 64 bit version otherwise same as above UF_long n nblocks lnz unz max_lnz_block max_unz_block Pnum Pinv Lip Uip Llen Ulen void LUbx 21 size_t LUsize void Udiag double Rs size_t worksize void Work Xwork UF_long Iwork UF_long Offp Offi void Offx UF_long nzoff klu_l_numeric pore one KLU control parameters and statistics Common gt status values define KLU_OK 0 define KLU_SINGULAR 1 status gt 0 is a warning
19. klu_zl_scale scale n Ap Ai Az Symbolic Numeric amp Common complex 5 20 klu extract extract the LU factorization This function extracts the LU factorization into a set of data structures suitable for passing back to MATLAB with matrices in conventional compressed column form The klu_sort function should be called first if the row indices should be returned sorted The factorization is returned in caller provided arrays if any of them are NULL that part of the factorization is not extracted this is not an error Returns 1 if successful 0 otherwise The sizes of Li Lx and Lz are Numeric gt lnz Ui Ux and Uz are of size Numeric gt unz and Fi Fx and Fz are of size Numeric gt nzoff Note that in the complex versions the real and imaginary parts are returned in separate arrays to be compatible with how MATLAB stores complex matrices This function is not required to solve a linear system with KLU KLU does not itself make use of the extracted LU factorization returned by this function It is only provided to simplify the MATLAB interface to KLU and it may be of use to the end user who wishes to examine the contents of the LU factors include klu h int ok Lp nti Li 1nz Up n 1 Ui unz Fp n 1 Fi nzoff P In Q n R n double Lx lnz Lz lnz Ux unz Uz unz Fx nzoff Fz nzoff Rs n klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_extract Numeri
20. not an error define KLU_OUT_OF_MEMORY 2 define KLU_INVALID 3 define KLU_TOO_LARGE 4 integer overflow has occured typedef struct klu_common_struct PR parameters double tol pivot tolerance for diagonal preference double memgrow realloc memory growth size for LU factors double initmem_amd init memory size with AMD c nnz L n double initmem init memory size c nnz A n double maxwork maxwork for BTF lt 0 if no limit int btf use BIF pre ordering or not int ordering O AMD 1 COLAMD 2 user P and Q 3 user function int scale row scaling 1 none and no error check 0 none 1 sum 2 max memory management routines void malloc_ memory size_t pointer to malloc void realloc_memory void size_t pointer to realloc void free_memory void pointer to free void calloc_memory size_t size_t pointer to calloc pointer to user ordering function int tuser_order int int int int struct klu_common_struct pointer to user data passed unchanged as the last parameter to the user ordering function optional the user function need not use this information void user_data int halt_if_singular how to handle a singular matrix FALSE keep going Return a Numeric object with a zero U k k A
21. operations something that is not true of any other sparse LU factorization method Prior to factorizing each diagonal block the blocks are ordered to reduce fill in By default the symmetric approximate minimum degree AMD ordering is used on 4 AT 1 2 Another ordering option is to find a column ordering via COLAMD 7 8 Alternatively a permutation can be provided by the user or a pointer to a user provided ordering function can be passed which is then used to order each block Only the diagonal blocks need to be factorized Consider a linear system where the matrix is permuted into three blocks for example Ay Ai A3 Ly by A22 A3 z2 b2 A33 T3 b3 The non singular system A3373 b3 can first be solved for x3 After a block back substitution the resulting system becomes Ai Aig z _ bi Ags _ A22 b2 A2373 b and the A22 2 bh system can be solved for x2 The primary advantage of this method is that no fill in occurs in the off diagonal blocks A12 A13 and 423 This is particular critical for sparse linear systems arising in SPICE like circuit simulation 18 19 20 22 Circuit matrices are typically permutable into block triangular form with many singletons 1 by 1 blocks They also often have a handful of rows and columns with many nonzero entries due to voltage and current sources These rows and columns are pushed into the upper block triangular form and related to the singleton blocks f
22. the largeArrayDims option from the MEX command prior to make mex or make in the MATLAB directory or just use klu_install m inside MATLAB which handles this case 19 7 The KLU routines The file KLU Include klu h listed below describes each user callable routine in the C version of KLU and gives details on their use klu include file ees Include file for user programs that call klu_ routines ifndef _KLU_H define _KLU_H make it easy for C programs to include KLU ifdef __cplusplus extern C endif include amd h include colamd h include btf h JR oro one Symbolic object contains the pre ordering computed by klu_analyze JR por rn typedef struct A P Q is in upper block triangular form The kth block goes from row col index R k to R k 1 1 The estimated number of nonzeros in the L factor of the kth block is Lnz k only computed if the AMD ordering is chosen double symmetry symmetry of largest block double est_flops est factorization flop count double lnz unz estimated nz in L and U including diagonals double Lnz size n but only Lnz 0 nblocks 1 is used computed for all orderings int n input matrix A is n by n nz entries in input matrix P size n Q size n R size n 1 but only R 0 nblocks is used nzoff nz in off
23. then order each block with AMD COLAMD a natural ordering or with a user provided ordering function klu_symbolic klu_analyze inputs not modified int n is n by n int Ap size n 1 column pointers int Ai size nz row indices klu_common Common 3 klu_l_symbolic klu_l_analyze UF_long UF_long UF_long klu_l_common Common JR porno o ne klu_analyze_given analyzes a matrix using given P and Q ER r roo ne Order the matrix with BTF or not then use natural or given ordering P and Q on the blocks P and Q are interpretted as identity if NULL klu_symbolic klu_analyze_given inputs not modified int n A is n by n int Ap size n 1 column pointers int Ai size nz row indices int P size n user s row permutation may be NULL int QC size n user s column permutation may be NULL klu_common Common klu_l_symbolic klu_l_analyze_given UF_long UF_long UF_long UF_long UF_long klu_l_common PR r ero ne klu_factor factors a matrix using the klu_analyze results 24 klu_numeric klu_factor returns KLU_OK if OK lt O if error inputs not modified int Ap size n 1 column pointers int Ai size nz row indices double Ax size nz numerical values klu_symbolic Symbolic klu_common Common klu_numeric k
24. Davis J R Gilbert S I Larimore and E G Ng Algorithm 836 COLAMD a column approximate minimum degree ordering algorithm ACM Trans Math Software 30 377 380 2004 T A Davis J R Gilbert S I Larimore and E G Ng A column approximate minimum degree ordering algorithm ACM Trans Math Software 30 353 376 2004 Timothy A Davis and Ekanathan Palamadai Natarajan Algorithm 907 KLU a direct sparse solver for circuit simulation problems ACM Trans Math Softw 37 36 1 36 17 September 2010 J W Demmel S C Eisenstat J R Gilbert X S Li and J W H Liu A supernodal approach to sparse partial pivoting SIAM J Matrix Anal Appl 20 720 755 1999 J J Dongarra J Du Croz I S Duff and S Hammarling A set of level 3 basic linear algebra subprograms ACM Trans Math Software 16 1 17 1990 I S Duff On algorithms for obtaining a maximum transversal ACM Trans Math Software 7 315 330 1981 I S Duff and J K Reid An implementation of Tarjan s algorithm for the block triangular ization of a matrix ACM Trans Math Software 4 137 147 1978 J R Gilbert and T Peierls Sparse partial pivoting in time proportional to arithmetic oper ations SIAM J Sci Statist Comput 9 862 874 1988 W W Hager Condition estimates SIAM J Sci Statist Comput 5 311 316 1984 N J Higham and F Tisseur A block algorithm for matrix 1 norm estimation with an appli cation to 1 norm pseudospectra
25. User Guide for KLU and BTF Timothy A Davis Eka Palamadai Natarajan VERSION 1 1 2 Jan 25 2011 Abstract KLU is a set of routines for solving sparse linear systems of equations It is particularly well suited to matrices arising in SPICE like circuit simulation applications It relies on a permutation to block triangular form BTF several methods for finding a fill reducing ordering variants of approximate minimum degree and nested dissection and a sparse left looking LU factorization method to factorize each block A MATLAB interface is included KLU appears as Collected Algorithm 907 of the ACM 9 Dept of Computer and Information Science and Engineering Univ of Florida Gainesville FL USA email davis cise ufl edu http www cise ufl edu davis This work was supported by Sandia National Labs and the National Science Foundation Portions of the work were done while on sabbatical at Stanford University and Lawrence Berkeley National Laboratory with funding from Stanford University and the SciDAC program Contents 1 6 7 8 License and Copyright Overview Availability Using KLU and BTF in MATLAB Using KLU and BTF in a C program 5 1 KLU Common object 5 2 KLU Symbolic object 5 3 KLU Numeric object 5 4 A sparse matrix in KLU 5 5 klu defaults set default parameters 5 6 klu_analyze order and analyze a MATIX 85 La ee AOR Be de hate Ee ee ey
26. ble UF_long double double UF_long double double UF_long double UF_long klu_1l_common 32 PR por ro ee KLU memory management routines pore ne void klu_malloc returns pointer to the newly malloc d block input size_t n number of items size_t size size of each item klu_common Common Ls void klu_free always returns NULL in out void p block of memory to free size_t n number of items size_t size size of each item _ klu_common Common P3 void klu_realloc returns pointer to reallocated block input size_t nnew requested of items in reallocated block size_t nold current size of block in of items size_t size size of each item in out void p block of memory to realloc _ klu_common Common Je void klu_l_malloc size_t size_t klu_l_common void klu_l_free void size_t size_t klu_l_common void klu_l_realloc size_t size_t size_t void klu_l_common All versions of KLU include these definitions As an example to test if the version you are using is 1 2 or later if KLU_VERSION gt KLU_VERSION_CODE 1 2 This also works during compile time if KLU gt KLU_VERSION_CODE 1 2 printf This is version 1 2 or later n
27. c Symbolic Lp Li Lx Up Ui Ux Fp Fi Fx P Q Rs R amp Common real ok klu_z_extract Numeric Symbolic Lp Li Lx Lz Up Ui Ux Uz Fp Fi Fx Fz P Q Rs R amp Common complex include klu h UF_long ok Lp nti Li 1nz Up n 1 Ui unz Fp nti Fi nzoff P n Q n R n double Lx lnz Lz lnz Ux unz Uz unz Fx nzoff Fz nzoff Rs n 15 klu_l_symbolic Symbolic klu_l_numeric Numeric klu_l_common Common ok klu_l_extract Numeric Symbolic Lp Li Lx Up Ui Ux Fp Fi Fx P Q Rs R amp Common real ok klu_zl_extract Numeric Symbolic Lp Li Lx Lz Up Ui Ux Uz Fp Fi Fx Fz P Q Rs R amp Common complex 5 21 klu malloc klu free klu_realloc memory management KLU uses a set of wrapper routines for malloc free and realloc By default these wrapper routines call the ANSI C versions of these functions However pointers to functions in Common can be modified after calling klu_defaults to allow the use of other memory management functions such as the MATLAB mxMalloc mxFree and mxRealloc These wrapper functions keep track of the current and peak memory usage of KLU They can be called by the user klu_malloc is essentially the same as p malloc n sizeof size klu_free is essen tially the same as free p except that klu free returns NULL which can then be assigned to p klu_realloc is similar to realloc exce
28. cate entries may be present in any given column The input matrix is not checked for validity row indices out of the range 0 to m 1 will lead to an undeterminate result possibly a core dump for example Row indices in any given column need not be in sorted order However if they are sorted and the matrix already has a zero free diagonal then the identity permutation is returned The output of btf_maxtrans is an array Match of size n If row i is matched with column j then A i j is nonzero and then Match i j If the matrix is structurally nonsingular all entries in the Match array are unique and Match can be viewed as a column permutation if A is square That is column k of the original matrix becomes column Match k of the permuted matrix In MATLAB this can be expressed as for non structurally singular matrices Match maxtrans A B A Match except of course here the A matrix and Match vector are all 0 based rows and columns in the range 0 to n 1 not 1 based rows cols in range 1 to n The MATLAB dmperm routine returns a row permutation See the maxtrans mexFunction for more details If row i is not matched to any column then Match i is 1 The btf_maxtrans routine returns the number of nonzeros on diagonal of the permuted matrix In the MATLAB mexFunction interface to btf_maxtrans 1 is added to the Match array to obtain a 1 based permutation Thus in MATLAB where A is m by n KO XX XX X
29. ces a column permutation MC21A uses a row form matrix and produces a row permutation The difference is merely one of convention in the comments and interpretation of the inputs and outputs If you want a row permutation simply pass a compressed row sparse matrix to this routine and you will get a row permutation just like MC21A Similarly you can pass a column oriented matrix to MC21A and it will happily return a column permutation ifndef _BTF_H define _BTF_H make it easy for C programs to include BTF ifdef cplusplus extern C endif include UFconfig h int btf_maxtrans returns of columns matched input not modified int nrow A is nrow by ncol in compressed column form int ncol int Ap size ncol 1 int Ai size nz Ap ncol double maxwork maximum amount of work to do is maxwork nnz A no limit if lt 0 output not defined on input double work work 1 if maxwork gt O and the total work performed reached the maximum of maxwork nnz A Otherwise work the total work performed int Match size nrow Match i j if column j matched to row i 36 see above for the singular matrix case workspace not defined on input or output int Work size 5xncol long integer version all int parameters become UF_long UF_long btf_l_maxtrans UF_long UF_long UF_long UF_lo
30. ch created it The same pivot order is used Since this can lead to numeric instability the use of klu_rcond klu_rgrowth or klu_condest is rec ommended to check the accuracy of the resulting factorization The inputs Ap and Ai must be unmodified since the call to klu_factor that first created the Numeric object This is function is much faster than klu_factor and requires no dynamic memory allocation include klu h int ok Ap n 1 Ai nz double Ax nz Az 2 nz klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_refactor Ap Ai Ax Symbolic Numeric amp Common real ok klu_z_refactor Ap Ai Az Symbolic Numeric amp Common complex include klu h UF_long ok Ap n i Ai nz double Ax nz Az 2 nz klu_l_symbolic Symbolic klu_l_numeric Numeric klu_l_common Common ok klu_l_refactor Ap Ai Ax Symbolic Numeric amp Common real ok k lu_zl_refactor Ap Ai Az Symbolic Numeric amp Common complex 5 12 klu free symbolic destroy the Symbolic object It is the user s responsibility to destroy the Symbolic object when it is no longer needed or else a memory leak will occur It is safe to pass a NULL Symbolic pointer These functions may be used 11 for both real and complex cases include klu h klu_symbolic Symbolic klu_common Common klu_free_symbolic amp Symbolic amp Common real or complex
31. d KLU is available under alternate licenses contact T Davis for details KLU License Your use or distribution of KLU or any modified version of KLU implies that you agree to this License This library is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 of the License or at your option any later version This library is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You should have received a copy of the GNU Lesser General Public License along with this library if not write to the Free Software Foundation Inc 51 Franklin St Fifth Floor Boston MA 02110 1301 USA Permission is hereby granted to use or copy this program under the terms of the GNU LGPL provided that the Copyright this License and the Availability of the original version is retained on all copies User documentation of any code that uses this code or any modified version of this code must cite the Copyright this License the Availability note and Used by permission Permission to modify the code and to distribute modified code is granted provided the Copyright this License and the Availability note are retained and a notice that the code was modified is included
32. divide by zero may occur when computing L k The Numeric object 22 can be passed to klu_solve a divide by zero will occur It can also be safely passed to klu_refactor TRUE stop quickly klu_factor will free the partially constructed Numeric object klu_refactor will not free it but will leave the numerical values only partially defined This is the default porno nee statistics _ int status KLU_OK if OK lt O if error int nrealloc of reallocations of L and U int structural_rank O to n 1 if the matrix is structurally rank deficient as determined by maxtrans 1 if not computed n if the matrix has full structural rank This is computed by klu_analyze if a BTF preordering is requested int numerical_rank First k for which a zero U k k was found if the matrix was singular in the range 0 to n 1 n if the matrix has full rank This is not a true rank estimation It just reports where the first zero pivot was found 1 if not computed Computed by klu_factor and klu_refactor int singular_col n if the matrix is not singular If in the range 0 to n 1 this is the column index of the original matrix A that corresponds to the column of U that contains a zero diagonal entry 1 if not computed Computed by klu_factor and klu_refactor int noffdiag of off diagonal pivots
33. e suite of sparse matrix libraries See http www cise ufl edu research sparse for each of these packages They are also all contained within the single SuiteSparse zip or SuiteSparse tar gz distribution 4 Using KLU and BTF in MATLAB KLU has a single MATLAB interface which provides several options for factorizing a matrix and or using the factors to solve a linear system The following is a synopsis of its use For more details type help klu in MATLAB LU klu A factorizes R A p q into L U F returning a struct x klu A b x A b x klu b A x b A x klu LU b x A b where LU klu A x klu b LU x b A where LU klu A With a single input klu A returns a MATLAB struct containing the LU factors The factor ization is in the form LxU F R A p q where LxU is the LU factorization of just the diagonal blocks of the block triangular form F is a sparse matrix containing the entries in the off diagonal blocks R is a diagonal matrix containing the row scale factors and p and q are permutation vectors The LU struct also contains a vector r which describes the block boundaries the same as the third output parameter of dmperm The kth block consists of rows and columns r k to r k 1 1 in the permuted matrix A p q and the factors L and VU An optional final input argument klu A opts for example provides a way of modifying KLU s user definable parameters including a partial piv
34. ee 5 7 klu_analyze_given order and analyze a matrix 5 8 klu factor numerical factorization 5 9 klu_solve solve a linear system 5 10 klu_tsolve solve a transposed linear system 5 11 klu_refactor numerical refactorization 5 12 klu free symbolic destroy the Symbolic object 5 13 klu_free_numeric destroy the Numeric object 5 14 klu sort sort the columns of LandU 5 15 klu_flops determine the flop count 5 16 klu_rgrowth determine the pivot POWE oe bass ia des Mantes Se 5 17 klu_condest accurate condition number estimation 5 18 klu_rcond cheap reciprocal condition number estimation 5 19 klu scale scale and check a sparse matrix 5 20 klu_extract extract the LU factorization 5 21 klu malloc klu free klu realloc memory management 5 22 btf maxtrans maximum transversal 5 23 btf_strongcomp strongly connected components 5 24 btf_order permutation to block triangular form 5 25 Sample C programs that use KLU Installation The KLU routines The BTF routines 19 20 35 1 License and Copyright KLU Copyright 2004 2011 University of Florida All Rights Reserve
35. er function The user function may use and optionally modify the contents of the klu_common Common ob ject In particular prior to calling KLU the user application can set both Common user_order and Common user_data The latter is a void pointer that KLU does not use except to pass to the user ordering function pointed to by Common user_ order This is a mechanism for passing additional arguments to the user function An example user function is provided in the KLU User directory which provides an interface to the ordering method in CHOLMOD 5 2 KLU Symbolic object KLU performs its sparse LU factorization in two steps The first is purely symbolic and does not depend on the numerical values This analysis returns a klu symbolic object klu 1 symbolic in the UF_long version The Symbolic object contains a pre ordering which combines the block triangular form with the fill reducing ordering and an estimate of the number of nonzeros in the factors of each block Its size is thus modest only proportional to n the dimension of A It can be reused multiple times for the factorization of a sequence of matrices with identical nonzero pattern Note a nonzero in this sense is an entry present in the data structure of A such entries may in fact be numerically zero 5 3 KLU Numeric object The Numeric object contains the numeric sparse LU factorization including the final pivot permu tations To solve a linear system both the Symbolic a
36. for the MATLAB interface int klu_sort inputs not modified klu_symbolic Symbolic input output klu_numeric Numeric klu_common Common int klu_z_sort inputs not modified klu_symbolic Symbolic input output klu_numeric Numeric klu_common Common Jag UF_long klu_l_sort klu_l_symbolic klu_l_numeric klu_l_common UF_long klu_zl_sort klu_l_symbolic klu_l_numeric klu_l_common por r on ne klu_flops determines of flops performed in numeric factorzation PR ene int klu_flops inputs not modified klu_symbolic Symbolic klu_numeric Numeric input output klu_common Common int klu_z_flops inputs not modified klu_symbolic Symbolic klu_numeric Numeric input output klu_common Common UF_long klu_l_flops klu_l_symbolic klu_l_numeric klu_l_common UF_long klu_zl_flops klu_l_symbolic klu_l_numeric klu_l_common PR poner ne klu_rgrowth compute the reciprocal pivot growth por rn nne N Pivot growth is computed after the input matrix is permuted scaled and off diagonal entries pruned This is because the LU factorization of each block takes as input the scaled diagonal blocks of the BTF form The reciprocal pivot growth in column j of an LU factorization of a matrix C is the largest entry in C divided by the largest entry in U then the overall reciprocal pivot
37. int klu_z_condest 29 int Ap double Ax size 2 nz klu_symbolic Symbolic klu_numeric Numeric klu_common Common result returned in Common gt condest UF_long klu_l_condest UF_long double klu_l_symbolic klu_l_numeric klu_l_common UF_long klu_zl_condest UF_long double klu_l_symbolic klu_l_numeric klu_l_common pore oe nnn klu_rcond compute min abs diag U max abs diag U Es ne int klu_rcond klu_symbolic Symbolic input not modified klu_numeric Numeric input not modified klu_common Common result in Common gt rcond int klu_z_rcond klu_symbolic Symbolic input not modified klu_numeric Numeric input not modified klu_common Common result in Common gt rcond JA UF_long klu_l_rcond klu_l_symbolic klu_l_numeric klu_l_common UF_long klu_zl_rcond klu_l_symbolic klu_l_numeric klu_l_common PR porno nnn klu_scale Es int klu_scale return TRUE if successful FALSE otherwise inputs not modified int scale lt 0 none no error check 0 none 1 sum 2 max int n int Ap size n 1 column pointers int Ai size nz row indices double Ax outputs not defined on input double Rs workspace not defined on input or output int WC size n can be NULL klu_common Common int klu_z_scale
38. ion and approximate minimum degree refer to 6 Concise versions of these methods can be found in the CSparse package KLU is also the topic of a Master s thesis by Palamadai Natarajan 21 a copy of the thesis can be found in the KLU Doc directory It includes a description of an earlier version of KLU some of the function names and parameter lists have changed in this version The descriptions of the methods used still applies to the current version of KLU however KLU appears as Algorithm 907 KLU a direct sparse solver for circuit simulation problems ACM Transactions on Mathematical Software vol 37 no 3 2010 3 Availability KLU and its required ordering packages BTF COLAMD AMD and UF config are available at http www cise ufl edu research sparse In addition KLU can make use of any user provided ordering function One such function is included which provides KLU with an interface to the ordering methods used in CHOLMOD 3 such as METIS a nested dissection method 17 After permutation to block triangular form circuit matrices have very good node separators and are thus excellent candidates for nested dissection The METIS ordering takes much more time to compute than the AMD ordering but if the ordering is reused many times typical in circuit simulation the better quality ordering can pay off in lower total simulation time To use KLU you must obtain the KLU BTF UF config AMD and COLAMD packages in the SuiteSpars
39. ll an inaccurate factorization may have been performed The inputs Ap Ai and Ax Az in the complex case must be unchanged since the last call to klu_factor or klu_refactor The result is returned in Common rgrowth include klu h int ok Ap n 1 Ai nz double Ax nz Az 2 nz klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_rgrowth Ap Ai Ax Symbolic Numeric amp Common real ok klu_z_rgrowth Ap Ai Az Symbolic Numeric amp Common complex include klu h UF_long ok Ap nti Ai nz double Ax nz Az 2 nz klu_l_symbolic Symbolic klu_l_numeric Numeric klu_l_common Common ok klu_l_rgrowth Ap Ai Ax Symbolic Numeric amp Common real ok klu_zl_rgrowth Ap Ai Az Symbolic Numeric amp Common complex 5 17 klu condest accurate condition number estimation This function is essentially the same as the MATLAB condest function It computes an estimate of the 1 norm condition number using Hager s method 15 and the generalization by Higham and 13 Tisseur 16 The inputs Ap and Ax Az in the complex case must be unchanged since the last call to klu_ factor or klu refactor The result is returned in Common condest include klu h int ok Ap n 1 double Ax nz Az 2 nz klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_condest Ap Ax Symbolic Numeric amp Common
40. lu_numeric Numeric klu_common Common int klu_z_refactor return TRUE if successful FALSE otherwise inputs not modified int Ap size n 1 column pointers int Ai J size nz row indices double Ax size 2 nz numerical values klu_symbolic Symbolic input and numerical values modified on output klu_numeric Numeric klu_common Common UF_long klu_l_refactor UF_long UF_long double klu_l_symbolic klu_l_numeric klu_l_common UF_long klu_zl_refactor UF_long UF_long double klu_l_symbolic klu_l_numeric klu_l_common porno o ene klu_free_symbolic destroys the Symbolic object pron ne int klu_free_symbolic C klu_symbolic Symbolic klu_common Common UF_long klu_l_free_symbolic klu_l_symbolic klu_l_common Es ne klu_free_numeric destroys the Numeric object porno enn Note that klu_free_numeric and klu_z_free_numeric are identical each can free both kinds of Numeric objects real and complex int klu_free_numeric klu_numeric Numeric klu_common Common int klu_z_free_numeric klu_numeric Numeric klu_common Common 2S UF_long klu_l_free_numeric klu_l_numeric klu_l_common UF_long klu_zl_free_numeric klu_l_numeric klu_l_common 27 Es klu_sort sorts the columns of the LU factorization PR r nne this is not needed except
41. lu_z_factor returns KLU_OK if OK lt O if error inputs not modified int Ap size n 1 column pointers int Ai J size nz row indices double Ax size 2 nz numerical values real imag pairs klu_symbolic Symbolic klu_common Common long real version klu_l_numeric klu_l_factor UF_long UF_long double klu_l_symbolic klu_l_common long complex version klu_l_numeric klu_zl_factor UF_long UF_long double klu_l_symbolic klu_l_common porno nnn klu_solve solves Ax b using the Symbolic and Numeric objects por rn ene int klu_solve inputs not modified klu_symbolic Symbolic klu_numeric Numeric int ldim leading dimension of B int nrhs number of right hand sides right hand side on input overwritten with solution to Ax b on output double B size ldim nrhs klu_common Common int klu_z_solve inputs not modified klu_symbolic Symbolic klu_numeric Numeric int ldim leading dimension of B int nrhs number of right hand sides right hand side on input overwritten with solution to Ax b on output double B size 2 ldim nrhs klu_common Common 25 UF_long klu_l_solve klu_l_symbolic klu_l_numeric UF_long UF_long double klu_l_common UF_long klu_zl_solve klu_l_symbolic klu_l_numeric UF_long UF_long double kl
42. matrix int R size n 1 block b is in rows cols R b R b 1 1 workspace not defined on input or output int Work size 4n UF_long btf_l_strongcomp UF_long UF_long UF_long UF_long UF_long UF_long UF_long ees BTF_ORDER N BIF_ORDER permutes a square matrix into upper block triangular form It does this by first finding a maximum matching or perhaps a limited matching if the work is limited via the btf_maxtrans function Ifa complete matching is not found BTF_ORDER completes the permutation but flags the columns of P A Q to denote which columns are not matched If the matrix is structurally rank deficient some of the entries on the diagonal of the permuted matrix will be zero BTF_ORDER then calls btf_strongcomp to find the strongly connected components On output P and Q are the row and column permutations where i P k if row i of A is the kth row of P A Q and j BTF_UNFLIP Q k if column j of A is the kth column of P A Q If Q k lt 0 then the k k th entry in P A Q is structurally zero The vector R gives the block boundaries where block b is in rows columns R b to R b 1 1 of the permuted matrix and where b ranges from 1 to the number of strongly connected components found KO OX OX OX X OX X OX X XX HF HF KF int btf_order returns number of blocks found input not modified int n A is n by n in compre
43. most systems Sample configuration files are provided for Linux Sun Solaris SGI IRIX IBM AIX and the DEC Compaq Alpha To compile and install the C callable KLU BTF AMD and COLAMD libraries go to the KLU directory and type make The KLU and BTF libraries are placed in KLU Lib libklu a and BTF Lib libbtf a Two KLU demo programs will be compiled and tested in the KLU Demo direc tory You can compare the output of make with the results in the KLU distribution kludemo out Typing make clean will remove all but the final compiled libraries and demo programs Typing make purge or make distclean removes all files not in the original distribution If you compile KLU or BTF and then later change the UFconfig UFconfig mk file then you should type make purge and then make to recompile When you compile your program that uses the C callable KLU library you need to add the KLU Lib libklu a BTF Lib libbtf a AMD Lib libamd a and COLAMD Lib libamd a libraries a nd you need to tell your compiler to look in the KLU Include and BTF Include directory for include files If all you want to use is the KLU mexFunction in MATLAB you can skip the use of the make command entirely Simply type klu install in the MATLAB command window while in the KLU MATLAB directory This works on any system with MATLAB including Windows Alternately type make in the KLU MATLAB directory If you have MATLAB 7 2 or earlier you must first edit UFconfig UFconfig h to remove
44. mponents found the diagonal blocks in the block triangular form int n Ap nt1 Ai nz Q n P n R m 1 Work 4 n ncomp ncomp btf_strongcomp n Ap Ai Q P R Work UF_long n Ap nti Ai nz Q n P n R n 1 Work 4 n ncomp ncomp btf_l_strongcomp n Ap Ai Q P R Work 5 24 btf order permutation to block triangular form The btf_order function combines the above two functions first finding a maximum transversal and then permuting the resulting matrix into upper block triangular form Unlike dmperm in MATLAB it always reveals the maximum matching along the diagonal even if the matrix is structurally singular On output P and Q are the row and column permutations where i P k if row i of A is the kth row of PxAxQ and j BTF_UNFLIP Q k if column j of A is the kth column of P A Q If Q k lt 0 then the k k th entry in P A Q is structurally zero The vector R and the return value are the same as btf_strongcomp int n Ap nti Ai mz P n Q n R m 1 nfound Work 5 n ncomp nfound double maxwork work ncomp btf_order n Ap Ai maxwork amp work P Q R amp nfound Work UF_long n Ap nti Ai nz P n Q n R n 1 nfound Work 5 n ncomp nfound double maxwork work ncomp btf_l_order n Ap Ai maxwork amp work P Q R amp nfound Work 17 5 25 Sample C programs that use KLU Here is a simple main program klu_simple c that illust
45. n the SuiteSparse directory alongside the KLU and BTF direc tories After running the installation scripts type pathtool and save your path for future MATLAB sessions If you cannot save your path because of file permissions edit your startup m by adding addpath commands type doc startup and doc addpath for more information 5 Using KLU and BTF in a C program KLU and BTF include the following C callable functions Each function is available in two or four versions with int or long integers and for functions that deal with numerical values with double or complex double values The long integer is actually a UF_long which is typically a long defined with a define statement It becomes an __int64 on Microsoft Windows 64 however The usage of real and complex and int and UF_long must not be mixed except that some functions can be used for both real and complex cases 5 1 KLU Common object The klu_common object klu_1_common for the UF_long version contains user definable parameters and statistics returned from KLU functions This object appears in every KLU function as the last parameter Details are given in the klu h include file which also appears below in Section 7 Primary parameters and statistics are summarized below The defaults are chosen specifically for circuit simulation matrices tol partial pivoting tolerance If the diagonal entry has a magnitude greater than or equal to tol times the largest magnitude of entries
46. nd Numeric objects are required 5 4 A sparse matrix in KLU The input matrix provided to KLU is in sparse compressed column form which is the same data structure used internally in MATLAB except that the version used here allows for the row indices to appear in any ordering and this version also allows explicit zero entries to appear in the data structure The same data structure is used in CSparse which is fully documented in 6 If you wish to use a simpler input data structure consider creating a triplet matrix in CSparse or CXSparse if you use the long and or complex versions of KLU and then convert this data structure into a sparse compressed column data structure using the CSparse cs_compress and cs dupl functions Similar functions are available in CHOLMOD cholmod_triplet_to_sparse The matrix is described with four parameters e n an integer scalar The matrix is n by n Note that KLU only operates on square matrices e Ap an integer array of size n 1 The first entry is Ap 0 0 and the last entry nz Ap n is equal to the number of entries in the matrix e Ai an integer array of size nz Ap n The row indices of entries in column j of A are located in Ai Ap j Ap j 1 1 The matrix is zero based row and column indices are in the range 0 to n 1 e Ax a double array of size nz for the real case or 2 nz for the complex case For the complex case the real and imaginary parts are interleaved compatible with Fortran
47. ng double double UF_long UF_long BTF_STRONGCOMP ees BTF_STRONGCOMP finds the strongly connected components of a graph returning a symmetric permutation The matrix A must be square and is provided on input in compressed column form see BTF_MAXTRANS above The diagonal of the input matrix A or A Q if Q is provided on input is ignored If Q is not NULL on input then the strongly connected components of A Q are found Q may be flagged on input where Q k lt 0 denotes a flagged column k The permutation is j BTF_UNFLIP Q k On output Q is modified the flags are preserved so that P A Q is in block upper triangular form If Q is NULL then the permutation P is returned so that P A P is in upper block triangular form The vector R gives the block boundaries where block b is in rows columns R b to R b 1 1 of the permuted matrix and where b ranges from 1 to the number of strongly connected components found KO OX OX OX OX X OX X OX X X KF HF KF int btf_strongcomp return of strongly connected components C input not modified int n A is n by n in compressed column form int Ap size n 1 int Ai size nz Ap n optional input modified if present on output int Q size n input column permutation output not defined on input int P size n P k j if row and column j are kth row col in permuted
48. of scaling computing either the largest magnitude in each row or the sum of the magnitudes of the entries each row KLU calls this function itself depending upon the Common scale parameter where scale lt 0 means no scaling scale 1 means the sum and scale 2 means the maximum That is in MATLAB notation Rs sum abs A or Rs max abs A KLU then divides each row of A by its corresponding scale factor The function returns 0 if the matrix is invalid or 1 otherwise A valid sparse matrix must meet the following conditions 14 1 n gt 0 Note that KLU does not handle empty 0 by 0 matrices 2 Ap Ai and Ax Az for the complex case must not be NULL 3 Ap 0 0 and Ap j lt Ap j 1 for all j in the range 0 to n 1 4 The row indices in each column Ai Ap j Ap j 1 1 must be in the range 0 to n 1 and no duplicates can appear If the workspace W is NULL on input the check for duplicate entries is skipped include klu h int scale ok n Ap nti Ai nz W n double Ax nz Az 2 nz Rs n klu_common Common ok klu_scale scale n Ap Ai Ax Symbolic Numeric amp Common real ok klu_z_scale scale n Ap Ai Az Symbolic Numeric amp Common complex include klu h UF_long scale ok n Ap nti Ai nz W n double Ax nz Az 2 nz Rs n klu_l_common Common ok klu_l_scale scale n Ap Ai Ax Symbolic Numeric amp Common real ok
49. or example A33 in the above system is 1 by 1 and the column in A13 and Ag3 has many nonzero entries Since these nearly dense rows and columns do not appear in the LU factorization of the diagonal blocks they cause no fill in The structural rank of a matrix is based solely on the pattern of its entries not their numerical values With random entries the two ranks are equal with probability one The structural rank of any matrix is an upper bound on the numerical rank The maximum transversal algorithm in the BTF package is useful in determining if a matrix is structurally rank deficient and if so it reveals a non unique set of rows and columns that contribute to that rank deficiency This is useful in determining what parts of a circuit are poorly formulated such as dangling components When ordered and factorized with KLU very little fill in occurs in the resulting LU factors which means that there is little scope for use of the BLAS 11 Sparse LU factorization methods that use the BLAS such as SuperLU 10 amd UMFPACK 4 5 are slower than KLU when applied to sparse matrices arising in circuit simulation Both KLU and SuperLU are based on Gilbert and Peierl s left looking method 14 SuperLU uses supernodes but KLU does not thus the name KLU refers to a Clark Kent LU factorization algorithm what SuperLU was before it became Super For details of the permutation to block triangular form left looking sparse LU factorizat
50. oting tolerance and ordering options A second output argument LU info klu provides statistics on the factorization The BTF package includes three user callable MATLAB functions which replicate most of features of the MATLAB built in dmperm function and provide an additional option which can significantly limit the worst case time taken by dmperm For more details type help btf help maxtrans and help strongcomp in MATLAB Additional information about how these functions work can be found in 6 p q r btf A similar to p q r dmperm A q maxtrans A similar to q dmperm A p r strongcomp A similar to p q r dmperm A speye n Both btf and maxtrans include a second option input maxwork which limits the total work performed in the maximum transversal to maxwork nnz A The worst case time taken by the algorithm is O n nnz A where the matrix A is n by n but this worst case time is rarely reached in practice To use the KLU and BTF functions in MATLAB you must first compile and install them In the BTF MATLAB directory type btf_install and then type klu_install in the KLU MATLAB directory Alternatively if you have the entire SuiteSparse simply run the SuiteSparse_install function in the SuiteSparse directory To use METIS 4 0 1 with KLU and CHOLMOD another part of SuiteSparse you must first download it from http glaros dtc umn edu gkhome views metis and place the metis 4 0 directory i
51. pt that if the reallocation fails p is returned unchanged Failure conditions are returned in Common status include klu h size_t n nnew nold size void p klu_common Common p klu_malloc n size amp Common p klu_free p n size amp Common p klu_realloc nnew nold size p amp Common include klu h size_t n nnew nold size void p klu_l_common Common p klu_l_malloc n size amp Common P P klu_l_free p n size amp Common klu_l_realloc nnew nold size p amp Common 5 22 btf maxtrans maximum transversal The BTF package includes three user callable functions each with int and UF long versions They do not need to be called directly by an application that uses KLU KLU will call these functions to perform the permutation into upper block triangular form The btf maxtrans function finds a column permutation Q that gives A Q a zero free diagonal if one exists If row i is matched to column j then Match i j If the matrix is structurally singular there will be some unmatched rows If row i is unmatched then Match i 1 If the matrix is square and structurally non singular then Q Match is the column permutation The btf_maxtrans function can accept as input a rectangular matrix it operates on the bipartite graph of A It returns the number of columns matched Unlike the KLU user callable functions the BTF functions do not check its inputs at all a segmenta
52. r later if BTF_VERSION gt BTF_VERSION_CODE 1 2 This also works during compile time if BTF gt BTF_VERSION_CODE 1 2 printf This is version 1 2 or later n else printf This is an early version n endif define BTF_DATE Jan 25 2011 define BTF_VERSION_CODE main sub main 1000 sub define BTF_MAIN_VERSION 1 define BTF_SUB_VERSION 1 define BTF_SUBSUB_VERSION 2 define BTF_VERSION BTF_VERSION_CODE BTF_MAIN_VERSION BTF_SUB_VERSION ifdef __cplusplus endif endif 39 References 1 10 11 12 13 14 15 16 17 P R Amestoy T A Davis and I S Duff An approximate minimum degree ordering algo rithm SIAM J Matrix Anal Appl 17 886 905 1996 P R Amestoy T A Davis and I S Duff Algorithm 837 AMD an approximate minimum degree ordering algorithm ACM Trans Math Software 30 381 388 2004 Y Chen T A Davis W W Hager and S Rajamanickam Algorithm 887 CHOLMOD supernodal sparse Cholesky factorization and update downdate ACM Trans Math Software 35 3 2009 T A Davis Algorithm 832 UMFPACK V4 3 an unsymmetric pattern multifrontal method ACM Trans Math Software 30 196 199 2002 T A Davis A column pre ordering strategy for the unsymmetric pattern multifrontal method ACM Trans Math Software 30 165 195 2004 T A Davis Direct Methods for Sparse Linear Systems SIAM Philadelphia PA 2006 T A
53. rates the basic usage of KLU It uses KLU and indirectly makes use of BTF and AMD COLAMD is required to compile the demo but it is not called by this example It uses statically defined global variables for the sparse matrix A which would not be typical of a complete application It just makes for a simpler example klu_simple a simple KLU demo solution is x 1 2 3 4 5 include lt stdio h gt include klu h int n 5 int Ap 0 2 5 9 10 12 int AGO EL 0e As OR OD 4 ds 225 8 A5 95 Ts ARS double Ax 2 3 3 1 4 4 3 1 2 2 6 1 double b 8 45 3 3 19 int main void klu_symbolic Symbolic klu_numeric Numeric klu_common Common int i klu_defaults amp Common Symbolic klu_analyze n Ap Ai amp Common Numeric klu_factor Ap Ai Ax Symbolic amp Common klu_solve Symbolic Numeric 5 1 b amp Common klu_free_symbolic amp Symbolic amp Common klu_free_numeric amp Numeric amp Common for i 0 i lt n i printf x 4d g n i b i return 0 The Ap Ai and Ax arrays represent the matrix 2 3 0 00 3 0 4 06 A 0 1 3 2 0 0 0 1 00 0 4 2 01 The solution to Ax b is x 12345 7 The program uses default control settings no scaling permutation to block triangular form and the AMD ordering It ignores the error codes in the return values and Common status The block triangular form found
54. rder a pointer to a function that can be provided by the application that uses KLU to redefine the fill reducing ordering used by KLU for each diagonal block The int and UF_long prototypes must be as follows int my_ordering function int n int Ap int Ai int Perm klu_common Common UF_long my_long_ordering_function UF_long n UF_long Ap UF_long Ai UF_long Perm klu_l_common Common The function should return 0 if an error occurred and either 1 or a positive nonzero value if no error occurred If greater than zero then the return value is interpreted by KLU as an estimate of the number of nonzeros in L or U whichever is greater when the permuted matrix is factorized Only an estimate is possible since partial pivoting with row interchanges is performed during numerical factorization The input matrix is provided to the function in the parameters n Ap and Ai in compressed column form The matrix A is n by n The Ap array is of size n 1 the jth column of A has row indices Ai Ap j Aplj 1 1 The Ai array is of size Ap n The first column pointer Ap 0 is zero The row indices might not appear sorted in each column but no duplicates will appear The output permutation is to be passed back in the Perm array where Perm k j means that row and column j of A will appear as the kth row and column of the permuted matrix factorized by KLU The Perm array is already allocated when it is passed to the us
55. ror occurs include klu h int Ap n i Ai nz double Ax nz Az 2 nz klu_symbolic Symbolic klu_numeric Numeric klu_common Common Numeric klu_factor Ap Ai Ax Symbolic amp Common real Numeric klu_z_factor Ap Ai Az Symbolic amp Common complex include klu h UF_long Ap nti Ai nz double Ax nz Az 2 nz klu_l_symbolic Symbolic klu_l_numeric Numeric klu_l_common Common Numeric klu_l_factor Ap Ai Ax Symbolic amp Common real Numeric klu_zl_factor Ap Ai Az Symbolic amp Common complex 5 9 klu_solve solve a linear system Solves the linear system Ax b using the Symbolic and Numeric objects The right hand side B is overwritten with the solution on output The array B is stored in column major or der with a leading dimension of 1dim and nrhs columns Thus the real entry b is stored in B i j ldim where 1dim gt n must hold A complex entry b is stored in B 2 i j ldim and B 2 i j ldim 1 for the real and imaginary parts respectively Returns 1 if successful 0 if an error occurs include klu h int ldim nrhs ok double B ldim nrhs Bz 2 ldim nrhs klu_symbolic Symbolic klu_numeric Numeric klu_common Common ok klu_solve Symbolic Numeric ldim nrhs B amp Common real ok klu_z_solve Symbolic Numeric ldim nrhs Bz amp Common complex include klu h
56. ssed column form int Ap size n 1 int Ai size nz Ap n double maxwork do at most maxwork nnz A work in the maximum transversal no limit if lt 0 output not defined on input double work return value from btf_maxtrans int P size n row permutation int QC size n column permutation int R size n 1 block b is in rows cols R b R b 1 1 int nmatch nonzeros on diagonal of P A Q workspace not defined on input or output int Work size 5n UF_long btf_l_order UF_long UF_long UF_long double double UF_long UF_long UF_long UF_long UF_long BIF marking of singular columns BIF_FLIP is a negation about 1 and is used to mark an integer j that is normally non negative BTF_FLIP 1 is 1 BTF_FLIP of a number gt 1 is negative and BIF_FLIP of a number lt 1 is positive BIF_FLIP BIF_FLIP j j for all integers j UNFLIP j acts like an absolute value operation and is always gt 1 You can test whether or not an integer j is flipped with the BTF_ISFLIPPED j macro define BIF_FLIP j j 2 38 define BTF_ISFLIPPED j j lt 1 define BTF_UNFLIP j BTF_ISFLIPPED j BTF_FLIP j j BTF version All versions of BTF include these definitions As an example to test if the version you are using is 1 2 o
57. tion fault will occur if any input pointers are NULL for example The function can require up to O nxnnz A time excluding the cheap match phase which takes another O nnz A time If maxwork gt O on input the work is limited to O maxwork nnz A 16 excluding the cheap match but the maximum transversal might not be found if the limit is reached The Work array is workspace required by the methods its contents are undefined on input and output int nrow ncol Ap ncolt i Ai mz Match nrowl Work 5 ncol nmatch double maxwork work nmatch btf_maxtrans nrow ncol Ap Ai maxwork amp work Match Work UF_long nrow ncol Ap ncol 1 Ai nz Match nrow Work 5 ncol nmatch double maxwork work nmatch btf_l_maxtrans nrow ncol Ap Ai maxwork amp work Match Work 5 23 btf strongcomp strongly connected components The btf_strongcomp function finds the strongly connected components of a directed graph re turning a symmetric permutation P The matrix A must be square The diagonal of A or A Q if a column permutation is given on input is ignored If Q is NULL on input the matrix P A P is in upper block triangular form Otherwise Q is modified on output so that P A Q is in upper block triangular form The vector R gives the block boundaries where the kth block consists of rows and columns R k through R k 1 1 in the permuted matrix The function returns the number of strongly connected co
58. u_l_common porno ene klu_tsolve solves A x b using the Symbolic and Numeric objects pore nn ene int klu_tsolve inputs not modified klu_symbolic Symbolic klu_numeric Numeric int ldim leading dimension of B int nrhs number of right hand sides right hand side on input overwritten with solution to Ax b on output double B size ldim nrhs klu_common Common int klu_z_tsolve inputs not modified klu_symbolic Symbolic klu_numeric Numeric int ldim leading dimension of B int nrhs number of right hand sides right hand side on input overwritten with solution to Ax b on output double B size 2 ldim nrhs int conj_solve TRUE conjugate solve FALSE solve A x b klu_common Common UF_long klu_l_tsolve klu_l_symbolic klu_l_numeric UF_long UF_long double klu_l_common UF_long klu_zl_tsolve klu_l_symbolic klu_l_numeric UF_long UF_long double UF_long klu_l_common porno nn ene klu_refactor refactorizes matrix with same ordering as klu_factor porno nee int klu_refactor return TRUE if successful FALSE otherwise inputs not modified int Ap size n 1 column pointers int Ai J size nz row indices double Ax size nz numerical values klu_symbolic Symbolic 26 input and numerical values modified on output k

User Guide for KLU and BTF

Contents

Download Pdf Manuals

Related Search

Related Contents