Home

PETSc-FEM: A General Purpose, Parallel, Multi

image

Contents

1. row 3 starts row 2 starts row 1 starts Pos in dyn array 0 1 2 3 4 5 6 7 8 Contents of dyn array 1 3 2 4 1 6 2 5 0 1 3 7 3 8 0 1 0 1 Fl v Y 38 33 32 a 382 za Na w 2 Figure 25 Structure of darray 128 For all i the i th row starts in position 1 The first component of the pair shows the value of 7 and the position of the next connected j index The sequence ends with a terminator 0 1 In this way the storage needed is two integers per coefficient with an overhead of two integers per row for the terminator and there is only one dynamic array growing at the same time To insert a new coefficient say i j we traverse the i th row checking whether the j coefficient is already present or not and if the terminator is found j is inserted in its place pointinf to the new terminator which is placed at the end of the dynamic array 16 The PFMat class The PFMat class is a matrix class that acts either as a wrapper to the PETSc Mat or to other representations of matrix solvers Currently there is the basic PETSc class named PETScMat and a class called IISDMat for Interface Iterative Sub domain Di rect method that has a special solver that solves the linear system by solving iteratively over the interface nodes while solving with a direct method in the sub domain interiors this is commonly referred
2. 0 2 00000 e 9 3 Caching the adresses used in the operations UL DAMON ooe ou ba a Fook ee Be ee we ae Ea 9 3 2 Loops executed a non constant number of times 9 3 3 Masks can t traverse branches 9 34 E IGIEDEN 2 256455 eae ee Rea ae be a ee aS 9 4 Synopsis of Operations 9 4 1 One to one operations 2 2 0 000 ee ee Wao Tnsplise op rations lt lt Boe ee Re Be ee a ee Ps 9 4 3 Generic sum operations sum over indices 9 4 4 Sum operations over all indices 0 00000 5s O45 Exp rt Import operati s ses s sw AAA 9 46 Static cache operations s e 265 s aa a Ewe eS 10 Hooks 10 1 Launching hooks The book Net cce ok satima e a a 10 2 Dynamically loaded hooks o sos 64 aoe eee ee ee es 103 Shell Hook A 104 Shell hooks with make so a s e 68 a ee A a a ee a 11 Gatherers and embedded gatherers 11 1 Dimensioning the values vector p o ia or na neoe Ee ee ee a 11 2 Embedded gatherere o c g aa e k e ok e e eB eG A da 11 3 Automatic computation of layer connectivities ooo a a 11 4 Passing element contributions as per element properties 11 5 Parallel aspects c oo sac nae ads aca s on a a ee ee 11 6 Creating a gatherer o sacc bo ae be eee a Wion a e a ae a ee 12 Generic load elemsets 12 1 Linear generic load elemaet cocos pacea ee e es 12 2 Functional extensions of the elemset
3. 4 2 Internal preprocessing The FileStack class AQ QUIE cara be Ree DA pe ke ee PRR Ree 42 2 Class Mual occiso hae be Baw te ee eee ee eee 4 3 Preprocessing with ePerl ddan eee 43 1 Basics of ePerl 2 22245 4408544 fe ee be ee a Bo WARNER o oa ee tee a a AON EG es a es dida TOE ORGANS 2 lt a aoe be woe A e ee 4 3 4 Conditional processing 0 e dao File EAS a eee ee ee Rot we ek ee Ae Re ete Pe 4 3 6 Use of ePerl im makenles oso 2 45 2684 hk a ee aoa AS ePerlini library o oae aa 6 by See A e e a 4 3 8 Errors in ePerl processing 0 e e AA General OPTIONS 2 23 a ee ee aa 44 1 Read mesh options o lt s s s soa sadaca wa kaad daa dae es das Elentir ophion so oad oa a eR aa bad WS Pde eR A 443 Piet ISD Mai class options ceco carp ee eee tari 4 5 Emacs tools and tips for editing data files 4 5 1 Installing PETSc FEM mode 020 4 5 2 Copying and pasting PETSc FEM options Text hash tables 5 1 The elemset hash table of properties 0 5 2 Text hash table inclusion os eos a praos aie aam raaa hu R ER a i aa oe The global table occ opresores Re aa 5 4 Reading strings directly from the hash table 5 5 Reading with get_int and get_double 5 6 Per element properties table 2 0 5 7 Taking values transparen
4. XN xs o Mt h peer aquifer bottom Sc f Spe A aa datum free surface of freatic aquifer Figure 2 Aquifer stream system Figure 3 Aquifer stream system Transverse 2D view This module solves the problem of subsurface flow in a free aquifer coupled with a surface net of 1D streams To model such system three elemsets must be used an aquifer 42 aquifer node Figure 4 Aquifer stream system Discretization system representing the subsurface aquifer a stream elemset representing the 1D stream and a stream_loss elemset representing the losses from the stream to the aquifer or vice versa see figures 2 and 3 The aquifer elemset is a 2D elemset with triangle or quadrangle elements see figure 4 A per element property eta represents the height of the aquifer bottom to a given datum The corresponding unknown for each node is the piezometric height or the level of the freatic surface at that point On the other hand the stream elemset represents a 1D stream of water It has its own nodes separate from the aquifer nodes whose coordinates must coincide with some corresponding node in the aquifer For instance the triangular aquifer element in the figure is connected to nodes n1 n2 and n3 while the stream element is connected to nodes n4 and n5 nl and n5 have the same coordinates but different unknowns and also n2 and n4 A node constant field so called H fields represents the stream
5. 12 3 The flow reversal elemset u soi 2 000 ee ee ee a 13 Visualization with DX 13 1 Asynchronous synchronous communication 000 13 2 Building and loading the ExtProglmport module 13 3 Inputs outputs of the ExtProglmport module 13 4 DX hook SPORE o soc s a saa do a a AA 74 74 75 76 77 77 77 77 77 78 82 84 84 85 85 85 86 87 87 88 88 89 89 90 92 93 94 94 97 98 99 99 100 100 101 101 14 The 14 1 14 2 14 3 14 4 14 5 14 6 14 7 14 8 idmap class Permutation matrices Permutation matrices in the FEM context A small example Inversion of the map Design and efficiency restrictions Implementation Block matrices 14 7 1 Example Temporal dependent boundary conditions 14 8 1 Built in temporal functions 14 8 2 Implementation details 14 8 3 14 8 4 14 8 5 14 8 6 Use of prefixes Time like problems 15 The compute_prof package 15 1 15 2 16 The 16 1 16 2 16 3 16 4 17 The 17 17 2 17 3 17 4 is 17 6 MPI matrices in PETSc Profile determination PFMat class The PFMat abstract interface IISD solver 16 2 1 Interface preconditioning Implementation details of the IISD solver Efficiency considerations DistMap class Abstract interface Implementation details Mesh refinement 17 3 1 Symmetry group generator 17 3 2 Canonical ordering Permutation tree Canonical ordering Object hashing 18 Synchronized buffer 18 1
6. Arr Ar Azzy Az Xr by Art A by 160 Ax b We consider solving this system of equations by an iterative method such as GMRES for instance For such an iterative method we have only to specify how to compute the modified right hand side b and also how to compute the matrix vector product y Ax Computing the matrix product involves the following steps 1 Compute y Ay x 2 Compute w Ay x 3 Solve Az z w for z 4 Compute v Ay z 5 Addy y v involving three matrix products with matrices A Azz and Azz and to solve the system with AL As the matrix Azz has no elements connecting unknowns in different processors the solution system may be computed very efficiently in parallel 16 2 1 Interface preconditioning To improve convergence of the interface problem 160 some preconditioning can be in troduced As the interface matrix is never built true Jacobi preconditioning i e using the diagonal part of the Schur complement P diag A where P is the preconditioning matrix can not be used However we can use the diagonal part of the interface matrix P diag Azz matrix or even the whole interface matrix P Azz Using the diagonal preconditining helps to reduce bad conditioning due to refinement and inter equation bad scaling If the whole interface matrix is used P Ay7 then the linear system A w x has to be solved at each iteration This can be done with a direct solver or iteratively In the 2D case
7. 0 and 1 for j 1 m and 151 gives lim 6 p 0 _ sin A 152 90 2sin 0 A0 ee We introduce then a correction factor Mh so that we define 02 n 1 a EWS 1 bo 2sin A0 ey Analogously we define Pm 1 7 Pm 1 Pm 2sin A0 Te 121 14 8 2 Implementation details Temporal boundary conditions are stored in an object of class Amplitude This objects contains a key string identifying the function and a text hash table containing the pa rameters for that function for instance omega 3 5 amplitude 2 3 etc Recall that each node field pair may depend on some free degrees of freedom those in U in 123 and some others fixed those in U For those fixed there is an array containing both an amplitude i e a double and a pointer to an Amplitude object If the condition don t depends on time then the pointer is null 14 8 3 How to add a new temporal function If none of the built in time dependent functions fit your needs then you can add you own temporal function Suppose you want a function of the form f 2 Ue 155 ay 420 Follow these steps 1 Give a name to it ending in _function We will use my_own_function in the following 2 Declare it with a line like AmplitudeFunction my_own_function In the core PETSc FEM this is done in dofmap h 3 Write the definition You can find typical definitions in file tempfun cpp You can use the macro SGETOPTDEF lt type gt lt name
8. Application Code N 2 C ss Ss PETSc FEM library cs Q a Element routines program output Figure 1 Typical structure of a PETSc FEM application 3 2 The elemset concept PETSc FEM is written in the C language and sticks to the Object Oriented Program ming OOP philosophy In OOP data is stored in objects and the programmer access them via an abstract interface A first approach to writing a Finite Element program with 13 OOP philosophy is to define each element or node as an object However this is both time and memory consuming since accessing each element or each node is performed by passing through the whole code layer of the element or node class As one of the primary objectives of PETSc FEM is the efficiency we solved this by defining the basic objects as a whole set of elements of nodes that share almost all the properties aside element connectivities or node coordinates This is very common in CFD where for each problem almost all the elements share the same physical properties viscosity density etc and options number of integration Gauss points parameters for the FEM formulation etc Thus for each problem the user defines a nodedata object and one or several elemset objects Each elemset groups elements of the same type i e for which residuals and matrices are to be computed by the same routine Usually in CFD all the elements are processed by t
9. gt mu lt rho x velocity L Reynolds gt Nusselt lt Reynolds Grashof 0 25 gt which results in mu 0 0095226 Nusselt 88 0111736793393 4 3 3 Text expansion It is common to have several lines of text that have to be repeated several times For instance some options that hae to be applied to several elemsets The trick is to assign the text to a variable via the here in document lt lt EOT feature and then inserting in the appropriate places for instance lt common_options lt lt EOT option1 valuel option2 value 2 EOT gt elemset typel 4 props 18 lt common_options gt option3 value3 __END_ELEMSET__ elemset type2 3 props lt common_options gt option4 value4 __END_ELEMSET__ that expands to elemset typel 4 props option1 valuel option2 value 2 option3 value3 __END_ELEMSET__ elemset type2 3 props option1 valuel option2 value 2 option4 value4 __END_ELEMSET__ Note the use of the underscore just before the gt terminator in the first ePerl block This tells to ePerl not to include the semicolon terminator see the ePerl documentation for further details The terminator EOT stands for End Of Text and may be replaced by any similar string It must appear in a line by itself at the end of the text to be included 19 4 3 4 Conditional processing ePerl allows conditional processing as with the C preprocessor with if else endif co
10. A more specialized class 19 Authors 20 Grants received 21 Symbols and Acronyms 21 1 Acronyms How to add a new temporal function Dynamically loaded amplitude functions 106 106 107 111 114 114 114 115 115 116 117 122 122 123 125 127 127 127 128 129 129 129 131 132 134 135 136 136 138 139 141 141 142 142 144 146 147 148 150 22 Symbols 150 1 LICENSE The PETSc FEM package is a library and application suite oriented to the Finite El ement Method based on PETSc Copyright C 1999 2008 Mario Alberto Storti Nor berto Marcelo Nigro Rodrigo R Paz Lisandro Dalcin Ezequiel Lopez Centro Inter nacional de Metodos Numericos en Ingenieria CIMEC Argentina Universidad Nacional del Litoral UNL Argentina Consejo Nacional de Investigaciones Cientificas y Tecnicas UNL Argentina This program is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for more details You should have received a copy of the GNU General Public License along with this program if not write to the Free Software Foundation I
11. Also it becomes difficult to comment out some hooks while keeping others The hooks are executed in the order as you entered them in the hook list so that in the previous case you will have at init time the init part of the compress hook to be executed before the init part of the my_dx_hook and finally the init part of the coupling_hook 10 2 Dynamically loaded hooks The easiest way to code a dynamically loadable hook is with a class You need to in clude the corresponding headers hook h and dlhook h and the class may define the hook functions for all some or none of the hook launching points Consider for instance the following Hello world hook that prints the message at the corresponding points in the program tinclude lt src hook h gt tinclude lt src dlhook h gt class hello_world_hook 4 public void init Mesh amp mesh_a Dofmap amp dofmap TextHashTableFilter options const char name printf Hello world 1I m in the init hook n void time_step_pre double time int step printf Hello world I m in the time_step_pre hook n void time_step_post double time int step const vector lt double gt amp gather_values 89 printf Hello world I m in the time_step_post hook n void close printf Hello world I m in the close hook n DL_GENERIC_HOOK hello_world_hook You can use almost any conceivable C C library within your hooks Take into account that the p
12. a may have or not the mask set J When the code reaches the prod method at line A it can have executed or not the block inside the if so that the mask set in line B may or may not be active at line A This is clearly an error and to avoid it the safest way is to always reset the masks at the outlet of a branched block like in line C as follows FastMat2 a b c resize and set a b c for int j 0 j lt N j FastMat2 branch if condition 4 FastMat2 choose 0 ate AAA inl x9 B operate on masked a a rs C FastMat2 leave c prod a b A OK a has not mask 9 3 4 Efficiency As we mentioned before When caching is enabled there is a gain in speed of ten to one hundreth and the library is very performant Of course the first execution of loop is not cached and represents and overhead that has to be amortized by executing the loop in 84 cached mode many times The average speed increases when the number of executions of the loop is increased The cut point i e the number of executions of the loop for which the excution speed falls to one half the speed obtained for very large number o execution is currently between 10 and 30 so that for loops larger than 200 the overhead time spent in building the caches is negligible Another issue is the memory required by the caches First there is some space required by the caches themselves and then there is a co
13. 1 Q 1 Q aaa aega ep oe h Co 4 O k 955 o A s t eel AG ve on Qs x 0 t 29 where A is the cross sectional area Q is the discharge Gs s t represents the gain or loss of the stream i e the lateral inflow per unit length of channel s is the arc length along the channel v Q A the average velocity in s direction v the velocity component in s direction of lateral flow from tributaries the Boussinesq coefficient P fu2dA u the flow velocity at a point and a the wind direction measured from a positive line tangent to s in flow direction The bottom shear stresses are approximated by using the Chezy or Manning equations P h f f x at Che zy model 30 h 48 S ny y le Manning model a AB h y where P is the wetted perimeter of the channel and a is a conversion factor a 1 for metric units 6 10 3 Kinematic Wave Model When friction and gravity effects dominate over inertia and pressure forces and if we neglect the stress due to wind blowing and the coriolis term the momentum equation becomes S Sf 31 47 and eq 29 0A h _ IQA Ot Os where A depends through the geometry of the channel on the channel water height h The flow rate Q under the KWM model is only a function of A through the friction law Q yA 33 where y Cp S1 2 P 1 and m 3 for the Ch zy friction model and y an7 s1 2 p 2 3 and m 5 3 for the Manning model S dhy ds is the slope
14. Finite Difference Method FEM Finite Element Method HISD Interface Iterative Sub domain Direct method KWM Kinematic Wave Model see 6 7 LES Large Eddy Simulation MPI Message Passing Interface OOP Object Oriented Programming Perl Practical Extraction and Report Language PETSc Portable Extensible Toolkit for Scientific computations SUPG Streamline Upwind Petrov Galerkin ANN Approximate Nearest Neighbor problem Also refers to the library developped by David Mount and Sunil Arya http www cs umd edu mount ANN 22 Symbols e u v Streamwise and normal components of velocity 150 References 1 L Rodr guez Investigation of Stream Aquifer Interactions Using a Coupled Surface Water and Ground Water Flow Model PhD thesis University of Arizona 1995 2 M Mallet T J R Hughes A new finite element method for cfd Iv a discontinuity capturing operator for multidimensional advective diffusive systems Comp Meth in Applied Mechanics and Engineering 58 1986 151
15. Objects with the same key are not overwritten i e if several elements with the same key are loaded on the same or differente processors then all of them are printed to the output As the sorting algorithm used is stable objects with the same key loaded in the same processor remain in the same order as were entered Typical usage is as follows include lt src syncbuff h gt class KeyedObject define methods as declared above Es SYNC_BUFFER_FUNCTIONS KeyedObject int main SyncBuffer lt KeyedObject gt sb Insert objects sb push_back obj1 EP Sa sb push_back obj1 flush the buffer sb flush The macro SYNC_BUFFER_FUNCTIONS takes as argument the name of the basic object class and generates a series of wrapper functions This should be done in the templates themself but due to current limitations in template specialization it has to be done through macros 145 18 1 A more specialized class A more simple class derived from SyncBuffer lt gt has been written This class is based on SyncBuffer lt gt using as KeyedObject the class KeyedLine which has simply an integer and a C string Typical usage is include lt src syncbuff h gt include lt src syncbuff2 h gt KeyedOutputBuffer kbuff AutoString s for E s clear Load string s with cat_sprintf functions TP as Load in buffer kbuff push k s kbuff flush Also you can directly use
16. __ _ECHO_OFF__ controls whether the lines read from input should be copied to the output Usually one may be interested in copying some part of the input to the output in order to remember the parameters of the run As implemented so far this feature is recursive so that if included files with the internal preprocessing i e with the __INCLUDE__ directive will be also copied to the output unless you enclose the __INCLUDE__ line itself with a __ECHO_OFF__ __ECHO_ON__ pair For instance __ECHO_ON__ global_options ett more options here nsaverot 50 viscosity 13 3333333333333 weak_form 0 re and here __END_HASH__ nodes 2 2 3 do not echo coordinates 16 __ECHO_OFF__ __INCLUDE__ stabi nod tmp __ECHO_ON__ __END_NODES__ 4 2 2 Class internals As its names suggests the class is based in a stack of files When a get_line is issued a new line is read comments are skipped and continuation lines are resolved if the line is an __INCLUDE__ directive then the current line is pushed in the stack and the new file is open and is the current file 4 3 Preprocessing with ePerl ePerl for embedded Perl is a package that allows inclusion of Perl commands within text files Perl is the Practical Extraction and Report Language a scripting lan guage optimized for scanning arbitrary text files extracting information from those text files and printing reports based on that information For more infor
17. can be rewritten as Tw g u u 80 where 7 fe ee 75 e or op a t Era 82 The traction on the wall element is assumed to be parallel to the wall and in opposite direction to the velocity vector that is tp g u Up 83 Replacing in 78 the residual term is Rip f g u upN dE 84 The Jacobian of the residual with respect to the state variables needed for the Newton Raphson algorithm is Ryp o l Jip jq Oujq Oujq g u up Ni dx OUp j u L so Dug I 4 tp m ka 60 but Up 5 up N 86 l so that up Oulp duja T Oj Y dy p Ni 87 l Ong Nj Similarly w Up Up UipNi UmpNm 88 and 89 Oujq bujq so that r u Up mM bujq u 90 Replacing in 85 Tigi 6 LY N N d 91 ip jq Z SS g u bpg u P tN 91 7 1 3 The van Driest damping factor Programming notes This is a non standard issue since the computation of one volume element requires in formation of other wall elements First we compute the wall element that is associated to each volume element assemble is called with jobinfo build_nneighbor_tree This jobinfo is acknowledged only by the wall elemsets which compute their geometrical center and put them in the data_pts STL array Then this is passed to the ANN package which computes the octree All this is cached in the constructor of a WallData class After this a call to assemble with jobinfo get_nearest_wall_element is acknow
18. erwise each processor loads in data_pts only the coordinates of the elements that be long to him A possible solution is after the loop to exchange the information among the processors but the simplest solution id to simply bypass the element selection with compute_this_elem with a call like for int k el_start k lt el_last k 4 if build_nneighbor_tree comp_shear_vel compute_this_elem k this myrank iter_mode continue That means that for jobinfo build_nneighbor_tree and comp_shear_vel the nor mal element selection is bypassed 7 2 Options General options e int A_van_Driest default 0 If A van_Driest 0 then the van Driest damping factor is not used found in file ns cpp e int activate_debug default 0 Activate debugging found in file ns cpp e int activate_debug_memory_usage default 0 Activate report of memory usage found in file ns cpp e int activate_debug_print default 0 Activate printing in debugging found in file ns cpp 62 double alpha default 1 Trapezoidal method parameter alpha 1 Backward Euler alpha 0 Forward Euler alpha 0 5 Crank Nicholson found in file ns cpp double displ_factor default 0 1 Scales displacement for ALE like mesh relocation found in file ns cpp double Dt default 0 The time step found in file ns cpp int fractional_step default 0 Use fractional step or TET algorithm found in file ns cpp string fractional_step_solver
19. found in file elemset cpp e int report_consumed_time_stat default 0 Print statistics about time spent in communication and residual evaluation found in file elemset cpp 23 4 4 3 PFMat IISDMat class options This options are used in the PFMat class int iisd_subpart default 1 Number of subpartitions inside each processor found in file iisdcr cpp int iisd_subpart_auto default 0 Choose automatically the number of subdomains so as to have approximately this number of unknowns per subdomain found in file iisdcr cpp int iisdmat_print_statistics default 0 Print dof statistics number of dofs local and interface in each processor found in file iisdcr cpp double interface_full_preco_fill default 1 The ILU fill to be used for the A_II problem if the ILU preconditioning is used found in file iisdcr cpp int interface_full_preco_maxits default 5 Number of iters in solving the preconditioning for the interface problem when using use_interface_full_preco found in file iisdcr cpp string interface_full_preco_pc default jacobi Defines the preconditioning to be used for the solution of the diagonal interface problem not the Schur problem found in file iisdcr cpp double interface_full_preco_relax_factor default 1 The problem on the interface is solved with Richardson method with few iterations normally 5 Richardon iteration may not converge in some cases and then we can help converg
20. hooks are useful for communicating between different instances of PETSc FEM For instance if you want to couple a inviscid external flow with an internal viscous flow then you can run a PETSc FEM instance for each region and perform the communication between the different regions with hooks The DX hook is explained in the DX section see 13 and we will explain here how to write and use dynamically loaded hooks and the shell hook The hook concept has been borrowed from GNU Emacs 88 10 1 Launching hooks The hook list In order to activate a hook you first have to add a for each hook a pair of strings to the hook_list namely the type pf hook and the name of the hook This last one is a unique identifier that makes that hook unique hook_list lt hook type 1 gt lt hook name 1 gt lt hook type 2 gt lt hook name 2 gt For instance hook_list shell_hook compress dx_hook my_dx_hook dl1_hook coupling_hook Here we added a shell hook that probably will compress some files during execution the DX hook in order to visualize and a dynamically linked hook that will couple the run with another program Each hook will after take their own options from special options in the table The hook_list entry must be unique so that you have to group all your hooks in a single hook_list entry This is not limiting because you can add as much hooks as you want but it is rather syntactically cumbersome because you end up with a long string
21. name of properties to be defined per element Elemset properties hash table geometry cartesian2d ndim 2 npg 4 physical properties Dt 1 time step viscosity 1 next lines flags end of the properties hash table __END_HASH__ element connectivities physical properties per element 1342 1 1 2 3 0 7 3564 1 2 2 1 0 8 5786 1 3 2 2 0 9 lt more element connectivities and physical props here gt 193 195 196 194 195 197 198 196 197 199 200 198 199 201 202 200 next lines flags end of elemset connectivities __END_ELEMSET__ PPP Pe 0 Y O 0 N NNN w Ne O oOoo0OoOoO aono Here we define that properties cond cp and emiss are to be defined per element We add to each element connectivity line the three properties There are two ways to access these properties The corresponding values are stored in an array elemprops of length nprops X Melem Where Nprops is the number of per element properties 3 in this example and nelem is the number of elements in the mesh Also there is a macro ELEMPROPS k iprop that allows treating it as a matrix So you can access this values with cond ELEMPROPS k 0 conductivity of element k cp ELEMPROPS k 1 specific heat of element k emiss ELEMPROPS k 2 emissivity of element k 5 7 Taking values transparently from hash table or per element table With the tools described so far you can access constant properties on one hand the same
22. one of them to fit in the other If we arrive to an internal node without possibility to follow then the geometrical objects are distinct If we reach a leave then the objects are the same 17 5 Canonical ordering Another possibility to determine whether two node oderings are congruent is to have a uniquely determined ordering that can be computed from the node sequence itself We we call this ordering the canonical ordering for the geometrical object If this is possible then given two orderings we can bring both of them to the canonical ordering and then compare them as plain sequences One possibility is to take the caonical order as that one that gives the lower node sequence in lexicographical order This has the advantage that one the canonical orderign is known for both objects the comparison is very cheap Also it can be used as a comparison operator for sorting geometrical objects i e the order between two objects is the lexicographical order between the two node sequences in its canonical form The canonical order can be computed efficientlyusing the permutation tree described above 17 6 Object hashing Another useful technique for comparing objects is by comparing first some scalar function of the node indices For instance we can compute a hash sum for the object go as n 1 H go S h nodego 162 j 0 where node go j is the j node of object go H is the hash sum for the geometrical object go whe
23. or not found in file double radius default lt required gt Radius of the channel found in file double Rf default 1 Resistivity including perimeter of the stream to loss to the aquifer found in file double roughness default 1 Roughness coefficient for the Manning formula a k a n found in file string shape default string undefined Choose channel section shape may be circular rectangular or triangular found in file double wall_angle default lt required gt Width and height of the channel found in file double wall_angle default lt required gt Aperture angle of channel found in file double width default lt required gt Width of the channel found in file double width_bottom default lt required gt Width of bottom of channel found in file 6 8 The Hydrological Model cont The implemented code solves the problem of subsurface flow in a free aquifer coupled with a surface net of 2D or 1D streams 2D Saint Venant Model 2DSVM 1D Saint Venant Model 1DSVM and Kinematic Wave model KWM To model such system three element sets must be used an aquifer system representing the subsurface aquifer a 2D or 1D depending on the chosen model stream element set representing the stream and a 2D or 1D stream loss element set representing the losses from the stream to the aquifer or vice versa 45 6 9 Subsurface Flow The aquifer
24. 1 3 j 1 2 114 k 1 2 1 1 3 These operation can be extended to any binary associative operation So far we have implemented the following e FastMat2 sum const FastMat2 amp A const int m 0 Sum over all selected indices e FastMat2 sum_square const FastMat2 A const int m 0 Sum of squares over all selected indices e FastMat2 sum_abs const FastMat2 amp A const int m 0 Sum of absolute values all selected indices e FastMat2 min const FastMat2 amp A const int m 0 Minimum over all selected indices e FastMat2 max const FastMat2 amp A const int m 0 Maximum over all selected indices e FastMat2 min_abs const FastMat2 amp A const int m 0 Min of absolute value over all selected indices e FastMat2 max_abs const FastMat2 amp A const int m 0 Max of absolute value over all selected indices 86 9 4 4 Sum operations over all indices When the sum is over all indices the resulting matrix has zero dimensions so that it is a scalar You can get this scalar by creating an auxiliar matrix with zero dimensions casting with operator double as in FastMat2 A 2 3 3 Z assign elements to A double a double Z sum A 1 1 or using the get function double a Z sum A 1 1 getQ without arguments which returns a double In addition there is for each of the previous mentioned generic sum function a companion function that sums over all indices The name o
25. 20 Grants received The development of this code has been developed under financement from the following grants 23 22 21 20 19 18 17 16 15 14 Key PICT 2006 VES Code PICT 1506 2006 Title C lculo distribuido en mec nica y multif sica computacional Director Mag Victorio Sonzogni Fi nancing agency FONCyT Begin 2008 End 2010 Key PICTO 2004 Code PICTO 23295 2004 Title Integraci n de procesos del complejo suelo agua planta para una mejor planificaci n h drica en la cuenca inferior del R o Salado Director Dr Leticia Rodr guez Financing agency FONCyT Begin 2006 End 2007 Key PIP 2005 Code PIP 5271 Title Mec nica Computacional en Prob lemas de Multif sica Director Dr M A Storti Financing agency CONICET Begin 2005 End 2008 Key CAI D 05 Code CAI D 2005 10 64 Title M todos Num ricos para Resoluci n de Problemas Multif sica Director Dr S R Idelsohn Financing agency Universidad Nacional del Litoral UN Litoral Begin 2005 End 2007 Key PROTIC Code PAV 127 Subproy 4 Title PROTIC Red para la Promoci n de las Tecnolog as de la Informaci n y las Comunicaciones Subproyecto 4 Centro Virtual de Computaci n de Alto Rendimiento Di rector Dres Alejandro Cecatto Guillermo Marshall Financing agency ANPCyT FONCyT Begin 2005 End 2006 Key LAMBDA Code PICT 12 14573 2003 Title LAMBDA Laboratorio virtual para el An lisis y simulaci
26. FastMat2 matrices may have any number of indices Actually the library is compiled for a maximum number of indices 10 by default This limit may be modified by redefining variable max_arg in script readlist eperl and recompile Also they can have zero dimensions which stands for scalars 9 2 1 Current Matrix view In lines 9 to 12 the coordinates of the nodes are loaded in matrix x The underlying philosophy in FastMat2 is that you can set views of the matrix without actually made any copies of the underlying values For instance the operation x ir 1 k for index restriction sets a view of x so that index 1 is restricted to take the value k reducing 76 in one the number of dimensions of the matrix As x has two indices the operation x ir 1 k gives a matrix of dimension one consisting in the k th row of x A call without arguments like in x ir cancels the restriction Also the function rs for reset cancels the actual view 9 2 2 Set operations The operation a set x ir 1 2 copies the contents of the argument x ir 1 2 in a Also we can use x set y with y a Newmat matrix Matrix y or an array of doubles double xy 9 2 3 Dimension matching The x set y operation requires that x and y have the same viewed dimensions As the ir 1 2 operation restricts index to the value of 2 x ir 1 2 is seen as a row vector of size 2 and then can be copied to a If the viewed dimensions don t fit then an
27. Jacobians 76 The modified expression for the projectors is I Cp SIS 77 6 12 8 Absorbing boundary conditions available At the moment of writing this we have three possible combinations of boundary conditions Using extrapolation from the interior These is the elemset lt system gt abso The number of nodes per element ne must be not lower than 4 The first ne 2 nodes are used for a second order extrapolation of the outgoing wave The ne 1 th node is the node with Lagrange multipliers and the ne th node is used to set the reference value For instance for an absorbing elment of ne 5 nodes we would have 3 internal nodes and the data would loke like this see figure 5 e n lagrange multipliers ome n reference state outgoing wave Figure 5 Absorbing element elemset gasflow_abso 5 normal lt nx gt lt ny gt __END_HASH__ lt nl gt lt n2 gt lt n3 gt lt n4 gt lt n5 gt __END_ELEMSET__ end_elemsets 57 fixa lt n4 gt 1 lt rho_ref gt lt n4 gt 2 lt u_ref gt lt n4 gt 3 lt v_ref gt lt n4 gt 4 lt p_ref gt __END_FIXA__ Each element has 5 nodes first three are real nodes i e not fictitious numbered from the boundary to the interior Fourth node is reserved for Lagrange multipliers and the fifth node is set to the reference value The normal option is used to define the normal to the boundary It can be set as a constant vector per elemset usua
28. O VE U1 ae U2 cos a Va Oy V U1 Va U2 p3 VA Or Vab U1 VA Us u4 Us Va Us pa U7 us Ug vs Uy 130 ps U10 ue cos a U5 sina Ug ve sin a Us cos a Ug pe U7 u7 U3 v7 U4 p7 Ui ug Us vg Us ps U12 ug U3 vg U4 Pg Ui As we can see the boundary conditions result in sparse Q and Q matrices Moreover in real large problems most of the rows correspond to interior nodes such as node 5 here so that Q is very close to a permutation matrix If we think at accessing the elements of Q by row then could store Q as a sparse matrix but in this case we need an average of two integers for pointers and a double for the value per unknown whereas a permutation matrix can be stored with only one integer per unknown One possibility is to think at these matrices as permutations followed by a sequence of operations depending on each kind of boundary conditions but it may happen also that several kind of b c s superposes as int the case of node 3 periodic absorbing and node 9 periodic Dirichlet 113 14 4 Inversion of the map It is necessary sometimes to invert relation 123 i e given Unp and U to find U for instance when initializing a temporal problem Of course this may not be possible for arbitrary Unp and U since Q is in general rectangular but we assume that QTQ is non singular and solve 123 in a least square sense After we may verify if t
29. Options controlling how the connection array is constructed can be consulted in 4 4 2 The resulting nn ne 1 arrays are grouped in a Group object and sent through the output_array_list output tab You can extract the individual components with the Select DX module and build field objects Also a set of field objects is created automati cally and sent through the output_field_list output tab Basically a field is constructed for each possible combination of positions connections and data objects This may seem a huge amount of fields but in fact as the arrays are passed internally by pointers in DX 103 the additional memory requirements are not large At the time of writing this this is a set of nn ne ne fields since nn 1 A name is generated automatically for each field Some information is send to the DX Message Window Also it s very useful to put Print modules downstream of the ExtProgImport module in order to see which arrays and fields have been created The communication between DX and PETSc FEM is done through a socket PETSc FEM acts as a server whereas DX acts as a client PETSc FEM opens a port option dx_port and DX connects to that port the port input tab in the ExtProgImport module Currently the standard port for the DX PETSc FEM communication is 5314 DX can communicate with PETSc FEM running in the background ad even on other machine the serverhost input tab At each time st
30. appro priated jobinfo in order to tell the element routine which matrix are you com puting This returns a Libretto dynamic array da containing the sparse structure of the given matrix e function compute_prof takes da as argument and fills d_nnz o_nnz 15 2 Profile determination Let us describe first how the sparse profile is determined in the sequential one processor only case We have to determine for each row index i all the column indices j that have a non null matrix entry A The amount of storage needed is a mattern oof concern here It is allmost sure that we will need at least one integer per each non null entry A first attempt is to have a dynamic array for each 7 index and store their all the connected j indices In typical applications we have O 10 to O 10 nodes per processor and the number of connected 7 indices ranges from 10 to several hundred In order to avoid this large number of small dynamic arrays growing we store all the indices in a big array behaving as a sinly linked list Each entry in the array is composed of two integers struct Node one pointing to the next entry for this row and other with the value of the row Consider for instance a matrix with the following sparse structure A x 156 Ro K Matrix coefficients i j are introduced in the following order 1 1 2 2 1 2 3 1 1 3 3 3 The dynamic array for this mesh results in
31. bottom height hp with reference to the datum So that normally we have for each node two coordinates and the stream bottom height ndim 2 nu 3 ndof 1 The unknown for these nodes is the height u of the stream free water surface with reference to the stream bottom The channel shape and friction model and coefficients are entered via properties described below If the stream level is above the freatic aquifer level hy u gt then the stream losses water to the aquifer and vice versa The equation for the aquifer integrated in the verical direction is 0 z S 6 m 6 V K o 0 V4 Y Ga 16 where S is the storativity and G is a source term due to rain losses from streams or other aquifers The equation for the stream is according to the Kinematic Wave Model KWM ap proach DA u QCA Ot Os 43 Where u is the unknown field that represents the height of the water in the channel with respect to the channel bottom as a function of time and a linear arc coordinate along the stream A is the transverse cross section of the stream and depends through the geometry of the channel on the channel water height u Q is the flow rate and under the KWM model is a function only of A through the friction law Q 7A 18 where y Cp S1 2 P 1 and m 34 for the Ch zy friction model and y an 91 2 p 2 3 and m 5 3 for the Manning model where S dhy ds is the slope of the stream bottom P is the wetted perimeter
32. conditions __END_HASH__ lt nodel gt lt dofl gt lt vall gt lt node2 gt lt dof2 gt lt val2 gt lt nodeN gt lt dofN gt lt valN gt __END_FIXA__ For instance for the example above it may look like this fixa_amplitude sine omega 5 amplitude 1 __END_HASH 0 00000 55556 88889 00000 88889 55556 00000 _END_FIXA__ oOoOOrRFO Oo NNNNNNNI 14 8 1 Built in temporal functions The following temporal functions are currently available in PETSc FEM ramp Double ramp function p t See 22 01 ft lt ti Q2 gt if t z ta 138 dits t t ift lt t lt t 117 where the slope s is _ 2 01 The parameters are start_time t default 0 The starting time of the ramp start_value default 0 The starting value of the ramp slope s default 0 The slope in the ramp region end_time t2 default start_time The end time of the ramp end_value 2 default start_value The end value of the ramp Only one of slope and end_value must be specified If the end values are not defined then they are assumed to be equal to the starting ones If the end time is equal to the starting time then the end time is taken as oo i e a single ramp starting at start_time A 9 Q w to ti Figure 22 Ramp function smooth_ramp Smooth double ramp function hyperbolic tangent See fig ure 23 HITO AS tt 5 5 tanh 7 140 o t
33. corresponding to incoming waves are null and vice versa for the outgoing waves we can add both blocks of equations to add a single set of Ndof equations mMm MEVo Ip Vt O ep VE 0 62 p 0 and coming back to the U basis Ug Up Wg Ug Y ep Up 0 63 p 0 The modified version of the FEM system of equations 59 that incoroporates the absorbing boundary conditions is then Ug Uo Wg UGt Y ep Ur 0 p 0 F Ugt U we ne 64 EU UA Un REM 54 6 12 6 Avoiding extrapolation For linear systems equation 59 is of the form n 1 n n 1 n a Ea O uel E ue n 65 k T YUk k 1 k l1 A 0 k gt 1 At 2h We have made a lot of simplifications here no source or upwind terms and a simple discretization based on centered finite differences Alternatively it can be thought as a pure Galerkin FEM discretization with mass lumping In the base of the characteristic variables V this could be written as VE VE VE Via A A 7 Q k 7 Vk k 1 k 1 A 0 k 1 At i h 7 T For the linear absorbing boundary conditions 49 we should impose IE Vier Vo Vref 0 67 while discarding the equations corresponding to the incoming waves in the first rows of 66 Here Urer Vrer is the state about which we make the linearization This can be done via Lagrange multipliers in the following way IE Vret Vo Vref IT Vref Vim 0 vat vg Vi vg F pN
34. determine which mechanism passes conductivity DEFPROP conductivity conductivity is found after calling load_props in propel conductivity_indx define COND propel conductivity_indx Other properties DEFPROP propa define PROPA propel propa_indx DEFPROP propb define PROPB propel propb_indx DEFPROP propc define PROPC propel propc_indx DEFPROP propp define PROPP propel propp_indx DEFPROP propq define PROPQ propel propq_indx Total number of properties loaded int nprops iprop Set error if maximum number of properties exceeded 33 assert nprops lt MAXPROP code loop over elements for int k el_start k lt el_last k 4 check if this element is to be processed if compute_this_elem k this myrank iter_mode continue Load properties either from properties hash table or from per element properties table load_props propel elprpsindx nprops amp ELEMPROPS k 0 more code use physical element property double veccontr wpgdet COND dshapex t dshapex Posse First we allocate for 100 entries in elprpsindx and propel arrays and set the counter prop to 0 Then we call DEFPROP for properties conductivity and propa thru propq After this we check that the maximum number of properties to be defined is not exceeded and enter the element loop After checki
35. documentation for an option put the cursor on the option and press C h C i that is lt Ctr1 h gt lt Ctrl i gt this is the key binding for info lookup symbol You will get in the minibuffer something like Describe symbol default print_internal_loop_conv If you press RET then you jump to the corresponding page in the info file From there you can browse the whole manual Pressing s Info search allows to search for text in the whole manual When done you can press q Info exit or x my Info bury and ki11 defined in too1s petscfem el If you don t know exactly what option you are looking for then you can search in the manual or launch info lookup symbol with the start of the command and then use completion to finish writing the option If you want to copy some option from the info manual to the data file then you can use the usual keyboard or mouse copy and paste methods of Emacs Also pressing c my Info lookup copy keyword in the info manual copies the name of the currently visited option to the kill ring Then in the data file buffer press as usual C y yank to paste the last killed option If you paste several options from the manual then you can navigate between them by pasting the last with C y and then going back and forth in the kill ring with M y yank forw usually M stands for pressing the lt A1t gt or lt Escape gt key For more information see the Emacs manual specially the documentation for the info
36. element set is 2D linear triangle or quadrangle elements A per node property n represents the height of the aquifer bottom to a given datum The corresponding unknown for each node is the piezometric height or the level of the freatic surface at that point The equation for the aquifer integrated in the vertical direction is o Ot where Qag is the aquifer domain S is the storativity K is the hydraulic conductivity and Ga is a source term due to rain losses from streams or other aquifers S 6 m0 V K d n V4 Ga on Rag x 0 4 22 6 10 Surface Flow 6 10 1 2D Saint Venant Model The stream element set represents a 2D or 1D stream of water It has its own nodes sep arated from the aquifer nodes whose coordinates must coincide with some corresponding node in the aquifer A constant per node field represents the stream bottom height hy with reference to the datum That is why normally we have two coordinates and the stream bottom height for each node The equations for the 2D Saint Venant open channel flow are the well known mass and momentum conservation equations integrated in the vertical direction If we write this equations in the conservation matrix form we have OU F U OE e On G U i 1 3 on Qs x 0 41 23 where Qst is the stream domain U h hw hu is the state vector and the advective flux functions in eq 23 are h2 F U hw hw g hwv 2 24 h F U hv hwv
37. equations that is added For instance when imposing conditions on primitive variables it is usual to discard the energy equation if pressure is imposed to discard the continuity equation if density is imposed and to discard the j th component of the momentum equation if the j th component of velocity is imposed On solid walls the energy equation is discarded if temperature is imposed Some of these pairings equation unknown are more clear than others 53 So that when imposing absorbing boundary conditions we have not only to specify the new conditions but also which equations are discarded Note that this is not necessary if all the variables are imposed at the boundary for instance in a supersonic outlet This suggests to generate appropriate values even for the outgoing waves for instance by extrapolation from the interior values 6 12 5 Extrapolation from interiors For a linear system in characteristic variables 45 we could replace all the first row of equations by 0 j 1 n oa A d 60 ee Cpp J N 1 Ndot which can be put in matricial form as yn tl _ IV 0 m dal 61 My V3 Vp 0 p 0 where the c s are appropriate coefficients that provide an extrapolation to the value vati from the values at time vat Note that these represents 2ngof equations but as Hy have rank n there are in total naof linearly independent equations As the nt 1 lt j lt naor rows in the first row of equations
38. file ns cpp 63 int nnwt default 1 Number of inner iterations for the global non linear Newton problem found in file ns cpp int nrec default 1000000 Sets the number of states saved in a given file in the rotary save mechanism see 7 2 found in file ns cpp int nsave default 10 Sets the save frequency in iterations found in file ns cpp int nsaverot default 100 Sets the frequency save for the rotary save mechanism Sometimes it is interesting to save the state vector with a certain frequency in a append manner i e appending the state vector at the end of the file However this posses the danger of storing too much amount of data if the user performs a very long run The rotary save mechanism allows writing only a certain amount of the recent states The mechanism basically saves the state vector each nsaverot steps appending to the a file The name of the file is contructed from a pattern set by the user via the save_file_pattern entry by replacing d by 0 a la printf For instance if save_file_pattern i set to file d out then the state vectors are appended to file0 out When the number of written states reach the nrec count the file is reset to 0 and the saving continues from the start of the file However if nfile is greater than one then the state vector are continued to be stored in file file1 out and so on When the number of files nfile is reached the saving continues
39. for the whole elemset and per element properties Now given a property you should decide whether it should be assumed to be constant for all the elemset or whether it can be taken per element The second is the more general case but to take all the possible 32 properties as per element may be too much core memory consuming There is then a mechanism to allow the application writer to get physical properties without bothering of whether the user has set them in the properties hash table or in the per element properties table First the application writer reserves an array of doubles large enough to contain all the needed properties This doesn t scale with mesh size so you can be gen erous here or either use dynamic memory allocation Before entering the element loop the macro DEFPROP prop_name determines whether the property has been passed by one or the other of the mechanisms This information is stored in an in teger vector elprpsindx MAXPROP Also it assigns a position in array propel so that propel prop_name_indx contains the given property Then once inside the element loop a call to the function load_props loads the appropriate values on propel MAXPROP independently how thay have been defined A typical call sequence is as follows Maximum number of properties to be loaded via load_prop define MAXPROP 100 int iprop 0 elprpsindx MAXPROP double propel MAXPROP
40. is the shell script mode In order to associate the epl or depl extensions to this mode add this to your emacs file setq auto mode alist cons ONAN INAMIdNW epl1 shell script mode auto mode alist 26 4 5 1 Installing PETSc FEM mode In order to use the mode you have to copy the file tools petscfem el to somewhere accessible for Emacs usr share emacs site lisp is a good candidate There is also a file tools petscfem init el that contains some basic configuration you can also copy it or insert directly its contents into your emacs Load basic PETSc FEM mode load file path to petscfem tools petscfem el Load petscfem init or insert directly its contents 35 below and configure load file path to petscfem tools petscfem init el 4 5 2 Copying and pasting PETSc FEM options PETSc FEM modules have a lot of options and is mandatory to have fast access to all of them and to their documentation The HTML documentation has a list of all of them at the end of the user s guide For easier access there is also an info file doc options info that has a page for each of the options You can browse it with the standalone GNU Info program or within Emacs with the info commands In the last case you have the additional advantage that you can very easily find the documentation for a given option with a couple of keystrokes and paste options from the manual into your PETSc FEM data file For jumping to the
41. is the step a C string the second the time step an integer and the last the simulation time a double In fact basically what PETSc FEM does is to build string with sprintf and then execute it with system like sprintf command your_command stage step time system command see the Glibc manual for more info about sprintf and system If you need for some reason to switch the order then use a parameter number like hello echo Hello world time 3 f step 2 d stage 1 s If you want to do some things depending on the stage then perhaps you can write something like this hook_list shell_hook hello hello my_script_hook and bin bash File my_script_hook if 1 init then echo in init Do more things in init stage Ho elif 1 pre then echo in pre Do more things in pre stage Ho elif 1 post then echo in post Do more things in post stage Ho elif 1 close then echo in close Do more things in close stage else Catch all Should not enter here echo Don t know how to handle stage 1 fi 91 At the init and close hook launching points the step number passed is 1 and 2 respec tively so that you can detect whether you are in a pre post stage or init close by checking this too The time passed is in both cases 0 10 4 Shell hooks with make If no command is given i e if you w
42. issue a make dx_step lt step gt dx_make_command found in file dxhook cpp string dx_node_coordinates default lt none gt Mesh where updated coordinates must be read found in file dxhook cpp int dx_port default 5314 TCP IP port for communicating with DX 5000 lt dx_port lt 65536 found in file dxhook cpp int dx_read_state_from_file default 0 Read states from file instead of computing them Normally this is done to analyze a previous run If 1 the file is ASCII if 2 then it is a binary file In both cases the order of the elements must be u 1 1 u 1 2 u 1 3 u 1 ndof u 2 1 u nnod ndof where u i j is the value of field j at node i found in file dxhook cpp string dx_split_state default Generates DX fields by combination of the input fields found in file dxhook cpp int dx_state_all_fields default dx_split_state_flag Generates a DX field with the whole state all ndof fields found in file dxhook cpp int dx_steps default 1 Initial value for the steps parameter found in file dxhook cpp int dx_stop_on_bad_file default 0 If dx_read_state_from_file is activated and can t read from the given file then stop Otherwise continue padding with zeros found in file dxhook cpp 14 The idmap class 14 1 Permutation matrices This class represents sparse matrices that are close to a permutation Assume that the n x n matrix Q is a permutation matrix so t
43. on STDERR preprocessing is stopped no ePerl output is given For instance if you include a directive like lt atanh 2 gt the output looks like ePerl Error Perl runtime error interpreter rc 255 Contents of STDERR channel atanh argument x must be x lt 1 In such a case you have to fix the ePerl commands prior to any further debugging of the PETSc FEM run 4 4 General options The following options apply to all the modules 4 4 1 Read mesh options This options are read in the read_mesh routine e int additional_iprops default 0 int additional properties used by the element routine found in file readmesh cpp e int additional_props default 0 Additional properties used by the element routine found in file readmesh cpp e int check_dofmap_id default 0 Checks that the idmap has been correctly generated found in file readmesh cpp e int debug_element_partitioning default 0 Prints element partitioning found in file readmesh cpp e int local_store default 0 Defines a locker for each element found in file readmesh cpp e int max_partgraph_vertices default INF Maximum number of vertices admissible while computing the partitioning graph found in file readmesh cpp 22 e string partitioning_method default metis Set partitioning method May be set to metis hitchhiking nearest_neighbor or random found in file readmesh cpp e int print
44. or centered one It is equivalent to approximate first derivatives by centered differences in the Finite Difference Method It is well known that the Galerkin formulation leads to oscillations for advective systems and this is solved by adding a stabilizing term to the discretized equations 6 3 SUPG stabilization In the SUPG for Streamline Upwind Petrov Galerkin formulation of Hughes et al The stabilized formulation is o OF MU F U A Psupa oe G 0 11 where the whole expression corresponds to the j th block of size ngo in the global equa tions Note that as the added term is a weighted residual form of the residual the term in parentheses then the continuum solution is solution of these discrete equations we say that this is a consistent formulation Psupa is a matrix of ngor X Ndor the SUPG perturbation function usually defined as ON P T A 12 Psupa ir 12 where T are the characteristic or intrinsic time of the element defined as he T 13 IIA where h is the size of the element and and A represents some norm of the vector of jacobians There is a variety of possibilities for computing both quantities For instance h may be computed as the largest side of the element or as the radius of the circle with the same area On the other hand A may be computed as the maximum eigenvalue of all the linear combinations of
45. permuted indices coincide with the other node sequence 17 4 Permutation tree In order to make it more efficient we can store all the permutations for a given shape in a tree like fashion Consider for instance the oriented tetrahedral shape Their permu tations are the following 0 1 2 3 0 2 3 1 0 3 1 2 1 0 3 2 1 2 0 3 1 3 2 0 2 0 1 3 2130 2 3 0 1 3 0 2 1 3 1 0 2 3 2 1 0 We can describe the generation of a new node numbering in the following way First we can take any of the nodes as the first node of the new numbering These is seen from the fact that all the indices 0 to 3 are present in the first column of the table above If we chose node 2 as the first index then we can chose any of the reamining as the second node 0 1 3 Once we choose the second index say for instance 1 there is only one possibility for the reamining two the numbering 3 1 0 2 The remaining possibility 3 1 2 0 would gnerate an inverted triangle Part of the tree is shown in 39 Every possible permutation is a path in the tree from the first node at level 0 to the last node which is a leave 141 root 6 1 e N Y a Cea OS 1 Q N O u N Figure 39 Tree representing all the permutations for the ordered tetra geometry Now given two possible node orderings for a given geometrical object and the tree that describes the permutations for its shape we simply have to follow the path that makes
46. the connectivity of the interface matrix is 1D or 1D like so that the di rect option is possible However in 3D the connectiviy is 2D like and the direct solver is much more expensive In addition the interface matrix is scattered among all processors The iterative solution is much more appealing since the fact that the matrix is scattered among processors is not a problem In addition the interface matrix is usually diagonally dominant even when the whole matrix is not For instance for the Laplace equation in 2D the stencil of the interface matrix on a planar interface in a homogeneous grid of step h with bilinear quad elements is 18 1 3h Such matrix has a condition number which is independent of h and is asymptotically k A71 5 3 In the case of using tri angular elements by the way this is equivalent to the finite difference case the stencil 131 is 14 1 2h whose condition number is Arr 3 This low condition number also favors the use of an iterative method However a disdvantage for the iterative solution is that non stationary iterative solvers i e those where the next iteration 2 doesn t depend only on the previous one a f x like CG or GMRES can not be nested inside other non stationary method unless the inner preconditioning loop is iterated to a very low error This is because the conjugate gradient directions will lose orthogonality But using a very low error bound for the preconditioning probl
47. the form n A with nj a unit vector i e the maximum propagation velocity possible in the fluid that is All max max X 14 1 Ndim k 1 Ndof where A is the k th eigenvalue of jacobian A For the Euler equations it turns out to be that this corresponds to pressure waves propagating in the direction of the fluid and is c u where c is the speed of sound and u the absolute value of velocity For the shallow water equations its value is u gh where g is gravity acceleration and h the local water elevation with respect to bottom 6 4 Shock capturing For problems with strong shocks shock waves in Euler or hydraulic jumps in shallow water the standard SUPG stabilizing term may not be sufficient Then an additional stabilizing term is added so that the stabilized equations are now of the form oU pon MU F U jt gt L Psupa ar Ta G N OU _ oe ae ae 36 Note that in contrast with the SUPG term the new so called shock capturing term is no more consistent dsc is a scalar the so called shock capturing parameter Often when shock capturing is added we diminish the amount of stabilization in the SUPG term in order to compensate and not to have an over diffusive scheme We will not enter in the details of this computations refer to 2 for further details 6 5 Creating a new advective system New advective systems may be added to PETSc FEM only by defining their flux function ja
48. the nodedata section several elemset sections the fixa section and constraint section Each section starts with the keyword identifying the section in a separate line followed by several parameters in the same line Follows several lines that makes the section ending with a terminator of the form __END_ lt some identifier gt __ for instance __END_HASH__ Note that these terminators start and end with double underscores __ while single underscores are used to separate words inside the keyword For instance an elemset section is of the form elemset volume_euler 4 geometry cartesian2d ndim 2 npg 4 chunk_size 1000 14 lumped_mass 1 shock_capturing 1 gamma 1 4 lt other element options go here gt __END_HASH__ al 2 81 80 2 3 82 81 3 4 83 82 4 5 84 83 5 6 85 84 6 7 86 85 7 8 87 86 8 9 88 87 lt more element connectivities follow here gt __END_ELEMSET__ In this example the keyword elemset is followed by the parameters volume_euler that is the elemset type and 4 that is the number of nodes connected to each element The line starting with props describes some per element quantities more on this later see 85 4 Follows the assignation of some values to parameters for the actual elemset for instance the value 4 is assigned to the npg parameter that is the number of Gauss points The assignation of parameters ends with the __END_HASH__ terminator Follows the element connectivities one per line endin
49. the particular element but it is rather a property of the terms being evaluated in the Jacobian For instance for the Navier Stokes equations the Galerkin term only has non null coef ficients for the velocity unknowns in the continuity equation while the pressure gradient term only has coefficients for the pressure unknowns in the momentum equations Both terms together have a mask as shown in figure 29 When the application writer codes such a term he defines the mask At the moment of uploading the elements if block_uploading is in effect then PETSc FEM computes the envelope of the mask i e the rectangular mask that contains the mask in order to make just one call to MatSetValues In this case the envelope is just a matrix filled with 1 s so that block uploading pays the benefit of us ing the faster MatSetValues routine with the cost of loading much more coefficients than the original mask In addition the PETSc matrix will be bigger with the corresponding increase in RAM demand and CPU time in computing factorizations IISD solver and matrix vector products PETSc solver In a future such a combination of terms will be loaded more efficiently with two calls to MatSetValues The conclusion is that if the terms to be loaded have a very sparse structure but a dense envelope then may be block uploading is slower The worst case is a diagonal like mask Note that also it s not sufficient to have a sparse structure of the el
50. this program if not write to the Free Software Foundation Inc 675 Mass Ave Cambridge MA 02139 USA Also add information on how to contact you by electronic and paper mail If the program is interactive make it output a short notice like this when it starts in an interactive mode 11 Gnomovision version 69 Copyright C 19yy name of author Gnomovision comes with ABSOLUTELY NO WARRANTY for details type show vw This is free software and you are welcome to redistribute it under certain conditions type show c for details The hypothetical commands show w and show c should show the appropriate parts of the General Public License Of course the commands you use may be called something other than show w and show c they could even be mouse clicks or menu items whatever suits your program You should also get your employer if you work as a programmer or your school if any to sign a copyright disclaimer for the program if necessary Here is a sample alter the names Yoyodyne Inc hereby disclaims all copyright interest in the program Gnomovision which makes passes at compilers written by James Hacker lt signature of Ty Coon gt 1 April 1989 Ty Coon President of Vice This General Public License does not permit incorporating your program into propri etary programs If your program is a subroutine library you may consider it more useful to permit linking proprietar
51. vr vn yr _ yr k X k de A k 1 F k 1 0 k gt 1 where Vj are the Lagrange multipliers for imposing the new conditions Note that if 7 is an incoming wave A gt 0 then the equation is of the form Ujo Vrefo 0 yntl yr yoti g jo JO Sl JO oe At Aj ra Vim 0 69 1 1 Uk ke y Viet Ue o b gt At E 2h 4 E Note that due to the vjim Lagrange multiplier we can solve for the vj values from the first last rows the value of the multiplier vj adjusts itself in order to relax the equations in the second row On the other hand for the outgoing waves A lt 0 we have Vj lm 9 1 1 vio Vo vi Vo 0 At aa 70 n 1 n n 1 n e Ms a ay pS At 2h 55 So that the solution coincides with the unmodified orginal FEM equation and vj im 0 Coming back to the U basis we have Ii User Uo _ Une EN IT Uret Uim 0 Ugh UG Ut Us U Ur Ue Us A 0 k gt 1 At de 2h And finally coming back to the FEM equations 59 TI Uer Uo _ Urner F IT Uet Uim 0 Fo Ugt UT 11 Uim RET FLUJO Up Ug Rp n 1 n 1 n ly _ n 1 F U U Uri Ry In conclusion in this setup we do not need to make extrapolations to the variables and then there is no need to have a structured line of nodes near the boundary It s only required to have an additional fictitious node at the boundary in order to hold the Lagrange multiplier unknowns Um and to a
52. 1 and amplitudes 1 lt j lt n entered by the user The interpolation is k tk 1 tk 0 if t lt to or t gt tn p t eer a t tk if tk lt t lt tray Pk 143 If final_time is defined then th function is extended periodically with period tn t1 The parameters are e npoints integer The number n of points to be entered e t n doubles The instants tj e f n doubles The function values e final_time double The function is extended from tn to this instant with period tn ti spline Spline interpolated function This is similar to piecewise but data is smoothly interpolated using splines e npoints integer The number n of points to be entered t n doubles The instants t f n doubles The function values period tn t spline periodic Spline interpolated periodic function final_time double The function is extended from t to this instant with Spline interpolation with piecewise may give poor results specially if you try to match smoothly the beginning and end of the period spline_periodic may give better results at the cost of being restricted to enter the data function in an equally spaced grid tj 1 t At cust j j e npoints integer The number n of points to be entered e period double The period T of the function e start_time double default 0 The first time instant t The remaining
53. 7 2 found in file adv cpp int nsave default 10 Save state vector frequency in steps found in file adv cpp int nsaverot default 100 Save state vector frequency with the rotary save mechanism see 7 2 found in file adv cpp int nstep default 10000 The number of time steps found in file adv cpp int nstep_cpu_stat default 10 Output CPU time statistics for frequency in time steps found in file adv cpp int print_internal_loop_conv default 0 Prints the convergence history when solving a consistent matrix found in file adv cpp int print_linear_system_and_stop default 0 After computing the linear system prints Jacobian and right hand side and stops found in file adv cpp 40 double rtol default 1e 3 Relative tolerance when solving a consistent matrix found in file adv cpp string save_file default outvector out Filename for saving the state vector found in file adv cpp string save_file_pattern default outvector d out The pattern to generate the file name to save in for the rotary save mechanism found in file adv cpp double start_time default 0 Counts time from here found in file adv cpp double tol_mass default 1e 3 Tolerance when solving with the mass matrix found in file adv cpp Generic elemset advecfm2 double beta_supg default 0 8 Parameter to control the amount of SUPG perturbation added to the mass matrix to be cons
54. ACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBIL ITY OF SUCH DAMAGES END OF TERMS AND CONDITIONS 2 3 Appendix How to Apply These Terms to Your New Programs If you develop a new program and you want it to be of the greatest possible use to the public the best way to achieve this is to make it free software which everyone can redistribute and change under these terms To do so attach the following notices to the program It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty and each file should have at least the copyright line and a pointer to where the full notice is found lt one line to give the program s name and a brief idea of what it does gt Copyright C 19yy lt name of author gt This program is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for more details You should have received a copy of the GNU General Public License along with
55. FEM once you write the flux function for a particular advective diffusive problem you get absorbing boundary conditions with none or little extra work There are basically two types of absorbing boundary conditions e Linear based on the Jacobian of the flux function assuming small perturbations about a reference value e Based on Riemann invariants require the writer of the flux function to provide the Riemann invariants for the flux function Needs the user to write the Riemann invariants and 50 6 12 1 Linear absorbing boundary conditions Starting with the conservation form of an advective system 2 and assuming small per turbations about a mean fluid state U x t Up U x t and no source term then we obtain the linearized form ae uy dt A Shh 42 A 42 where we assume further that the flow only depends on x the direction normal to the boundary Let S be the matrix of right eigenvectors of Ag so that AoS SA 43 where A diag A1 Ana are the eigenvalues of Ap Assuming that the system is hyperbolic then such a diagonal decomposition is possible for any state Up with real eigenvalues and a non singular matrix S Multiplying 42 at left by S7 and defining V StU we obtain a decoupled system of equations OV OV BE ar 0 44 Now the equation for each characteristic component vj of V is a simple linear transport equation with constant transport velocity A Ov Ot so that the absor
56. FunData d MyGaussFunData fun_data lt use data in d and define amplitude gt CLEAR_FUN gaussian clear allocated memory MyGaussFunData d MyGaussFunData fun_data delete d fun_data NULL Finally yet another approach is to have a wrapper class with methods init and eval The macro DEFINE_EXTENDED_AMPLITUDE_FUNCTION class_name is in charge of creating and destroying the object For instance the following file fun3 cpp defines two functions linramp and tanh_ramp File fun3 cpp include lt src ampli h gt class linramp public double f0 f1 tO t1 slope void init TextHashTable thash double eval double F void linramp init TextHashTable thash int ierr TGETOPTDEF_ND thash double t0 0 TGETOPTDEF_ND thash double t1 1 TGETOPTDEF_ND thash double f0 0 TGETOPTDEF_ND thash double f1 1 slope f1 f 0 t1 t0 double linramp eval double t if t lt tO return f0 else if t gt t1 return f1 else return f0 slope t t0 DEFINE_EXTENDED_AMPLITUDE_FUNCTION linramp 126 SS NS A lt gt lt gt class tanh_ramp public double A t0 delta base void init TextHashTable thash double eval double Je void tanh_ramp init TextHashTable thash int ierr TGETOPTDEF_ND thash double base 0 TGETOPTDEF_ND thash double A 1 TGETOPTDEF_ND thash double t0 0 TGETOPTDEF_ND thash double delta 1 assert delt
57. PETSc FEM A General Purpose Parallel Multi Physics FEM Program User s Guide by Mario Storti Norberto Nigro Rodrigo Paz Lisandro Dalcin and Ezequiel L pez Centro Internacional de M todos Computacionales en Ingenier a CIMEC Santa Fe Argentina http www cimec org ar petscfem version mstorti vi5 root 5 g62b6839 clean date Sun Apr 27 18 22 33 2008 0300 processed date Sun Apr 27 18 22 36 2008 0300 April 27 2008 This is the documentation for PETSc FEM current version mstorti vl15 root 5 g62b6839 clean a general purpose parallel multi physics FEM program for CFD applications based on PETSc PETSc FEM comprises both a library that allows the user to develop FEM or FEM like ie non structured mesh oriented programs and a suite of application programs It is written in the C language with an OOP Object Oriented Programming philosophy but always keeping in mind the scope of efficiency Contents LICENSE GNU GENERAL PUBLIC LICENSE 2i Preamble o ce eaaa a BAS ERED RG he EER ERS Ee ES 2 2 GNU General Public License Terms and Conditions for Copying Distri bution and Modification oc s s sp o ee Ae ee 2 3 Appendix How to Apply These Terms to Your New Programs The PETSc FEM philosophy 3 1 The three levels of interaction with PETSc FEM 3 2 The elemset concept sc ocio General layout of the user input data file 4 1 Preprocessing the user input data file
58. This can be applied to any number of generators p q T S For instance the symmetries for the oriented triangle can be generated with the per mutation p 1 2 0 since 2 O 1 can be generated as p The symmetries for the unoriented triangle can be derived from generators 1 2 0 rotation and 0 2 1 in version The most relevant geometrical objects and their symmetries are 0 0 2 i 2 O Figure 34 Generators for triangle e Unoriented triangles are described by numbering their nodes in any way Their symmetries are all the permutations 6 2 en total and are generated by a rotation and an inversion e Oriented triangles are described by numbering their nodes in a specified direction Their symmetries are 3 en total generated by a rotation e Unoriented quadrangles are described by numbering their nodes in clockwise or counterclockwise rotation Symmetry generators are 1 2 3 0 rotation and 0 3 2 1 inversion there is a total of 8 permutations 139 2 2 O O Figure 35 Generators for quadrangle Oriented quadrangles are identical to unoriented quadrangles but do not include the inversion 4 rotations G lt b inv D Figure 36 Generators for tetras Unoriented tetrahedra are described by numbering one of the faces and then the oposite node Symmetries are generated by permutation of any two of the faces for instance 1 2 0 3 and 3 0 2 1 and inversion 1 0 2 3 24 permu
59. This function is somewhat analogous to ramp in the sense that it goes from a starting constant value to another final one but smoothly Beware that the start and end values are reached only for t oo switch_time 1s default 0 The time at which passes by the mean value o between o and 1 time_scale 7 default none This somewhat represents the duration of the change from the starting value to the end value During the interval from ts T to ts 7 ie a total duration of 27 goes from o 0 12A to do 0 88A start_value o default 0 The limit valu for t gt oo end_value default 1 The limit valu for t 00 118 Figure 23 Smooth ramp with the hyperbolic tangent function sin Trigonometric sine function olt do A sin wt p 141 O Soe ee 9 la u t oc sin wt o t Figure 24 Sine function The parameters are e mean_val p default 0 e amplitude A default 1 e omega w default none e frequency w 27 default none e period T 27 w default none e phase y default 0 Only one of omega frequency or period must be specified cos Trigonometric cosine function g t do A cos wt p 142 The parameters are the same as for sin see 14 8 1 119 piecewise Piecewise linear function This defines a piecewise linear interpolation for a given series of time instants t ordered such that tj lt tj
60. Typically a gatherer is created by deriving from the virtual class gatherer and imple menting the set_pg_values method which is in charge of computing the integrands class gatherer ae public perform several checks and initialization void init add Gauss point contributions void set_pg_values vector lt double gt amp pg_values FastMat2 amp u FastMat2 kuold FastMat2 amp xpg FastMat2 amp Jaco double wpgdet double time e The init function may be used for initialization The TGETOPTDEF thash macro can be used for extracting options of the elemset For instance TGETOPTDEF thash double Young_modulus 0 e The set_pg_values is the main method Here the user computes the values to be integrated Its signature is 99 void set_pg_values vector lt double gt amp pg_values FastMat2 amp u FastMat2 g uold FastMat2 amp xpg FastMat2 amp n double wpgdet double time The argument pg_values are the values to be computed pg stands for Gauss point since usually this function is called by the gatherer class at the Gauss points of integration u and uold are the states at times t and t xpg are the coordinates of the Gauss point n is the normal to the surface only relevant if the dimension of the element ndimel is equal to ndim 1 wpgdet is the area of the Gauss point i e the Gauss point weight times the Jacobian of the transformation to the master element e void element_hook int k is c
61. Up used for the Riemann based absorbing boundary conditions but on the internal state That means for in stance that if the internal computational scheme tends to produce some error due to non conservativity or rounding errors the internal state would drift constantly A good compromise may be to use Riemann based 54 or linear absorbing boundary conditions 49 at inlet and absorbing boundary conditions based on last state 58 at outlet As the error tends to propagate more intensely towards the outlet boundary it is preferably to use strongly absorbing boundary conditions there whereas the linearly absorbing or Riemann invariant boundary conditions upstream avoid the drift 6 12 4 Finite element setup Assume that the problem is 1D with a constant mesh size h and time step At so that nodes are located at positions 1 kh k 0 00 Let U be the state at node j time step n FEM discretisation of the system of equations with natural boundary conditions leads to a system of the form Fo Ug UA Rot F Uj U1 U3 RIH 0 59 n 1 n 1 n ly __ n 1 F U 1 Uy Uka R where the F are non linear functions and the Ry possibly depends on the previous values Uy Imposing boundary conditions means to replace some of the equations in the first row k 0 for other equations Recall that in order to balance the number of equations and unknowns we must specify which of the equations are discarded for each
62. _combo default iisd Solver combination for the fractional step method May be iisd lu global_gmres found in file ns cpp int fractional_step_use_petsc_symm default 1 Fractional step uses symmetric matrices only CG iterative KSP found in file ns cpp string gather_file default gather out Print values in this file found in file ns cpp int LES default 0 Use the LES Smagorinsky turbulence model found in file ns cpp int measure_performance default 0 Measure performance of the comp_mat_res jobinfo found in file ns cpp int ndim default 3 Dimension of the problem found in file ns cpp vector lt double gt newton_relaxation_factor default none Relaxation parameter for Newton iteration Several values may be entered in the form newton_relaxation_factor w1 ni w2 n2 wn that means Take relaxation factor w1 for the first n1 steps w2 for the following n2 steps and so on until w_ n 1 wn is taken for all subsequent steps Normally one takes a conservative said 0 5 relaxation factor for the first steps and then let full Newton i e w 1 for the rest For instance the line newton_relaxation_factor 0 5 3 1 means take w 0 5 for the first 3 steps and then use w 1 found in file ns cpp int nfile default 1 Sets the number of files in the rotary save mechanism see 7 2 found in file ns cpp int ngather default 0 Number of gathered quantities found in
63. _dofmap_id default 0 Prints the dofmap idmap object found in file readmesh cpp e int print_hostnames default 0 Print hostnames for nodes participating in this run found in file readmesh cpp e int print_partitioning_statistics default 0 Print graph statistics found in file readmesh cpp 4 4 2 Elemset options This options are used in the Elemset class e int chunk_size default ELEM_CHUNK_SIZE Chunk size for the elemset found in file elemset cpp e int debug_compute_prof default 0 Debug the process of building the matrix profile found in file elemset cpp e int element_weight default 1 Element weight for the processor found in file elemset cpp e double epsilon_fdj default EPSILON_FDJ The increment in the variables in order to compute the finite difference approxima tion to the Jacobian Should be order epsilon sqrt precision typical magnitude of the variable Normally precision 1e 15 so that epsilon 1e 7 typical magnitude of the variable found in file elemset cpp e int print_local_chunk_size default 0 Print the local chunk size used for each elemset in each processor for each chunk found in file elemset cpp e int report_assembly_time default 0 Debug the process of building the matrix profile found in file elemset cpp e int report_consumed_time default 0 Report consumed time for the elemset Useful for building the table of weights per processor
64. _load elemset are e int double_layer default 0 Whether there is a double or single layer of nodes found in file genload cpp e string geometry default cartesian1d Type of element geometry to define Gauss Point data found in file genload cpp e double var_len hfilm_coeff default no default Defines coeffcients for the film flux function May be var_len 0 no AT driven load or var_len ndof ndof a full matrix of relating the flux with AU found in file linhff cpp e double var_len hfilm_source default no default Defines constant source term for the generic load on surfaces May be of length 0 null load or ndof which represents a geven load per field found in file 1inhff cpp e int ndimel default ndim 1 The dimension of the element found in file genload cpp 12 2 Functional extensions of the elemset The generic elemset is GenLoad There is an instantiation for the Generic load elem sets can be extended by the user by deriving the base class GenLoad and the most im portant task is to implement the methos q u flux jac for the single layer case and q u_in u_out flux_in flux_out jac in the double layer case 12 3 The flow reversal elemset This instantiation of GenLoad is useful for avoiding instabilities caused by infersion of the flow at an outlet boundary Assume that the Navier Stokes module computes a velocity field v and some scalar is being advected with this velocity field On an outlet bo
65. a gt 0 double tanh_ramp eval double t if t lt t0 return base else return base A tanh t t0 delta DEFINE_EXTENDED_AMPLITUDE_FUNCTION tanh_ramp 14 8 6 Time like problems The mechanism of passing time to boundary conditions and elemsets may be used to pass any other kind of data since it is passed as a generic pointer void time data This may be used to treat other problems where an external parameter exists for instance Continuation problems For instance a NS solution at high Reynolds number may be obtained by continuation in Re and then we can use it as the external tome like parameter In general we can perform continuation in a set of parameters so that the time_data variable should be an array of scalars Multiple right hand sides Here the time like variable may be an integer the number of right hand side case 15 The compute_prof package 15 1 MPI matrices in PETSc When defining a matrix in PETSc with MatCreateMPIAIJ you tell PETSc see the PETSc documentation e How many lines of the matrix live in the actual processoor m 127 e For each line of this processor how many non null elements it has in the diagonal block d_nnz j e For each line of this processor how many non null elements it has in the off diagonal block o_nnz j In PETSc FEM this quantities are computed in two steps e Call assemble with ijob COMP_MAT_PROF or ijob COMP_FDJ_PROF and
66. a printf like semantics in the following way include lt src syncbuff h gt include lt src syncbuff2 h gt KeyedOutputBuffer kbuff int key for E vel Qed Load internal string with printf and cat_printf functions kbuff printf one line d d d fWn i j k a kbuff cat_printf other line d d dWn m n p Push in buffer with key q push also clears the internal string kbuff push q kbuff flush If you want to dump the buffer on another stream like a file for instance then you have to set the static FILE KeyedLine output field of the KeyedLine class Also the static int KeyedLine print_keys flag controls whether the key number is printed at the beginning of the line or not For instance the following code sends output to the output dat file without key numbers 146 FILE out fopen output dat w KeyedLine output out KeyedLine print_keys 0 kbuff flush fclose out Also the member int KeyedOutputBuffer sort_by_key default 1 controls whether to sort by keys prior to printing then you can set kbuff sort_by_key 0 if you want to disable sorting Some notes regarding usage of this class are e You can have several SyncBuffer lt gt s or Keyed0utputBuffer s at the same time and you can flush them independently e Memory usage All items sent to the buffer with push are kept in memory in a temporary buffer When flush O is cal
67. a report of the times a given option was accessed Useful for detecting if an option was used or not found in file ns cpp int reuse_mat default 0 Use fractional step or TET algorithm found in file ns cpp string save_file default outvector out The name of the file to save the state vector found in file ns cpp string save_file_pattern default outvector d out The pattern to generate the file name to save in for the rotary save mechanism found in file ns cpp string save_file_some default outvsome out Name of file where to save node values for the print some feature found in file ns cpp int save_file_some_append default 1 Access mode to the some file If 0 rewind file If 1 append to previous results found in file ns cpp int solve_system default 1 Solve system before print_linear_system_and_stop found in file ns cpp string solver default petsc Type of solver May be iisd or petsc found in file ns cpp string solver_mom default petsc Type of solver for the projection and momentum steps fractional step May be iisd or petsc found in file ns cpp double start_comp_time default 0 Time to start computations found in file ns cpp string stdout_file default none If set redirect output to this file found in file ns cpp 65 int steady default 0 Flag if steady solution or not uses Dt inf If steady is set to 1 then the computations are as i
68. al PETSc matrix in each processor for the local part 133 of the Azz block The Azz block must be defined as sequential because otherwise we couldn t factorize it with the LU solver of PETSc However this constrains that MatSetValues has to be called in each processor for the matrix that belongs to its block i e elements in a given processor shouldn t contribute to Azz elements in other processors Normally this is so but for some reasons this condition may be violated One is periodic boundary conditions and constraints in general they are not taken into account for the partitioning Another reasons is very bad partitioning that may arise in some not so common situations Consider for instance figure 28 Due to bad partitioning a rather isolated element e belongs to processor 0 while being surrounded by elements in processor 1 Now as nodes are assigned to the highest numbered processor of the elements connected to the node nodes p g and r are assigned to processor 1 But then nodes q and r will belong to the local subset of processor 1 but will receive contributions from element e in processor 0 However the solution is not to define this matrices as PETSc because so far PETSc doesn t support for a distributed LU factorization The solution we devised is to store those Ary contributions that belong to other processors in a temporary buffer and after to send those contributions to the correct processors directly with MPI messages Thi
69. alculate the characteristic size of the element e Matrix amp H input size 1 x ny In the shallow water the source term G depends on the gradient of the depth H and in the 1D Euler equations on the area of the tube section A z This is taken into account by PETSc FEM by assuming that the user enters in the nodedata section nu Ndim ny quantities per node where the first nqim quantities are the node coordinates and the rest are assumed that are node data that has to be passed to the flux function routine together with its gradient in order to compute the source term e Matrix amp grad_H input size Naim X ny the gradient of the quantities in H see previous entry e Matrix flux output size Ndof X Nadim Each column is the vector Fj of fluxes for each of the governing equations e vector lt Matrix gt A_jac output size a vector Of ngim pointers to matrices of Ndof X Naor Each matrix is the jacobian matrix as defined by 5 To access the jd jacobian matrix you may write A_jac jd 1 The macro AJAC jd defined in advective h expands to this 38 e Matrix amp A_grad_U output size Ngor X 1 compute if options amp COMP_UPWIND This is the accumulation term Az 9U 0x aa e Matrix amp grad_U input size Ndim X Naot The gradient of the state vector e Matrix amp tau_supg output size either 1 x 1 or ngof X Ndor compute if options amp COMP_UPWIND This is the 7 intrinsic time scale it may be eith
70. alled before the Gauss point hook and then can be used in order to pre compute some stuff for all the Gauss points k is the element number e void clean can be used after processing all the elements and doing some cleanup 12 Generic load elemsets Generic load elemsets account for surface contributions which represent constant terms in the governing equations or either a function of the state at the surface Typical terms that can be represented in this way are e External heat loads like a constant radiation load in thermal problems q q e A linear Newtonian cooling term q A T Tx e A nonlinear Newtonian cooling term q f T T 12 1 Linear generic load elemset Figure 14 Generic load element single layer In the simplest case the load is of the form q hU q single layer 117 q h Uout U q double layer un This is implemented by the lin_gen_load elemset This elemset may be single layer or double layer see figures 14 and 15 Double layer elements can represent a lumped thermal resistance for instance a very thin layer of air inside a material of higher con ductivity In the double layer case the number of nodes is twice than in the single layer 100 Figure 15 Generic load element double layer case Double layer is activated either by including the double_layer option or either if the numer of nodesis twice that one specified by the geometry The options for the lin_gen
71. ally an STL vector of pointers to cache objects When the body of the loop is executed the second time and the following the adresses are not needed to be recomputed but they are read from the cache instead The use of the cache is rather automatic and requires little intervention by the user but in some cases the position in the cache list can get out of sync with respect to the execution of the operations and severe errors may occur 77 The typical use can be seen in the example shown in section 89 2 First we have to create a FastMatCacheList object as shown in line 3 and activate it with as in line 4 The outer loop here is the loop over elements For the second and following executions of the body of the loop you must rewind the cache list this is done in line 8 with the FastMat2 reset_cache operation see figure 9 The deactivate_cache call at line 43 causes that subsequent operations after this line will not be cached This call is required if subsequent calls to FastMat2 cached operations will be executed Otherwise there will be an error since those posterior calls will not have a corresponding cache in the cache list The void_cache call deletes the cache claiming the space used by it It s not required but it is a good idea in order to save memory space It may be required if the loop will be called with other dimensions For instance a FEM loop may be called first for triangles and then for quadrangles and in this two
72. am into other free programs whose dis tribution conditions are different write to the author to ask for permission For software which is copyrighted by the Free Software Foundation write to the Free Software Foundation we sometimes make exceptions for this Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally 11 WARRANTY BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE THERE IS NO WARRANTY FOR THE PROGRAM TO THE EXTENT PER MITTED BY APPLICABLE LAW EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND OR OTHER PARTIES PRO VIDE THE PROGRAM AS IS WITHOUT WARRANTY OF ANY KIND EL THER EXPRESSED OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE THE ENTIRE RISK AS TO THE QUALITY AND 10 PERFORMANCE OF THE PROGRAM IS WITH YOU SHOULD THE PRO GRAM PROVE DEFECTIVE YOU ASSUME THE COST OF ALL NECESSARY SERVICING REPAIR OR CORRECTION 12 INNO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY MODIFY AND OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE BE LIABLE TO YOU FOR DAMAGES INCLUDING ANY GENERAL SPECIAL INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED IN
73. an be performed simultaneously so that each stage can be performed in 2T sending plus receiving This nproc 2 stages take then a total of 2 Nproc 2 T NprocT secs Now all communication between processors in S and S2 has been performed it only remains to perform the communication between processors in S and processors in S But then we can apply this idea recursively and divide the processors in S4 in two subsets S11 and S12 with nproc 4 each let s assume that the number of processors is a power of 2 Mproc 2 with a required time of Nproc 2 T Applying the idea recursively we arrive to an estimation of a total time of T n Matos Nproc 2 17 2Nnproc DT 161 an O 2NprocT Then it is significantly better than the previous algorithm Scheduling algorithm for exchanging data between processors void exchange j if myrank gt j 137 send data to processor j receive data from processor j else send data to processor j receive data from processor j 17 3 Mesh refinement Warning This is a work in progress FEM mesh Figure 32 FEM mesh and graph representation Figure 33 Geomtrical objets 5 nodes and an edge We conceive a mesh as a graph connecting nodes and other higher dimension entities based on these nodes Consider for example five nodes labeled from 0 to 40 and an edge connecting two of these nodes as in figure 33 In total there are 6 geome
74. and Ch a and n are model constants G represent the gain or loss of the stream and the main component is the loss to the aquifer Gs P Ry b h u 19 where Ry is the resistivity factor per unit arc length of the perimeter The corresponding gain to the aquifer is Ga Gs or 20 where I represents the planar curve of the stream and dp is a Dirac s delta distribution with a unit intensity per unit length i e L f x r dE f f x s ds 21 The stream_loss elemset represents this loss and a typical discretization is shown in figure 4 The stream loss element is connected to two nodes on the stream and two on the aquifer and must be entered in that order in the element connectivity table for instance elemset stream_loss 4 lt elemset properties gt __END__HASH__ lt n5 gt lt n4 gt lt nl gt lt n2 gt __END__ELEMSET__ 6 7 1 Related Options e double a_bar default 1 Unit conversion factor for Manning friction law found in file e double Bi default lt required gt Width of the channel found in file e double Ch default lt required gt Chezy roughness coefficient found in file 44 double diameter default lt required gt geometry of the channel found in file string friction_law default string undefined Choose friction law may be manning or chezy found in file int impermeable default 0 Flag whether the element is impermeable Rf 00
75. and info lookup modes 27 5 Text hash tables In many cases options are passed from the user data file to the program in the form of small databases which are called text hash tables This consist of several lines each one starting with a keyword and followed by one or several values When reading this data in the read_mesh routine the program doesn t know anything about neither the keywords nor the appropriate values for these keywords Just when the element is processed by the element module via the assemble function call for instance assembling residuals or matrices the element routine written by the application writer reads values from this database and apply it to the elemset The text hash table is stored internally as a correspondence between the keys and the values both in the form of strings The key is the first space delimited token and the value is the rest of the string from the end of the space separator to the end of the line Usually the values are strings like a C identifier chunk_size for instance As the values are stored in the form of strings almost any kind of values may be stored their for instance non_linear_method Newton string value max_iter 10 integer value viscosity 1 2e 4 double value or lists and combinations of them It s up to the application writer to decode this values at the element routine level Several routines in the getprop package help in getting this see 85 4 S
76. angential components and one prescribed normal component The normal component may be prescribed to some non null value if transpirations fluxes are to be imposed The corresponding equations may look like this Uj Ujtle UjnNe 125 Vj Ugtly Ujn My where ujt is the free tangential component and jn the prescribed normal compo nent Figure 18 Symmetrical arrangement of foils with specular symmetry Periodic boundary conditions These kind of b c s are useful when treating geometries with repetitive components as is very common for rotating machinery In some cases this can be done as slip boundary conditions see figure 18 Here the foils are symmetric about their centerline and then the whole geometry not only posses 109 Figure 19 Non symmetrical arrangement of foils symmetry of rotation about angles multiple of 27 Nfoiis but in addition possess re flection symmetry about the centerline of a foil like BC and the centerline between two consecutive foils like AB In consequence it suffices the domain ADEC with inlet outlet b c s in DE and AC slip boundary conditions on those parts of AD and EFGHC On the foil boundary FGH pressure should be free while on AD EF and HC we can impose Op 0n 0 For Navier Stokes we may impose solid wall b c s on the foil normal null velocity component on AD EF and HC null tangential shear stress 9u 0n 0 and Op On 0 However if the foils are non s
77. as the Domain Decomposition Method 16 1 The PFMat abstract interface The create member should create the matrix from the matrix profile da and the dofmap For the PETScMat matrix class it calls the old compute_prof routine calculating the d_nnz and o_nnz arrays and calling the MatCreate routine For the IISD matrix it has to determine which dofs are in the local blocks and create the appropriate numbering The set_value member is equivalent to the MatSet Values routine and allows to enter values in the matrix For the IISDMat class it sets the value in the appropriate block Arz Art Apr or Arr PETSc matrices In addition for the Azz local local block it has to buffer those values that are not in this processor this can happen when dealing with periodic boundary conditions or bad partitionings for instance an element that is connected to all nodes that belong to other processor This last case is not the most common but it can happen Once you have filled the entries in the matrix you have to call the assembly_begin and assembly_end members as in PETSc The solve member solves a linear ssytem associated to the given operator The zero_entries is also the counterpart of the corresponding PETSc routine The build_sles member creates internally the SLES needed by the solution included the preconditioner It takes as an argument a TextHashTable from where it takes a series of options The d
78. at the boundary The second node is the node with Lagrange multipliers and the third node is used to set the reference value The data would look like this see figure 6 elemset gasflow_abso2 3 normal lt nx gt lt ny gt __END_HASH__ lt nl gt lt n2 gt lt n3 gt __END_ELEMSET__ 58 absorbing element E lt n lagrange multipliers O n 1 10 n y reference state outgoing wave Figure 6 Absorbing element end_elemsets fixa lt n4 gt 1 lt rho_ref gt lt n4 gt 2 lt u_ref gt lt n4 gt 3 lt v_ref gt lt n4 gt 4 lt p_ref gt __END_FIXA__ e As before the normal property is used for computing the direction normal to the boundary e If the use_old_state_as_ref flag is set to true 1 then the reference state is taken as the state o the state of the boundary node at the previous time step In this case the state of the third node is ignored On the other hand if it is set to false 0 then the state of the third node is used as the reference state e This type of boundary condition doesn t need the implementation of the Riemman invariants methd but it needs the methods get_Cp and get_Ajac 7 The Navier Stokes module 7 1 LES implementation The Smagorisky LES model for the Navier Stokes module follows The implementation under PETSc FEM has presents the following particularities e Wall boundary conditions are implemented as mixed type e The van Driest damping factor int
79. atik Graz Austria Financing agency CONICET CEE Begin 1997 End 1999 Key TUCANO Title Proyecto Alpha de la Comisi n Europea TU CANO Transatlantic University Industry Cooperation Director S R Idelsohn y S Mac Neill Univ de Birmingham Financing agency CONICET CEE Begin 1997 End 1999 Key CONICET FNRS Title Convenio CONICET Fonds National de la Recherche Scientifique FNRS entre el Grupo de Tecnolog a Mec nica del INTEC y el Laboratoire de T chniques A ronautiques et Spatiales Universidad de Lieja B lgica Investigaci n en Mec nica Computacional Director A Cardona y M G radin Financing agency CONICET FNRS B lgica Begin 1996 End 1997 Key CAI D 94 Code CAI D 94 004 024 Title M todos Num ricos en Mec nica de S lidos y Fluidos Director S R Idelsohn y A Cardona Financing agency Universidad Nacional del Litoral UNLit Begin 1994 End 1995 Key GERMEN Code PICT 51 Title FONCyT PICT 51 GERMEN GEneraci n de Recursos b sicos para la aplicaci n de los MEtodos Num ricos Director Dr Sergio Idelsohn Financing agency FONCyT Begin 1998 End 2001 149 1 Key COLA Code PID 026 Title Simulaci n Num rica de Procesos de Colada Continua Director S R Idelsohn Financing agency SECYT BID Be gin 1996 End 1999 21 Symbols and Acronyms 21 1 Acronyms CFD Computational Fluid Dynamics DX IBM Data Explorer ePerl embedded Perl FDM
80. atrices block_after Then the cache list generated in the first execution of the loop will be block_before inner_block inner_block inner_block block_after and this is OK since the number of times inner_block is executed is always the same If the operations that are performed inside the loop are the the same for all executions of the loop but are executed an irregular number of times then we can use a sequence as follows 82 Case B Inner code executed a variable number of times FastMatCachePosition cpl for int k 0 k lt N k N very large Outer loop block_before FastMat2 get_cache_position cp1 int n irand 1 5 for int 11 0 11 lt n 11 FastMat2 jump_to cp1 inner_block Operations that act on the same matrices FastMat2 resync_was_cached block_after Here the number of times the inner block is executed may vary randomly from 1 to 5 irand m n returns an integer number randomly distributed between m and n The FastMatCachePosition class objects store the position of the actual computation in the cache list So that the call to jump_to at the start of the loop restarts the position in the cache to the desired one After leaving the loop we call to resync_was_cached in order to resync the cache list This is OK if the inner loop is executed at least once the in the first execution of the outer loop If it happens that in the first execution of the loop the inner loop is not ent
81. bing boundary condition is almost evident Assuming that we want to solve the equation on the semiplane x gt so that x 0 is a boundary then the corresponding absorbing boundary condition is A 0 45 fe 0 if A gt 0 ingoing boundary 46 vjextrapolated from interior otherwise outgoing boundary This can be summarized as TL Vo 0 47 where II diag 1 sign A 2 48 is the projection matrix onto the space of incoming waves As the II is a diagonal matrix with diagonal elements 1 or 0 it trivially satisfies the projection condition 1111 rr Coming back to the U basis we obtain the following first order linear absorbing bound ary condition TI Uo U 0 Up 0 49 where 1 su7s 50 This condition is perfectly absorbing for small amplitude waves around the state Up The main problem with it is that as the limit state at the boundary U is not known a priori we have to use some a priori chosen state Up 4 U and then the projection matrix used IT U will not be equal to I U and then not fully absorbing We call Up the reference state for the absorbing boundary condition It can even happen that the eigenvalues for the actual state at the boundary change sign with respect to the reference state 5l 6 12 2 Riemann based absorbing boundary conditions Let n n be the number of incoming outgoing waves i e the number of positive negative eigenvalues of Ap and assume that th
82. boundary conditions 51 6 12 2 Riemann based absorbing boundary conditions 52 6 12 3 Absorbing boundary conditions based on last state 53 6 12 4 Finite element setup e 53 6 12 5 Extrapolation from interiors lt c e ete saws e eaga teti a 54 6 12 6 Avoiding extrapolation ooo e e eee ee 55 6 12 7 Flux functions with enthalpy oaoa o 56 6 12 8 Absorbing boundary conditions available 57 7 The Navier Stokes module 59 7 1 LES implementation e ke a ee eee 59 GLI The wallelemeet s ok ca opari ao ee eee a 60 7 1 2 The mixed type boundary condition 60 7 1 3 The van Driest damping factor Programming notes 61 T2 Options lt ciar PS ea aea d ee ee eae eS 62 Ta Mesh WOOTEN o se s eaor RR A A 69 8 Tests and examples 71 8 1 Flow in the anular region between to cylinders 72 8 2 Flow in a square with periodic boundary conditions 72 8 3 The oscilating plate problem 0 020000 ee eee 72 8 4 Linear advection diffusion in a rectangle 4 74 9 The FastMat2 matrix class Gi o poe oa k a E eS ee ee ae ee 92 Example cc ee Re wR ee a eee a i ee es 921 Current Matrix VIEW cocos Oe ee a O22 SetoperationS lt a A AR A eae a O25 Dimensi n matching sa se s soars a Ms RA aana A a 9 2 4 Automatic dimensioning 00002 eee 9 2 5 Concatenation of operations
83. calls the dimensions of the involved matrices are different The general layout of a code section which uses caching is like this Initialization Not cached operations FastMatCacheList cache_list Cache list object FastMat2 activate_cache amp kcache_list activates the cache for j 0 j lt N j Outer loop N very large number FastMat2 reset_cache Rewinds the cache list op_1 op_2 op_3 Cached operations op_N End of outer loop FastMat2 void_cache free memory FastMat2 deactivate_cache deactivates cache 9 3 1 Branching Caching is straightforward if the sequence of operations are executed linearly with the same local variables without branching or looping Special operations have to be executed if there are branching conditions if sentences that alter the order of execution of the operations in the loop For instance an if sequence like this op_prev_1 operations previous to the branch 78 reset_cache Figure 9 Cache of operations for linear segment of code op_prev_2 op_prev_k if lt condition 0 gt op_0_1 op_0_2 conditional block for branch 0 op_0_N else if lt condition 1 gt 1 op_1_1 op_1_2 conditional block for branch 1 op_1_N else op_1_1 op_1_2 conditional block for the else branch op_1_N 79 op_pos_1 operations posterior to the bran
84. ch op_pos_2 op_pos_k If branch 0 is followed in the first execution of the block then the cache will look like that shown in figure 11 When in a subsequent execution of the loops another branch is branch point Ta op_0_1 op_1_1 op_e_1 Figure 10 Cache list produced when branch 0 is chosen chosen say branch 1 then when reading trying to execute operation op_1_1 the library will find in the cache list a cache corresponding to operation op_0_1 This is solved by creating branch points in the cache list and choosing the appropriate branch as shown in the following code see figure 10 op_prev_1 operations previous to the branch op_prev_2 op_prev_k FastMat2 branch if lt condition 0 gt 80 op_prev_1 y op_prev_2 op_pos_N Figure 11 Cache list produced when branch 0 is chosen FastMat2 choose 0 op_0_1 op_0_2 conditional block for branch 0 op_0_N else if lt condition 1 gt 4 FastMat2 choose 1 op_1_1 op_1_2 conditional block for branch 1 op_1_N else 4 FastMat2 choose lt N gt op_e_1 op_e_2 conditional block for the else branch op_e_N FastMat2 leave 81 op_prev_1 operations posterior to the b
85. characters and a value composed of the rest of the line In the previous example Key name Value My Mesh Key ndim Value 2 1 Key viscosity Value 1 and so on This hash table is read in an object of type TextHashTable without checking whether this properties apply to the particular elemset or not The values are stored strictly as string so that no check is performed on the syntax of entering double values or integers 5 2 Text hash table inclusion Often we have some physical or numerical parameter that is common to a set of elemsets for instance gravity in shallow water or viscosity in Navier Stokes In this case these common properties can be put in a separate table with a table header and included in the elemsets with _table_include directives The table sections not associated to a particular elemset have to be put in preference before the calling elemset for instance table steel_properties density 13 2 viscosity 0 001 lt more steel properties here gt __END_HASH__ 29 elemset nsi_tet 4 _table_include steel_properties npg 8 weak_form 1 lt more properties for this elemset here gt __END_HASH__ 1342 lt more element connectivities here gt Text hash tables may be recursively included to any depth The text hash table for an elemset may be included in other elemset referencing it by its name entry for instance elemset nsi_tet 4 name volume_elemset_1 geom
86. ci n de metales en estado L quido y sus efectos Termomec nicos Director A Cardona Financing agency ANPCyT FONCyT Begin 2000 End 2002 Key FLAGS Code PID 99 74 Title FLAGS Simulaci n num rica en gran escala de la interrelaci n entre el FLujo de Aguas Superficiales y el FLujo de AGuas Subterr neas Director S Idelsohn Financing agency ANPCyT FONCyT Begin 2001 End 2004 Key GERMEN CFD Code PIP 0198 98 Title Germen CFD GEneraci n de Recursos b sicos para la aplicaci n de los M todos Num ricos en din mica de fluidos computacional Director M Storti Financing agency CONICET Begin 1999 End 2001 Key PELCFD Code PEI 231 97 Title PEI 231 CONICET Dise o mec cnico asistido por CFD Director N Nigro Financing agency CONICET Begin 1998 End 1999 Key PEI NAVAL Code PEI 232 97 Title PEI Nro 232 CONICET M todos Num ricos en Hidrodin mica Naval y Costera Director M Storti Financing agency CONICET Begin 1998 End 1999 Key SINUS PIM B Title Proyecto Alpha de la Comisi n Europea Sinus Pim B Modelisation et Simulation Numeriques en Ingenierie Mecanique Director S R Idelsohn y V Ruas Univ Paris VI Laboratoire de modelisation en Mecanique Financing agency CONICET CEE Begin 1997 End 1999 Key CMES Title Proyecto Alpha de la Comisi n Europea CMES Computer Methods in Engineering Science Director S R Idelsohn y G Beer Institut fiir Baust
87. cobians and other quantities This means that you don t need to code details of the numerical discretization Follow these steps 1 Create the flux function in a file by itself in the applications advective directory The better is to start copying one of the existing advective sys tems for instance ffeuler cpp or ffshallw cpp The arguments to flux function routines is described in section 6 6 The name of the function has to be of the form flux_fun_ lt system gt where lt system gt identifies the new sys tem We assume that you write the flux function flux_fun_new_adv_sys in file applications advective ffnadvs cpp 2 Add the file in the MYOBJS variable in the Makefile for instance MYOBJS advective o adv o absorb o ffeuler o ffshallw o ffadvec o 3 Define the new derived classes volume_new_adv_sys and absorb_new_adv_sys by adding a line at the end of the file applications advective advective h as in the following example Add here declarations for further advective elemsets Euler equations for inviscid Gas dynamics eqs ADVECTIVE_ELEMSET euler Shallow water equations ADVECTIVE_ELEMSET shallow Advection of multiple scalar fields with a velocity field ADVECTIVE_ELEMSET advec My new advective system ADVECTIVE_ELEMSET new_adv_sys lt Add this line 4 Recompile 6 6 Flux function routine arguments Currently the interface is the following 37 typedef int FluxFunct
88. code via the per element table is activated with the pass_values_as_props option In the last two cases the number of values to be computed by the gatherer can be set via the store_values_length option The summary of relevant options is e pass_values_as_props activates the mechanism of passing per element computed values as per element properties e store_values_length is an integer indicating how many values are computed by the gatherer It defaults to gather_length However if the standard mechanism of the gather_values is deactivated then the number of computed values must be passed through this option e store_in_property_name is a string that identifies the per element property that must be filled with the computed values Of course the user must define such property with the appropriate size The values in the connectivity table are irrelevant for instance null values and will be overwritten by the gatherer e compute_densities Ifcompute_densities 0 then the value set in the per element property is the integral of the integrand over the element That is if the integrand is the computed value for the element is value for element e O 115 Qe Conversely if compute_densities 1 the value set is the density of the mean value of the integrand i e 7 1 e Rel Jo O 116 value for element e 0 For instance if the integrand is the heat flow through the surface then ifcompute_densities 0 then the value set in
89. compared with the analytical one in figure 8 4s moving plate r plate at rest AOS ss S u A Figure 7 Oscillating plates l pe A oS N A ee eee et eee a As Figure 8 Velocity profile for the oscilating plate problem 73 8 4 Linear advection diffusion in a rectangle File sine ep1 This is an example for testing the advdif module The governing equa tions are 0 oo 4 ule 0 in0 lt 0 lt Loly lt 00 1 gt 0 A cos wt cos ky at x 0 t gt 0 rl a 0 atxr TL t gt 0 On 6 0 10 lt w lt Loly lt t 0 As the problem is perdiodic in the y direction we can restrict the analysis to one quart wave length i e if Ay 27 k Ly 1 4 then the above problem is equivalent to o o L 4 ul DAG in 0 lt s lt La 0 lt y lt Lyt gt 0 o A cos wt cos ky at xz 0 t gt 0 do N e On 0 in0O lt r lt L 0 lt y lt Ly t gt 0 0 at y Ly op n 112 0 at y 0 We can find the solution proposing a solution of the form bx eft iwt Replacing in 112 we arrive to a characteristic equation in kg of the form iw pu k 9 113 This is a quadratic equation in 8 and has two roots say 12 The general solution is p x y t Re cie coe elo 9 The FastMat2 matrix class 9 1 Introduction Finite element codes usually have two levels of programming In the outer level a large vector describes the state of the p
90. cond derivatives of the distortion functional the residual and Jacobian This cost may be further reduced by noting that the eigenvalues are invariant under rotations and translations and simply scaled by a dilatation or contraction so that from the nen Ndim Ndim 1 displacements only Ndim Ndim 1 2 1 should be really computed but this is not implemented yet For tetras in 3D this implies a reduction from 12 distortion functional computation to only 5 We will show now how the derivatives of the eigenvalues are computed analytically It can be shown that OXg Gij Ugi Ugj 103 OLpk Uqi OL pk Ugj 103 Note that this is as differentiating 98 but only keeping in the right hand side the change in the matrix G and discarding the rate of change of the eigenvectors It can be shown that the conttribution of the other two terms is an antisymmetric matrix so that the contibution to the rate of change of the eigenvalue is null Now from 94 aC ON ON 7 Nu y Wu 104 OF yj Ju o T dj Og o so that as JN U a EM yas O2 pl va Jui 3E vw gt 105 which is the expression used 8 Tests and examples In this section we describe some examples that come with PETSc FEM They are inten tionally small and sometimes almost trivial but they are very light weight and can be run in a short time at most some minutes They are put in the test directory and can be run with make tests command The purpose of thi
91. d parties are not compelled to copy the source along with the object code 4 You may not copy modify sublicense or distribute the Program except as expressly provided under this License Any attempt otherwise to copy modify sublicense or distribute the Program is void and will automatically terminate your rights under this License However parties who have received copies or rights from you under this License will not have their licenses terminated so long as such parties remain in full compliance 5 You are not required to accept this License since you have not signed it However nothing else grants you permission to modify or distribute the Program or its deriva tive works These actions are prohibited by law if you do not accept this License Therefore by modifying or distributing the Program or any work based on the Pro gram you indicate your acceptance of this License to do so and all its terms and conditions for copying distributing or modifying the Program or works based on it 6 Each time you redistribute the Program or any work based on the Program the recipient automatically receives a license from the original licensor to copy distribute or modify the Program subject to these terms and conditions You may not impose any further restrictions on the recipients exercise of the rights granted herein You are not responsible for enforcing compliance by third parties to this License 7 If as a consequence of a cou
92. dd the absorbing boundary equation first row of 72 for these nodes 6 12 7 Flux functions with enthalpy When the flux function has an enthalpy term that is not the identity then the expressions for the change of basis are somewhat modified and also the projectors An advective diffusive system with a generalized enthalpy function H U is an extension of the form 2 and can be written as OF U _ ay AU es 73 The heat equation can be naturally put in this way Also the gas dynamics equations for compressible flow can be put in this form if we put the equations in conservative form but use the primitive variables U p u p as the main main variables for the code This has the advantage of using a conservative form of the equations and at the same time allows an easy imposition of Dirichlet boundary conditions that are normally set in terms of the primitive variables In this case U are the primitive variables and the generalized enthalpy H U is the vector of conservative variables We call the generalized heat content matrix Cp as OH U e 4 and 2 can be put in quasi linear form as OU OU iae oa 2 56 Note that this can be brought to the quasi linear form 42 i e without the Cp if we multiply the equation at left by C 1 and define new flux Jacobians as A C Az 76 So that basically the extension to systems with generalized enthalpy is to replace the Jacobians by the modified
93. default cartesian2d Type of element geometry to define Gauss Point data found in file bccnsfm2 cpp e int weak_form default 1 Use a weak form for the gradient of pressure term found in file bccnsfm2 cpp Elemset wall e string geometry default cartesian2d Type of element geometry to define Gauss Point data found in file wall cpp e double rho default 1 Density found in file wall cpp e double y_wall_plus default 25 The y coordinate of the computational boundary found in file wall cpp Elemset ns_sup_g This element imposes a linearized free surface boundary condition e int LES default 0 Add LES for this particular elemset found in file nssupg cpp e double free_surface_damp default 0 Cif free_surface_set_level_factor tries to keep the free surface level constant by adding a term 7 to the free surface level see doc for free_surface_damp The equation of the free surface is dn dt w 92 where 7 elevation and w is the velocity component normal to the free surface We modify this as follows d Ceg n T C C damp An wW 93 68 Where 7 is the average value of eta on the free surface and Caamp free_surface_damp smoothes the free surface adding a Laplacian filter Note that if only the temporal derivative and the Laplace term are present in 93 then the equation is a heat equation A null value which is the default means no filtering A high value
94. define the energy functional to be minimized as a function of the associated matrix tensor If Jj Ox O amp where x are the spatial coordinates and the master coordinates then the metric tensor is Oxy OX 1 DE OG The Jacobian of the transformation may be computed by differentiating the interpolated displacements 94 D aN 95 H with respect to the master coordinates so that Jig tu 96 9 DE 2 de We want to compute the distortion function to be minimized as a function of the eigen values of the metric tensor Gij The eigenvalues v4 of G are such that GijUgj Aqvaj 97 so that Ag Uqi Gij Ugj 98 no sum over q since the v are orthogonal Now if the functional F is a function of the element node coordinates through the eigenvalues A then we can compute the new coordinates 2 as F 7 0 99 OT pj A possible distortion functional is F Ag gq 8 Y y Ar 100 qr This functional has several nice features Is minimal whenever all the eigenvalues are equal the distortion is minimal It is non dimensional so that an isotropic dilatation or contraction doesn t produce a change in the functional The non linear problem 99 can be solved by a Newton strategy by computing the first and second derivatives of F with respect to the node displacements the residual and Jacobian of the system of equations by finite difference approximations However this turns to be too costly i
95. e previous one making the whole process Nproc T where T is the time needed to send a typical individual buffer from one process to other 136 Processor 0 1 2 3 4 5 6 7 O Qe PO Seay es oe gt Q x E 3 E E p E r ee ee le S Ee os 0 3 E EO lt q L E a A POP a as en time EQ lt 0 E Eos 0E a E9 E E 9 0E E9S 0F gt Fo lt 0 E i C Eo lt Fox p E slo E E exchange Figure 31 Improved scheduling algorithm for transferring data among processors An improved strategy consists in a multilevel scheduling algorithm Assume first that proc is even and divide processors in subsets S 0 1 Mproc 2 1 S2 Nproc 2 Nproc 2 1 proc 1 We first exchange all information between processors in S with those in S2 in Nproc 2 stages In stage O see figure 31 processor O exchanges data with n 2 1 with nproc 2 1 and n 2 1 with nproc i e processor O calls exchange n 2 and processor n 2 calls exchange 0 the code of procedure exchange is shown in the procedure below In the stage 1 processor 0 exchanges with Nproc 2 2 1 with Nproc 2 3 Mproc 2 2 with Nproc and Nproc 2 1 with Nproc 2 i e processor j exchanges with n 2 j 1 Nproc 2 In general in stage k processor j exchanges with processor n 2 j k Nproc 2 Note that in each stage all communications are paired and c
96. e U Up we can proved to be perfectly absorbing since as U gt Uso at the boundary we can expand each of the conditions around U and it would result in an equation similar to 52 but centered about Ugo LU Ue 0 j 1 n 56 The problem is that in general the differentials are not exact Riemann invariants are functions that satisfy 53 under some restrictions on the flow For instance Riemann in variants can be computed for compressible gas flow if we assume that the flow is isentropic They are wi s log p p A 4 entropy 2 w23 u T A23 uta acoustic waves 57 Y w45 u t12 A45 4 vorticity waves Boundary conditions based on Riemann invariants are then absorbing in some particular cases 52 6 12 3 Absorbing boundary conditions based on last state Another possibility is to linearize around the last state U i e TI U 0 t U O t U 0 t 0 58 This equation is always perfectly absorbing in the limit because we are always linearizing about a state that in the limit will tend to Us and doesn t need the computation of Riemann invariants which could be unknown for certain problems Also this boundary condition is fully absorbing even in the case of inversion of the sense of propagation of waves The drawback is that we have no external control on the internal states i e the limit internal state does not depend on some external value as the
97. e code assumes that a thermal NS problem is being run i e vel_index 1 and the penalization term is added only on the temperature field i e dofs ndim 2 Also the refval found in file flowrev cpp e int vel_indx default 1 Field index where the components of the velocity index start 1 base found in file flowrev cpp 13 Visualization with DX Data Explorer http www opendx org is a system of tools and user interfaces for visualizing scientific data Originally an IBM commercial product has been released now under the IBM Open Source License and maintained by a group of volunteers see the URL above Besides their impressive visualization capabilities DX has many features that make it an ideal visualization tool for PETSc FEM e DX is Open Source with a License very close to the GPL not completely compatible though e It has a Visual Program Editor which makes it very configurable for different modules 102 e It has been linked to PETSc FEM through sockets which makes it possible to visu alize a running job even in background e It has a scripting language If you want to visualize your results with DX you have first to download it from the URL above and install it Then you can pass your results to DX simply by editing the needed dx files or well by using the ExtProgImport DX module In order to use this last option you have to e Compile PETSc FEM with the USE_SSL flag enabled disabled by default Al
98. e eigenvalues are decreasingly ordered ie Aj gt Ag if j lt k So that the positive eigenvalues are the first n ones The boundary conditions for the incoming waves 46 can be written as 1 U Uo 0 j 1 n 51 where 1 is a row of S i e a left eigenvalue of Ay If U is close to Up we can write 51 as U dU 0 j 1 n 52 If this differential forms were exact differentials i e if 1 U dU dw U for all j 1 ndof 53 for some functions w U then we could impose as absorbing boundary conditions w U w Uo J lin 54 Let s call to these w functions invariants As there are naof invariants we can define as a new representation of the internal state the w variables Note that as Qw OU S7 and S is non singular due to the hyperbolicity of the system the correspondence between w and U is one to one Assume that the value at the boundary reaches a steady limit value of U i e U 0 t gt Ux for t 00 55 If all the waves were incoming nt ndor then the set of boundary conditions 54 would be a set of ngo non linear equations on the value Us As the correspondence U gt w is one to one the boundary conditions would mean w U w Ugp and then Us Up But if the number of incoming waves is nt lt naof then it could happen that Ux Uo In fact for a given Up the limit value Us would belong to a curved n dimensional curvilinear manifold Even if the limit stat
99. e represent as ya yo 0 is Q Pe Y10 e 134 si ya Q 219 y Q ij 5 6 pes So that Q decomposes in three blocks of 3 x 2 1x 1 and 2x3 two void rows 2 9 and two row columns 2 3 One basic operation of class idmap is to compute the row connected to a given column index 14 8 Temporal dependent boundary conditions Temporal dependent boundary conditions are treated in PETSc FEM assuming that the boundary condition on a set of nodes J j1 j2 j3 jN is of the form bj t a t pja t a26 t 135 bjy t an G t where the az are spatial amplitudes and the function t is the temporal part of the dependency For instance consider the solution of a problem of heat conduction coupledt with diffusion of a component in a rectangular region as depicted in figure 21 The boundary condition on side AD is of the form T x t 100 C 4a L x L sin 5t 136 where L is the distance AD The nodes on side AD are 1 7 13 19 25 31 37 and we assume that concentration and temperature are fields 1 and 3 respectively Here we can take aj 4xj L x L for j in J p t 100 C sin 5t 137 Boundary conditions depending on time as in this example are entered in PETSc FEM with a fixa_amplitude section The general form is fixa_amplitude lt function gt lt paraml gt lt vall gt lt param2 gt lt val2 gt 116 Figure 21 Example with time dependent boundary
100. em may be expensive so that non stationary iterative methods are discarded Then the use of Richardson iteration is suggested Sometimes this can diverge and some under relaxation must be used Options controlling iteration for the preconditioning problem can be found in section 4 4 3 16 3 Implementation details of the IISD solver e Currently unknowns and elements are partitioned in the same way as for the PETSc solver The best partitioning criteria could be different for this solver than for the PETSc iterative solver Element in processor 0 Element in processor 1 O dof in processor 0 S dof in processor 1 dof s in processor 0 connected to dof s in processor 1 Figure 27 ISD deccomposition by subdomains Actual decomposition e Selecting interface and local dof s One strategy could be to mark all dof s that are connected to a dof in other processor as interface However this could lead to 132 an interface dof set twice larger in the average than the minimum needed As the number of nodes in the interface set determines the size of the interface problem 160 it is clear that we should try to choose an interface set as small as possible In read_mesh partitioning is done on the dual graph i e on the elements Nodes are then pa
101. emental matrix but also the application writer has to compute and return the mask Finally note that you can always check whether block uploading is faster or slower by activating the time statistics for the elemset the report_consumed_time and run a large example with both kinds of uploading uy Uy Uz p A momentum eq pressure gradient ter o0 0 0 1 momentum A _ 0 0 0 1 momentum e mask 7 i 0 0 0 1 momentum 1___1___1 0 continuity gt cont equation Galerkin term Figure 29 Example of mask for the continuity equation Galerkin term in the Navier Stokes equations 17 The DistMap class This class is an STL map lt Key Val gt template container where each process can insert values and at a certain point a scatter call sends contributions to the corresponding processor 135 17 1 Abstract interface To insert or access values one uses the standard insert member or the operator of the basic class The processor iterator k member to be defined by the user of the template should return the process rank to which the pair key val pointed by k should belong After calling the scatter member by all processes all entries are sent to their processors In order to send the data across processors the user has to define the size_of_pack and pack routines The pack and unpack functions in the BufferPack namespace can help to do that for basic types i e those for which the sizeof op
102. ence using a relaxation factor 1 found in file iisdcr cpp string local_solver default PETSc Chooses the local solver may be PETSc or SuperLU found in file iisdcr cpp int max_partgraph_vertices_proc default INF The maximum number of vertices in the coarse mesh for sub partitioning the dof graph in the IISD matrix found in file iisdcr cpp double pc_lu_fill default 5 PETSc parameter related to the efficiency in growing the factored profile found in file iisdcr cpp int print_Schur_matrix default 0 Print the Schur matrix don t try this for big problems found in file iisdcr cpp 24 int print_interface_full_preco_conv default 0 Flags whether or not print the convergence when solving the preconditioning for the interface problem when using use_interface_full_preco found in file iisdcr cpp int use_interface_full_preco default 0 Chooses the preconditioning operator found in file iisdcr cpp int use_interface_full_preco_nlay default 1 Number of layers in the preconditioning band should be nlay gt 1 found in file iisdcr cpp int asm_lblocks default 1 Chooses the number of local blocks in ASM found in file int asm_overlap default 1 Chooses the overlap of blocks in ASM found in file string asm_sub_ksp_type default preonly Chooses the preconditioning for block problems in ASM method found in file string asm_sub_preco_type default ilu C
103. ep DX sends a request to PETSc FEM which answers sending back arrays and fields 13 1 Asynchronous synchronous communication e In synchronous mode steps gt 0 PETSc FEM waits each steps time steps a con nection from DX Once DX connects to the port PETSc FEM transmits the required data and resumes computation This is the appropriate way of communication when generating a sequence of frames for a video with a DX sequencer for instance Note that if you don t use a sequencer then you have to arrange in someway to make ExtProgImport awake and connect to PETSc FEM otherwise the update in the visualization is not performed and the PETSc FEM job is stopped waiting for the connection e In asynchronous mode steps 0 in contrast PETSc FEM monitors each port after computing a time step If a DX client is trying to connect it answers the request and resumes computing otherwise it resumes computing immediately This is ideal for monitor a job that is running in background for instance Note that in this case the interference with the PETSc FEM job is minimal since once PETSc FEM answers the request resumes processing automatically until a new connection is requested The steps state variable is internal to PETSc FEM It can be set initially with a dx_steps options line 1 by default After that it can be changed by changing the steps input tab However note that the change doesn t take effect until the next con
104. er a scalar or a matrix Beware that even in the case where it is a scalar it must be returned as a Newmat Matrix object of dimensions 1 x 1 e double amp delta_sc output double compute if options amp COMP_UPWIND This is the shock capturing parameter as described in 6 4 Set to 0 if no shock capturing is added e double amp lam_max output double compute if options amp COMP_UPWIND The maximum propagation speed The expression is 14 but normally it may be computed directly from the state vector This is used to compute upwind and also the automatic and local time step e TextHashTable thash input This is the TextHashTable of the elemset Physical and numerical parameters can be passed from the user input data file to the flux function routine through this specific heat specific heat ratio gravity for instance Beware that the flux function routine is called at each Gauss point and decoding of the table may be time expensive so that if the properties are constant over all the mesh you can decode them once and leave the decoded data in static variables e double propel input double array compute if options amp COMP_UPWIND This is the table of per element properties Physical and numerical parameters that are not constant for all the elemsets can be passed from the user input data file to the flux function routine through this e void user_data input Arbitrary information may be passed from the main rou
105. erator and the libc memcpy routine work Finally the combine member defines how new entries that have been sent from other processors have to be merged in the local object 17 2 Implementation details When calling the scatter member each entry in the basic container is scanned and if it doesn t belong to the processor it is buffered using the pack member to a buffer to be sent to the other processor Once all the data to be sent is buffered the entries are scanned again and the entries that have been buffered are deleted The buffers are sent to the other processors following a strategy preventing deadlock in Nproc 1 stages where Nproc is the number of processors In the k th stage data is sent from processor j to processor j k nproc Where is the remainder operator as in the C language In each stage all processors other than the server do first a MPI_RecvO and after a MPI_Send while the server does in the other sense i e first a MPI_Send and after a MPI_Recv initiating the process Processor 0 1 2 3 4 5 AR PE a gt gt a S ge ES S R D ke R S A i IR S l 4 A R oye oe tim E Ps ER d sy E 7R S eed E Te A k N q p 7 S da a a A MN I I l E S E ee ee a S send R receive Figure 30 Simple scheduling algorithm for transferring data among processors This is a rather inefficient strategy because at each stage all sending is tied to th
106. ered then the cache list will contain block_before block_after and when the inner block will be entered in subsequent executions of the loop and error will arise since there will be missimng caches To fix this we have to combine this with branching as here FastMatCachePosition cpl for int k 0 k lt N k N very large Outer loop he by Previous block FastMat2 branch Allows conditional execution FastMat2 choose 0 FastMat2 get_cache_position cp1 n irand 0 5 if k 0 n 0 This is the critical case n 0 the first execution of the loop for int 11 0 11 lt n 11 FastMat2 jump_to cp2 non inner_block FastMat2 resync_was_cached FastMat2 leave posterior block 83 Off course if the number of times the inner loop is executed is very large and the most time consuming part is the execution of this loop then it may be convenient to choose this loop as the outer one 9 3 3 Masks can t traverse branches Another restriction is that if branching is used the mask that is active at a certain FastMat2 cached operation must be the same independently of the path that the code have followed for instance consider the following code FastMat2 a b c resize and set a b c for int j 0 j lt N j FastMat2 branch if condition 4 FastMat2 choose 0 E AO E O E B operate on masked a FastMat2 leave c prod a b A Wrong
107. error is issued 9 2 4 Automatic dimensioning In the example a has been dimensioned at line 2 but most operations perform the di mensioning if the matrix has not been already dimensioned For instance if at line 2 we would declared FastMat2 a only without specifying dimensions then at line 14 the matrix is created and dimensioned taking the dimensions from the argument The same applies to set Matrix amp but not to set double since in this last case the argument double don t posses information about his dimensions Other operations that define dimensions are products and contraction operations 9 2 5 Concatenation of operations Many operations return a reference to the matrix return value FastMat2 amp so that operations may be concatenated as in A ir 1 k ir 2 j 9 3 Caching the adresses used in the operations If caching is not used the performance of the library is poor typically one to two orders of magnitude slower than the cached version and the cached version is very fast in the sense that almost all the CPU time us spent in performing multiplications and additions and negligible CPU time is spent in auxiliary operations The idea with caches is that they are objects class FastMatCache that store the adresses needed for the current operation In the first pass through the body of the loop a cache object is created for each of the operations and stored in a list of class FastMatCacheList This list is basic
108. es written by the nodes The idea is that one defines a class say KeyedObject that must support the following member functions class KeyedObject 4 public Default constructor KeyedObject Copy Ctor KeyedObject const KeyedObject amp ko Dtor KeyedObject Used for sorting the lines friend int operator lt const KeyedObject amp left const KeyedObject amp right Call back for the distributed container Return size of the buffer needed to store this element int size_of_pack const Effectively packs the object into the buffer upgrading the pointer void pack char amp buff const Extracts the object from the buffer upgading the buffer 144 void unpack const char amp buff Print the object void print Then the user can load objects into the buffer and finally call the flushO method to dump the actual content of the buffer to the output The flush is equivalent to e sending all objects to the server deleting them from the original node e sorting them according to the operator lt defined and e calling print on all of them on the server The underlying container is a list so that you can manipulate it with the standard list ac cessors but the ideal is to push elements with push_back KeyedObject obj or pushing a clean object with push_back and then access the elements with back Note that you must implement the copy constructor for the KeyedObject class
109. estroy_sles member has to be called afterwards in order to destroy it and free space The monitor member allows the user to redefine the convergence monitoring routine The view member prints operator information to the output 16 2 IISD solver Let s consider a mesh as in figure 26 partitioned such that a certain number of elements and nodes belong to processor 0 and others to processor 1 We assume that one unknown 129 Element in processor 0 Element in processor 1 Unknown in processor 0 S Unknown in processor 1 Figure 26 IISD deccomposition by subdomains is associated to each node and no Dirichlet boundary conditions are imposed so that each node corresponds to one unknown We split the nodes unknowns in three disjoint subsets Li 2 and I such that the nodes in L are not connected to those in La i e they not share an element and then the FEM matrix elements A and Aj with i L and j La are null The matrix is split in blocks as follows ieli A Art Ar Ao 0 A dl 0 An 157 Arr Aor Au _ Aro Ain An Now consider the system of equations Ax b 158 130 which is split as Ar XL ALrx1 bz 159 Ari xy Arr xy by Now consider eliminating xz from the first equation and replacing in the second so that we have an equation for x
110. etry cartesian2d npg 4 viscosity 0 23 lt more properties here gt __END_HASH__ 1342 lt more connectivities here gt __END__ELEMSET__ elemset nsi_tet 4 name volume_elemset_2 _table_include volume_elemset_1 includes properties from the previous elemset __END_HASH__ 5 4 76 45 lt more connectivities here gt __END__ELEMSET__ 5 3 The global table If one table is called global_options then all other hash tables inherit their properties without needing to explicitly include it For instance in this case table global_options viscosity 0 023 tau_fac 0 5 __END_HASH__ elemset nsi_tet 4 name elemset_1 30 __END_HASH__ 5 4 76 45 lt more connectivities here gt __END__ELEMSET__ the elemset elemset_1 gets a value for viscosity from the global hash table of 0 023 The global_options table may include other tables as well Note In previous versions of the code the table keyword may be omitted for the global_options table For instance the previous example can be entered as global_options viscosity 0 023 tau_fac 0 5 __END_HASH__ This usage is obsolete and will be deprecated 5 4 Reading strings directly from the hash table Once inside the element routine values can be retrieved with routines from the TextHashTable class typically get_entry for instance char geom thash gt get_entry geometry geom this returns a pointer to the internal value string geom this is documented w
111. extHashTable thash void amp fun_data extern C double lt prefix gt eval_fun double t void fun_data extern C void lt prefix gt clear_fun void fun_data this is optional Prefix may be an empty string or a non empty string ending in underscore Use of non empty prefix is explained below The macros INIT_FUN EVAL_FUN and CLEAR_FUN expand to these definitions The second function eval_fun is that one that defines the relation between the time and the corresponding amplitude The first init_fun one can be used to make some initialization normally it is called only once The last one clear_fun is called after all calls to eval_fun For simple functions you may not need the init and clear functions for instance if you want a linear ramp from 0 at t 0 to 1 at t 1 then it can be done with a simple file funO cpp this File fun0 cpp include lt math h gt include lt src ampli h gt INIT_FUN EVAL_FUN if t lt 0 return 0 else if t gt 1 return 1 else return t CLEAR_FUN You then have to do a 123 make fun0 efn that will create the fun0 efn shared object file Dynamically loaded functions can be used using the fixa_amplitude dl_generic clause and then giving the name of the file with the ext_filename option For instance lt previous lines here gt fixa_amplitude dl_generic ext_filename fun0 efn __END_HASH__ 1 1 1 lt more node dof val combinati
112. f At oo The value of Dt is used for printing etc If Dt is not set and steady is set then Dt is set to one found in file ns cpp e int stop_mom default 0 After computing the linear system for the predictor momentum step print right hand side and solution vector and stops found in file ns cpp e int stop_on_step default 1 After computing the linear system for the predictor momentum step print right hand side and solution vector and stops found in file ns cpp e int stop_poi default 0 After computing the linear system for the poisson step print right hand side and solution vector and stops found in file ns cpp e int stop_prj default 0 After computing the linear system for the projection step print right hand side and solution vector and stops found in file ns cpp e double tol_newton default 1e 8 Tolerance to solve the non linear system global Newton found in file ns cpp e int update_jacobian_iters default 1 Update jacobian only until n th Newton subiteration Don t update if null found in file ns cpp e int update_jacobian_start_iters default INF Update jacobian each n th Newton iteration found in file ns cpp e int update_jacobian_start_steps default INF Update jacobian each n th time step found in file ns cpp e int update_jacobian_steps default 0 Update jacobian each n th time step found in file ns cpp e int update_wall_data default 0 If 0 compute wall_data info o
113. f t If the subspace t 0 is bounded by a surface OT x then additional conditions have to be imposed on that surface at all values of t This defines an initial boundary value problem A solution for the system of the first order PDE s can be written as a superposition of wave like solutions of the type corresponding k to the n eigenvalues of the matrix A n 20 n k 1 dim being n the outward unit normal to the boundary edge n usy Ue e on R DO 40 a 1 where summation extends over all eigenvalues Aq As the problem is hyperbolic n initial conditions for the Cauchy problem have to be given to determine the solution That is equal number of conditions as unknowns must be imposed at t 0 For initial boundary value problem the n boundary conditions have to be distributed along the limits at all values of t according to the direction of the propagation of the corresponding waves If the wave phase velocity the a eigenvalue of AF n i e k wave projected in the interior normal direction n is positive the information is propagated inside the domain Hence the number of conditions to be imposed for the initial boundary value problem at a given point of OI is equal to the 49 number of positive eigenvalues of A nz at that point The total number of conditions remains equal to the total number of eigenvalues i e the order of the system For the treatment of the boundary conditions we will use the one dimensional pro
114. f this function is obtained by appending _all to the generic function double a A sum_square_all The list of these functions is e double sum_al1 const Sum over all indices e double sum_square_all const Sum of squares over all indices e double sum_abs_all const Sum of absolute values over all indices e double min_all const Minimum over all indices e double max_al1 const Maximum over all indices e double min_abs_al1 const Minimum absolute value over all indices e double max_abs_al1 const Maximum absolute value over all indices 9 4 5 Export Import operations These routines allow to convert matrices from or to arrays of doubles and Newmat matrices e FastMat2 amp set const Matrix amp A Copies to argument from Newmat matrix e FastMat2 amp set const double a Copy from array of doubles e const FastMat2 amp export double a const exports to a double vector e FastMat2 export double a exports to a double vector e const FastMat2 amp export Matrix amp A const Exports to a Newmat matrix e const FastMat2 amp export Matrix A const Exports to a Newmat matrix 87 9 4 6 Static cache operations These routines control the use of the cache list e static void activate_cache FastMatCacheList cache_list_ NULL Acti vates use of the cache e static void Deactivates use of the cache e static void reset_cache void Resets the cache e static void void_cache void Voids the cache e static void branch void Crea
115. found in file int print_fsm_transition_info default 0 Print Finite State Machine transitions for PFPETScMat matrices 1 print inmedi ately 2 gather events non immediate printing found in file int print_internal_loop_conv default 0 Prints convergence in the solution of the GMRES iteration found in file double rtol default 1e 3 Relative tolerance to solve the monolithic linear system Newton linear subiteration found in file int use_compact_profile default LINK_GRAPH Choice representation of the profile graph Possible values are 0 Adjacency graph classes based on STL map set demands too much memory CPU time OK 1 Based on dynamic vector of pair of indices with resorting demands too much CPU time RAM is OK 2 For each vertex wee keep a linked list of cells containing the adjacent nodes Each insertion is O m where m is the average number of adjacent vertices This seems to be optimal for FEM connectivities found in file 4 5 Emacs tools and tips for editing data files GNU Emacs http www gnu org software emacs is a powerful text editor that can colorize and indent text files according to the syntax of the language you are editing Emacs has modes for most languages C C Pascal Fortran Lisp Perl We have written a basic mode for PETSc FEM data files that is distributed with PETSc FEM in the tools petscfem el Emacs Lisp file Another mode that can serve for colorization
116. g avoiding the creation of large intermediate files In addition this allows the user to choose the external preprocessing package 15 4 2 Internal preprocessing The FileStack class This class allows reading from a set of linked files the PETSc FEM data file including node coordinates mesh connectivities etc with some preprocessing capabilities Supports file inclusion comments and continuation lines The preprocessing capabilities supported in this class are the minimal ones needed in order to treat very large files efficiently More advanced preprocessing capabilities will be added in a higher level layer written in Perl or similar may be ePerl 4 2 1 Syntax The syntax of comments and continuation lines is borrowed from Unix shells Comments From the first appearance of to the end of the lines is considered a comment Continuation lines If a line ends in i e if backslash is the last character in the line before newline J then the next line is appended to the previous one to form a logical single line File inclusion The directive __INCLUDE__ path to file inserts the contents of file in directory path to to this file in the actual point Directories may be absolute or relative we use fopen from stdio File inclusion may be recursive to the limit of the quantity of files that can be kept open simultaneously by the system Echo control The directives __ECHO_ON
117. g with the terminator __END_ELEMSET__ 4 1 Preprocessing the user input data file It s very handy to have some preprocessing capabilitites when reading input files for instance including files if else constructs macro definitions inline arithmetics etc Some degree of preprocessing is performed inside PETSc FEM this includes file inclusion skipping comments and continuation lines and is described in section 84 2 Off course this internal preprocessing may be combined with any previous preprocessing package such as m4 or ePerl In section 34 2 we describe internal preprocessing while in section84 3 we describe preprocessing with ePerl The reason to have this two mechanisms of preprocessing is the following Preprocessing with ePerl or m4 or any other packages is very powerful and well supported however the mechanism is to create an intermediate file and this file may be very large for large prob lems so that some internal preprocessing including at least the file inclusion is needed On the other hand including all the preprocessing capabilities of preprocessing as in ePerl is beyond the scope of PETSc FEM so that we preferred to keep with this two levels of preprocessing The idea is to have a small user data file where the strong capabilities of an external preprocessing package may be used while performing file inclusion of very large files containing node coordinates element connectivities and so on as the file is readin
118. gt lt default_value gt for retrieving values from the hash table defined by the user in the data file 4 Register it in the temporal function table with a call such as the following prefer ably after the call to Amplitude initialize_function_table in the main For instance Amplitude initialize_function_table Amplitude add_entry smooth_ramp amp smooth_ramp_function lt add this line In the core PETSc FEM this is done inside the Amplitude initialize_function_table 5 Use it in your data files as follows 122 amplitude_function my_own tau 3 0 __END_HASH__ 123 14 8 4 Dynamically loaded amplitude functions Another possibility to add new amplitude functions is through loading functions at run time This avoids re linkage and modification of the sources Note You need to have the module compiled with the appropriate flag either the compilation flag DUSE_DLEF or the Makefile variable USE_DYNAMICALLY_LOADED_EXTENDED_FUNCTIONS yes either in the Makefile base or the Makefile defs file These amplitude functions are written in C in source files say fun cpp The com piled files have extension efn from extension function for instance fun efn in this example The Makefile base file provides the appropriate rules to generate the compiled efn functions from the source For each amplitude you want to define you have to write three functions extern C void lt prefix gt init_fun T
119. hat if y x are vectors of dimension n and y Qx 119 Yj TPG ae 106 where P is the permutation associated with Q For instance if P 1 2 3 4 2 4 3 1 then T2 y Qx 4 121 Las This matrix can be efficiently stored as an array of integers of dimension ident n such that ident j 1 P j Also the inverse permutation can be stored in another integer array iident n so that both Q and Q7 can be stored at the cost of n integer values for each Also both multiplication and inversion operations as y Qx y QT x x Q y and x Qy can be done with n addressing operations 14 2 Permutation matrices in the FEM context Figure 16 Node field to dof map Example for NS or Euler in a duct Permutation matrices are very common in FEM applications for describing the relation between node field pairs and degrees of freedom abbrev dof s For instance consider a flow NS or Euler application in a mesh like that shown in figure 16 At each node we have a set of unknown fields both components of velocity and pressure u v and p In 107 a first description we may arrange the vector of unknowns as ui U1 Pl v2 Unf po UNnod UNnod PNnod The length of this vector is Nnoq X dor and this may be called the node field this accounts for the NF subindex description of the vector of unknowns However we can not take this vector as the vector of unknowns for actual computat
120. he corresponding row and if it is special then we look for the pointer in row_map and this returns the desired row Similar information is stored by columns but in this case it is not necessary to store the values for the special rows so that the columns are of the form typedef set lt int gt col_t and class idmap contains a private member col_map of the form map lt int col_t gt col_map 14 7 Block matrices When solving a linear 123 for U it is a key point to take into account that most part of Q are independent blocks so that the inversion may be done with a minimum computational effort We say that a given row and column indices 7 7 we say that they are directly connected if the element Qj is non null We can then define that two row indices i 7 are indirectly connected if they are connected by a path i gt j gt i1 gt j2 7 gt jn gt i where the arrow means directly connected The definition applies also to pairs of column indices and row column column row pairs The indirect connection defines an equivalence class that splits the rows in disjoint sets 14 7 1 Example Consider matrix led coll Lk DE 3 xo Ro 4 Q 5 1 k ok Kone 4 133 10 2 xo K 7 where the asterisks means a non null element Starting with row index 1 we see that its corresponding connecting set is rows 1 3 10 and columns 8 9 So that the linear transfor 115 mation y Qx can b
121. he given data Unr and U is compatible by evaluating the residual of the equation This operation should be performed in O N operations 14 5 Design and efficiency restrictions The class of matrices representing Q and Q should have the following characteristics e Be capable of storing arbitrary matrices e Should be efficient for permutation like matrices i e require as storage order of 2 integers per unknown and constant time access by row and column 14 6 Implementation An arbitrary permutation P of N objects can be stored as two integer vectors ident N and iident N such that if P j k then ident j 1 k 131 iident k 1 j Pan We will consider a slight modification to this A row is void If all its elements are null normal If one element is one and the rest null like in a permutation matrix special otherwise Then we set 0 if row j is void ident j 1 4 k if row j is normal and the non null element is in position k 1 if row j is special 132 We need now how to access the coefficients of the special rows Rows are stored as a map in the STL language from integer values to doubles i e typedef map lt int double gt row_t and we keep in class idmap a map from row indices to pointers to rows of the form map lt int row_t gt row_map 114 So that for a given j we can get the corresponding row by checking the value of inode j 1 If the row is void or normal we return t
122. he same routine so that one may ask for what it may serve to have several elem sets First some boundary conditions constant flux or mixed boundary conditions for heat conduction absorbing boundary conditions for advective systems may be imposed more conveniently through an elemset abstraction Also several elemsets may be used for reducing the space required for storing varying physical properties If some physical property is shared by all the elements for instance viscosity or specific heat then it is defined once for all the elemset If the quantity varies continuously in the region but it is known a priori then it can be passed as a per element property see 5 6 but that means storage of a double or the amount of memory needed for that property for each element If it is not the same on all the mesh but is constant over large regions then it may be convenient to divide the mesh in several elemsets where the given property has the same value over the elements of each elemset 4 General layout of the user input data file Input to application packages like ns advective or laplace is feed via input data files usually with extension dat This file contains global options node coordinates element connectivities physical properties fixations and boundary conditions etc Even if the precise format of the file may be changing it is worth to describe the general layout The file is divided in sections the table sections
123. hooses the preconditioning for block problems in ASM method found in file string asm_type default restrict Chooses the restriction extension type in ASM found in file double atol default 1e 6 Absolute tolerance to solve the monolithic linear system Newton linear subitera tion found in file int compact_profile_graph_chunk_size default 0 Size of chunk for the dynamic vector used in computing the mstrix profile found in file double dtol default 1e 3 Divergence tolerance to solve the monolithic linear system Newton linear subitera tion found in file string gmres_orthogonalization default modified_gram_schmidt Orthogonalization method used in conjunction with GMRES May be unmodified_gram_schmidt modified_gram_schmidt or ir_orthog default Iterative refinement See PETSc documentation found in file int Krylov_dim default 50 Krylov space dimension in solving the monolithic linear system Newton linear subit eration by GMRES found in file 25 string KSP_method default gmres Defines the KSP method found in file int maxits default Krylov_dim Maximum iteration number in solving the monolithic linear system Newton linear subiteration found in file string preco_side default lt ksp dependent gt Uses right or left preconditioning Default is right for GMRES found in file string preco_type default jacobi Chooses the preconditioning operator
124. hv I gt where h is the height of the water in the channel with respect to the channel bottom u w v is the velocity vector and g is the acceleration due to gravity As in eq 22 G represents the gain or loss of the river the source term is G U Gs gh Sox Sfx fehv Cyaz m gh Soy Spy fehw Cymyla 25 where So is the bottom slope and Sj is the slope friction 1 1 Ste Gn ola Si Gal Ch zy model m 2 2 Sfo ola S fy vgl Manning model where Cp and n the Manning roughness are model constants Generally the effect of coriolis force related to the coriolis factor fe must be taken in account in the case of great 46 lakes wide rivers and estuaries The coriolis factor is given by fe 2wsinw where w is the rotational rate of the earth and y is the latitude of the area under study The free surface stresses in eq 25 are expressed as the product between a friction coefficient and a quadratic form of the wind velocity w w wy and Ce ts Pair 27 p where Cg 1 25x10 014 if w lt 1 m s Cm 0 5x10 801 if 1 m s lt w lt 15 m s 28 Cy 2 6x10 if w gt 15 m s 6 10 2 1D Saint Venant Model When velocity variations on the channel cross section are neglected the flow can be treated as one dimensional The equations of mass and momentum conservation on a variable cross sectional stream in conservation form are OA s t 4 OQ A s t Ot Os G s 1
125. hysical system Usually this vector has as many entries as the number of nodes times the number of fields minus the number of fixations i e Dirichlet boundary conditions This vector can be computed at once by assembling the right hand side and the stiffness matrix in a linear problem iterated in a non linear problem or updated at each time step through solution of a linear or non linear system The point is that at this outer level you perform global assemble operations that build this vector and matrices At the inner level you perform a loop over all the elements 74 in the mesh compute the vector and matrix contributions of each element and assemble them in the global vector matrix From one application to another the strategy at the outer level linear non linear steady temporal dependent etc and the physics of the problem that defines the FEM matrices and vectors may vary The FastMat2 matrix class has been designed in order to perform matrix computations at the element level It is assumed that we have an outer loop usually the loop over elements that is executed many times and at each execution of the loop a series of operations are performed with a rather reduced set of local vectors and matrices There are many matrix libraries but often the performance is degraded for small dimensions Sometimes performance is good for operations on the whole matrices but it is degraded for operations on a subset of the elment of the matrice
126. ical act of transferring a copy and you may at your option offer warranty protection in exchange for a fee 2 You may modify your copy or copies of the Program or any portion of it thus form ing a work based on the Program and copy and distribute such modifications or work under the terms of Section 1 above provided that you also meet all of these conditions a You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change b You must cause any work that you distribute or publish that in whole or in part contains or is derived from the Program or any part thereof to be licensed as a whole at no charge to all third parties under the terms of this License c If the modified program normally reads commands interactively when run you must cause it when started running for such interactive use in the most ordinary way to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty or else saying that you provide a warranty and that users may redistribute the program under these conditions and telling the user how to view a copy of this License Exception if the Program itself is interactive but does not normally print such an announcement your work based on the Program is not required to print an announcement These requirements apply to the modified work as a whole If identifiable sections of that work are not de
127. in file 0 More precisely the saving mechanism is described by the following pseudo code Read state vector from initial_state file into x0 n 0 for i 0 i lt nstep i advance x n to x n 1 if n nsaverot 0 j lt n nsaverot k lt j nrec 1 lt j nrec if k 0 rewind file 1 append state vector to file 1 found in file ns cpp int nsome default 10000 Sets the save frequency in iterations for the print some mechanism The print some mechanism allows the user to store the variables of some set of nodes with some frequency The nodes are entered in a separate file whose name is given by a print_some_file entry in the general options one node per line The entry nsome indicates the frequency in steps at which the data 64 is saved and save_file_some the name of the file to save in found in file ns cpp int nstep default 10000 The number of time steps found in file ns cpp int print_linear_system_and_stop default 0 After computing the linear system solves it and prints Jacobian right hand side and solution vector and stops found in file ns cpp int print_residual default 0 Print the residual each nsave steps found in file ns cpp string print_some_file default lt none gt Name of file where to read the nodes for the print some feature found in file ns cpp int report_option_access default 1 Print after execution
128. ining the inclusion path with the INC list see Perl documentation 4 3 6 Use of ePerl in makefiles Usually user data files have extension dat When preprocessing with ePerl the convention is to use epl suffix for the file written by the user with ePerl commands i e the files to be preprocessed and suffix depl1 for the preprocessed file A line in the Makefile of the form depl epl eperl P lt gt assures the translation when needed 4 3 7 ePerlini library Some useful constants and functions are found in the file eperlini pl This may be included in the user data file with the following line lt require eperlini pl gt Initializes ePerl It includes a definition for PI r trigonometric and hyperbolic functions and others A common mistake when using preprocessing packages like ePerl is to edit manually the preprocessed file dep1 instead to edit the epl file In order to avoid this we write protect the dep1 file for instance the section in the Makefile is replaced by depl epl if e 0 then chmod w 0 rm 0 fi eperl P lt gt 0 chmod w 0 In addition inclusion of the eperlini pl library inserts the following comment in the included file 21 DON T EDIT MANUALLY THIS FILE This files automatically generated by ePerl from the corresponding epl file 4 3 8 Errors in ePerl processing If the preprocessing stage with ePerl gives some error
129. instants are then defined as t t 1 n 1 T e f n doubles The function values at times t Example The lines fixa_amplitude spline_periodic period 0 2 npoints 5 start_time 0 333 ampl_vals 1 0 A 120 Defines a function with period T 0 2 that takes the values 0 333 nT 1 0 333 n 14 T 0 144 0 333 n 14 T 1 0 0 333 n 34 T 0 Actually the resulting interpolated function is simply t 0 333 g t cos 2 145 T Implementation details If we define the phase 0 as t t 0 2r 146 then is periodic with period 27 We can decompose it in two even functions p 0 as b 0 0 F o 0 2 147 pas PO 0 0 Janey so that pCO 6 0 sin 0 6 0 148 As are even function they may be put in terms of x 1 cos 2 So that x and F are computed and by construction only one half of the values say j 1 to j m n 1 2 1 are relevant The values and f have to be computed specially since sin 0 for them If we take the limit 9 o 8 do a 2 sin 0 cy then if linear interpolation is assumed in each interval it can be shown that PO P0 _ 2 Gn 1 5 he 6 for 0 lt A 150 0 8 _ do J a 27 Yn 1 Cr nor en We could take then this value for however this introduces some noise For instance if 0 sin 0 then
130. ion const RowVector amp U int ndim const Matrix amp iJaco Matrix amp H Matrix amp grad_H Matrix amp flux vector lt Matrix gt A_jac Matrix amp A_grad_U Matrix amp grad_U Matrix amp G_source Matrix amp tau_supg double amp delta_sc double amp lam_max TextHashTable thash double propel void user_data int options This may eventually change in any case if you are interested in adding a new advective system then see the actual description in the advective h file in the distribution The meaning of these arguments are listed below When the size is specified it means that the argument is a Newmat or FastMat matrix In some situations the flux function routine must compute only some of the required values For instance when computing the contribution of the absorbing boundary ele ments there is no need to compute the parameters regarding stabilizing terms This is controlled with the parameter options which can take the values DEFAULT COMP_UPWIND and COMP_SOURCE In the list below it is indicated under which conditions the specific quantity must be computed e const RowVector amp U input size ngor This is the state vector you must return the flux jacobians and other quantities for this state vector e int ndim input The dimension of the space e const Matrix amp iJaco input size ndim X Ndim The jacobian of the master to element coordinates in the actual gauss points This may be used in order to c
131. ions due to a series of facts 122 e Not all of them are true unknowns since some of them may be imposed to a given value by a Dirichlet boundary condition e There may be some constraints between them for instance in structural mechanics two material points linked by a rigid bar or a node that is constrained to move on a surface or line In CFD these constraints arise when periodic boundary conditions are considered e Some reordering may be necessary either for reducing band width if a direct solver is used or either due to mesh partitioning if the problem is solved in parallel e Also non linear constraints may arise for instance when considering absorbing boundary conditions or non linear restrictions So that we assume that we have a relation Unr QU QU 123 where U is the reduced array of unknowns U representing the externally fixed values and Q Q appropriated matrices Follows some common cases Dirichlet boundary conditions If the velocity components of node 1 are fixed then we have ee 124 UV UL so that the corresponding file in Q is null and the file in Q has a 1 in the entry corresponding to the barred values 108 a Figure 17 Slip boundary condition Slip boundary conditions For the Euler equations if node j is on a slip wall and j is the corresponding normal then the normal component of velocity is null and the tangential component is free In 3D there are two free t
132. is License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License The Program below refers to any such program or work and a work based on the Program means either the Program or any derivative work under copyright law that is to say a work containing the Program or a portion of it either verbatim or with modifications and or translated into another language Hereinafter translation is included without limitation in the term modification Each licensee is addressed as you Activities other than copying distribution and modification are not covered by this License they are outside its scope The act of running the Program is not restricted and the output from the Program is covered only if its contents constitute a work based on the Program independent of having been made by running the Program Whether that is true depends on what the Program does 1 You may copy and distribute verbatim copies of the Program s source code as you receive it in any medium provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty keep intact all the notices that refer to this License and to the absence of any warranty and give any other recipients of the Program a copy of this License along with the Program You may charge a fee for the phys
133. istent SUPG beta_supg 0 implies consistent Galerkin and beta_supg 1 implies full consistent SUPG found in file advecfm2 cpp string geometry default cartesian2d Type of element geometry to define Gauss Point data found in file advecfm2 cpp int lumped_mass default 1 Use lumped mass found in file advecfm2 cpp int weak_form default 1 Use the weak form for the Galerkin part of the advective term found in file advecfm2 cpp Flux function ffeulerfm2 Euler eqs double gamma default 1 4 The specific heat ratio found in file ffeulerfm2 cpp int shock_capturing default 0 Add shock capturing term found in file ffeulerfm2 cpp double shock_capturing_threshold default 0 1 Add shock capturing term if relative variation of variables inside the element exceeds this found in file ffeulerfm2 cpp double tau_fac default 1 Scale the SUPG upwind term found in file ffeulerfm2 cpp Flux function ffswfm2 Shallow water eqs double gravity default 1 Acceleration of gravity found in file ffswfm2 cpp Al e int shock_capturing default 0 Add shock capturing term found in file ffswfm2 cpp e double shock_capturing_threshold default 0 1 Add shock capturing term if relative variation of variables inside the element exceeds this found in file ffswfm2 cpp e double tau_fac default 1 Scale the SUPG upwind term found in file ffswfm2 cpp 6 7 The hydrology module
134. it is impossible to have such un hashing function If for two distinct sequences we have the same hash values then we said thar there is a collision in the hashing process We then try to have a hashing functions that minimizes the number of collisions Suppose we consider the set of number sequences 32 bit integers of length n We have N 2 sequences and M 2 hash values and in the best case we would have N M 2 2 sequences for each hash value If we generate m distinct random sequences and m lt N then if the hashing funtion is good there is very little probability of having collisions between them adn the probability of collision is almost by random i e m M Then the number of collisions is approximately m2 br of collisi M nbr of collisions Ni x 7 j 0 163 We consider the following sequence hasshing functions e Hasher SVID rand48 functions If we have some kind of pseudo random gener ator in the form y rand s where s is the seed then we can do a hashing function with the following seudocode int hash int x int n Y int v 0 for int j 0 j lt n j v rand f v x jl return V where f v x is some binary functions that combines the values of the current state v and the incoming sequence element x j The rand48 hasher is based on the random function based on the SVID library coming with the GNU C library version 5 3 12 at the moment of writing this e FastHasher This i
135. ith the TextHashTable class You can then read values from it with routines from stdio sscanf and friends You should not try to modify this strings for instance with the strtok function In that case copy the string in a new fresh string remember to delete it after to avoid memory leak In this way you can pass arbitrary information strings integer doubles to the element routine 5 5 Reading with get_int and get_double As most of the time element properties are either of integer or double type two specific routines are provided get_int and get_double for instance ierr get_int thash npg amp npg where the integer value is directly returned in npg You can specify also a default value and read several values at once 5 6 Per element properties table Many applications need a mechanism for storing values per element for instance when dealing with physical properties variable varying spatially in a continuous way If the properties is piecewise constant then you can define an elemset for each constant patch but if it varies continuously then you should need an elemset for each element which 31 conspires with efficiency We provide a mechanism to pass per element values to the element routine At the moment this is only possible for doubles These properties are specified in the same line of the connectivities for instance elemset nsi_tet 4 elemset header props cond cp emiss
136. jected system and consider the sign of the eigenvalues of A un c and un c We remark that if n is the outward unit normal to the boundary edge an inflow boundary corresponds to u n lt 0 and an outflow one to u n gt 0 Fluvial Boundary e inflow boundary u specified and the depth h is extrapolated from interior points or vice versa e outflow boundary depth h specified and velocity field extrapolated from interior points or vice versa Torrential Boundary e inflow boundary u and the depth h are specified e outflow boundary all variables are extrapolated from interior points Solid Wall Condition We prescribe the simple slip condition over IP slip C Ist u n 0 41 Kinematic Wave Model The applicability of the kinematic wave as an approximation to dynamic wave was discussed in Rodr guez 1995 and according to Lighthill and Whitham subcritical flow conditions favor the kinematic approach Since one characteristic is propagated inside the domain only we can specify the water head the channel section or the discharge at inflow boundaries see eq 33 6 12 Absorbing boundary conditions Absorbing boundary conditions are a very useful feature for the solution of advective diffusion problems They allow the user to put artificial boundaries closer to the interest region and also accelerate convergence to steady solutions since provide the highest rate of energy dissipation through the boundaries In PETSc
137. l elements can be seen as layers of surface elements parallel to the skin With the velocity values computed by the Navier Stokes solver at these nodes the gatherer elemset can compute high precision approximations to the normal derivatives of velocity at the surface e Note that this requires that the mesh must be somewhat structured near the surface However it is usual to add structured layer on the body skin in order to correctly capture the boundary layer e Also it requires the construction of the connectivities of these layers what may be cumbersome but we will see later that this can be done automatically see 11 3 e The size of the elements in the normal direction are not required to be equispaced i e the distances 1 6 and 6 11 are not required to be equal or similar This is important because normally the layers of nodes are refined towards the surface as shown in the figure e The lines of nodes are not required to be normal to the surface e However it is required that each row of nodes for instance the row 1 6 10 must lay on a smooth curve This is required since in the process of computing high normal derivatives a Taylor expansion is computed in terms of this curves so the error depends on the higher derivatives e g the curvature of the line 95 The typical invocation is as follows elemset nsi_tet_les_full 4 geometry cartesian2d __END_HASH__ 127 6 6 7 12 11 2387 __END__ELEMSET__ elemset visc_fo
138. late to certain responsibilities for you if you distribute copies of the software or if you modify it For example if you distribute copies of such a program whether gratis or for a fee you must give the recipients all the rights that you have You must make sure that they too receive or can get the source code And you must show them these terms so they know their rights We protect your rights with two steps e copyright the software and e offer you this license which gives you legal permission to copy distribute and or modify the software Also for each author s protection and ours we want to make certain that everyone understands that there is no warranty for this free software If the software is modified by someone else and passed on we want its recipients to know that what they have is not the original so that any problems introduced by others will not reflect on the original authors reputations Finally any free program is threatened constantly by software patents We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses in effect making the program proprietary To prevent this we have made it clear that any patent must be licensed for everyone s free use or not licensed at all The precise terms and conditions for copying distribution and modification follow 2 2 GNU General Public License Terms and Conditions for Copying Distribution and Modification 0 Th
139. led all objects are sent to the master printed and al buffers are cleared So that you must guarantee space enough in memory for all this operations e Implementation details Data is sent from the nodes to the master with point to point MPI operations which is far more efficient than writing all nodes to a file via NFS Sorting of the objects by key in the master is done using the sort algorithm of the list lt gt STL container which is O N log N operations 19 Authors Mario A Storti CIMEC PETSc FEM kernel NS and AdvDif modules Norberto M Nigro CIMEC PETSc FEM kernel NS and AdvDif modules multi phase flow Rodrigo R Paz CIMEC AdvDif module AdvDif module hydrology module compress ible flow fluid structure interaction Lisandro Dalc n CIMEC PETSc FEM Kernel Python extension language project scripting extension linear algebra preconditioners Ezequiel L pez CIMEC Mesh relocation algorithms mesh move Also many people from the following institutions has contributed directly or indirectly to the generation of this code CIMEC International Center for Computational Methods in Engineering Santa Fe Ar gentina http www cimec org ar INTEC Instituto de Desarrollo Tecnol gico para la Industria Qu mica http www intec unl edu ar 147 CONICET Consejo Nacional de Investigaciones Cientificas y T cnicas http www conicet gov ar UNL Universidad Nacional del Litoral http www unl edu ar
140. ledged by all the volume elemsets that compute for each volume element the neares wall element This is stored as an integer per element property in the volume elemsets In order to reduce memory requirements only an index in the data_pts array is stored As several wall elemsets may exists an array of pair lt int elemset gt is used to store pointers to the data_pts array in order to know to which wall elemset the given index in data_pts belongs vector lt double gt data_pts_ new vector lt double gt vector lt ElemToPtr gt elemset_pointer new vector lt ElemToPtr gt WallData wall_data if LES VOID_IT argl 61 argl arg_add data_pts_ USER_DATA argl arg_add elemset_pointer USER_DATA Elemset elemset NULL argl arg_add elemset USER_DATA ierr assemble mesh argl dofmap build_nneighbor_tree amp time CHKERRA ierr PetscPrintf PETSC_COMM_WORLD After nearest neighbor tree n wall_data new WallData data_pts_ elemset_pointer ndim lt gt lt gt lt gt lt gt lt gt lt gt Find nearest neighbor for each volume element VOID_IT arg1 argl arg_add wall_data USER_DATA ierr assemble mesh argl dofmap get_nearest_wall_element amp time CHKERRA ierr In the jobinfo build_nneighbor_tree call to assemble a loop over all the el ements in the elemset ignoring to what processor it belongs must be performed Oth
141. lly for plane boundaries as in the example above or as a per element value In this last case we would have something like this elemset gasflow_abso 5 props normal 2 normal lt nx gt lt ny gt __END_HASH__ lt nl gt lt n2 gt lt n3 gt lt n4 gt lt nd gt lt nx gt lt ny gt __END_ELEMSET__ The normal need not be entered with a high precision If the vector entered is not exactly normal to the boundary then the condition will be absorbing for waves whose group velocity vector is parallel with this vector Note the the fixa section for the values of the reference node In this case gas dynamics elemset gasflow we set the four fields ngim 2 to the reference values The reference values can be made time dependent in an explicit way by using a fixa_amplitude section instead of a fixa section Using this absorbing boundary condition requires the flow to have implemented the Riemman_Inv method If this is not the case then the program will stop with a message like For the gasflow elemset If the option linear_abso is set to false 0 then the Riemman invariants for gas dynamics are used and the state reference value is used for computing the reference Riemman invariants If linear_abso is set to true 1 then the linear absorbing boundary conditions are imposed Not using extrapolation from the interior These is the elemset lt system gt _abso2 The number of nodes per element ne is 3 The first node is the node
142. mation about ePerl see http www engelschall com sw eperl while for more information on Perl see http www perl com Describing the possibilities of preprocessing with ePerl are far beyond the scope of this manual We will describe some basic techniques of use in the context of writing PETSc FEM user data files 4 3 1 Basics of ePerl ePerl allow you to embed Perl commands within text enclosing them within lt and gt delimiters for instance lt pi 2 atan2 1 0 theta sin pi 4 gt some text here theta lt print theta gt beta lt print 4 theta gt results after being processed by ePerl in some text here theta 0 707106781186547 beta 2 82842712474619 17 The basic rules are e Variables start with following by a C like identifier alphanumeric plus underscore case sensitive for instance alpha or my_variable e Statements end in semicolon e The text inserted in place of the ePerl block is the output of the commands inside the block This output is done in Perl with the print O function but in ePerl there is a shortcut of the form lt expression gt e Mathematical functions sin cos tan exp atan2 y x are available as in C powers x are expressed as x y i e Fortran like 4 3 2 Variables You can define variables to use them after in different places and also in mathematical expressions lt Reynolds 1000 L 2 rho 1 345 velocity 3 54 Grashof 6 e4
143. means high filtering Warning A high value may result in unstability Caamp has dimensions of L T like a diffusivity One possibility is to scale with mesh parameters like h At other is to scale with h g05 Currently we are using Ciamp C4 weg with Caamp 2 found in file nssupg cpp am e double free_surface_set_level_factor default 0 This adds a Cif term in the free surface equation in order to have the total meniscus volume constant found in file nssupg cpp e double fs_eq_factor default 1 Cey 1s eq factor see doc for free_surface_damp option is a factor that scales the free surface rigidity Cog 1 which is the default means no scaling a zero value means infinitely rigid as for an infinite gravity found in file nssupg cpp e string geometry default cartesian2d Type of element geometry to define Gauss Point data found in file nssupg cpp e int npg default none Number of Gauss points found in file nssupg cpp 7 3 Mesh movement Sometimes one has a mesh connectivity and node coordinates and wants a mesh for a slightly modified domain If u is the displacement of the boundaries then we can pose the problem of finding a mesh topologically equal to the original one but with a slight displacement of the nodes so that the boundaries are in the new position This is termed mesh relocation One way to do this is to solve an elasticity problem where the displacements at the b
144. metry The relation of the flux vector with the state vector is the heart of the advective system In fact the discretization of advective systems may be put in a completely abstract setting where the unique thing that varies from one system to another is the definition of the flux function itself The discretization of advective systems in PETSc FEM has been done in this way so that it is easy to add other advective systems by only adding the new flux function Applying the chain rule and noting that the fluxes only depend on position through their dependence on the state vector we arrive to OF OU A 4 Ox Ox 4 where OF are the jacobians of the advective fluxes 6 2 Discretization of advective systems Using the Finite Element Method with weight functions W x and interpolation functions N 5 x results in MU F U G 6 where Ur U2 U 7 Ths so that U has Ngo dor X Nnoq components M of dimension Ngo X Na o t is the mass matrix If we look at it as an Nnoa X Nnoa block matrix with blocks of size Nnaof X Naot then the 7 7 block is Mi f Ni x N x dx 8 Q the i block of the global flux and source vector F and G are OF F N x dx E x TA G f N G x dx If the flux vector term is integrated by parts then we have the weak form ON r i Q OL Fy dct mero dx 10 Tr 35 where T is the boundary of Q This formulation is the Galerkin
145. n Groundwater flow In the previous section the equation that governs subsurface flow was established In order to obtain a well posed PDE problem initial and boundary con ditions must be superimposed on the flow domain and on its limits The initial condition for the groundwater problem is a constant hydraulic head in the whole region that obeys levels observed in the basin history Now consider a simply connected region Q bounded by a closed curve OQ such that 48 Ng U OQ U Ngo ON If the stream is partially penetrating and connected in a Hydraulic sense to the aquifer we set o on Ng x 0 t K 9 no 00 on 00 x 0 t 37 Klo m C 9 h on Ngo X 0 t where is a given water head oo is a given flux normal to the flux boundary 00 and C the conductance at the river stream interface If a fully penetrating stream is considered o Klo n C d h on Pga x 0 4 38 Finally for a perched stream o Kl m Clhy h on Dgo x 0 1 39 Surface Flow Fluid Boundary We recall that the type of a flow in a stream or in an open channel depends on the value of the Froud number F u c where c gh is the wave celerity a flow is said e fluvial for u lt c e torrential for u gt c Saint Venant equations Considering a Cauchy problem for the time like variable m where the solution is given in the subspace 4M t 0 as U U x t 0 and is determined at subsequent values o
146. n CPU time for the second derivatives since we should compute for each second derivative three evaluations of the functional and there are Nen Nen 1 2 where nen Ndof Ne is the number of unknowns per element Nne is the number of nodes per element and ngo the number of unknowns per node here ngof Ndim second derivatives to compute Each evaluation of the functional amounts to the computation of the tensor metrics G and to solve the associated eigenvaue problem so that an analytical expression to at least the 70 first derivatives is desired The derivatives of the distortion functional can be computed as OF OF M Ox E OXg OL aj and then the derivatives with respect to the eigenvalues can be computed still numerical ly while we will show that the derivatives of the eigenvalues with respect to the node coor dinates which are the most expensive part can be computed analytically In this way we can compute the derivatives of the functional with the solution of only one eigenvalue problem The second derivatives can be computed similarly as 101 OF OF Ady 0 OF La f 102 OL piTvj OXgAr OF ui On gs OXg OL piTyj The first and second derivatives of F with respect to the eigenvalues are still computed numerically whilst the second derivatives of the eigenvalues can be computed by differen tiating numerically the first derivatives This amounts to O nen eigenvalue problems for computing the first and se
147. n computacional de problemas Mul tif sicos Basados en ecuaciones Diferenciales Acopladas Director Dr S Idelsohn Financing agency ANPCyT FONCyT Begin 2005 End 2007 Key PME CLUSTER Code PME 209 Title Cluster del Litoral Red de laboratorios para la resoluci n de problemas de la f sico matem tica aplicados a la ingenier a Director Dr S Idelsohn Financing agency ANPCyT FONCyT Begin 2004 End 2005 Key CLUSTER CHILE Code C 13680 4 Nro 23 Title C lculo paralelo en problemas de mec nica computacional a trav s del uso de una red de computadores personales Director Dres Marcela Cruchaga Norberto Nigro Financing agency Programa de Colaboracion Cientifico Acad mica entre Argentina Brasil y Chile 2000 2001 C 13680 4 Fundaci n Andes Begin 2000 End 2001 Key PIP PAR Code PIP 02552 2000 Title Generaci n de recursos de c lculo paralelo para mec nica computacional Director V E Sonzogni Fi nancing agency CONICET Begin 2000 End 2002 Key CAI D Code CAI D 2000 43 Title Desarrollo de algoritmos para c lculo paralelo Director Victorio Sonzogni Financing agency UNL Begin 148 13 12 11 10 2000 End 2002 Key PROA Code PICT 6973 99 Title Desarrollos en Mec nica Com putacional utilizando t cnicas de PRogramaci n Avanzada Director Sergio Idelsohn Financing agency ANPCyT Begin 2000 End 2003 Key MELT Code PID 99 76 Title MELT Modelado de Emulsifica
148. n the Makefile compress_init compress_pre compress_post for f in state tmp do echo gzipping f gzip f f done compress_close 11 Gatherers and embedded gatherers Typically gatherers are reduction operators over the element sets i e instead of assem bling a matrix or vector the gather operations produce some kind of global data as for instance e The volume surface length of an elemset e The integral of some quantity fields coordinates or functions of the them over the element set for instance total concentration total energy total linear or angular momentum or angular momentum e Same as above but depending also on gradients of the state variables for instance total elastic energy This involves some e Same as above but integrals of quantities on a manifold of a dimension lower than the embedding space In this case the integrand may involve also the normal to the surface for instance total flow of heat or volume rate through a surface Gatherers are implemented as elemsets the only difference is that they typically only process a special jobinfo named gather This jobinfo is processed after the time step i e only after the Newton loop on the converged values This jobinfo task does not assemble vectors or matrices but a series global values stored in a global C vector called vector lt int gt gather_values A typical call is as follows taken from ns cpp arglf clear arglf arg_add amp sta
149. n the inner layers of nodes are replaced by 1 s With this setting the code will inspect the connectivity of the viscous_fluid elemset and find the nodes corresponding to the inner layers and replace the 1 s by the correct n ode number 97 11 4 Passing element contributions as per element properties For some applications it is desirable to have the individual element contributions instead of having their sum For instance in a fluid structure application involving a fluid and a deformable solid it does not suffice only with the integral of the forces to compute the evolution of the solid but also it is needed the whole distribution of forces In this case what is needed is a list of surface elements and the total force for each element There are two flavors of this feature The simplest one is activated with dump_props_to_file and simply stores the data in a file The other possibility is ac tivated with the pass_values_as_props option and stores the computed per element values in the per element properties table Subsequent parts of the code can grab this information using the usual mechanism to query the per element properties table All three mechanism can be activated or deactivated independently of the others In summary e Computation of global values is activated with gather_length gt 0 e Storing the per element is activated with the dump_props_to_file option e Passing the per element computed values to other sections of the
150. nc 59 Temple Place Suite 330 Boston MA 02111 1307 USA 2 GNU GENERAL PUBLIC LICENSE Version 2 June 1991 Copyright C 1989 1991 Free Software Foundation Inc 675 Mass Ave Cambridge MA 02139 USA Everyone is permitted to copy and distribute verbatim copies of this license document but changing it is not allowed 2 1 Preamble The licenses for most software are designed to take away your freedom to share and change it By contrast the GNU General Public License is intended to guarantee your freedom to share and change free software to make sure the software is free for all its users This General Public License applies to most of the Free Software Foundation s software and to any other program whose authors commit to using it Some other Free Software Foundation software is covered by the GNU Library General Public License instead You can apply it to your programs too When we speak of free software we are referring to freedom not price Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software and charge for this service if you wish that you receive source code or can get it if you want it that you can change the software or use pieces of it in new free programs and that you know you can do these things To protect your rights we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights These restrictions trans
151. nection of DX to PETSc FEM If you don t want to change the internal state of the steps variable then you can set it to NULL or 1 13 2 Building and loading the ExtProglmport module This module allows PETSc FEM to exchange data with DX through a socket using a predefined protocol The module is in the PETSCFEM_DIR dx directory of the PETSc FEM distribution To built it you have to compile first the petscfem library and then cd to the dx directory and make make This should build the dx epimport file which is a dynamically loadable module for DX This one altogether with the dx epimport mdf which is a plain text file describing the inputs outputs of the module and other things are the files needed by DX in order to run the module 104 To load this module in DX you can do this either at the moment of launching DX with something like dx mdf path to epimport mdf or well from the dxexec window menu File Load Module Descriptions 13 3 Inputs outputs of the ExtProglmport module input steps type integer default 0 Description Visualize each steps time steps 0 means asynchronously input serverhost type string default localhost Description Name of host where the external program is running May be also set in dot notation e g 200 9 223 34 or myhost mydomain edu input port type integer default 5314 Description Port number input options type string default NULL Description Option
152. ng as usual if the element needs processing we call Load_props in order to effectively load element properties in propel and after this we can use them as propel conductivity_indx and so on Macro shortcuts COND PROPA are handy for this 6 The general advective elemset 6 1 Introduction to advective systems of equations Advective system of equations are of the form U OF U aie 2 Ot Ox G where U is the state vector of the fluid in conservative variables Examples of these are the inviscid fluid equations the Euler or gas dynamic equations the shallow water equations and scalar advective systems that represents the transport of a scalar property like temperature or concentration of a component by a moving fluid The conservative variables for the Euler equations are U pu 3 pe for the Euler equations In general U is a vector of ngof components F U is the flux vector t can be thought as a matrix of Ndof X Ndim components The row index 34 corresponds to a field value whereas the column index is a spatial dimension G is a source vector For the 2D or 3D Euler equations it is null but if we consider 1D flow in a tube of varying section then it has a source term in the momentum equation due to the reaction on the wall Also for the shallow water equations there is a reaction term in the momentum balance equations if there is a varying bathy
153. nly once Otherwise refresh each update_wall_data steps found in file ns cpp e int use_iisd default 0 Use IISD Interface Iterative Subdomain Direct or not found in file ns cpp e int verify_jacobian_with_numerical_one default 0 Computes jacobian of residuals and prints to a file May serve to debug computation of the analytic jacobians found in file ns cpp Elemset nsitetlesfm2 e int ALE_flag default 0 Flag to turn on ALE computation found in file nsitetlesfm2 cpp 66 double A_van_Driest default 0 van Driest constant for the damping law found in file nsitetlesfm2 cpp double C_smag default 0 18 Smagorinsky constant found in file nsitetlesfm2 cpp double ndim G_body default null vector Vector of gravity acceleration must be constant found in file nsitetlesfm2 cpp int LES default 0 Add LES for this particular elemset found in file nsitetlesfm2 cpp double additional_tau_pspg default 0 Add to the tau_pspg term so that you can stabilize with a term independently of h Mainly for debugging purposes found in file nsitetlesfm2 cpp string axisymmetric default none Add axisymmetric version for this particular elemset found in file nsitetlesfm2 cpp int cache_grad_div_u default 0 Cache grad_div_u matrix found in file nsitetlesfm2 cpp string geometry default cartesian2d Type of element geometry to define Gauss Point data found in file nsitetle
154. nstructs as in lt method iterative gt if method eq iterative maxits 100 tolerance le 2 else maxits 1 tolerance 1le 10 endif expands to maxits 100 tolerance 1le 2 Also lines starting with c are dicarded as comments Conditional preprocessing and comments are enabled with the P flag so that make sure you have this flag enabled when preprocessing the ep1 file probably in the Makefile file Note that PETSc FEM comments those starting with numeral may collide with the ePerl preprocessing directives so that when commenting out lines in PETSc FEM input files it is safer to leave a space between the character and the commented text commented text OK commented tex but dangerous 4 3 5 File inclusion In addition to the inclusion allowed in the internal preprocessor via the command ePerl has his own inclusion directive for instance some text include home mstorti PETSC petscfem doc options txt another text and provided file options txt contains 20 __INCLUDE__ File options txt opioni valuel option2 value2 then the previous block expands to some text File options txt opioni valuel option2 value2 another text Including with the internal preprocessing directive __INCLUDE__ has the advantage of not creating an intermediate file On the other hand including with the ePerl directive allows recursive ePerl preprocessing and more versatility in def
155. nvariant the object are the whole set of 6 permutations for 3 objects so that in this case we can say that unordered triangles can be represented as unordered sets of three nodes But for the unordered triangle edge a b c isthe same that b c a so that associating unordered triangles with unordered sequences does not take into account the rotational symmetry 17 3 1 Symmetry group generator The set of permutations perm SHAPE that leave the geometrical object SHAPE invariant are a group that means that if two permutations p q belong to perm SHAPE then the composition pq i e the permutation resulting of applying first q and then p also belong to the group as well as the product qp In general this is a finite group As a consequence if we have two elements p and q of a group G then pq qp pq pqp pq all belong to G Given a set of permutations S C G the set pS formed by the left product of all the elements of S with p belongs to G The same applies to Sp qS and Sq So that starting with the set So 1 where 1 is the identity permutation we can generate recursively a larger set of symmetries by forming S So pSo Sop qSo Soq As the total number of permutations in the group is finite at most n where n is the number of nodes in the object applying recursively the relation above will end with a set of permutations H that is itself a group included in G We will call it the group generated by p an q
156. o give any third party for a charge no more than your cost of physically performing source distribution a complete machine readable copy of the corresponding source code to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or c Accompany it with the information you received as to the offer to distribute corresponding source code This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer in accord with Subsection b above The source code for a work means the preferred form of the work for making mod ifications to it For an executable work complete source code means all the source code for all modules it contains plus any associated interface definition files plus the scripts used to control compilation and installation of the executable However as a special exception the source code distributed need not include anything that is normally distributed in either source or binary form with the major components compiler kernel and so on of the operating system on which the executable runs unless that component itself accompanies the executable If distribution of executable or object code is made by offering access to copy from a designated place then offering equivalent access to copy the source code from the same place counts as distribution of the source code even though thir
157. of the stream bottom Gs on Qs x 07 32 6 11 Boundary Conditions 6 11 1 Boundary Conditions to simulate River Aquifer Interactions Coupling Term The stream aquifer interaction process occurs between a stream and its adjacent flood plain aquifer The coupling term is not explicitly included in eq 22 but it is treated as a boundary flux integral At a nodal point we can write the coupling Gs P Ri h h 34 where Gs represents the gain or loss of the stream and the main component is the loss to the aquifer and Ry is the resistivity factor per unit arc length of the perimeter The corresponding gain to the aquifer is Ga E T 35 where I represents the planar curve of the stream and r is a Dirac s delta distribution with a unit intensity per unit length i e L f OLIE f E 8 ds 36 The stream loss element set represents this loss and a typical discretization is shown in fig 4 The stream loss element is connected to two nodes on the stream and two on the aquifer If the stream level is over the freatic aquifer level hy h gt then the stream losses water to the aquifer and vice versa Contrary to standard approaches the coupling term is incorporated through a boundary flux integral that arises naturally in the weak form of the governing equations rather than through a source term 6 11 2 Initial Conditions First Second and Third Kind Boundary Conditions Absorbent Boundary Conditio
158. olve it in 1 th of the domain i e in the square x y gt 0 We could solve it also in 14 th of the region the triangle y gt 0 y lt x x lt 1 8 3 The oscilating plate problem File oscplate dat This is the flow produced between two infinite plates when one is at rest and the other oscillating with amplitude A and frequency w see figure 7 This serves to test a problem with temporal dependent boundary conditions The problem is one dimensional and the resulting field is u 0 p cnst and v v x We set parameters lenght between plates L 1 viscosity y 1 and we model the problem with a strip 0 lt x lt 1 0 lt yS lt 0 l and set periodic boundary conditions between y 0 1 and y 0 At the plate at rest x L we set u p 0 and at x 0 and at the moving plate x 0 we set u WA sin wt The analytic solution can be found easily The equation for v is Ov y with boundary conditions v 0 Aw sin wt and v L 0 The solution can be found by searching for solutions of the form v eivtt Ar 107 Replacing in 106 we arrive at the characteristic equation iw vr 108 whose solutions are A E yw v 4 109 72 Setting the boundary conditions the solution is ee ert aL eA a L v x Im4 e PES YE eed 110 In the example we set v 100 w 20007 6283 We choose to have 16 time steps in one period so that Dt 6 2x 1075 The resulting profile velocity is
159. ome of this options are used for internal use of the PETSc FEM code for instant chunk_size sets the quantity of elements that are processed by the elemset routine at each time In section 85 1 some specific properties of the elemset properties text hash table are described 5 1 The elemset hash table of properties A very common problem in FEM codes is how to pass element physical or numerical properties conductivities viscosities etc to the element routine In PETSc FEM you can pass arbitrary per elemset quantities via an elemset properties hash table This comes after the element header as in the following example elemset nsi_tet 4 fat 4 elemset header props per element properties to be explained later Elemset properties hash table name My Mesh geometry cartesian2d ndim 2 npg 4 couple_velocity 0 weak_form 1 28 physical properties Dt 1 time step viscosity 1 next lines flags end of the properties hash table __END_HASH__ element connectivities 1342 3564 5786 lt more element connectivities here gt 193 195 196 194 195 197 198 196 197 199 200 198 199 201 202 200 next lines flags end of elemset connectivities __END_ELEMSET__ In this example we define several properties containing doubles viscosity and Dt integers ndim npg etc or strings name This hash table is stored in the form of an associative array hash with a text key the first non blank
160. ons here gt __END_FIXA__ You can also have dynamically loaded functions that use parameters loaded via the table of options at run time For this you are passed the TextHashTable object entered by the user to the init function You then can read parameters from these but in order that the init function do anything useful it has to be able to pass data to the eval function Normally you define a struct or class to hold the data create it with new or malloc put the data in it and then pass the address via the fun_data pointer Later in subsequent calls to eval_fun it is passed the same pointer so that you can recover the information previous a static cast Of course in order to avoid memory leak you have to free the allocated memory somewhere this is done in the clear function For instance the following file defines a ramp function where youcan set the four values to fo t1 fi include lt math h gt include lt src fem h gt include lt src getprop h gt include lt src ampli h gt struct MyFunData double f0 f1 tO t1 slope F INIT_FUN int ierr Read values from option table TGETOPTDEF thash double t0 0 TGETOPTDEF thash double t1 1 TGETOPTDEF thash double f0 0 TGETOPTDEF thash double f1 1 create struct data and set values MyFunData d new MyFunData fun_data d d gt f0 0 d gt f1 f1 124 d gt t0 t0 d gt ti t1 d gt slo
161. or donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice This section is intended to make thoroughly clear what is believed to be a conse quence of the rest of this License 8 If the distribution and or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries so that distribution is permitted only in or among coun tries not thus excluded In such case this License incorporates the limitation as if written in the body of this License 9 The Free Software Foundation may publish revised and or new versions of the General Public License from time to time Such new versions will be similar in spirit to the present version but may differ in detail to address new problems or concerns Each version is given a distinguishing version number If the Program specifies a version number of this License which applies to it and any later version you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation If the Program does not specify a version number of this License you may choose any version ever published by the Free Software Foundation 10 If you wish to incorporate parts of the Progr
162. oundaries are the prescribed displacement of the boundary This problem has been included by convenience in the Navier Stokes module even if at this stage it has little to do with the Navier Stokes eqs Once the displacement is computed by a standard finite element computation we can compute the new mesh node coordinates by adding simply the computed displacement to the original node coordinate The elastic constants can be chosen arbitrarily If an isotropic material is considered then the unique relevant non dimensional parameter is the Poisson ratio controlling the incompressibility of the fictituos material However if the distortion is too large this linear simple strategy can break down when some element collapses and has a negative Jacobian A simple idea to fix this is to somewhat rigidize the elastic constants of the fictituos material in order to minimize the distortion of the elements Designing this nonlinear material behaviour should guarantee a unique solution and a relatively easy way to compute the Jacobian A possibility is to have an hyperelastic material i e to define an energy functional F as function of the strain tensor One 69 should guarantee that this functional is convex and one should have an easy way to compute its derivatives and second derivatives A related approach implemented in PETSc FEM is to compute for simplices the transformation from a regular master element to the actual element and to
163. pe f 1 f0 t1 t0 EVAL_FUN MyFunData d MyFunData fun_data define ramp function if t lt d gt t0 return d gt f0 else if t gt d gt t1 return d gt f1 else return d gt f0 d gt slope t d gt t0 CLEAR_FUN clear allocated memory MyFunData d MyFunData fun_data delete d fun_data NULL Another possibility perhaps more simple would be to use a global MyFunData object but if several fixa_amplitude entries that use the same function are used then the data created by the first entry would be overwritten by the second entry 14 8 5 Use of prefixes Several functions can be written in the same cpp file using a prefix ending in underscore for instance if you want to define a gaussian function then you define functions with lt prefix gt gaussian_ for instance extern C void gaussian_init_fun TextHashTable thash void amp fun_data extern C double gaussian_eval_fun double t void fun_data extern C void gaussian_clear_fun void fun_data this is optional The macros INIT_FUN1 name EVAL_FUN1 name and CLEAR_FUN1 name do it for you where name is the prefix with the underscore removed for instance lt headers gt struct MyGaussFunData lt your data here gt os INIT_FUN1 gaussian lt read data gt MyGaussFunData d new MyGaussFunData 125 fun_data d lt set data in d gt EVAL_FUN gaussian 1 MyGauss
164. py of the addresses of the elements involved For instance in a a set b operation with a and b of size n x m say we have to store 2mn addresses Usually this overhead in memory requirement is negligible since the amount of variables and operations needed in the element routines are very small as compared with the size of the problem itself However some care must be taken when caching large inner loops For instance in code A section 89 3 2 if the inner loop is executed a constant but very large number M of times then the amount of the cache required is proportional to M Then even if as discussed before no operations like those used in code B are required it may be adviceable to spend some time in insert these calls in order to reduce memory cache and overhead time Again in the limit of M very large it will be more convenient to choose this loop as the outer one 9 4 Synopsis of operations 9 4 1 One to one operations These are operations that take one FastMat2 argument as in FastMat2 amp add const FastMat2 amp A The operations are from one element of A to the corresponding element in this The one to one operations implemented so far are e FastMat2 set const FastMat2 A Copy matrix e FastMat2 amp add const FastMat2 amp A Add matrix e FastMat2 amp rest const FastMat2 amp A Substract a matrix e FastMat2 mult const FastMat2 y A Multiply element by element like Matlab e FastMat2 div const Fa
165. ranch op_prev_2 op_prev_k The branch call tells the library that several branchs will start from there and a branch point is created Then each conditional block code must start with choose lt j gt where lt j gt is a number that must be unique among all other branches Finally when leav ing all the branches we must call leave in order to tell the library that the mainstream of the cache list must be retaken Branches can be nested at any level The call to branching is not needed if the execution path is the same for all executions of the loop This usually happens when the condition refer to some global option that is uniform over all elements For instance if branch 0 corresponds to include turbulence model A and branch 0 to model B then the same branch is executed for all the elements and there is no need to call the static functions 9 3 2 Loops executed a non constant number of times Another special case is when there are loops inside the body of the outer loop Note that no special branching is needed in general if the loop is executed a fixed number of times since the sequence of operations is not altered from one execution to another For instance consider the following piece of code Case A Inner code executed a fixed number of times for int k 0 k lt N k N very large Outer loop block_before for int 11 0 11 lt 3 11 inner_block Operations that act on the same m
166. rce_integrator 6 geometry line2quad __END_HASH__ 126711 12 2 3 7 8 12 13 3489 13 14 __END__ELEMSET__ tri2prism quad2hexa Figure 13 Embedded gatherer elements in 3D Note that the geometry is special line2quad means that the surface geometry are lines and the corresponding volume elemset is composed of quads The other two possibilities implemented so far are tri2prism and quad2hexa see figure 13 Typical invocation is as follows The typical invocation is as follows elemset visc_force_integrator 9 geometry tri2prism __END_HASH__ 1234567809 96 __END__ELEMSET__ and elemset visc_force_integrator 12 geometry quad2hexa __END_HASH__ 123456789 10 11 12 __END__ELEMSET__ 11 3 Automatic computation of layer connectivities Sometimes it is somewhat cumbersome to compute the connectivities of the embed ded gatherer connectivities since it involves the finding of layers of nodes inside the adjacent volume element This can be done automatically by PETSc FEM if the identify_volume_elements is activated and the name of the adjacent volume element is passed through the option volume_elemset for instance elemset nsi_tet_les_full 4 geometry cartesian2d name viscous_fluid __END_HASH__ 1276 6 7 12 11 2387 __END__ELEMSET__ elemset visc_force_integrator 6 volume_elemset viscous_fluid identify_volume_elements geometry line2quad __END_HASH__ 121111 231111 341111 __END__ELEMSET__ Note that the nodes i
167. reas h is a scalar hashing function A very simple possibility is to take h as the identity h a x In that case the hash of the object is simply the sum of their node indices The idea is that even for this simple hashing function we can compare first the hash sums of two objects for determining if they are equal If there is a high probability of the objects being unequal then in most cases this simple check will suffice If the hash sum are equal then the full comparision procedure as described above must be performed Note that the hash sum is and must be independent of the ordering 142 In order to reduce the probability that unequal objects give the same hash sum we can device better hashing functions Two kind of hashing functions will be discussed functions that hashes a sequence of numbers hash sequence functions taking in account the ordering of the sequences and functions that hash the sequences in a way infependent of the ordering hash sum functions Consider first the hash sequence functions If we consider sequences of integers then the hash functions should return a scalar value for each sequence of integers in such a way that if two sequences are distinct i e they have distinct numbers or the same numbers in differente order they should have different hashing functions We restrict us to 32 bit integers unsigned integers are bounded by 2 and there are 2 distinct sequences of n integers so that
168. rite hook_list shell_hook hello but don t add the hello lt command gt line then PETSc FEM uses a standard command line like this make petscfem_step 2 d petscfem_time 3 f hello_ 1 s so that it will execute make commands with targets hello_init hello_pre hello_post and hello_close like make petscfem_step 1 petscfem_time 0 hello_init make petscfem_step 1 petscfem_time 0 1 hello_pre make petscfem_step 1 petscfem_time 0 1 hello_post make petscfem_step 2 petscfem_time 0 2 hello_pre make petscfem_step 2 petscfem_time 0 2 hello_post make petscfem_step 3 petscfem_time 0 3 hello_pre make petscfem_step 3 petscfem_time 0 3 hello_post AHR A A SH make petscfem_step 100 petscfem_time 10 hello_pre make petscfem_step 100 petscfem_time 10 hello_post make petscfem_step 2 petscfem_time 0 hello_close Inside the Makefile you can use the make variables petscfem_step and petscfem_time For instance you can do the Hello world trick by adding the targets In Makefile hello_init echo In init Do more things in init stage HH 1 hello_pre echo In pre Do more things in pre stage hello_post echo In post Do more things in post stage 92 HE tetas hello_close echo In close Do more things in close stage For instance I love to gzip my state files with a command like this In the PETSc FEM data file hook_list shell_hook compress I
169. rived from the Program and can be reasonably considered independent and separate works in themselves then this License and its terms do not apply to those sections when you distribute them as separate works But when you distribute the same sections as part of a whole which is a work based on the Program the distribution of the whole must be on the terms of this License whose permissions for other licensees extend to the entire whole and thus to each and every part regardless of who wrote it Thus it is not the intent of this section to claim rights or contest your rights to work written entirely by you rather the intent is to exercise the right to control the distribution of derivative or collective works based on the Program In addition mere aggregation of another work not based on the Program with the Program or with a work based on the Program on a volume of a storage or distri bution medium does not bring the other work under the scope of this License 3 You may copy and distribute the Program or a work based on it under Section 2 in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following a Accompany it with the complete corresponding machine readable source code which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or b Accompany it with a written offer valid for at least three years t
170. roduces non localities in the sense that the tur bulent viscosity at a volume element depends on the state of the fluid at a wall 59 7 1 1 The wall elemset Wall boundary conditions have been implemented via a wall elemset This is a surface element that computes given the velocities at the nodes the tractions corresponding to this velocities for a given law of wall Also this shear velocities as stored internally in the element so that the volume elements can get them and compute the van Driest damping factor This requires to find for each volume element the nearest wall element This is done before the time loop with the ANN Approximate Nearest Neighbor library 7 1 2 The mixed type boundary condition The contribution to the momentum equations from the wall element is Hs f tyN aE 78 where Rip is the contribution to the residual of the p th momentum equation of the node i N is the shape function of node 7 and tp are the tractions on the surface of the element Xe The wall law is in general of the form u 2 f 79 where u is the tangent velocity u the shear velocity u Tw p and y yu v the non dimensional distance to the wall We have several possibilities regarding the positioning of the computational boundary We first discuss the simplest that is to set the computational boundary at a fixed y position Note that this means that the real position od the boundary yt changes during iteration In this case 79
171. rogram may be called in a parallel environment so for instance if you will compress a certain file then you should take care of doing that only at the master process by e g enclosing the code with aif MY_RANK 4 construct 10 3 Shell hook The shell hook allows the user to execute a certain action at the hook launching points by simply writing shell commands hook_list shell_hook lt name gt lt name gt lt shell command gt For instance hook_list shell_hook hello hello echo Hello world In this case PETSc FEM will issue the echo command at each of the launching points If you want to issue more complex commands then perhaps it s a better idea to bundle them in a script and then execute the script from the hook hook_list shell_hook hello hello my_script_hook where you have previously written a my_script_hook script with something like bin bash This is my_script_hook file echo Hello world inside Probably you want to perform some actions depending on which stage you are so that you can pass the stage name to the command by including a s output conversion token in the command For instance hello echo Hello world stage s Moreover you can have also the time step currently executing and the current simulation time by including a d and a f output conversions for instance 90 hello echo Hello world stage s step 4d time tf The order is important That is the first argument
172. rt judgment or allegation of patent infringement or for any other reason not limited to patent issues conditions are imposed on you whether by court order agreement or otherwise that contradict the conditions of this License they do not excuse you from the conditions of this License If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations then as a consequence you may not distribute the Program at all For example if a patent license would not permit royalty free redistribution of the Program by all those who receive copies directly or indirectly through you then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program If any portion of this section is held invalid or unenforceable under any particular circumstance the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices Many people have made generous con tributions to the wide range of software distributed through that system in reliance on consistent application of that system it is up to the auth
173. rtitioned in the following way A node that is connected to elements in different processors is assigned to the highest numbered processor Referring to the mesh in figure 26 and with the same element partitioning all nodes in the interface would belong to processor 1 as shown in figure 27 Now if a dof is connected to a dof j on other processor we mark as interface that dof that belongs to the highest numbered processor So in the mesh of figure 27 all dof s in the interface between element sub domains are marked to belong to processor 1 The nodes in the shadowed strip belong to processor 0 and are connected to nodes in processor 1 but they are not marked as interface since they belong to the lowest numbered processor Note that this strategy leads to an interface set of 4 nodes whereas the simpler strategy mentioned first would lead to an interface set of 4 i e including the nodes in the shadowed strip which is two times larger Element in processor 0 Element in processor 1 O dof in processor 0 S dof in processor 1 dof s in processor 0 connected to dof s in processor 1 Figure 28 Non local element contribution due to bad partitioning e The IISDMat matrix object contains three MPI PETSc matrices for the Azz Arz and Ar blocks and a sequenti
174. s like columns rows or individual elements This is due to the fact that acessing a given element in the matrix implies a certain number of arithmetic operations Otherwise we can copy the row or column in an intermediate object but then there is an overhead due to the copy operations The particularity of FastMat2 is that at the first execution of the loop the address of the elements used in the operation are cached in an internal object so that in the second ans subsequent executions of the loop the addresses are retrieved from the cache 9 2 Example Consider the following simple example We are given a 2D finite element composed of triangles i e an array xnod of 2 x Nyoq doubles with the node coordinates and an array icone with 3 x Nelem elements with node connectivities For each element 0 lt j lt Nelem its nodes are stored at icone 3 j k for 0 lt k lt 2 We are required to compute the maximum and minimum value of the area of the triangles This is a computation which is similar to those found in FEM analysis For each element we have to load the node coordinates in local vectors X1 xg and x3 compute the vectors along the sides of the elements a x2 xX and b x3 xy The area of the element is then the determinant of the 2 x 2 matrix J formed by putting a and b as rows The FastMat2 code for the computations is like this Chrono chrono FastMat2 x 2 3 2 a 1 2 b 1 2 J 2 2 2 FastMatCacheList cache_list FastMa
175. s based in a simple pseudo random function of the form int rand int v int x a v X int y v x 2 y y MAX y y m 143 int hash int w int n Y int v C for int j 0 j lt n j v rand v x where MAX 2 and c 0x238e1f29 m 0x6b8b4567 e MD5Hasher This is based on the MD5 routines from RSA This is an elaborated algorithm that creates a 16 byte hash value from a string of characters We take as hash value the first 4 bytes from this digest 18 Synchronized buffer One difficult task in parallel programming is printing from the slave nodes In MPI in general it is not guaranteed that printing from the nodes is possible and in the MPICH implementation output from the nodes get all mixed and scrambled PETSc provides a functions in order to facilitate this task The user can call PetscSynchronizedPrintf as many times as he wants en each nodes The output is concatenated in each node to a buffer and then a collective call to PetscSynchronizedFlush flushes all the buffers in order to the standard output There is a similar function for files PetscSynchronizedFPrintf but it turns out that the flushing of standard output and files is mixed In addition even in the case of writing only to standard output the output is not sorted properly The objective of the SyncBuffer lt T gt template class and the KeyedOutputBuffer class is to have a synchronized output device that sorts the lin
176. s is performed with the DistMatrix object A_LL_other 16 4 Efficiency considerations The uploading time of elements in PETSc matrices can be significantly reduced by using block uploading i e uploading an array of values corresponding to a rectangular sub matrix not necessarily being contiguous indices instead of uploading each element at a time The following code snippets show the both types of uploading ELEMENT BY ELEMENT UPLOADING The global matrix Mat A row and column indices both of length nen int row_indx col_indx Elementary matrix size nen nen double Ae I aa define row_indx col_indx and fill Ae for int j 0 j lt nen j for int k 0 k lt nen k ierr MatSetValue A row_int j col_indx k Ae j nen k ADD_VALUES BLOCK UPLOADING TP sx ierr same stuff as before MatSetValues A nen row_int j nen col_indx k Ae ADD_VALUES In PETSc FEM the computed elemental matrices can be uploaded in the global matri ces with both methods as selected with the block_uploading global option set to 1 by default i e use block uploading However in some cases block uploading can be actually 134 slower due to the use of masks A mask is a matrix of the same size as the elemental matrix with 0 s or 1 s indicating whether some coefficients are structurally i e not for a particular state null Moreover the mask do not depends on
177. s on standard output they are printed on the corresponding file The file is opened and closed at each time step 11 2 Embedded gatherers If the elemset has a dimension lower than the embedding space for instance a surface embedded in 3D space then computing the gradients of the variables on the surface can not be done with the information on the surface only This is not an uncommon situation for instance it happens when computing viscous forces of a Newtonian fluid on the skin of a solid body The gradient of the velocity field must be computed in order to compute 94 12 embedded gatherer element Figure 12 Embedded gatherer element the stress on the skin But if the velocity field is known only on the surface think for simplicity in the case a of plane surface the normal component of the gradient can not be determined In order to solve this a special class of gatherers have been developed namely the embedded_gatherer class Typically such an elemset is composed is a surface elemset associated with a volume one For instance see figure 12 the user can have a fluid problem with a volume elemset in 2D composed of cartesian2d and wants to compute the viscous traction on the solid surface AB For this she adds an elemset visc_force_integrator composed of six nodes elements composed of three layers of segments parallel to the surface as for instance the element 3 4 8 9 14 marked with a dashed line in the figure This specia
178. s tests is to check the corect behaviour of PETSc FEM but also they can serve people to understand the program 71 8 1 Flow in the anular region between to cylinders File sector dat This example tests periodic boundary conditions The mesh is a sector of circular strip 2 72 lt r lt 4 48 0 lt 0 lt 7 4 and we impose The governing equation is Au 0 where u is a velocity vector of two components On the internal radius we impose u 0 and u t where t is the tangent vector On the radii 9 0 and 0 7 4 we impose periodic boundary conditions so that it is close to flow in the region between two cylinders with the internal cylinder fixed and the external one rotating with velocity 1 But the operator is not the Stokes operator However in this case the flow for this operator is divergence free and the gradient of pressure has no component in the angular direction so that the solution do coincide with the Stokes solution In the output of the test sector sal we check that the solution is aligned with the angular direction uy uy at the outlet section 0 7 4 8 2 Flow in a square with periodic boundary conditions File lap_per dat This is similar to example sector dat but in a region that is the square 1 lt x y lt 1 u is imposed on all sides u 1 and in the tangential direction in the counter clockwise sense i e u 0 1 in z 1 u 41 0 in y 1 By symmetry we can s
179. s to be passed to the external program input step type integer default 1 Description Input for sequencer This value is passed to the dx_hook hook but currently is ignored by it It could be used in a future in order to synchronize the DX internal step number with the PETSc FEM one input state_file type string default NULL An ASCII file storing a state where to read the state to be visualized output output_array_list type field Description Group object of imported arrays output output_field_list type field Description Group object of imported fields 13 4 DX hook options int dx_auto_combine default 0 Auto generate states by combining elemsets with fields found in file dxhook cpp int dx_cache_coords default 0 Uses DX cache if possible in order to avoid sending the coordinates each time step or frame Use only if coordinates are not changing in your problem found in file dxhook cpp double dx_coords_scale_factor default 1 Coefficient affecting the new displacements read New coordinates are cO x0 c u where cO dx_coords_scale_factor0 cl dx_coords_scale_factor x0 are the initial coordinates and u are the coordinates read from the given file found in file dxhook cpp 105 double dx_coords_scale_factor0 default 1 Coefficient affecting the original coordinates See doc for dx_coords_scale_factor found in file dxhook cpp int dx_do_make_command default 0 If true then
180. sfm2 cpp int indx_ALE_xold default 1 Pointer to old coordinates in nodedata array excluding the first ndim values found in file nsitetlesfm2 cpp double jacobian_factor default 1 Scale the jacobian term found in file nsitetlesfm2 cpp int npg default none Number of Gauss points found in file nsitetlesfm2 cpp double pressure_control_coef default 0 Add pressure controlling term found in file nsitetlesfm2 cpp int print_van_Driest default 0 print Van Driest factor found in file nsitetlesfm2 cpp double residual_factor default 1 Scale the residual term found in file nsitetlesfm2 cpp double rho default 1 Density found in file nsitetlesfm2 cpp double shock_capturing_factor default 0 Add shock capturing term found in file nsitetlesfm2 cpp double tau_fac default 1 Scale the SUPG and PSPG stabilization term found in file nsitetlesfm2 cpp 67 e double tau_pspg_fac default 1 Scales the PSPG stabilization term found in file nsitetlesfm2 cpp e double temporal_stability_factor default 0 Adjust the stability parameters taking into account the time step If the steady option is in effect which is equivalent to At oo then temporal_stability_factor is set to 0 found in file nsitetlesfm2 cpp e int weak_form default 1 Use a weak form for the gradient of pressure term found in file nsitetlesfm2 cpp Elemset bcconv_ns_fm2 e string geometry
181. so if you want to DX be able to communicate asynchronously with PETSc FEM you have to compile with the USE_PTHREADS flag enabled disabled by default e Load the dx_hook hook in PETSc FEM and pass it some options e Build the dynamically loadable module file dx epimport and load it in DX DX basic visualization units are Field objects which are composed of three Array objects called positions node coordinates connections element connectivities and data computed field values At each time step ExtProgImport exports two Group objects e a Group of Arrays named output_array_list and e a Group of Fields objects named output_fields_list This objects are generated as follows e For each Nodedata PETSc FEM object a positions array is constructed this results in say nn array objects Currently PETSc FEM has only one Nodedata object member name nodes so that NN 1 e A data array is constructed in basis to the current state vector member name data e For each elemset a connections array is constructed You can disable construction of some particular elemsets through the dx elemset option and also which nodes and DX interpolation geometry is used This results in other ne connection arrays The member name of each array is based in the name option of the elemset If this has not been set the name is set to the type of the elemset If collision is produced a suffix of the form _0 _1 is appended in order to make it unique
182. stMat2 amp A Divide matrix element by element like Matlab e FastMat2 axpy const FastMat2 A const double alpha Axpy operation element by element this alpha A 9 4 2 In place operations These operations perform an action on all the elements of a matrix e FastMat2 set const double val 0 Sets all the element of a matrix to a constant value e FastMat2 scale const double val Scale by a constant value 85 e FastMat2 amp add const double val Adds constant val e FastMat2 amp fun scalar_fun_t function Apply a function to all elements e FastMat2 amp fun scalar_fun_with_args_t function void user_args Apply a function with optional arguments to all elements 9 4 3 Generic sum operations sum over indices These operations perform some operation an all the indices of a given dimension resulting in a matrix which has less number of indices It s a generalization of the sum max min operations in Matlab that returns the specified operation per columns resulting in a row vector result one element per column Here you specify a number of integer arguments in such a way that e if the j th integer argument is positive it represents the position of the index in the resulting matrix otherwise e if the j th argument is 1 then we perform the specified operation sum max min etc over all this index For instance if we declare FastMat2 A 4 2 2 3 3 then B sum A 1 2 1 1 means Bij y Arji for
183. t2 activate_cache amp cache_list Compute area of elements chrono start for int ie 0 ie lt nelem ie 4 FastMat2 reset_cache for int k 1 k lt 3 k 4 int node ICONE ie k 1 x ir 1 k set amp XNOD node 1 0 rsQ x rs a set x ir 1 2 a rest x ir 1 1 75 b set x ir 1 3 b rest x ir 1 1 J ir 1 1 set a J ir 1 2 set b double area J rs det 2 total_area area if ie 0 minarea area maxarea area if area gt maxarea maxarea area if area lt minarea minarea area printf total_area g min area g max area g ratio g n total_area minarea maxarea maxarea minarea printf Total area OK s n fabs total_area 1 lt 1e 8 YES NOT double cpu chrono elapsed FastMat2 print_count_statistics printf CPU g number of elements d n rate g sec Me g Mflops n cpu nelem cpu le6 nelen nelem FastMat2 operation_count cpu 1e6 FastMat2 void_cache FastMat2 deactivate_cache Calls to the static members those starting with FastMat2 are related with the caching manipulation and will be discussed later Matrix are dimensioned in line 2 the first argument is the number of dimensions and then follow the dimensions for instance FastMat2 x 2 3 2 defines a matrix which 2 indices ranging from 1 to 3 and 1 to 2 respectively The rows of this matrix will store the coordinates of the local nodes to the element
184. tations in total Oriented tetrahedra is identical to unoriented tetrahedra without inversion 24 per mutations in total Figure 37 Generators for hexas Unoriented hexas are described by numbering one of the faces as a quad and then the opposite face in correspondence with the first face Symmetries are generated by 90 rotations for any two of the faces not opposite for instance 1 2 3 0 5 6 7 0 and 1 5 6 2 0 4 7 3 and inversion 1 0 2 3 232 permutations in total Oriented hexahedra is identical to unoriented hexahedra without inversion 112 per mutations in total Unoriented prisms are described by numbering one of the triangular faces as a tri angle and then the opposite face in correspondence with the first face Symmetries are generated by 180 rotations of any two of the quad faces 4 35 1 0 2 for instance a 120 rotation of any of the triangular face 1 2 0 4 5 3 for instance and an inversion 1 0 2 4 3 5 for instance There are 12 permutations in total 140 Figure 38 Generators for prisms e Oriented hexahedra is identical to unoriented hexahedra without inversion 6 per mutations in total 17 3 2 Canonical ordering One of the most common operations when manipulating these geometrical objects is given two geometrical objects of the same type to determine whether they represent the same object The brute force solution is to apply all the permutations to one of them and check if one of the
185. te IN_VECTOR USE_TIME_DATA arglf arg_add amp state_old IN_VECTOR USE_TIME_DATA arglf arg_add amp gather_values VECTOR_ADD ierr assemble mesh arglf dofmap gather amp time_star CHKERRA ierr 93 Note that three arguments are passed to the gather assemble task the old and new states and the vector of gathered values Many gatherer elemsets have the suffix integrator appended to their names for in stance visc_force_integrator or volume_integrator 11 1 Dimensioning the values vector In order to use the gatherers the user must before dimension appropriately the ar ray of values with global option ngather Then for each gatherer elemset the user must set the options that select a continuous range in this vector namely gather_pos and gather_length The selected range is gather_pos gather_post gather_length Then for instance if the hypothetical gatherer elemset momentum_integrator is sup posed to compute the integral of the momentum a 3 vector in 3D then we could use as global_options ngather 3 __END_HASH__ elemset momentum_integrator 4 gather_pos 0 gather_length 3 data connectivity dat __END_HASH With this setup the program will compute for each time step the integral of the momen tum at will print these three values on standard output If a string is passed to the global option gather_file for instance ngather 3 gather_file momentum out then instead of reporting the gathered value
186. tes a branch point e static void choose const int j Follows a branch e static void leave void Leaves the current branch e static double operation_count void Computes the total number of operations in the cache list e static void print_count_statistics Print statistics about the number of operations of each type in the current cache list 10 Hooks Hooks are functions that you pass to the program and then are executed at particular points called hook launching points in the execution of the program The particular hook launching points may depend on the application but in order to fix ideas for the Navier Stokes module and the Advective Diffusive module the standard hooks are e init To be executed once at the start of the program e time step pre To be executed before the time step calculation e time_step_post To be executed after the time step calculation e close To be executed once at the end of the program There are built in hooks included in the modules for instance the DX hook that is in charge of communicating with the DX visualizations program or the shell hook that allows you to execute shell commands but you can also define your own hooks that are defined in a C piece of code compiled and dynamically loaded at run time You can do almost anything with your hooks for instance you can compress the result files perform file manipulation launch visualization with other software like GMV Also
187. the per element property is the total 98 heat flow through the element which has units of energy per unit time whereas ifcompute_densities 1 then the value set is the mean heat flow density which has units of energy per unit time and unit surface For the traction on a surface the passed value is the total force on the element in one case units of force and the mean skin friction in the other units of force per unit area Of course this option has no effect on the values passed via the global gather_values vector e dump_props_to_file activates the mechanism of storing per element computed val ues in a file e dump_props_file is the name of the file where the per element values are written If it contains a d format sequence then it is replaced by the time step number using printf and friends e dump_props_freq is the frequency at which the values are dumped to file 11 5 Parallel aspects Of course PETSc FEM is in charge of adding the contributions on different processors The resulting sum of contributions over all elements in all processors The sum is available not only in the master rank 0 but in all processors as with an MPI_Allreduce call This is important since this global sum can be used also in hooks in order to perform computations For instance a gatherer can be used to compute the force on a body and this force can be passed to a hook to compute the movement of the body 11 6 Creating a gatherer
188. tine to the flux function through this pointer e int options input integer e Matrix amp G_source output size N of X 1 compute if options amp COMP_SOURCE The source vector 6 6 1 Options General options e double Courant default 0 6 The Courant number found in file adv cpp e double Dt default 0 Time step found in file adv cpp 39 double atol default 1e 6 Absolute tolerance when solving a consistent matrix found in file adv cpp int auto_time_step default 1 Chooses automatically the time step from the selected Courant number found in file adv cpp int consistent_supg_matrix default 0 Uses consistent SUPG matrix for the temporal term or not found in file adv cpp double dtol default 1e 3 Divergence tolerance when solving a consistent matrix found in file adv cpp int local_time_step default 1 Chooses a time step that varies locally Only makes sense when looking for steady state solutions found in file adv cpp int maxits default 150 Maximum iterations when solving a consistent matrix found in file adv cpp int measure_performance default 0 Measure performance of the comp_mat_res jobinfo found in file adv cpp int nfile default 1 Sets the number of files in the rotary save mechanism see 7 2 found in file adv cpp int nrec default 1000000 Sets the number of states saved in a given file in the rotary save mechanism see
189. tive humidity is a complex non linear function of T and H through the sorption isotherms of the porous media and the saturation pressure of water it results in a non linear link of the form P T H Puatm By the moment we consider only a linear relation since the non linear case doesn t fit in the representation 123 The non linear case will be considered later on 14 3 A small example Figure 20 A small example showing boundary conditions Consider the region shown in figure 20 composed of 8 triangle elements and 9 nodes We have 9 x 3 27 node field values but periodic boundary conditions on side 1 4 7 to side 3 6 9 eliminates the 9 unknowns on these last three nodes In addition at the outlet boundary nodes 1 2 3 there is only one in going component so that the unknowns here are only w w3 w2 and w3 On the other hand on the inlet boundary u and v are 111 imposed so the vector of unknowns is while the prescribed values are U we Us wi U we U w3 Us u4 Ug va U7 p4 Us Us Ug v5 Uio ps Uii p7 U12 ps U wl Uy ws U3 ur U v7 Us ug Us v8 112 128 129 And the relation defining matrices Q and Q are uy Yi Ur Vo 01 Va Us v Va Ur VA U1 Vox Us p Voy Uy Va U1 Vox U2 uz V Oz V U V Ua va VA U2 VA U3 VA U4 po VA Da VZ U VA U4 uz cosa Vi Uy Vib U1 Vh U2 sin a Va Oy VA U1 VA U2 v3 sina Vi
190. tly from hash table or per element table 13 13 13 14 15 16 16 17 17 17 18 18 20 20 21 21 22 22 22 23 24 26 27 27 6 The general advective elemset 34 6 1 Introduction to advective systems of equations 34 6 2 Discretization of advective systems 2 2 a ee 35 63 SUPG stabilization lt o s soc ioios a ma a scsi a ee eee 36 GA Shock Care a e a a A eG RA a aa we a 36 6 5 Creating a new advective system e ee so 37 6 6 Flux function routine arguments a a 37 GOL S ok ee ee A ee ee ees 39 6 7 The hydrology module ca cacos rasaae dieta pa terae sa s 42 51 Related Options lt oe a ce ed rew esaw a ew ee ia 44 6 8 The Hydrological Model cont a 45 69 Subsurface Flow 2 54 28245 68 ta detu eaaa ee ai 46 6 10 Surface Flow o o s socre bbe he eR ee be eae ee ee 46 6 10 1 2D Saint Venant Model o e 00205200 46 6 10 2 1D Saint Venant Model o e e 47 6 10 3 Kinematic Wave Model 2 sve to s a e ee ee ee i ea 47 6 11 Boundary COnditions gt 2 0 5 0 a koi e 2b PR a TOE we pe ee 48 6 11 1 Boundary Conditions to simulate River Aquifer Interac tions Coupling Termi soos sdo escassa ROR BER eee BS 48 6 11 2 Initial Conditions First Second and Third Kind Boundary Condi tions Absorbent Boundary Condition 48 6 12 Absorbing boundary conditions e e 50 6 12 1 Linear absorbing
191. trical objects 5 nodes and the edge in the figure In order to identify higher order objects like the edge we could add a new index for it say index 5 but instead we can associate the edge with the connected pair of node indices edge 2 3 Note that if we consider that the edge has no orientation then the sequence must be considered as a set i e edge 2 3 edge 3 2 Lisp like S exp expressions will be used throughout in order to describe objects whereas if oritentation matters then edge 2 3 edge 3 2 We could then define geometrical objects of type edge as unordered sequences of two node indices while the type ordered edge is associated with ordered sequences of two nodes We say that the set of permutations that leave invariant edge objects is 0 1 1 0 whereas for the oriented edge is only the identity permutation 0 1 For larger objects the set of permutations that leave invariant the node sequence that defines the object is more complex than that it doesn t reduce to the special case of ordered and unordered 138 sequences as for edges Consider for instance the case of triangles For an oriented triangle the set of nodes that leave invariant the triangle is 0 1 2 1 2 0 2 0 1 i e shifts clockwise and counter clockwise of the node sequence whereas for unordered objects the sequences are the same 0 1 2 120 201 02 1 1 0 2 2 1 0 In the last case the permutations that leave i
192. undary one usually leaves the temperature free that is no Dirichlet condition is imposed on But if the flow is reversed that is v n lt 0 n is the external normal to the boundary at some instant then this boundary condition becomes ill posed and the simulation may diverge 101 One solution is to switch from Neumann to Dirichlet boundary conditions depending on the sign of v i e 0 ifv n gt 0 a 3 Rare ch 118 ho ifv n gt 0 where h is a film coefficient and phi is the value to be imposed at the boundary when the flow is reversed Note that this amounts to switch to a Dirichlet condition by penalization For h large enough the value of will converge to However if h is too large this can affect the conditioning of the linear system The options for this elemset are e vector lt double gt coefs default empty vector Penalization coefficients ar tificial film coefficients The length of coefs and dofs must be the same found in file flowrev cpp e vector lt int gt dofs default empty vector Field indexes base 1 for which the flow reversal will be applied found in file flowrev cpp e int ndim default 0 Dimension of the problem found in file flowrev cpp e vector lt double gt refvals default empty vector The reference values for the unknown The length of coefs and dofs must be the same found in file flowrev cpp e int thermal_convection default 0 If this is set then th
193. y applications with the library If this is what you want to do use the GNU Library General Public License instead of this License 12 3 The PETSc FEM philosophy 3 1 The three levels of interaction with PETSc FEM As stated in the PETSc FEM description it is both a library and an application suite That means that some applications as Navier Stokes Euler inviscid flow shallow water general advective linear systems and the Laplace Poisson equation come bundled with it whereas the library is an abstract interface that allows people to write other applications So that we distinguish between the user for which the interaction with PETSc FEM is limited to writing data files for the bundled applications from the application writers that is people that uses the library to develop new applications Usually application writers write a main routine that use routine calls to the PETSc FEM library in order to assemble PETSc vectors and matrices and perform algebraic operations among them via calls to PETSc routines In addition they also have to code element routines that compute vectors and matrices at the element level PETSc FEM is the code layer that is in charge of assembling the individual contributions in the global vectors or matrices taking into account fixations etc Finally there is the PETSc FEM programmers that is people that write code for the core library written by the user data_file
194. ymmetric or they are not disposed symmetrically about a line passing by the rotation axis see figure 19 then there are no predefined stream lines but given two corresponding point like j and j that are obtained through rotation of a 27 Nfoils then we can impose Uj COSA Uj sin av Vj Pj sinau cosav 126 Pj Absorbing boundary conditions The basic concept about these b c s is to impose the in going components from outside reference values while leaving the outgoing com ponents free If w u v p is the state vector of the fluid at some node j on the outlet boundary see figure 16 then Vw u are the eigen components where V is the change of basis matrix The absorbing b c may be written as 1 127 w N w u Viw w 110 Where w is the in going component taken from the reference value Wye and the other components w are free i e they go in U Non linear Dirichlet boundary conditions In some cases Dirichlet boundary conditions are not expressed in terms of the state variables used in the computations but on a non linear combination of them instead For instance consider the transport of moisture and heat through a porous media and choose temperature T and moisture content H as the state variables On an external boundary we impose that the partial pressure of water should be equal to its external value Py Py atm As the partial pressure of water which may be related to rela

Download Pdf Manuals

image

Related Search

Related Contents

Cypress NoBL CY7C1355C User's Manual    Système plasma manuel ou mécanisé pour le coupage et le    BATTENTE - DEA Polska  

Copyright © All rights reserved.
Failed to retrieve file