Home

FARGO3D User Guide

1. 102 Chapter 19 Developing a complex setup FARGO3D User Guide Release 1 1 of which the implicit solution is n pa ae i 1 aAt which is stable for all At First we will create a file called edamp c in setups explosion similar to lt FLAGS gt define _ GPU define __NOPROTO lt XFLAGS gt lt INCLUDES gt include fargo3d h lt INCLUDES gt void Edamp_cpu real dt lt USER_DEFINED gt INPUT Energy OUTPUT Energy lt USER_DEFINED gt lt INTERNAL gt int 1 int 4 int k lt XINTERNAL gt lt EXTERNAL gt realx Energy gt field_cpu real edamp EDAMP int pitch Pitch_cpu int stride Stride_cpu int size_x Nx int size_y Ny 2x NGHY int size_z Nz 2xNGHZ lt EXTERNAL gt lt MAIN_LOOP gt for k 0 k lt size_z for 3 0 j lt size_y for i 0 i lt size_x lt gt k jtt i 4 e l 1 0 1 0 tedampxdt lt gt lt MAIN_LOOP gt Inside you should have something Now you must add this new routine to the makefile In order do this without breaking the generality of the code a file called setup objects was created This file must to has two make variables MAINOBJ GPUOBJ routine_filename o routine_filename_gpu o where the second line is optional In our case should be 19 3 Time
2. 14 2 4 Arguments Identificator Implicit only a string Parsed as Field f gt real f gt field_gpu non pointer argument gt textually Location Normally after the INCLUDES block Any built in type is allowed including FARGO3D s real type The only structure allowed is the Field structure There are two constrains about the arguments field e All must be on the same line e The Field structures are the last arguments 14 2 5 USER DEFINED Identificator 82 Chapter 14 Cuda translator C2CUDA py FARGO3D User Guide Release 1 1 lt USER_DEFINED gt lt USER_DEFINED gt Parsed as Textually Location After the function name This block can be very general You can do here a lot of things because there is no limitation on syntax everything inside is parsed textually In practice we use this block mainly to do memory transfers between host amp device when they are needed by issuing INPUT and OUTPUT directives This block is a kind of pre kernel execution instructions will be executed before the kernel launch by the launcher or wrapper function In a similar way the post kernel execution is the block called LAST_BLOCK 14 2 6 EXTERNAL Identificator lt EXTERNAL gt lt EXTERNAL gt Parsed as All the external variables are parsed as arguments of the kernel type variable global_variable_cpu
3. array of reals straight away and subsequently only refer to it to manipulate cell values In order to avoid confusion it is a good idea to have an upper case for the initial of Fields and lower case for the corresponding real arrays 6 4 Fields on the gpu Similar techniques are used on the GPU but we have made it totally transparent to the user so unless you want to program your CUDA kernels directly you should never to worry about this 6 5 Useful variables For the handling of the mesh a set of useful variables and macrocommands has been defined An extensive list with a description is given below Indices e 1 The index of the current cell It is a function of i j k pitch amp stride e 1xp The index of the right neighbor in x of the current cell It is a function of 1 e 1xm The index of the left neighbor in x of the current cell It is a function of 1 e lyp The index of the right neighbor in y of the current cell It is a function of 1 e lym The index of the left neighbor in y of the current cell It is a function of 1 e 1zp The index of the right neighbor in z of the current cell It is a function of 1 e 1zm The index of the left neighbor in z of the current cell It is a function of 1 e 12D The current index in a 2D field eg vmean It is a function of 3 k e 12D_int The current index in a 2D integer field eg a field of s
4. field gpu GPU Video RAM field cpu C Field evz CPU Normal RAM Before starting a M HD time step on the GPU we need to upload the data to it This is done by a host to device communication 42D which occurs through the PCI bus field gpu GPU Video RAM Field cou CPU Normal RAM The run can start on the GPU Only the data on board the GPU is then up to date while the data on the CPU is left untouched and corresponds to the initial condition GPU Video RAM field gpu a field cpu Field Vx CPU Normal RAM At some point we need to get the data back on the CPU for instance to dump it to an output file or because we have written a new routine that uses it and which we have not developed to run on the GPU We need to perform a communication through the PCI bus from the device to the host D2H this time A simple example could be Suppose we are in the first time step of a run and we want to compute the pressure field on the GPU P c p but we have set the initial conditions on the CPU We need to upload the sound speed and density fields We do not upload all fields This would be extremely time consuming We only upload what is needed and nothing else In order for that to be achieved semi automatically we have defined two macrocommands that are called INPUT and OUTPUT What these macrocommands do is that they update the color red or green of a field on the GPU or CPU s
5. gt function_name_kernel type global_variable_gpu type variable global_variable_cpu gt function_name_kernel typex global_variable_gpu Location After the USER_DEFINED block The cuda kernels cannot see the global host variables This block is meant to grant access to these variables to the kernels so that all the variables dealt with in this block are global The main rule when your draw a list of global variables inside the EXTERNAL block is e Avoid the use of all capital variables Instead declare another variable with the same name but without cap itals and declare the variable equal to the global variable example real omegaframe OMEGAFRAME 14 2 7 INTERNAL Identificator lt INTERNAL gt lt INTERNAL gt Parsed as Textually Location Normally after the EXTERNAL block All the internal variables are work variables with a very local scope You find here the indices of the loops but you could include any other variable that you need Maybe one of the most interesting examples on how to use this block is found in compute_force c 14 2 How the script works 83 FARGO3D User Guide Release 1 1 14 2 8 CONSTANT Identificator lt CONSTANT gt lt CONSTANT gt Parsed as The variables inside are moved to the constant memory if BIGMEM is not defined If BIGMEM is defined the variables are moved instead to the device s global memory
6. e LEGACY 35 FARGO3D User Guide Release 1 1 which requests the output of two files dumped by the former FARGO code dims dat and used_rad dat which can be needed by certain reduction scripts As can be seen also in the opt file it has the monitoring of the e MASS e MOM_X e TORQ We shall come back to the monitoring of quantities later on in this manual 7 1 2 Parameters The following parameters are essentially the same as those of the FARGO code You can also browse the online help of that code to get the detail of each of them e Setup This keyword specifies the name of the setup that should be used to build the code in order to run this parameter file If it is left unspecified this parameter file can be run using a build of FARGO3D with otvortex mri etc with potentially surprising error message and outcomes e AspectRatio real Sets the disk aspect ratio ho H Ro at r Ro where Ro is a characteristic length defined in src fondam h It is a natural choice to use Ro 1 ina scale free setup Physically this parameter is related to the sound speed through H cs ro Ogr This parameter is a way to initialize a desired sound speed on the disk e Sigma0 real Sets the numerical value of the surface density at r RO e SigmaSlope real Sets the exponent of the density profile assumed to be a power law of radius r SigmaSlope u r Sigma0 r dci x e FlaringIndex real Sets th
7. 10 4 Plotting your new setup If you have ipython pylab working plotting your new setup is very easy see the first run section e ipython pylab e In 1 rho fromfile outputs myblob gasdens 10 dat dtype float32 reshape 100 400 e In 2 imshow rho Figure 10 1 myblob setup at output number 10 ie at date OUTPUTNB NINTERM DT 0 5 here 56 Chapter 10 Defining a new SETUP CHAPTER ELEVEN RUN TIME VISUALIZATION FARGO3D has a visualization module that can be activated for a specific setup by doing S make SETUP setup view or the equivalent form S make SETUP setup FARGO_DISPLAY MATPLOTLIB In order to use it you need to have a python development package if you are using some package manager or simply a basic python installation the file python h is needed at compilation time Also it is mandatory to have installed matplotlib and numpy packages 11 1 How does it work Run time visualization uses an embedded Python interpreter running at the same time as your simulation All the related routines are in src matplotlib c The scheme developed to visualize data allows you to make a visualization routine adapted to your needs If you work with matplotlib you will see that making an interactive plot from within FARGO3D is the same as working in an interactive session of python matplotlib By default there are three main routines called plotld plot2d and plot3d The functionality o
8. Chapter 14 Cuda translator C2CUDA py CHAPTER FIFTEEN EXECUTION FLAGS FARGO3D has options that can be activated at run time in the form fargo3d flag parfile where flag is a suitable flag The extensive list is m Merge This flag is used for writing files in the form fieldn dat instead fieldn_m dat with n m integers n is the output number m is the process number This flag has relevance when you work with MPI In practice after NINTERM time steps you will have an output and each processor will write its own piece of mesh on the disk If m flag is activated they will write a single file with all the data inside in the right order as if the data had been written by a single process run 0 Redefine overwrite parameters on the command line The following argument must be enclosed between quotes The parameters are case insensitive The syntax for the separators is relatively flexible The following example shows a valid instruction fargo3d o outputdir scratch test nx 200 Ny 34 my_parameterfile par An error message is issued if the parameter does not exist or is redefined several times If the parameter was defined in the parameter file rather than being implicitly set to its default value an information string is output at run start S Restart from separate files This flag must to be used in the form fargo3d s n parfile where n is the output number at which you want
9. DumpAllFields 999 SaveStateSecondary RestoreState printf Executing s_gpu s n Edamp dt Edamp_gpu dt DumpAllFields 998 CompareAllFields prs_exit 0 y Note Why would GPU and CPU routines give different results if the GPU kernel is produced automatically from the CPU function Apart from issues related to passing values to the kernel in particular the lt CONSTANT gt block it may happen if you have a race condition inherent to your kernel What is a race condition It happens whenever the outcome of your kernel depends on the order in which threads are executed If the INPUT and OUTPUT fields of your kernel are different it is impossible to have a race condition If however a field appears both as INPUT and OUTPUT the values of its cells are used in the kernel and also modified by it This is the case of the kernel of Edamp In this case however everything is local the final value of one zone only depends on the value of that zone only so it does not matter in which order the CUDA threads process the mesh If this value also depended on the neighbors we would have race conditions and we would need to split the kernel in two and use an intermediary array For instance this is what we have done with SubStep2 19 6 Summary We have developed a somehow complex routine interacting with the main parts of FARGO3D This example shows how to write a kernel in five minutes o
10. S naturally also works if the mesh is cylindrical or Cartesian It can also be used to re spawn a 1D run so as to make it 2D radius azimuth D Specify manually the CUDA device This flag must to be used in the form fargo3d D n parameters par where n is an integer With this flag you can select manually the device where FARGO3D will run D Specify the CUDA device file fargo3d D devfile parameters par where devfile is a text file which contains several lines Each line must contain a host name such as compute 0 34 followed by one of the three following separators column slash or equal followed itself by a device number Example of device file compute 0 0 0 compute 0 0 1 compute 0 1 0 compute 0 1 1 This device file is intended for a run on two nodes compute 0 0 and compute 0 1 each node having two GPUs numbered 0 and 1 on each node If the run is spawned on nodes not specified in the device file the run will fail technically if the string returned by gethostname does not match any beginning of line in the device file The device file can be used together with the job scheduler such as torque or PBS to obtain the list of free GPUs on the nodes of the job even if the job scheduler is not GPU aware The public distribution comes with a directory called jobs in which you will find an example of PBS job file called dev file that parses the output of qstat to find the GPUs
11. a main level and an inner level Also comment lines are allowed The general structure of the format is Some header eee n E LEVEL1_a Level2_a Some_datal_a FA comment Level2_b Some_datal_b Level2_c Some_datal_c Level2_d Some_datal_d A comment line LEVEL1_b Level2_a Some_data2_a Level2_b Some_data2_b Level2_c Some_data2_c Level2_d Some_data2_d CLECs e 5 4 centering txt Special care must to be taken to the centering of data Not all data are defined at the same location in FARGO3D In order to write automatically boundary conditions in a given direction Y Z it is necessary to know which fields are centered or staggered in this direction This information is taken by boundparser py from centering txt This file uses the same format described above and uses a particular set of instructions The allowed values are 5 2 boundparser py 25 FARGO3D User Guide Release 1 1 e LEVEL1 gt The name of a Field Structure inside the code eg Density Vx Bz etc e Level2 gt the word Staggering e data gt C x y z or a combination of xyz eg xy yz xyz C is short of centered meaning that the field is a cube centered quantity The meaning of x is staggered in x that is to say the corresponding quantity is defined at an interface in x between two zones rather than in the middle of two subsequent interfaces By default the field value is assumed cente
12. gt field_cpu realx Energy gt field_cpu define Q1 xmed i XBLOB define 02 ymed j YBLOB for k 0 k lt Nz 2x NGHZ k for j 0 J lt Ny 2x NGHY J for i O i lt Nx i rho 1 1 0 Constant value outside e l 1 0 GAMMA 1 0 The isothermal soundspeed is equal to 1 0 vx 1 sqrt e 1 GAMMA 1 0 XMACH vy 1 0 0 if sqrt Q1 Q1 Q2 Q2 lt RBLOB YMAX YMIN rho 1 RHO21 vx 1 0 0 10 2 Initial state 55 FARGO3D User Guide Release 1 1 10 3 Making the executable We are now ready to build the code make SETUP myblob view You may skip the final rule view if the build process fails you need to install Python s matplotlib for it to work If everything goes fine you should see a message similar to All objects are OK Linking stage FARGO3D SUMMARY This built is SEQUENTIAL Use make para to change that This built has a graphical output which uses the python s matplotlib library Use make noview to change that SETUP myblob Use make SETUP valid_setup_string to change set up Use make list to see the list of setups implemented Use make info to see the current sticky build options And finally we can execute the test e fargo3d m setups myblob myblob par If you want to change the boundaries you must modify myblob bound and recompile the code make again
13. in any of the files provided in the public release n Provides a numerical seed to the code that is used for two things 1 The numerical seed is zero padded up to six digits and appended to the name of the output directory 2 This seed which is accessible from any part of the code as the integer variable ArrayNb will be used in general as a random seed for the initial conditions condinit c in the setup directory or the file postrestarthook c but its use may not be limited to that In the case of a restart the name of the output directory is changed prior to reading the data necessary of the restart if the flag is used Otherwise when the flag is used the data is read in the directory specified by the string outputdir of the parameter file or the command line then the outputdir parameter is changed and the data is output to the new directory This technique is very useful to spawn a large number of jobs when used in conjunction with SPBS_ARRAYID Example we run a master simulation in output directory out fargo3d m o outputdir out parameters par We then fork the results of this master simulation by restarting it with many different random seeds fargo3d m S 100 number o outputdir out parameters par We use the output number 100 found in out for each of the runs since we use the flag number is either provided by PBS_ARRAY or any shell loop index Finally each of the runs c
14. read 100 rec 1 data gnuplot plot filename binary format lf array nx ny w image 12 2 Domain files Another important piece of output is the domain files e domain_x dat e domain_y dat 60 Chapter 12 Outputs FARGO3D User Guide Release 1 1 e domain_z dat These three files are created after any run except if they existed in the output directory before running your simulation The content of these files are the coordinates of the lower face of each cell xyz min inside the code They can also be considered as the coordinates of the interfaces between cells It is important to note that domain_ yz dat are written with the ghost cells The format is ASCH and the total number of lines is e domain_x dat Nx lines e domain_y dat Ny 2NGHY lines e domain_z dat Nz 2NGHZ lines where NGHY NGHZ 3 by default The active mesh starts at line 4 and has Ny 1 Nz 1 lines up to the upper boundary of the active mesh If you want to use a logarithmic spacing of the domain you could set the parameter Spacing to log see the section Default parameters If you include in the output directory files with the name domain_ xyz dat their content will be read which enables you to handcraft any kind of non constant zone size 12 3 Variables When you run the code two files called variables par and IDL var are created inside the output directory These files are ASCII files containing the same information in two different f
15. 0 0 1 1 2 0 3 1 4 0 5 1 whereas in the second case the correspondence is as expected 70 Chapter 13 Communications FARGO3D User Guide Release 1 1 CPU core CPU core GPU 0 Q Y g m CPU core CPU core node 0 CPU core CPU core CPU core CPU core Ai Q Q y Y G o node 1 CPU core CPU core GPUO Q Sel E m CPU core CPU core RE node 2 Figure 13 1 Three nodes interconnected by a fast network with 4 CPU cores and 2 GPUs each Process GPU 0 0 1 0 2 0 3 1 4 1 5 1 Prior to writing the rule to select your GPUs on your cluster you should determine how your MPI implementation distributes the process ranks among the nodes case 1 or 2 by writing a test program such as include lt stdio h gt include lt mpi h gt int main int argc char xargv int rank char hostname 1024 PI_Init amp argc amp argv MPI_Comm_rank MPI_COMM_WORLD amp rank gethostname hostname printf I process of rank d run on host s n rank hostname PI_Finalize 13 3 2 Implementation of the device selection rule How do we implement the device selection rule seen above This should be done on a platform MPI version basis on the same platform two different flavors of MPI may behave differently This is done in the function 13 3 MPI CUDA 71 FARGO3D User Guide Release 1 1 CPU core Process 0 GPU 0 CPU c
16. GPU 1 instead of the instruction above seq stands for PARALLEL 0 but make seq gpu would fail Note Additional information We assume here that you followed the whole sequence of examples of this page so that your previous run was a parallel CPU run Build options such as PARALLEL 1 are sticky so that they are remembered from build to build until their value is explicitly changed Since we want here a sequential built for one GPU we need to explicitly reset the value of PARALLEL to zero Another option is to issue make mrproper which resets all sticky built options to their default values and after that issue make gpu You will see at the end of the building process the message FARGO3D SUMMARY This built is SEQUENTIAL Use make para to change that This built can be launched on a CPU with a GPU card 1 GPU only SETUP fargo Use make SETUP valid_setup_string to change set up Use make list to see the list of setups implemented Use make info to see the current sticky build options telling you that the build is sequential and should be run on a GPU To run it simply type 10 Chapier 2 First Steps FARGO3D User Guide Release 1 1 S fargo3d m setups fargo fargo par Before the initialization of the arrays you will see a block similar to PROCESS NUMBER 0 RUNNING ON DEVICE N 0 GEFORCE GT 520 COMPUTE CAPABILITY 2 1 VIDEO RAM ME
17. In this case the code performs automatically a loop on the different planets and the corresponding file name has a suffix which indicates in a self explanatory manner the planet it corresponds to 12 7 5 How to register a monitoring function You may now stop reading if you are not interested in implementing your own monitoring functions and simply want to use the ones provided in the public distribution However if you want to design custom monitoring functions for your own needs let us explain how you include such functions to the code Let us recall that a monitoring function is a function that fills a dedicated 3D arrays with some value of interest left to the user This function has no argument and must return a void Have a look at the the file mon_dens c and the function void mon_dens_cpu defined in it Note that the temporary array dedicated to the storage of the monitoring variable is the Slope array As we enter the monitoring stage after a M HD time step the Slope array is no longer used and we may use it as a temporary storage Any custom monitoring function will have to use the Slope array to store the monitoring variable The monitoring function is then registered in the function InitMonitoring in the file monitor c There we call a number of times the function InitFunctionMonitoring to register successively all the mon itoring functions defined in the code e The first argument is the integer power of two that is
18. Note the pointers and the triple nested loop For filling the fields we will use the helper index T In this case the outer loop is not necessary but when Z is not defined by default NGHZ 0 and Nz 1 so there is only one external loop cycle Also in this particular case we need to define a circle and the size of the circle should be resolution independent so we 54 Chapter 10 Defining a new SETUP FARGO3D User Guide Release 1 1 will need to normalize it We could add the following macrocommand lines above the initialization of the indices i j k define Q1 xmed i XBLOB define Q2 ymed j YBLOB Remember all the upper variables are taken from the par file Now inside the innermost loop we will fill the field First we need a condition about where the blob is if sqrt Q1 Q14 Q2 Q2 lt RBLOB YMAX YMIN And inside these curly brackets for example the density must to be RHO21 times denser than the density outside The inner loop should be similar to rho 1 1 0 Constant value outside e 1 1 0 GAMMA 1 0 The isothermal soundspeed is equal to 1 0 vx 1 sqrt e 1 GAMMA 1 0 MACH vy 1 0 0 if sqrt Q1 Q1 02 Q2 lt RBLOB YMAX YMIN rho 1 RHO21 vx 1 0 0 A complete view of the file condinit c is include fargo3d h void CondInit int Lp jp ke realx rho Density gt field_cpu realx vx Vx gt field cpu real vy Vy
19. Python script will automagically create a var c file similar to that of the former FARGO code out of this newly created parameter file we must help the script to guess correctly the type of each variable For instance if we write Xmin 2 instead of Xmin 2 0 1t will wrongly deduce that Xmin is an integer not a floating point value with highly unpleasant consequences at run time Similarly the figure 20 5 is correctly recognized by the script but 5 would not be Now we will define the parameters specific to our setup They are gamma 1 666667 rho21 10 0 mach 2 7 rblob 0x15 xblob 1 0 yblob 0 0 where gamma is the adiabatic index rho21 is the quotient between the density in the circle 2 and outside 1 and the same for the temperature rblob is the radius of the initial blob normalized by the vertical size of the box xy blob is the initial position of the blob The observant reader will notice that gamma is already defined in std stdpar par with the same value Since both sets of parameters are used those of std stdpar par and those of setups myblob myblob par the first line in the block above is actually redundant and could have been omitted 10 1 4 opt file Our setup is 2D and we want to use the energy equation In the code s jargon we refer to this as an adiabatic situation We work in Cartesian coordinates emacs myblob opt The minimal opt file should be similar to F
20. and coordinate transformations If you have a run in dat format and you want to convert its output to vtk for displaying it with Visit FARGO3D has two execution flags devoted to this V Dat2 VTK amp B VTK2Dat The flag V is used when you have an output file in dat and you want to convert it into VTK The opposite holds for the flag B Visit has also a powerful python script for converting your data into vtk but it is not included with FARGO3D 93 FARGO3D User Guide Release 1 1 94 Chapter 16 VTK Compatibility CHAPTER SEVENTEEN IMPROVING CUDA PERFORMANCE One can regard the action of a CUDA kernel on a mesh as the distribution of elementary tasks one mesh cell one elementary task to the CUDA cores of the GPU The CUDA cores are distributed within streaming multipro cessors SMP on board the GPU For instance on a Kepler K20 GPU there are 13 SMP with 192 cores each for single precision data hence there are in total 2496 CUDA cores In a similar manner splitting the whole task into threads that perform elementary tasks on the CUDA core obeys a two level hierarchy the global mesh must be split in logical blocks and the blocks are then split in threads The user has to determine the size of the blocks in X Y and Z A given block runs on a single SMP If you choose blocks that are too small the SMPs are underused and the performance is degraded If you choose blocks that are too large the small amount of
21. are mapped to the ghost zones The last third ghost cell is filled with the first active one the second ghost cell is filled with the second active one and finally the first ghost cell is filled with the third active cell Note Note that we do not consider the mesh periodicity along a given direction as a proper boundary condition in the sense that no ad hoc prescription has to be used to fill the corresponding ghost zones Rather a mesh periodic along a given dimension has no boundary in this dimension and this property is assigned to the mesh in the parameter file and not in the boundary files of the setup that we present in detail below by the use of the Boolean parameters PeriodicY and PeriodicZ which default to NO In the boundary files that we present hereafter you will therefore see BC labels such as SYMMETRIC OUTFLOW etc but never PERIODIC Note We note on the figure above that we have three rows of ghost cells The number of rows depends on the problem and on how frequently communications are performed within a time step There is a trade off between the number of rows large buffers slow down the calculations and the communications and the number of communications The number of ghosts is defined in src define h around line 40 et sq 23 FARGO3D User Guide Release 1 1 Figure 5 1 Schematic view of how the b
22. associated to the function and which we use to request monitoring at build time in the opt file e The second argument is the function name itself the observant reader will notice that this is not exactly true in mon_dens c the function is mon_dens_ cpu whereas in InitMonitoring we have InitFunctionMonitoring MASS mon_dens mass instead of InitFunctionMonitoring MASS mon_dens_cpu mass The reason for that is that mon_dens is a function pointer itself that points to mon_dens_cpu or to mon_dens_gpu depending of whether the monitoring runs on the CPU or the GPU e The third argument is a string which constitutes the radix of the corresponding output file e The fourth argument is either TOTAL or AVERAGE self explanatory e The fifth argument is a 4 character string which specifies the centering of the quantity in Y and Z The first and third characters are always respectively Y and Z and the second and fourth characters are either S staggered or C centered This string is used to provide the correct values of Y or Z in the formatted 1D profiles For instance the zone mass determined in mom_dens c is obviously centered both in Y and Z 12 7 Monitoring 65 FARGO3D User Guide Release 1 1 e The sixth and last argument indicates whether the monitoring function depends on the coordinates of the planet DEP_PLANET or not INDEP_PLANET In the distribution provided only the torq fun
23. be different for each process which will allow a selection of the device on this basis so that MPI_Init can be called afterwards If FARGO3D is compiled with the make flag MPICUDA main will invoke a function called Earl1yDeviceSelection just after reading the parameter file and it will subsequently invoke SelectDevice with the rank thus obtained 74 Chapter 13 Communications FARGO3D User Guide Release 1 1 Note Using OMPI_COMM_WORLD_LOCAL_RANK instead of OMP1_COMM_WORLD_RANK is simpler The for mer returns the rank within a node hence its name so that it can directly select the device number local_rank without further arithmetic This is the approach used in FARGO3D when the build flag MP ICUDA is set To sum up if you want to build FARGO3D with a CUDA aware MPI implementation you must pass this special environment variable to the code at build time This is achieved by defining the variable ENVRANK in the makefile You should edit one of the platform specific build options provided in the makefile and adapt it to your own needs Note how do you know if the code is really running with GPU Direct At run time on a GPU built if any communication of a data cube occurs between the CPU and GPU a flag is raised and a is printed on the terminal instead of FARGO 3D s classical dot This helps to diagnose that something is wrong for instance when a part of a time step is still running on the C
24. can browse the source files of mesh functions to see examples You can have the list of the corresponding files by have a look at src makefile There you see a block called GPU_OBJBLOCKS All the mesh functions are found in the files that have same prefix as those found in this list but with a c suffix instead of _gpu o 14 2 1 FLAGS Identificator lt FLAGS gt lt FLAGS gt Parsed as Textually with heading comment sign removal Location Normally at the top of a C file There are two flags that must be always included define _ GPU define __NOPROTO They are important for a proper header building The __GPU can be used inside a C mesh function to issue specific lines that should be run only on the GPU version with the help of the macrocommand ifdef _ GPU endif 14 2 How the script works 81 FARGO3D User Guide Release 1 1 14 2 2 INCLUDES Identificator lt INCLUDES gt lt INCLUDES gt Parsed as Textually Location Normally after the FLAGS block The INCLUDES block is the block where all the headers are This block must contain at least include fargo3d h You may include any other header file 14 2 3 Function name Identificator Implicit only a string Parsed as function_name_cpu gt function_name_gpu for the wrapper or launcher gt function_name_kernel for the associated CUDA kernel Location Normally after the INCLUDES block
25. complex directories in FARGO3D The complexity arises because inside are stored all the information required for a given specific problem The extensive list of the files stored for each setup is condinit c this is the file where the initial conditions are written Thanks to the use of the VPATH variable in the makefile this file supersedes the file src condinit c of the main source SETUP par the parameters required for this setup SETUP opt all the directives for the makefile this is were you decide the number of dimensions the equation of state the geometry whether you use orbital advection aka FARGO algorithm MHD etc SETUP bound set the boundary conditions used in the setup taken from boundaries txt SETUP mandatories A list of parameters that must be always explicited in your par files SETUP units The scale rules for the parameters not explicited in std standard_units SETUP objects Additional objects you want to include Your own developments Warning Any file here has priority over the file with same name in the src directory So in theory inside a SETUP directory you could have a complete copy of the src directory and the make process will be done with this sources but in practice only a few files are needed for example depending on your needs resistivity c potential c etc 3 2 setups SETUP directory 15 FARGO3D User Guide Release 1 1 16 Chapter 3 Directory tree CHAP
26. have a real value If it finds it in the scaling tules it has access to those of std standard units plus if any in setups SETUP SETUP units in case your setup defines new real variables it copies that rule in a file made automatically that is called rescale c and which contains the rescaling routine called before entering the main loop if you have made a built with the RESCALE option If it does not find a scaling rule for a variable it issues a warning asking to check whether this variable is dimen sionless Since the output of this script is found at the very beginning of the make process and may be unnoticed it can be a good idea to run the script separately You have to do that from the src directory 48 Chapter 9 Units FARGO3D User Guide Release 1 1 cd sre python scripts unitparser py mri Warning Scaling rule not found for FLARINGINDEX Is it dimensionless Warning Scaling rule not found for SIGMASLOPE Is it dimensionless Warning Scaling rule not found for BETA Is it dimensionless Warning Scaling rule not found for NOISE Is it dimensionless Warning Scaling rule not found for ASPECTRATIO Is it dimensionless You can verify that each of the variables found by the script is indeed dimensionless This list is naturally setup dependent and the above example is for the set mri Warning Upon completion of the manual run of the script as above you MUST go to the
27. means that the data on the CPU is up to date green state in the above explanation and out of date otherwise red state Similar rules apply for fresh_gpu One should never set nor even see directly these flags All of this is taken care of by INPUT and OUTPUT Take away message you should only care to properly state at the beginning of each routine which fields are INPUT and which fields are OUTPUT That s all This should be done rigorously as you start to write the routine If you forget to do it the code will throw wrong results when run on a GPU built If you do it correctly you will never have to worry about CPU GPU communications which will take place automatically for you behind the scene This is easy to do intuitive but it must be done rigorously All the details can be found in src fresh c file We have actually developed several kinds of INPUT OUTPUT macrocommands for each type of field encountered in the code Field volumic data Field2D X averaged ie azimuthally averaged data 2D real data and FieldInt2D 2D fields of integers The latter are used in particular for the shifts needed by the azimuthal advection NPUT Field NPUT2D Field2D PUT2DINT FieldInt2D UTPUT Field UTPUT2D Field2D UTPUT2DINT FieldInt2D OO OHHH Under the hood these methods are only wrappers of the cudaMemcpy function 13 1 GPU CPU communications 69 FARGO3D User Guide Release 1
28. memory within the SMPs 48k is saturated and the extra data is stored within the device s global memory the Video RAM with a dramatic performance penalty There are other considerations that matter the choice of the CUDA blocks for instance memory alignment but in short it is obvious that there is an optimal block size that will maximize performance This size depends e On the GPU e On the kernel itself it is not the same for all kernels of FARGO3D By default the block sizes used in a kernel execution are the numbers provided in the opt file which are reason able numbers but they are the same for all kernels hence they cannot be optimal A makefile rule combined with python scripting has been developed in order to do perform a systematic test of the performance of each kernel individually as a function of the size of the CUDA blocks At compilation time a file called setup blocks setup is the name of your setup is looked for in the corresponding src setup directory in order to provide to c2cuda py the best block size for r each kernel You could hand write this file but in practice it is automatically generated by the makefile when you execute the rule called blocks make blocks setup SETUP It is necessary to use setup in lower case in order to avoid a misunderstanding with the SETUP variable Exam ple make blocks setup fargo And you will see lines similar to CompPresIso 64 8 1
29. resolution the central vortex begins to drift in some direction breaking the initial central symmetry of the setup It can be desirable to check whether this break of symmetry arises as a consequence of amplification of noise or because the scheme contains a bug that renders it non sym metric We have found that at least on the CPU non asymmetries in the scheme arise from additions of more than two terms which are non commutative As the MHD solver implies at several places arithmetic averages of four variables we need to group them by two in order to enforce symmetry If the initial conditions are strictly symmetric the fields will then remain symmetric forever The interested reader may grep STRICTSYM in the sources This trick does not work on the GPUs on which we have tested it however The other make flags have already been discussed in the fargo setup 7 2 2 Parameters The parameter file is short and each of its variables is self explanatory 7 2 3 Suggested run You may activate run time visualization to see the vortex evolve you must have installed matplot1ib for that make SETUP otvortex GPU 0 PARALLEL 1 view mpirun np 4 fargo3d m setups otvortex otvortex par 7 3 Sod shock tube 1D This very simple setup is self explanatory You may obtain information about it by issuing at the command line make describe SETUP sodld If you build it with run time visualization a graph of a field is display
30. takes place or not Our new routine therefore just needs to know the time interval at which it is called which is DT There is normally no need to pass this variable as an argument since it is a global upper case variable that can be invoked anywhere in the C code So the invocation of our new routine in src main c should be similar to AlgoGas amp PhysicalTime OurNewRoutine Actually Explode see below MonitorGlobal Since we want to amend main c and algogas c it is a good idea to copy these files to out setup di rectory in order not to interfere with the main version of FARGO3D This way everything we develop for this complex kernel is self contained within the setup directory There is a drawback however if we im prove the main file src algogas c at some later stage this improvement will not be reflected in the file setups explosions algogas c until we implement it manually in this file The name of the new routine will be Explode and will be stored in the setups explosions condinit c file void Explode INPUT Energy OUTPUT Energy Int Lp pk real yr zr int yr0 zr0 real p realx Energy gt field_cpu real Rate 1 0 Average number of explosions per unit time real sphere_radius 0 05 sphere_radius sphere_radius we take the squar p drand48 if p lt 1 0 exp DTxRate real sphere_radius 0 1 yr YMIN drand48 YMAX Y
31. they transit through the host CPU if no CUDA aware version of MPI is used 20 4 Where is the directory were my last run data has been output 9 The path to this directory is given by the parameter OUTPUTDIR in the parameter file that you specified in your run If this parameter begins with a this character should be substituted with the content of the FARGO_OUT environment variable if it exists You can also go to your home directory There a directory name fargo3drc has been produced which contains two files history and lastout The latter contains for each run issued a timestamp and the path to the output directory The former contains a timestamp and the command issued to run the code You may define an alias in your bashrc file that brings you automatically to the last output directory alias fo cd tail n 1 HOME fargo3drc lastout Y 20 5 My build produces unexpected results Some files should have been remade and they have not How do fix this Although we have dedicated some care to the chain of rules of the relatively complex build process we may have failed to respect some dependencies If you believe that your make process is flawed the simplest thing is to note your build options with make info then issue a make mrproper and finally rebuild by issuing a new make command followed by all your build options 20 6 My code does not run much faster on the GPU than on the CPU Why is this There ar
32. time temperature and electric intensity These constants are the gravitational constant G G with units M 1 L 7 6 7 the central star mass M MSTAR with units M1L T 9 7 a length Ro RO with units M L T 9 the ratio of the ideal gas constant to the mean molecular weight R 1 R_MU with units M L T 79 1 and the value of the magnetic permeability of vacuum uy MUO with units M L T 7290172 Naturally if you specify the CGS unit system your parameter file must provide all real variables in this unit sys tem YMIN YMAX must be in centimeters and so must be ZMAX ZMIN in cylindrical or Cartesian coordinates and XMIN XMAX in Cartesian coordinates Similarly NU must be in cm 2 s the planetary mass in the cfg file must be in grams and so on and so forth There is however an exception to this when one uses the RESCALE directive as we explain below 9 3 Rescaling the input parameters Previous users of FARGO are certainly used to scale free input parameters in which the central star mass is set to one the planet s orbital radius set to one and the gravitational constant set to one The orbital period is therefore 27 You may require that FARGO3D be run in a unit system such as cgs or MKS without editing your scale free parameter file For this purpose you must build FARGO3D with the rescale option make UNITS MKS RESCALE 1 Once the parameters are read from the parameter file they are resc
33. variable you have two options e define FARGO_ARCH before compiling the code e define FARGO_ARCH in your personal bashrc or tcshrc file depending on your shell If you do not have a standard Linux distribution do not forget to export the variable FARGO_ARCH in your bashrc file vi USER_DIR bashrec and add the following line export FARGO_ARCH MYCLUSTER where MYCLUSTER is only an example name You should modify it to match the name that you defined in the src makefile This file is provided as is with a few examples that you may adapt to your own needs 4 4 FARGO_ARCH environment variable 21 FARGO3D User Guide Release 1 1 22 Chapter 4 Make Process CHAPTER FIVE BOUNDARIES Boundary conditions in FARGO3D can be selected only for the Y and Z directions since in X the mesh is always considered periodic It is because FARGO3D was mainly designed for azimuthal periodic planetary disks with a cost effective orbital advection aka FARGO algorithm Albeit that this limitation may seem strong in practice there are lots of situations where you can assume your 3D problem to be periodic along a given direction If you can not do that unfortunately there is no way to avoid this limitation with the present version of the code The boundary conditions BCs are handled by the script boundparser py We have developed a metalan guage to handle them Although this may seem quite a futile investment it soon a
34. we did not need external libraries to compile the code but 1f we want to build a parallel version of the code we must have a flavor of MPI libraries on our system Note FARGO3D was successfully tested with e OpenMPI 1 6 1 7 e MPICH2 3 e MVAPICH2 2 0 with similar overall performance for the CPU version of the code As we do not use any version dependent features of MPI we expect the code to work with any version of MPI There are however some special features of MVAPICH 2 0 and OpenMPI 1 7 related to CUDA interoperability that are discussed later in this manual and that are useful exclusively for GPU builds If you are running on a standard Linux installation with a standard working MPI distribution you have to issue make PARALLEL 1 or the corresponding shortcut make para At the end of the process you will see a message telling you that the compilation was performed in parallel Now you can run the code in parallel mpirun np 4 fargo3d m setups fargo fargo par If your computer have at least four physical cores you should see a speed up of a factor 4 Note Open MPI Installation on Ubuntu systems If you have Ubuntu these lines install a functional version of OpenMPI sudo apt get install openmpi bin sudo apt get install openmpi common sudo apt get install libopenmpi dev You must accept all the installation requirements A similar process could be done for MPICH This process i
35. we give specific values for a given setup such as the mesh size parameters specific of the initial conditions etc A given setup can be run with many different parameter files without recom piling Usually one file only is required to run FARGO3D once it has been build for a given setup This file is the parameter file There is an exception with some setups like the fargo setup which in addition require a file in which the planetary system initial configuration is specified The planet files which by convention have the extension cfg have the exact same syntax as in the former FARGO code so planetary systems designed for a prior FARGO calculation can be used straight away There are located in the sub directory planets of the main directory Their name and path must be passed to the code through the string parameter PLANETCONFIG 7 1 fargo This setup is a legacy setup The public FARGO code ancestor of FARGO3D amounts to this particular setup of FARGO3D This is the default setup and the initial conditions are taken from one of the EU comparison setups The setup is strictly comparable to the template par parameter file of the FARGO code We explain some special characteristics of the fargo setup 7 1 1 Make options This setup uses the following physical options which are selected in the fille setups fargo fargo opt e X Y e CYLYNDRICAL e ISOTHERMAL e VISCOSITY e POTENTIAL e STOCKHOLM It also activates the following flag
36. we have only a limited amount of acceleration factors to quote between CPU and GPU 18 2 The FARGO_SPEEDUP macrocommand We have developed a macrocommand named FARGO_SPEEDUP You can see its source in the file src define h near the line 475 This macrocommand is meant to give the speed up factor of a given CUDA kernel with respect to its CPU counter Its use is overly simple Suppose that we want to know the speed up ratio GPU vs CPU of the function SubStep1_x for the setup fargo First we need to identify where this function is invoked It is called near line 73 in the file src algogas c ifdef X FARGO_SAFE SubStep1_x dt endif Note that the invocation is wrapped in a FARGO_SAFE macrocommand the definition of which is empty see file src define h near line 409 All the sub steps of FARGO3D are wrapped similarly into this macro command In normal use it does not do anything However it may be redefined see the alternate definitions commented out near line 409 in src define h so as to provide useful debugging diagnostics What we need to do here to get an automatic evaluation of the speed up factor is simply to change our wrapper from FARGO_SAFE to FARGO_SPEEDUP Note that this new macrocommand will manipulate a bit the function name it will subsequently invoke SubStep1_x_cpu then SubStep1_x_gpu Since the C preprocessor is unable to manipulate strings we need to help it identify where the sub string
37. 023MB N A Default Compute processes GPU Memory GPU PID Process name Usage 0 Not Supported And the number of the GPU in this case is 0 So you could try to run FARGO3D forcing the executii fargo3d mD 0 setups fargo fargo par Note 2 2 5 First GPU run 11 FARGO3D User Guide Release 1 1 We have used two sticky flags PARALLEL and GPU and after some time you may wish to know which ones are activated In order to know the current status of the executable issue make info Current sticky build options PROFILING 0 ESCALE 0 ETUP fargo LEL 0 ARGO_DISPLAY NONE ULLDEBUG 0 TS 0 U 1 EBUG 0 CUDA 0 BIGMEM 0 D Ue H Iagun wW ES FU H You will see the meaning of each flag later on in this manual All the information about how to open and visualize your fields is still valid While the computation was done on the GPU all necessary data transfers were run in a transparent manner from the GPU to the CPU before a write to the disk For you nothing changes except the execution speed 2 6 First Parallel GPU run The same ideas as before can be used for running FARGO3D on multiple GPUs But we cannot give a set of instructions because they are cluster dependent In the next sections you will learn how to run FARGO3D on a large cluster and have each process select adequately its device according to your configuration When you know how to work with the in
38. 1 13 2 MPI When FARGO3D is running in parallel mode the main computational mesh must be split into several submeshes each one corresponding to a cluster core All the computation is done independently inside this submesh because all the HD MHD equations are local but at the borders of the submeshes some communications must be done with neighbors in order to merge all problems into a big one The mesh is split so as to minimize the surface of contact between the processors Following this rule the size of the communications is minimal Note that much like the former FARGO code the mesh is not split in the X azimuthal direction because orbital advection in not local in x This represents a penalty for communications because the contact surface between processors is not as small as it could be if we split the mesh in x as well The abscissa and ordinate of each processor in the 2D mesh Y and Z of processors are the global variables I and J In practice with this indices and with the variables CPU_Rank and CPU_Number you have all the information needed to know where each process lies in the mesh of processes and who the neighbors are 13 3 MPI CUDA 13 3 1 General considerations In a mixed CUDA MPI run we must have one processing element processor per GPU Normally when you run CUDA on one GPU only the driver selects the device for you automatically or you may specify manually which device you want to run on by sp
39. ARGO_OPT DX FARGO_OPT DY FARGO_OPT DADIABATIC FARGO_OPT DCARTESIAN 10 1 Blob test 53 FARGO3D User Guide Release 1 1 ifeq GPU 1 FARGO_OPT DBLOCK_X 16 FARGO_OPT DBLOCK_Y 16 FARGO_OPT DBLOCK_Z 1 endif If you want to use simple precision you can set FARGO_OPT DFLOAT 10 2 Initial state Now we must fill all the primitive fields with the initial conditions The standard method is as follow step by step 1 Make a file called condinit c inside your setup directory setups myblob condinit c 2 At the top of this file include the fargo3d h header 3 Define a function called CondInit that returns a void 4 Fill the Field_variable gt field_cpu with the data for each field of the problem step by step 1 Start by opening the new file for initial conditions emacs condinit c 2 In the top line include FARGO3D s header file include fargo3d h 3 Subsequently add lines similar to these lines void CondInit 4 Write inside the function something similar to ane 1 TJ realx rho Density gt field_cpu realx vx Vx gt field cpu realx vy Vy gt field_cpu realx Energy gt field_cpu i j k 0 for k 0 k lt Nz 2xNGHZ k for j 0 J lt Ny 2x NGHY j for i 0 i lt Nx i The fields are filled here with the help of the 1 index This is the basic structure of a routine that works on fields
40. C functions 14 1 FARGO3D Mesh Functions In FARGOBD all the time consuming functions that are called during a hydro or MHD time step involve without exception a 3D nested loop over the computational mesh We call hereafter these functions mesh functions The C to CUDA translator has been developed to convert these mesh functions to CUDA kernels The component of any mesh function are e A header e A function type and name e Arguments e Global variables Local variables e A loop over the mesh In some cases the structure is a bit more complicated including some additional lines at the end of the main loop but this will be discussed below The easiest example of a mesh function is the Pressure computation include fargo3d h void ComputePressureFieldIso_cpu realx dens realx cs Density gt field_cpu Energy gt field_cpu realx pres int pitch stride size_x size_y size_z i j k PE ssure gt field_cpu Pitch_cpu Stride_cpu Nx Ny NzA 2 NGHY5 F2X NGHZ 1 77 FARGO3D User Guide Release 1 1 for k 0 k lt size_z k for j 0 j lt size_y j for i 0 i lt size_x i pres 1 dens 1 xcs 1 cs 1 As you see the general structure is very simple Here is the explanation of the different blocks Header tinclude fargo3d h This block contains all the includes required to compile the code A funct
41. Example lt CONSTANT gt real ymin Ny 2 NGHY 1 lt CONSTANT gt Is parsed as define ymin i ymin_s i CONSTANT real ymin_s some_number Here some_number is a fraction of the constant memory size calculated by the parser ifdef BIGMEM define ymin_d amp Ymin_d tendif CUDAMEMCPY ymin_s ymin_d sizeof real Ny 2 NGHY 1 0 cudaMemcpyDeviceToDevice Location Normally after the INTERNAL block This is one of the most complex blocks It is very complex because the process is complex We need a way to pass our light arrays to the constant memory of the GPU for performance reasons But there are some cases where the problem is too big and the constant memory is not enough In this case one can define the build flag BIGMEM The main portrait of the process is Declare as constant some global variables you want to use without passing it as external or use this block for light arrays similar to xmin sxk etc This block reserves a constant memory segment then copy the data to the this segment If BIGMEM is activated the constant memory is not used but instead the global memory is used CUDAMEMCPY macrocommand is expanded in a manner that depends on whether BIGMEM is defined and therefore performs the copy to the correct location constant or global memory You can see this in src define h Note that there is a subtlety here in case you use BIGMEM the constant memory is st
42. FARGOS3D User Guide Release 1 1 Pablo Benitez Llambay Frederic Masset June 02 2015 CONTENTS 1 Introduction 3 1 1 A foreword about the terminology used in this manual o 4 12 LICONES sia e ak a a o BS Be a a tod e a 4 2 First Steps 5 2 1 Gettitig FARGOSD ooo cate tee ee GSR Oe ee eA eS ae SEA 5 22 Instale et Set ot does Pa ee ee ee E eo ie oe ee 5 23 A emoon ANA 5 24 First parallel tum acri ves Del A RA e aes 9 230 Fist GPUTON lt n pe 4 rro a AOE RA a E 10 26 First ParalleGRU min ia n ati oor id AAA A e aR 12 3 Directory tree 13 3A Directies iia ao e ra a a A a a Gh Be esl a 13 3 2 setup SETUP directory coe roo a e e p ea a A 15 4 Make Process 17 4 1 Makefile Stagel o coe mt e angi a e e ea ap Sd e Eo aS Baw Se edie amp ae 17 42 scripts make py Stage2 se passte a bee he HE ee Ee ee ee ee eR 19 4 3 Sic maketlle Stage3 sgor oe enek eb ERS ORLA REE HERR Re Pee eS 20 4 4 FARGO_ARCH environment variable e 20 5 Boundaries 23 5 1 How are boundary conditions applied s s s p ssc spp ee ee ee eee 23 2 Wboundparserpy 72 A ee eS 25 3 3 Boundaries files format 0 2 0 4 48 See 04 64 FM eRe ON eR eee Y 25 Oe centern txt aid e etd ee bee ee Be ee he vate Seo e E eed amp amp Bote 25 3 3 boundary templates e inse cen pee ee bad ns bbe eh a ee ey we eb ee e ok 26 3 6 bOUMdATIESIXE o ces bbe Ae Ee ee Oe Oe Le eA ee ee eee bee 26 5 1 SETUP bound
43. L 0 1 debug nodebug gt DEBUG 1 0 fulldebug nofulldebug gt FULLDEBUG 1 0 prof noprof gt PROFILING 1 0 mpicuda nompicuda gt MPICUDA 1 0 view noview gt FARGO_DISPLAY MATPLOTLIB NONE cgs mks scalefree gt UNITS CGS MKS 0 rescale norescale gt RESCALE 1 0 testlist gt A list of the tests implemented testname gt the test called name py found in test_suite will be executed blocks gt special syntax make blocks setup SETUPNAME Performs a detailed study of the perfor mance of your graphic card with respect to the size of the CUDA blocks This test will be done for each GPU function The result is stored in setups SETUPNAME SETUPNAME blocks go to the section In creasing the GPU performance A build is performed with a default block size if this file does not exist so you do not have to worry about this feature at this stage However remember that it may increase the performance up to 20 clean gt Cleans the bin directory Recommended when you switch to another SETUP mrproper gt Removes all the data related to some specific make configuration All the code is restored to its default The outputs directory will not be touched 4 2 scripts make py Stage2 The second step in the building process is to call make py This file does not need a manual invocation and is launched from the main Makefile stage1 In general the com
44. Let us take a simple example to illustrate this assume that somewhere in an azimuthal derivative calculation a developer has divided by the angular step rather than the linear one or vice versa i e he has forgotten to divide or multiply by the cylindrical radius When working in scale free units where one usually takes radii commensurable to one this mistake may remain unnoticed for a long time especially if it is hidden in a part which has a small impact on the evolution of the flow such as the viscous stress tensor for instance If one switches to MKS or cgs where the radii have values which are typically 10 to 14 orders of magnitude larger such errors appear straight away In fact we have developed tests to check the dimensional homogeneity of all parts of the code We run it a first time in a given system of units then we rerun it in another system of units and we check that the ratio of the two outputs is flat to the machine precision You may have a look at the test named dimp3diso py in the directory test_suite You can run it from the main directory by issuing make testdimp3diso The other reason for implementing a variety of unit systems comes from the user feed back of the FARGO code About half of the questions that we received on the code had to do with units Besides as the code gains in complexity by the inclusion of MHD or radiative transfer it may become desirable to switch to a standard system of units where constan
45. MIN zr ZMIN drand48 ZMAX ZMIN yrO yr zrO zr if yr gt YMIN YMAX YMIN 2 0 19 3 Time dependent explosions 101 FARGO3D User Guide Release 1 1 yr YMAX YMIN else yr YMAX YMIN if zr gt ZMIN ZMAX ZMIN 2 0 zr ZMAX ZMIN else zr ZMAX ZMIN sphere_radius sphere_radius we take the squar for k 0 k lt Nz 2x NGHZ k for 3 0 J lt Ny 2x NGHY j for i 0 i lt Nx i if Ymed 3 yr Ymed 3 yr Z2med k zr x Zmed k zr lt sphere_radius e l 5 0 if Ymed 3 yr0 Ymed 3 yr0 Zmed k zr x Zmed k zr lt sphere_radius e l 5 0 if Ymed j yr Ymed j yr Zmed k zr0 x Zmed k zr0 lt sphere_radius e l 5 0 if Ymed 3 yr0 Ymed 3 yr0 Zmed k zr0 x Z2med k zr0 lt sphere_radius e l 5 0 Note The variable Rate selects the rate at which explosions take place In the case in which Rate DT lt 1 we tend towards a Poissonian statistics of explosions When this condition is not fulfilled the statistics departs from Poisson s statistics because we can only have one explosion per DT Warning We work on the physical coordinates rather than on the indices so that the size of the spheres is independent of resolution and so that the output is independent on the number of processors this requires however that they all have same sequence of random numbers hence that they h
46. MORY 1 GB It is the information about the graphic card used by FARGO3D If you see some strange indications in these lines weird symbols an unreasonable amount of memory etc it is likely that something went wrong The most common error is a bad device auto selection Warning In the jargon of GPU computing the device is the name for a given GPU We shall use indistinctly these two terms throughout this manual If you need to know the index of your device you can use the nvidia smi monitoring software nvidia smi You may then explicitly specify this device on the command line here e g 0 S fargo3d m D 0 setups fargo fargo par In general expensive cards support detailed monitoring but at least the memory consumption will be given even for the cheapest ones Also you can check the temperature of your device An increasing temperature is a good indication that FARGO3D is running on the desired device Note Useful comments Note 1 If you cannot run on a GPU after reading the above instructions you should try to check the index of your device normally it is O is you have only one graphic card with nvidia smi nvidia smi NVIDIA SMI 4 310 44 Driver Version 310 44 GPU Name Bus Id Disp Volatile Uncorr ECC Fan Temp Perf Pwr Usage Cap Memory Usage GPU Util Compute M 0 GeForce GT 520 0000 01 00 0 N A A 40 37C N A N A N A 18 182MB 1
47. PU expensive volumic kind of data transfer are therefore occurring at each time step Similarly if any communication of a data square ie the boundary of a cube occurs between the CPU and GPU another flag is raised and a is printed on the terminal This happens if MPI communications are done through the host instead of being achieved through GPU Direct surfacic kind of data transfer are necessary between the host and the device in this case To sum up if you see on the terminal a line such as then all routines are running on the GPU but MPI communications are still done through the host with a sizable performance penalty Finally when you see the customary line of dots everything is running on the GPU and MPI communications are achieved through GPU direct 13 4 Spawning a job on a cluster of GPUs a primer e If MP ICUDA is not defined You can use the flag D to specify the device number on which each GPU job must be launched This is fine if your cluster has only one GPU per node and you spawn one PE per node A warning message is issued in any case as specifying manually the device number should be reserved for sequential runs Alternatively you can define a rule e g hostname based for the device number on the model of those already written in src select_device c in the function SelectDevice which is the function called when the code is compiled without MP ICUDA e if MPICUDA is de
48. TCONFIG This number starts at 0 For the vast majority of runs in which one planet only is considered three files are therefore output planet0 dat bigplanet0 dat and orbit0 dat The last two files correspond to fine grain sampling that is they are up dated every DT see also Monitoring In contrast planet i dat is updated at each coarse grain output every time the 3D arrays are dumped for restart purposes This file is essentially a subset of bigplanet i dat At each update a new line is appended to each of these files In the file bigplanet i dat a line contains the 10 following columns 1 An integer which corresponds to the current output number The x coordinate of the planet The y coordinate of the planet The z coordinate of the planet The x component of the planet velocity The y component of the planet velocity The z component of the planet velocity The mass of the planet The date So SO ga E 10 The instantaneous rotation rate of the frame In the file orbit i dat a line contains the 10 following columns 1 the date the eccentricity e the semi major axis a 2 3 4 the mean anomaly M in radians 5 the true anomaly V in radians 6 the argument of periastron in radians measured from the ascending node 7 the angle y between the actual and initial position of the x axis in radians useful to keep track of how much a rotating frame in particular with varying ro
49. TER FOUR MAKE PROCESS Building the code is achieved simply by invoking the Makefile in the main directory The Makefile contains the set of instructions that builds the executable normally called fargo3d according to a specific setup and with proper instructions for generating a CPU GPU sequential or parallel built This Makefile is not standalone A set of scripts was developed in order to simplify the building process These scripts do all the hard work Most users will not need to know the different stages of a build but we give here the detail of what happens when you issue make options in the main directory A schematic view of the make process is see coro rn We can see that the make process involves three main steps When we type make in the main directory we spawn the first step that is we execute the Makefile All the shortcut rules cleaning rules and testing rules are defined within The second step is spawned by the Makefile itself and amounts to running the scripts make py file that works as a link between the main Makefile and the src makefi le file The third and last step corresponds to the execution of the src makefile All the scripts are managed at this stage automatic conversion of C to CUDA analysis of boundary conditions and scaling rules etc 4 1 Makefile Stage1 As we have just seen the first file related with the make process is Makefile in the main directory Inside there 1s a set
50. _count SubStepl_x_gpu dt time_speedup_gpu GiveSpecificTime t_speedup_gpu wy wy printf GPU CPU speedup in s g n SubStepl_x time_speedup_cpu time_speedup_gpux 10 0 printf CPU time g ms n le3 time_speedup_cpu 200 0 printf GPU time g ms n le3 time_speedup_gpu 2000 0 y a proper indentation has been added for legibility We see that the function is firstly executed 200 times on the CPU then 2000 times on the GPU The respective single times on CPU and GPU are inferred and thus the speed up ratio Note that we have developed another useful macrocommand in the same spirit called FARGO_DEBUG which is meant to automatically compare the result of the CPU version of one routine with the result of its GPU counterpart It is presented in section Using FARGO_DEBUG 98 Chapter 18 GPU vs CPU Benchmarking CHAPTER NINETEEN DEVELOPING A COMPLEX SETUP This is the advanced part of the manual After this section you will be able to develop kernels and complex routines inside FARGO3D One of the most useful features of FARGO3D is the fact that you do not have to develop CUDA kernels yourself neither even learn CUDA All the CUDA code is written automatically with the help of scripts c2cuda py But developing FARGO3D is not only a matter of kernels We will not write an extensive documentation on each topic but instead we will develop a routine in detail and we will give a brief exp
51. a giant planet 20 11 A very incomplete TODO list 113
52. aled using rescaling rules For instance the value of YMIN the mesh minimal radius in cylindrical or spherical geometry is multiplied by Ro Similarly the value of SIGMAO the disk s surface density if your setup uses one is multiplied by M R2 etc This allows to get an output in a standard unit system while keeping scale free input files the content of which is probably more intuitive 9 4 Specifying the scaling rules A scaling rule for a variable is a product of the five dimensionally independent variables G M Ro R and Ho each raised to a specific power that determines uniquely the dimension of a variable A scaling rule for a given variable is unique If it is cast incorrectly the code will not pass the homogeneity test if this variable is used in the setup tested The scaling rules are required exclusively if you build the code with the RESCALE flag activated so as to have a dimensional output with scale free input parameters You can have a look at the file std standard units You can see that each line looks like C code no is required at the end and the right hand side of the symbol has same unit as the left hand side The scaling rules for some variables is trivial e g SIGMAO which is a surface density or PLANETMASS which is a mass During the make process the python script scripts unitparser py is run which scans all real variables known to the code that is everything in the setup par file found to be
53. an be subsequently restarted itself as follows 91 FARGO3D User Guide Release 1 1 fargo3d m S 200 number o outputdir out parameters par We must now use the flag as it is important that run 23 seeks the data for restart in out 000023 and not out 92 Chapter 15 Execution flags CHAPTER SIXTEEN VTK COMPATIBILITY The Visualization Toolkit VTK is an open source freely available software system for 3D computer graphics image processing and visualization This format is very powerful because there are a lot of routines developed for it The format used in FARGO3D is the Legacy VTK format We have tested the outputs with Visit The output rules in the section Outputs do not apply for vtk The main structure of a FARGO3D VTK file is e A header e Coordinates e Data The VTK file is a mix between an ASCII file and a binary file The header is written in ASCII format while the domains and fields are written in binary format This format is compatible with the m merge and with the S s restart execution flags The fastest index is not i x but now is always j y On the other hand the structure of a loop for reading a vtk file is coordinate dependent In Cartesian amp Cylindrical coordinates the order is loop over k loop over i loop over j while in Spherical coordinates the order is loop over i loop over k loop over j This format is useful when you are working with Visit
54. appended CompPresAd was skipped compute_slopes was skipped compute_star was skipped compute_emf was skipped update_magnetic was skipped substepl_x 16 8 1 appended substepl_y 32 4 1 appended substepl_z was skipped substep2_a 64 8 1 appended 95 FARGO3D User Guide Release 1 1 and a file called fargo blocks inside setups fargo is created and is filled with this information which represents the best block size for each kernel All the functions skipped were skipped because they are not used in this particular setup It generally takes a few minutes At the end you have a blocks file similar to CompPresIso 64 8 1 substepl_x 16 8 1 substepl_y 32 4 1 substep2_a 64 8 J Now each time you compile the code this file is taken by the c2cuda py script In the best cases you can increase the performance in a 10 20 In 3D massive MHD problems you will have a maximum gain Note The blocks file could be saved for the future if you want to save time In theory the blocks file is hardware dependent Be careful if you share the same file on multiple platforms 17 1 MPICUDA The considerations about GPU Direct and improvement of MPI communications between GPUs have been ex posed in section CUDA aware MPI implementations 96 Chapter 17 Improving CUDA Performance CHAPTER EIGHTEEN GPU VS CPU BENCHMARKING 18 1 Some acceleration factors At the present time FARGO3D has run on a limited number of platforms so
55. as developed from scratch allowing a much more versatile code Here is a summary of the main features of FARGO3D Eulerian mesh code Multidimensional 1D 2D amp 3D Several geometries Cartesian cylindrical and spherical Non inertial reference frames including shearing box for Cartesian setups Adiabatic or Isothermal Equation of State EOS It is easy to implement another EOS Designed mainly for disks but works well for general problems Solves the equations of hydrodynamics continuity Navier Stokes and energy and magnetohydrodynamics MHD Includes ideal MHD Method Of Characteristics amp Constrained Transport Includes magnetic field diffusion resistivity module Includes the full viscous stress tensor in the three geometries Simple N body integrator for embedded planets FARGO algorithm implemented in Cartesian cylindrical and spherical coordinates The FARGO or orbital advection scheme is also implemented for MHD Possible run time visualization Multi platform Sequential Mode one process on a CPU Parallel Mode for clusters of CPU distributed memory with MPI One GPU CUDA without MPI Parallel GPU Mode for clusters of GPUs mixed MPI CUDA version Another important feature of FARGO3D is to provide a coherent and simple framework to develop new routines We have developed a coding style which allows one to develop exclusively in C as in the previous FARGO code so th
56. asdens10 dat binary array nx ny format lf with image notitle and you should see an image similar to 0 1 0 01 0 001 0 0001 le 05 50 100 250 300 350 0 150 200 120 100 80 60 0 Figure 2 1 Gnuplot image of the first run gas surface density output number 10 2 3 2 GDL IDL GDL GNU Data Language http gnudatalanguage sourceforge net is an open source package similar to IDL but it is free and has similar functions The command line should be similar to gdl GDL GNU Data Language Version 0 9 2 GDL gt openr 10 gasdens10 dat GDL gt nx 384 GDL gt ny 128 GDL gt rho dblarr nx ny GDL gt readu 10 rho GDL gt rho rebin rho 2 nx 2 ny 2 3 First run 7 FARGO3D User Guide Release 1 1 GDL gt size size rho GDL gt window xsize size 1 ysize size 2 GDL gt tvscl alog10 rho and you should see an image similar to Figure 2 2 GDL image of the first run gas surface density output number 10 2 3 3 Python Python http www python org is one of the most promising tool for data analysis in the scientific community The main advantages of Python are the simplicity of the language and the power of its many libraries Data reduction of FARGO3D data is straightforward with the numpy package http www numpy org Also making plots is extremely easy with the help of the matplotlib package http matplotlib org We strongly recommend to use the
57. at the user does not have to learn CUDA a kind of GPU programming language Automatic conversion of the C code to CUDA code is performed at build time by a Python script Memory transactions between CPU and GPU are dealt with automatically in the most efficient manner so that the user never has to worry about these details For this reason the built process of FARGO3D is supported by a lot of scripting that does all the hard work There are scripts for developing new GPU functions kernels new boundary conditions and for adding new parameters within the code Two important warnings FARGO3D User Guide Release 1 1 e FARGO3D is based on a finite difference scheme It does enforce mass and momentum conservation to machine accuracy but does not enforce the conservation of total energy e FARGO3D always assumes the x direction as periodic In cylindrical and spherical coordinates x corre sponds to the azimuthal angle We might in the future develop a more general solution We are working on a paper about FARGO3D Should you publish results obtained with the code a citation to that forthcoming paper would be appreciated It is the way to support our work We will be happy if the code is useful to you The reference should be Benitez Llambay et al 2014 in prep If you have questions or comments want to report a bug or suggest improvements please send an email We have created a discussion group https groups google com d fo
58. available V The same as S but takes a dat merged file and makes VTK files Useful if you want to convert some dat files into VTK files B 90 Chapter 15 Execution flags FARGO3D User Guide Release 1 1 The same as S but takes a VTK merged file and makes dat files Useful if you want to convert some vtk files into dat files t Timer Very useful to follow the execution of a run and get an estimate of the time remaining to complete If you want more detailed information you can compile the code with the PROFILE 1 option shortcut make prof and it will provide detailed information for each block of the time step The t flag is not required in this case f Execute one elementary time step and exits Useful for some automatic benchmarking such as optimal block size determination and also indirectly useful to merge the output of several processes see Tips Tricks Todos and Troubleshooting 0 Set the initial arrays either with condinit fora fresh start or reading an output if it is a restart writes them to the disk and exits May be useful to merge prior outputs if FARGO3D has been run without the flag m C Force execution of all functions on CPU for a GPU built Obviously this flag does nothing if the executable was built for the CPU p Instruct the code to execute a post restart hook upon loading the files from a previous output This flag is not used
59. ave same initial seed Now if you type make all the code will be rebuilt Note we have added the INPUT OUTPUT directives INPUT OUTPUT are useful macrocommands that synchro nizes the host and device memory if it is needed see GPU CPU communications For example if you run this setup on the CPU all the data is all the time on the CPU but instead if you run this setup on the GPU before the execution of Explode it is possible that the Field Energy is not more fresh on the CPU because all the calculation was done on the GPU We could develop this routine as a kernel routine with some suitable structure for c2cuda py but in practice this routine only works with a few threads all the threads inside a small circle and the remaining threads will stay idle So we can pay the cost of a communication without committing a big mistake and running the routine on the CPU Besides and more importantly it does not run within the hydro MHD loop but once in a while Anyway you can try a better implementation if you wish but it is not very important here Next we develop a massively parallel function for the cooling damping of internal energy 19 3 1 Damping function In this section we will develop a routine to cool the fluid with a simple minded exponential damping law for the internal energy in order to simulate some kind of energy extraction from the system The law we want to apply is dielt ae t e t C exp at
60. be closed after the close of the outermost loop Remember that the content of the main loop block defined by lt gt lt gt nested within the MAIN_LOOP block is parsed textually so you cannot use global variables inside it or the generated CUDA code will not work 14 2 How the script works 85 FARGO3D User Guide Release 1 1 14 2 10 LOOP CONTAIN Identificator lt gt lt gt Parsed as Textually Location After the innermost loop or where you want to have a textual parsing inside the main_loop If you put the beginning of this block not exactly after the innermost loop you can make some interesting things With this technique you can skip some lines devoted to the CPU You can also achieve this with the _GPU flag declared at the beginning as shown earlier 14 2 11 LAST BLOCK Identificator lt LAST_BLOCK gt lt LAST_BLOCK gt Parsed as Textually Location The last block in the routine This block will be executed after the kernel execution Useful for reductions for instance 14 3 Common errors This section describes the most common errors at compilation time using the parser e The name of a block has white spaces all blocks declarations must be closed and after that white spaces are not allowed lt BLOCK gt __ gt wrong lt BLOCK gt gt ok e The pointer type is not a type For the parser the type of a variable is a string without spa
61. bin directory and remove manually the file rescale c leftover by the script Otherwise for dependency reasons the makefile will not remake it automatically at the next build 9 4 Specifying the scaling rules 49 FARGO3D User Guide Release 1 1 50 Chapter 9 Units CHAPTER TEN DEFINING A NEW SETUP This section is a small tutorial on how to define a setup from scratch In this tutorial we will implement a hydro dynamics setup The setup we will define is blob that comes implemented in the public version of FARGO3D 10 1 Blob test This is a 2D test characterized by a uniform fluid with a denser fluid disk embedded The system is force balanced ie in pressure equilibrium if there is no velocity between the two fluids The disk is moving at a supersonic speed In order to compare our results we will develop the same parameters as http www astrosim net code doku php id home codetest hydrotest wengen wengen3 Parameters of the test e Mach number 2 7 e Density jump 10 e Pressure equilibria e gamma 5 3 We will do the test in the XY plane and we will implement periodic boundaries in X and reflexive boundaries in Y We will do a second test with free outflow boundaries in Y In order to implement this setup we need 1 A setup name In this case will be myblob 2 Therefore a directory inside setups called myblob 3 the bound file in that directory called myblob bound 4 the par file i
62. ces A pointer variables must be declared as the pointer type real rho gt wrong realx rho gt ok the type is realx e Some block was not properly closed if you have not closed a block undefined behavior may result lt BLOCK gt lt BLOCK gt gt wrong lt BLOCK gt lt BLOCK gt gt wrong lt BLOCK gt lt BLOCK gt gt ok 86 Chapter 14 Cuda translator C2CUDA py FARGO3D User Guide Release 1 1 e The end of the name function is not _cpu The parser cannot find the function name if it does not have a valid name neither can it invent a rule to make the wrapper and kernel names e The order of the arguments in the function Remember pointers to Field structures are at the end e Only one variable per line may appear in the INTERNAL and EXTERNAL fields e Files which have been edited on a Windows machine and in which lines end with rn instead of end ing with n will fail to be converted to CUDA Use a conversion procedure such as tr d d lt original_file c gt correct_line_ending c prior to the build e Values relative to the mesh such as zmin or xmed etc should be lower case and should be fol lowed by parentheses not square brackets because they are considered as macrocommands You can use substep1_x c as a template for that This list may be completed as we receive users feedback 14 3 Common errors 87 FARGO3D User Guide Release 1 1 88
63. coordinates Let us try and display one of these files with Python We start ipython directly from the output directory ipython pylab In 1 n 10 assume you have reached 10 outputs Your mileage may vary In 2 ny 320 Radial resolution Adapt to your needs if you altered the par file In 3 m fromfile reynolds_1d_Y _raw dat dtype f4 nx 10x ny reshape n 10 ny In 4 imshow m aspect auto origin lower In 5 colorbar Note A few comments about these instructions In the third line we read the binary file reynolds_1d_Y_raw dat and specify explicitly with the dtype keyword that we are reading single precision floating point data fromfile otherwise expects to read double precision data The trailing n 10 ny truncates the long 1D array of floating point values thus read up to the row value number n 10 10 because this is the value of NINTERM This 1D array is finally reshaped into a 2D array plotted on the following line You should see a figure such as On this plot the x direction represents the radius whereas in the y direction we pile up the radial profiles that have been dumped every DT Therefore the y direction represents the time If we remember that the file name has radix reynolds we are obviously looking at some quantity related to the Reynold s stress tensor and we see how turbulence develops in the inner regions and progresses toward larger radii as time goes on But w
64. ction lt FLAGS gt Your preprocessor variables here with a heading sign __GPU and __NOPROTO must be defined as in the example lt FLAGS gt lt INCLUDES gt your includes here fargo3d h must be in the list lt INCLUDES gt function_name_cpu arguments lt the name MUST end in _cpu lt USER_DEF INED gt Some general instructions lt USER_DEF INED gt lt EXTERNAL gt type internal_name external_variable typex internal_pointer external_pointer_cpu lt the name MUST end in _cpu lt EXTERNAL gt lt INTERNAL gt type internal_namel initialization type internal_name2 lt INTERNAL gt ti lt CONSTANT gt type internal_namel 1 Note these lines begin with a type array_name2 size lt CONSTANT gt lt MAIN_LOOP gt 80 Chapter 14 Cuda translator C2CUDA py FARGO3D User Guide Release 1 1 ifdef Z for k 0 k lt size_z k endif ifdef Y for 3 0 j lt size_y j endif ifdef X for i 0 i lt size_x i endif lt gt Anything you want here lt gt ifdef X endif ifdef Y endif ifdef Z endif lt MAIN_LOOP gt lt LAST_BLOCK gt Some final instruction s lt LAST_BLOCK gt Below there is an explanation of each field an how to use it You
65. ction depends on the planet The call of the InitFunctionMonitoring therefore associates a variable such as MASS or MAXWELL to a given function It specifies the radix of the file name to be used and gives further details about how to evaluate the monitored value loop on the planets integration versus averaging etc We may now check that requesting MOM_X in any of the six variables of the opt file does indeed allow a monitoring of the angular momentum We see in InitMonitoring that the MOM_X variable is associated to mon_momx The latter is defined in the file mon_momx c where we can see that in cylindrical or spherical coordinates the quantity evaluated is the linear azimuthal velocity in a non rotating frame multiplied by the cylindrical radius and by the density 12 7 6 Implementing custom monitoring a primer In the distribution we adhere to the convention that monitoring functions are defined in files that begin with mon_ If you define your own monitoring function and you want it to run indistinctly on the CPU or on the GPU you want to define in the mon_foo c file the function void mon_foo_cpu then in global h you define a function pointer void mon_foo In change_arch c this pointer points either to the CPU function mon_foo mon_foo_cpu or to the GPU function mon_foo mon_foo_gpu depending on whether you want the monitoring to run on the CPU or the GPU Finally the syntax o
66. d storing each one in an organized manner is achieved through a number of different subdirectories In this section we give a brief explanation about each one 3 1 Directories 3 1 1 src The directory where the main sources are stored is the src directory Inside it you will find all the files related to the initialization evolution and visualization of the data All the sources are pure C files plas some headers and a very few CUDA files Also in this folder you can find the main makefile src makefile This makefile should never be called directly All the fundamental changes to the code must to be done here 3 1 2 scripts In the scripts directory you will find the fundamental python scripts used at compilation time There are scripts for e Defining variables 13 FARGO3D User Guide Release 1 1 e Analyzing boundary conditions e Analyzing and defining units e Generating CUDA files automatically e Improving CUDA blocks e Accelerating the make process executing the makefile in parallel e Making general tests mandatory for new improvements 3 1 3 bin This directory is empty by default but during compilation this is where all the object files and script residual data are stored In practice this directory is useful for avoiding a mixing between sources and objects a very useful behavior when your are developing new routines 3 1 4 outputs The outputs directory is the default output directory
67. d things up during your first try It is a good idea also to tune the CUDA block size prior to running the setup you may skip this part if you wish Execution will be 10 20 faster make blocks setup mri Note that everything is lower case in the line above It will take a few minutes to complete Upon completion issue 7 4 MRI 39 FARGO3D User Guide Release 1 1 make clean make SETUP mri GPU 1 and the code is built using the block size information previously determined or using a default block size archi tecture independent if you skipped the action above We now start the run fargo3d mt in mri par The t option above activates a timer that will give you an idea of the time it takes to complete a run You can see that there are several files in the output directory presumably out put s mri if you have not changed this value of the variable OUTPUTDIR in the parameter file called respectively e reynolds_1d_Y_raw dat e maxwell_1d Y raw dat e mass_1d Y raw dat that grow in size progressively every time a carriage return is issued after a line of dots This kind of file is presented in detail in the section Monitoring later on in this manual For the time being it suffices to know that these are raw binary 2D files to which a new row is added every DT fine grain monitoring This row contains radial information as indicated by the _Y_ component of the file name Y is the radius in cylindrical
68. d with the formula he 0 if r r Hiu lt 0 5 he 1 ifr rmiu gt 1 0 he sin r r rn 1 2 otherwise and the force is cut off prior to the torque calculation see src compute_force c Fout off Fx he Note This parameter needs the make option called HILLCUT to be activated in the opt file it is because this cut is somehow expensive on the gpu This is achieved by adding this line to the setups fargo fargo opt file FARGO_OPT DHILLCUT IndirectTerm boolean Selects if the calculation of the potential indirect term that arises from the primary acceleration due to the planets and disk s gravity is performed In the fargo setup the reference frame is always on the central star you can see src potential c itis not difficult to change this For this reason this parameter should normally be set to yes Frame string Sets the reference frame behavior F Fixed C Corotating and G Guiding center it is case insensitive When it is set to F the frame rotates at a constant angular speed specified by OmegaFrame When it is set to Corotating the frame corotates with planet number 0 If this planet migrates or has an eccentric orbit the frame angular speed is not constant in time When it is set to Guiding Center the frame corotates with the guiding center of planet 0 The frame angular speed therefore varies with time if planet 0 migrates and it does so in a smoother manner than in the Corotating case Ome
69. dependent explosions 103 FARGO3D User Guide Release 1 1 MAINOBJ edamp o GPUOBJ edamp_gpu o Before the compilation do not forget to add in the parameter file setups explosions explosions par the variable called EDAMP Its first value could be 0 1 Finally we must add the execution line In algogas c before the invocation substep3 we can add an invoca tion to our newly created function Edamp_cpu dt The code should compile in both CPU sequential amp MPI and GPU platforms including a cluster of GPU s If you compile and run the code in GPU mode you will see some lines similar to OUTPUTS 0 at Physical Time t 0 000000 OK TotalMass 2 0000000000 MS All the symbols mean that at every time step you have volumic communications host lt gt device see sections GPU CPU communications and CUDA aware MPI implementations This is very expensive This is because we force the invocation of the CPU version of our new function because we call Edamp_cpu dt which triggers device to host and host to device communications by means of the INPUT OUTPUT directives If you want to avoid this we must call the automatically generated CUDA kernel Until now we are running the CPU version of our kernel 19 4 Incorporating our kernel Now we will incorporate our new kernel into all the GPU machinery inside FARGO3D The set of rules here is very general and in theory if you follow them yo
70. dt e It now executes Edamp_gpu dt note that automatic GPU CPU communication is dealt with thanks to the INPUT OUTPUT directives as explained in GPU CPU communications e It dumps all arrays this time in output number 998 These arrays should be the same as those dumped in 999 if the CPU and GPU calculations yield indistinguishable results e It performs a comparison of the arrays with the secondary checkpoint created previously If any difference is found a message is printed on the terminal Some arrays are skipped from the comparison because they are used as temporary work arrays and may be different on the CPU and GPU without any impact on the calculation Here we see that all arrays are the same GPU and CPU yield indistinguishable results e In some other cases we may have differences to the machine accuracy The example below shows the output when wrapping Substep1_x in FARGO_DEBUG Fields Vx_temp differ Minimum on GPU 5 2721583979362551e 22 Minimum on CPU 0 Maximum on GPU 0 Maximum on CPU 0 19 5 Using FARGO_DEBUG 107 FARGO3D User Guide Release 1 1 Minimum of GPU CPU 1 1 Maximum of GPU CPU 1 1 Minimum of GPU CPU 0 Maximum of GPU CPU 5 27216e 22 Minimum of GPU CPU max abs CPU 0 Maximum of GPU CPU max abs CPU 1 KKKKKKKKKK We show hereafter how the macrocommand is expanded at build time SaveState printf Executing s_cpu s n Edamp dt Edamp_cpu dt
71. e flaring of the disk If it is null the aspect ratio of the disk is constant ie the disk height scales linearly with r The dependence of the the aspect ratio with the flaring index is FlaringIndex j FlaringIndex AspectRatio peetRatio H h r ho F Ro e PlanetConfig string The name the planetary file to be used The path is relative to the location at which you launch the code e ThicknessSmoothing real Potential smoothing length for all the planets The use of this parameter is mutually exclusive with the use of RocheSmoothing The smoothing length s of the potential is ThicknessSmoothing x H FlaringIndex x r x ThicknessSmoothing AspectRatio x a Gm e RocheSmoothing real Potential smoothing length for all the planets The use of this parameter is mutu ally exclusive with the use of ThicknessSmoothing The smoothing length of the potential over the mesh is RocheSmoothing x Rp there Rp is the Hill radius of the current planet The potential of a planet of mass mp has the form where r is here the distance to the planet 36 Chapter 7 Default SETUPS FARGO3D User Guide Release 1 1 a 1 3 RocheS thi P Ss ochesmoothing x r x 57 e Eccentricity real The initial eccentricity of all the planets ExcludeHill boolean When this parameter is set to YES a cut off is introduced when the force is com puted The cut off is calculate
72. e several reasons for this Your setup is very small and your GPU s is are underused You have default block sizes in your opt file that are far from optimal yielding a degradation of perfor mance You have a high end CPU and a low end GPU You may not have tuned sufficiently the GPU build options in src makefile Make sure to set them to your target GPU architecture You have a 2D YZ setup see the note below about reductions 110 Chapier 20 Tips Tricks Todos and Troubleshooting FARGO3D User Guide Release 1 1 Warning Reductions are operations on a whole mesh that amount to obtaining a single number as a function of all cells content hence the name reduction It may be the sum of the mass or momentum content of all cells or the minimum of all time steps allowed on all cells as a result of the CFL criterion the reduction operation can be a sum a min etc We see that there is at least one unavoidable reduction operation per time step the search for the maximum time step allowed Reduction operations are performed in a two stage process e A reduction in the X direction at the end of which we obtain a 2D array in YZ This reduction is performed on board the GPU using the algorithm described in this pdf document This corresponds to one of the few kernels that are written explicitly in FARGO3D instead of being produced by the Python script The user should never have to interfere with it A furthe
73. eEnergy WriteVx WriteVy WriteVz WriteBx WriteBy WriteBz These allows the user either to oversample one of these fields for an animation for instante or to dump to the disk only some variables by setting NINTERM to a very large value and using NSNAP instead Files created through the NSNAP mechanism obey the same numbering convention as those normally written In order to avoid conflicts in filenames the files created through NSNAP are written in the subdirectory snaps in the output directory Besides the runtime graphical representation with matplotlib is performed using the files created in the snaps directory 12 7 Monitoring 12 7 1 Introduction FARGO3D much as its predecessor FARGO has two kinds of outputs coarse grain outputs in which the data cubes of primitive variables are dumped to the disk and fine grain outputs in which a variety of other usually lightweight data is written to the disk As their names indicate fine grain outputs are more frequent than coarse grain outputs Note that a coarse grain output is required to restart a run In this manual we refer to the fine grain output as monitoring The time interval between two fine grain outputs is given by the real parameter DT This time interval is sliced in as many smaller intervals as required to fulfill the Courant or CFL condition Note that the last sub interval may be smaller than what is allowed by the CFL condition so that the time difference betwe
74. ease 1 1 e ymax Max value for the colorbar Only if Autocolor is No e PlotLine Allows to make an arbitrary plot when your simulation is 3D The field is stored in a 3D numpy array called field For example if you want to plot the 2D Z sum projection you should do something similar to PlotLine np sum field axis 0 Also you can make a z slice doing something similar to PlotLine field k where k is an integer with 0 lt k lt NZ 11 2 Backends Matplotlib uses the concept of backend to refer to some specific set of widgets used for rendering the plot eg qt tkinter wx etc This is very important for us because not all the backends at least for now are compatible with interactive non blocking plots If the main widget that appears after the execution of FARGO3D does not work for you you can try with another backend More details can be found in the matplotlib official documentation There is a file in the main directory of FARGO3D called matplotibre that contains all the useful configurations related with matplotlib that will be used at run time This file is matplotlib standard and is version dependent If you want to modify the aspect of the widget or change the backend it is a good idea to start modifying this file We work on a daily basis with the backend TKAgg If you want to use the default matplotlibrc configured in your environment instead of the one we provide it should be enough to rename the matplotlibrc
75. ecifying the D flag on the command line This is obviously not possible here as all processes within the same node would run on the same device Instead each process will have to select at run time in an automatic manner the correct GPU through a directive of the kind cudaSetDevice int device_number where device_number must be evaluated depending on the process rank Assume that your cluster has a topology similar to this one Despite the fact that four processes could fit on each node for non GPU runs here we must limit ourselves to two processes only per node otherwise several processes will use the same GPU leading to a degradation of performance Depending on your MPI implementation the rank ordering of processes could then be as follows or the processes could be distributed in a different manner as shown below The strategy to calculate the device number would be different in these two cases In the first case we should have a rule like this one device_number CPU_Rank number_of_processes_per_node where the operator represents the modulo operation in C On the contrary in the second case we would need a rule like this one device_number CPU_Rank number_of_nodes where the division is an integer Euclidean division Naturally in the first case the number_of_processes_per_node is also the number of GPUs per node We can check that it yields the desired correspondence Process GPU
76. ect to the centered case Our boundary condition is therefore coded as ANTISYMMETRIC Staggered value 0 value If the pipe symbol could be thought of as representing the zones interface for the centered case it is no longer the case in the staggered case In any case we will always have two cells in the BC code for a centered quantity and three cells in the BC code for a staggered quantity regardless of the number of rows in the ghosts The matching between an active cell and its corresponding ghost zone is as shown on the figure at the beginning of this section We sum up this information the file boundaries txt contains a two level description that obeys the general following rule e LEVEL gt the name of the boundary It is an arbitrary name defined by you e Level2 gt the word Centered or Staggered followed by some data e data gt Isome_text_1lsome_text_2l or lsome_text_1lsome_text2lsome_text3 The rules in the ghost zone are as follows 1 Any string used in the active zone that matches some part of the ghost zone string will be replaced by the active cell value 69 2 Any string inside will be textually parsed and converted to upper case They are useful to match the parameters of the parameter files which are upper case C variables 3 Any string that does not match rule 1 nor rule 2 will be textually parsed and converted to lower case Additionally there
77. ed in a matplotlib window This field is selected by the variable Field of the parameter file 7 4 MRI The setup mri lower case corresponds to a MHD turbulent unstratified disk on a cylindrical mesh periodic in Z much similar to the setup of Papaloizou and Nelson 2003 MNRAS 339 983 The data provided in this public 38 Chapter 7 Default SETUPS FARGO3D User Guide Release 1 1 release have same coverage and resolution as the data by Baruteau et al 2011 A amp A 533 84 We present hereafter in some detail the make options and the parameter file and we provide a hands on tutorial on reducing data from this setup 7 4 1 Make options The file setups mri mri opt shows that the following options are defined at build time e FLOAT X Y Z MHD e ISOTHERMAL e CYLINDRICAL e POTENTIAL e VANLEER The FLOAT options runs everything that is related to the gas in single precision should we have planets their data would remain in double precision This speeds up by a factor 2x the simulation both on CPUs and GPUs The other flags have already been explained in the previous setups We note that here counter to what was set in otvortex we do not request the STANDARD flag for orbital advection Therefore by default the scheme will use the fast orbital advection aka FARGO described for hydrodynamics by Masset 2000 A amp ASS 141 165 and for the EMFs by Stone amp Gardiner 2010 ApJS 189 142 7 4 2 Parameter file Th
78. ed k zr lt sphere_radius e l 5 0 if Ymed 3 yr0 Ymed 3 yr0 Zmed k zr x Zmed k zr lt sphere_radius e 1 5 0 if Ymed j yr Ymed j yr Zmed k zr0 x Zmed k zr0 lt sphere_radius e l 5 0 if Ymed 3 yr0 x Ymed j yr0 Zmed k zr0 x Zmed k zr0 lt sphere_radius e l 5 0 Note the four tests account for the periodicity of the mesh as they take into account the main explosion and its replica in the neighboring domains 100 Chapter 19 Developing a complex setup FARGO3D User Guide Release 1 1 19 3 Time dependent explosions Now we want to add explosions at random times in our simulation This should be done at the end of each DT Remember that DT is a parameter chosen by the user in the parameter file that sets the fine grain temporal resolution of the code It is not the time step imposed by the CFL condition which is usually which should be at least much shorter than DT The loop of time steps of total length DT is performed in AlgoGas in the file src algogas c AlgoGas is invoked in src main c so we need to call our explosions just there after the execution of AlgoGas You can see that AlgoGas is invoked with the pointer PhysicalTime so that the value of this variable can be modified from within the routine Here we want an explosion rate that is constant in time so that the date does not matter to determine whether an explosion
79. en two fine grain outputs is exactly DT As for its predecessor FARGO NINTERM fine grain outputs are performed for each coarse grain output where NINTERM is an integer parameter Fine grain outputs or monitoring may be used to get the torque onto a planet with a high temporal resolution or it may be used to get the evolution of Maxwell s or Reynolds stress tensor or it may be used to monitor the total mass momentum or energy of the system as a function of time etc The design of the monitoring functions in FARGO3D is such that a lot of flexibility is offered and the user can in no time write new functions to monitor the data of his choice The monitoring functions provided with the distribution can run on the GPU and it is extremely easy to implement a new monitoring function that will run straightforwardly on the GPU using the functions already provided as templates 12 7 2 Flavors of monitoring The monitoring of a quantity can be done in several flavors e scalar monitoring in which the sum or average of the quantity over the whole computational domain is performed The corresponding output is a unique two column file the left column being the date and the right column being the integrated or averaged scalar A new line is appended to this file at each fine grain output e 1D monitoring either in Y i e radius in cylindrical or spherical coordinates or Z i e colatitude in spherical coordinates In this case the integral or a
80. erent resolutions corroborates this statement See also Monitoring 7 4 MRI 43 FARGO3D User Guide Release 1 1 44 Chapter 7 Default SETUPS CHAPTER EIGHT OPT FILES FARGO3D is a very versatile code It can solve from a very simple sod shock tube test to a tridimensional problem with MHD In order to keep the versatility without a performance penalty we have adopted an extensive use of macrocommand variables This variables allow to activate deactivate a lot of sections of the code depending on what we want to solve For example if we want to solve a 2D planetary disk without MHD the code does not need to know anything about MHD In this case an if run time sentence checking whether we want to use MHD or not would be expensive With the help of this compi ler variables through ifdef statements all the job is done at compilation time Most of these variables are defined in the opt file but there are other ones for example FARGO_DISPLAY that are defined during the Make Process The variables must be defined inside a container variable called FARGO_OPT as follows FARGO_OPT DVARIABLE In this version the list of options modules that can be activated from the opt file is Performance e FLOAT Uses single precision floating point data On GPUs the code runs 2x times faster If FLOAT is not defined the code will be run in double precision Dimensions e X Activates the X directio
81. et the ghost value to twice the active zone s value 2 0x xfoo foo Or if we wish to set it to some predefined value say that you have a supersonic flow of uniform predefined density 0 12 g cm43 entering the mesh and you work in cgs 0 12 whatever in which case the value whatever is never used Naturally it is much better to use a parameter say RHOO that you define in your parameter file RHOO whatever We shall come back to the use of the quotes later on in this section This in a nutshell is how we define boundary conditions in the code Now assume that we want to have an anti symmetric boundary condition on the velocity perpendicular to the boundary The situation is a bit different than the one described above because this field is staggered along the dimension perpendicular to the mesh That is we have Ll LI 5 value edge of active mesh As shown on this diagram the value is defined at the interface and not at the center as previously The triple vertical line delineates the edge of the active mesh What we intend is the following 111 111 value A value 5 6 boundaries txt 27 FARGO3D User Guide Release 1 1 edge of active mesh but this leaves yet unspecified the value at the edge of the active mesh which should be set here to zero it has to be its own negative 111 l 111 l value 0 value Eventually we have to specify one extra value with resp
82. etorri a tld Be ey a tl Geel eee we eS 29 328 ACOMMONMEMOFS es ois aw oe aw Ge eee eta at ee ea be ee Ge a a 30 6 Mesh and Fields 31 Gl Mesh 24 5 ms o ed 4 heehee PPE PERE Mee eS eA AEP RS a 31 O2 A a BR ee a e A Soke ee ae Ae eee amp EE Sok eae we Sa 31 6 3 Workine withifields lt aa rae aie Se a BAG a ghee tee we Bs 32 GA Fields onthe opus 4 ee ba oe eee RP RR OE ERR Pa eee Oe ee eR EES AS 33 6 5 Useful variables lt lt ce eee a ee ee ew ee eG 33 7 Default SETUPS 35 Pe o lt i o oo aes Stee cee a Gace ee A A ee pes RG BUA DOE de RGR Oe ae ae 35 12 Orszds Tang VOMEX orga e ede bop ER ee ee de et Pe Peo 37 dor 4800 Shock mbes 4243 46 246 2 eee eb he ee ROSS oe ee eS e 38 Meee CMRI ce ve tee ero ee gh Se Aye ge etn eae Ges ee ecm aes aoe Fe ee eta Ged ee R E 38 8 opt files 45 10 11 12 13 14 15 16 17 18 19 20 Units Ol Introduction a s e cect ose a A a 9 2 Specifying the unit system e css oras se 9 3 Rescaling the imput parameters osos sd OE oS 94 Specifying the scaling rules a te cms seka Seo soe BHA See ee EA Defining a new SETUP TOT Blobilest sum mois i oe eh ee hee eh be i e e sb 10 2 Imitlalstate orar A ee ee RE eA a EEA ee ee 10 3 Makanethe executable sone scms ae s a Ge Ge a BS 10 4 Plotting your mew setup ss ccoa ce ee ee eee wa ee Run time visualization 111 Howdoesat work Ss saapa ead a Be a AOS He Le ee Se IZ BACKENGS sxs i koa o Duda oe by
83. f CUDA can deal with direct device to device GPU to GPU MPI communications so called GPU Direct We shall not consider the details here but the interested reader could consult the following page Compiling the code with a CUDA aware version of MPI is relatively straightforward but there is one subtlety with which we must deal The problem is the following In order to setup the CUDA aware MPI machinery behind the scene each process must already have chosen its GPU when the code executes MPI_ Init However as we have seen at length above choosing the GPU is done on the basis of the rank But how can a process know its rank even before entering MPI_Init In order to avoid this vicious circle implementations of MPI provide a mechanism that allow to know the rank of the process even before MPI_Init has been invoked This mechanism cannot be a MPIT_Something directive as no MPI directive can be called before MPI_Init Rather it simply consists in reading an environment variable that has a specific name There are two such flavors of variables in each MPI implementation For instance these variables are named OMPI_COMM_WORLD_RANK and OMPI COMM_WORLD_LOCAL_RANK in OpenMPI Each process can therefore get its rank in this manner rank atoi getenv OMPI_COMM_WORLD_RANK or its local rank i e within a given node as follows local_rank atoi getenv OMPI_COMM_WORLD_LOCAL_RANK The value returned will
84. f each one is obvious from the name The most important line for the run time visualization is Py_InitializeEx 1 Upon execution of this function you are able to execute any python command inside FARGO3D A helper function was developed for passing values from FARGO3D to the python interpreter void pyrun const char The pyrun function works identically to printf man 3 printf but pyrun returns a void The main difference between printf and pyrun is that pyrun prints on the python interpreter and the text printed is interpreted as a python command In the background pyrun is only a wrapper to the function PyRun_SimpleString In the basic public implementation of the run time visualization a set of helper parameters have been imple mented These parameters should be included in your par files You can see the standard value of each one in std stdpar par e Field Name of the field you want to plot Eg gasdens e Cmap A matplotlib palette Eg cubehelix related to cmap matplotlib karg e Log If you want to use a logarithmic scale for your color map Values Yes No e Colorbar If you want to see the colorbar Values Yes No e Autocolor If you want a dynamic colorbar between the min and max of the field Values Yes No e Aspect The same as the aspect karg of matplotlib imshow method Values Auto None e ymin Min value for the colorbar Only if Autocolor is No 57 FARGO3D User Guide Rel
85. f the mon_foo c file must obey the syntax described elsewhere in this manual so that its content be properly parsed into a CUDA kernel and its associated wrapper and the object files mon_foo o and mon_foo_gpu o must be added respectively to the variables MAINOBJ and GPU_OBJBLOCKS of the makefile A good starting point to implement your new mon_foo_cpu function is to use mon_dens c as a template Both the _cpu and _gpu functions need to be declared in prototypes h ex void mon_foo_cpu void in the section dedicated to the declaration of CPU prototypes and ex void mon_foo_gpu void in the section dedicated to the declaration of GPU prototypes Be sure that the declaration are not at the same place in the file The second one must be after the ifndef __NOPROTO statement In order to be used you new monitoring function needs to be registered inside the function InitMonitoring of the file monitor c using a syntax as follows InitFunctionMonitoring FOO mon_foo foo TOTAL YCZC INDEP_PLANET or similar as described above In this sentence FOO is an integer power of 2 that must be defined in define h Be sure that it is unique so that it does not interfere with any other predefined monitoring variable You are now able to request a fine grain output of your foo variable using in the opt file expressions such as MONITOR_SCALAR MASS FOO MOM_Z MONITOR_2D BXFLUX FOO 66 Cha
86. file provided with FARGO3D with any arbitrary name 58 Chapter 11 Run time visualization CHAPTER TWELVE OUTPUTS FARGO3D has many different kinds of outputs each one with different information They are e Scalar fields e Domain files e Variables file e Grid files e Legacy file e Planetary files e Monitoring files This section is devoted to a brief explanation of each kind of files 12 1 Scalar fields These files have a dat extension They are unformatted binary files The structure of each file is a sequence of doubles 8 bytes or floats 4 bytes if the FLOAT option was activated at build time see the section opt files The number of bytes stored in a field file is e 8x Ny x Ny x Nz e 4x Nz x Ny x N if the option FLOAT was activated Remember that Nz Ny Nz are the global variables NX NY NZ defined in the par file the size of the mesh For a correct reading of the file you must be careful with the order of the data The figure below shows how the data is stored in each file for 2D simulations but the concept is the same in 3D The fast innermost index is always the x index index i inside the code The next index is the y index j and the last one is the z index k If one direction is not used eg 2D YZ simulation the indices used follow the same rule Note that scalar fields files do not contain information about coordinates It is only a cube of data without any additional info
87. fined You can use the flag D to specify the device number on which each GPU job must be launched This is fine if your cluster has only one GPU per node and you spawn one PE per node A warning message is issued in any case as specifying manually the device number should be reserved for sequential runs You can use the flag D to specify a list of devices on each host This is meant to be used in general with a job scheduler such as PBS see Execution flags Finally when neither D nor D is used the device are selected on the base of the local rank All GPUs on the nodes used by the run should be free when the run is launched otherwise they may get oversubscribed Note The D flag does not work for a build without MP ICUDA non CUDA aware build 13 4 Spawning a job on a cluster of GPUs a primer 75 FARGO3D User Guide Release 1 1 76 Chapter 13 Communications CHAPTER FOURTEEN CUDA TRANSLATOR C2CUDA PY One of the most interesting features of FARGO3D is that it can run on GPUs If you look the src directory there is only very few routines related with CUDA The philosophy of the development of FARGO3D was to avoid the kernel writing process Instead following a set of very simple rules we were able to develop a python script that translates the time consuming C routines into CUDA kernels This section is a brief description of the rules and the process of building CUDA kernels from
88. for all the FARGO3D standard setups As you could see in the first run the data were stored in outputs fargo By default all the data is stored in outputs setup_name where setup_name is one of the setups in the setups directory 3 1 5 setups This directory is where all the different setups are By default the make process looks here if the setup is defined eg S make SETUP otvortex After that the makefile will search inside setups whether there is a directory called setups otvortex Inside the ot vortex directory we find all the files necessary to set up the problem and build the code 3 1 6 planets Inside this directory are the default planetary system files to run a simulation with one or several planets con sidered as point like masses that interact between them and with the disk onto which they act as an external potential The syntax of planetary files is for the former FARGO code It is not mandatory to store your planetary data here but it is recommended 3 1 7 std the name std comes from standard This directory stores all the standard configuration files In this version these are boundaries txt where the boundary conditions are defined boundary_template c A build helper for the boundaries scripting centering txt A file describing the centering of the different fields with respect to the mesh helper for boundary conditions scripting defaultflags A file with all the default flags fo
89. gaFrame real It is the angular velocity of the reference frame It has sense only if the parameter Frame is equal to F Fixed 7 1 3 boundaries Because this problem is 2D in XY only boundary conditions in Y are applied The boundary conditions are an extrapolation of the Keplerian profile for the azimuthal velocity the density is also extrapolated using its initial power law profile and an antisymmetric boundary condition on the radial velocity is applied If STOCKHOLM is activated in the opt file the wave killing recipe of De Val Borro 2006 is used to damp disturbances near the mesh radial boundaries 7 2 Orszag Tang Vortex This setup corresponds to the well known 2D periodic MHD setup of Orszag and Tang widely used to assess the properties of MHD solvers We briefly go through the make options of the opt file and through the parameter file 7 2 1 Make options Here are the options activated in the opt file 7 2 Orszag Tang Vortex 37 FARGO3D User Guide Release 1 1 X Y Z MHD STRICTSYM ADIABATIC CARTESIAN STANDARD VANLEER The first four lines are self explanatory Note Even though the Orszag Tang setup is a 2D setup every time the MHD is included all three dimensions should be defined This is why we define here X Y and Z The flag STRICTSYM on the fifth line is meant to enforce a strict central symmetry of the scheme Usually after some time the amount of time depends on the
90. h the m flag Is there a way to merge the outputs 47 47 47 48 48 51 51 54 56 57 57 58 59 60 61 61 61 63 63 67 67 70 70 75 77 77 86 89 93 95 96 97 97 97 99 99 99 101 104 106 108 20 5 My build produces unexpected results Some files should have been remade and they have not HoWwdo TOX this cis gh beeen bee ea a ee be eo a be ed 110 20 6 My code does not run much faster on the GPU than on the CPU Why is this 110 20 7 What is this E sign at the beginning of the outputs path ee 111 20 8 I see that there are par files in each setup directory and the same par files are found in the in subdirectory Whatisthis TOF oia ee bos oak OS SR A OR ee oe eG es be es 111 20 9 Ihave noticed a directory named fargo3drc in my home directory What is it here for 112 20 10 How can I see the output of python scripts in particular the CUDA files 112 20 lA very incomplete TODO St Las eR A eet Oh ORS Eda a amp PACED da 113 FARGO3D User Guide Release 1 1 Contents CONTENTS 1 FARGO3D User Guide Release 1 1 2 CONTENTS CHAPTER ONE INTRODUCTION FARGO3D is a hydrodynamics and magnetohydrodynamics parallel code It is the successor of the public FARGO code http fargo in2p3 fr It conserves the main features of FARGO but includes a lot of new concepts Al though FARGO3D was started inspired on FARGO it w
91. hat is exactly the quantity that we plot It is R r Ar rd dzpv vo 0g Ar o z That is to say it is the sum in z and in of the quantity pur vg Ug X cell volume This quantity is evaluated in src mon_reynolds c and it is subsequently passed to the systematic machinery of Monitoring Each dot stands for an elementary HD or MHD time step The number of dots on a line which has length DT is given by the CFL condition 40 Chapter 7 Default SETUPS FARGO3D User Guide Release 1 1 1400 1200 1000 800 600 400 200 0 50 100 150 200 250 300 Figure 7 1 Figure obtained with the above Python instructions here with n 150 ie upon run completion In the same vein we can plot the quantities found respectively in maxwell_1d_Y raw dat and in mass_Id_Y_raw dat There are the vertical and azimuthal sum on all cells of the following quantities B Bg Ho x cell volume and p x cell volume We see that the value of a can therefore be obtained as follows r arange ny 5 ny 7 1 cs2 0 01 r cs2array cs2 repeat 10 n reshape ny 10xn transpose reyn fromfile reynolds_ld_Y_raw dat dtype f4 n 10 ny reshape n 10 ny maxwell _1ld_Y_raw dat dtype f4 nx10 ny reshape n 10 ny mass fromfile mass_1d_Y_raw dat dtype f4 nx 10xny reshape nx10 ny alpha_maxwell maxw mass cs2array alpha_reynolds reyn mass cs2array alpha a
92. hey bbe Bee Bake a de A Outputs 125 Scalar ields sere baa Be eee oR ee A AA A e ee A e 12 2 Domain files iu a a ws 12 3 Variables 2 22544 5 640644 2 44445620 i5 2 PGA wow Se EG 124 Gndfiles os aeria a a bees ieee bh eee hb bbs bbe wed ea 12 5 Planet files o cias o eee oO Soho we ee aa CA 12 6 Datacubes sos eces rra Gd De eae AG AR EHS CS RS 12 7 Montonng o oo ba eee be eee AE Oe ew a Communications 13 3 MPIECUDA 3 2 keok ee hPa ee ESE EA eRe we ae Oe 13 4 Spawning a job on a cluster of GPUs a primer Cuda translator C2CUDA py 14 1 FARGO3D Mesh Functions et es 14 2 How the SCTIPEWOTKS 2 2 40 44 8b 6445 SERA ee RA ee a a 14 3 Commomerrors lt 6 sk bk ee Sh a a Execution flags VTK Compatibility Improving CUDA Performance 17k AMPICUDA am e A e a IR E a GPU vs CPU Benchmarking 18 1 Some acceleration factors aimara ae 18 2 The FARGO_SPEEDUP macrocommand Developing a complex setup LOT Setup lolder eu lo a a pb de da dd bool de ek 19 2 Initial Condition o da es eoe e a REA a ee eed 19 3 Time dependent explosions oie ea A Boe ee dew SS 19 4 Incorporating our kernel espia Sa Bb Eee ee wa ee 19 5 Using FARGO DEBUG 2 s gogio 2 ahi 4d SA eR ae ee od TOG SUMMA e ea a oS e ees a EA a ek Ae eee ee A Tips Tricks Todos and Troubleshooting 20 1 How do I add a new parametertoasetup a 20 2 I forgot to run the code wit
93. hifts It is a function if 4 k e ixm i index of the left neighbor in x of the current cell taking periodicity into account e ixp i index of the right neighbor in x of the current cell taking periodicity into account Coordinates e XC center of the current cell in X It is a function of the indices must to be used inside a loop YC center of the current cell in Y It is a function of the indices must to be used inside a loop ZC center of the current cell in Z It is a function of the indices must to be used inside a loop xmin i The lower x bound of a cell xmed i The x center of a cell same as XC but can be used outside a loop ymin 3 The lower y bound of a cell ymed j The y center of a cell same as YC but can be used outside a loop 6 4 Fields on the gpu 33 FARGO3D User Guide Release 1 1 e zmin k The lower z bound of a cell e zmed k The z center of a cell same as ZC but can be used outside a loop Length zone_size_x zone_size_y zone_size_z edge_size_x edge_size_y edge_size_z edge_size_x_middl edge_size_x_middl L L Face to face distance in the x direction Face to face distance in the y direction Face to face distance in the z direction The same as zone_size_x but measured on the lower x border The same as zone_size_y but measured on the lower y border The same as zone_size_z but measured
94. ial Condition Because we want explosions at random times this kind of setup cannot be entirely dealt with in condinit c We will develop in the next section an additional routine called from the main loop to handle explosions In the initial conditions we define a first explosion at a random position The routine that we will create in the next 99 FARGO3D User Guide Release 1 1 section will resemble this one which we write in the file setups explosions condinit c It should be similar to include fargo3d h void CondInit ENE cap eee real yr zr yr0O zr0 int yhy Zi real vx Vx gt field_cpu real vy Vy gt field_cpu real vz Vz gt field_cpu real bx Bx gt field_cpu realx by By gt field_cpu real bz Bz gt field_cpu real rho Density gt field_cpu real Energy gt field_cpu real sphere_radius 0 05 yr YMIN drand48 YMAX YMIN zr ZMIN drand48 ZMAX ZMIN yro yr Zr0 zr if yr gt YMIN YMAX YMIN 2 0 yr YMAX YMIN else yr YMAX YMIN if zr gt ZMIN ZMAX ZMIN 2 0 zr ZMAX ZMIN else zr ZMAX ZMIN sphere_radius sphere_radius we take the squar for k 0 k lt Nz 2 NGHZ k for 3 0 J lt Ny 2x NGHY j for i 0 i lt Nx i rho 1 e 1 1 0 vx 1 vy 1 vz 1 0 0 bx 1 by 1 0 0 bz 1 0 1 if Ymed 3 yr Ymed 3 yr Zmed k zr x Z2m
95. ile Therefore these setup or master par files must include all variables that can ever be used by the setup and they must specify default fiducial values for all parameters Although the user may run the code by using these master parameter files e g fargo3d m setups fargo fargo par this is not customary and any other parameter file with different parameter values than those specified in the master parfile may be used fargo3d m in fargo par You may have a large set of parameters files with different values and you can run the code on them without any rebuild Note If a parameter is not defined in a parameter file the code will take its default value from the master parameter file unless this parameter is specified as mandatory in which case the user must specify its value in any parameter file otherwise an error message is issued and the code stops Warning It is recommended to edit the parameter files of the in directory rather than interfering with the master parameter files found in the setups directories See also Defining the parameter file 20 9 have noticed a directory named fargo3drc in my home directory What is it here for This directory contains two text files history and lastout which are updated at each new run Every time a new run is spawned two lines are appended to the history file a line indicating the date and time at which the ru
96. ill used to store the pointer to the ar ray In the other case it stores directly the array itself The variables for this block are created in src light_global_dev c The scalar variables are passed as type variable 1 While the array variables are passed as type variable size_of the _array Note that all this block is commented out in the C file with at the beginning of each line 84 Chapter 14 Cuda translator C2CUDA py FARGO3D User Guide Release 1 1 14 2 9 MAIN LOOP Identificator lt MAIN_LOOP gt lt MAIN_LOOP gt Parsed as The size of the loops is read and parsed Also the indices are Example ifdef Z for k NGHZ k lt size_z k endif ifdef Y for j NGHY J lt size_y j endif ifdef X for i 0 i lt size_x i endif is parsed to ifdef X i threadIdx x blockIdx x blockDim x else i 0 endif ifdef Y j threadldx y blockldx y blockDim y else J 0 endif ifdef Z k threadIdx z blockIdx z blockDim z else k 0 endif ifdef Z if kK gt NGHZ amp amp k lt size_z endif ifdef Y if j gt NGHY amp amp j lt size_y endif ifdef X if i lt size_x endif The content of the loop is parsed textually the content is part of another block lt gt lt gt but formally Location Before the initialization of the indices i j k initialized This block is very particular because it must to
97. inary ie without MPI meant to run on one CPU core only By default this sequential version cannot run on a GPU After the building process you will see a message similar to FARGO3D SUMMARY H his built is SEQUENTIAL Use make para to change that SETUP fargo Use make SETUP valid_setup_string to change set up Use make list to see the list of setups implemented Use make info to see the current sticky build options We will go through the details of this note in the next sections but for now the important thing is that the code was compiled in SEQUENTIAL mode and the SETUP is fargo This means that the code was compiled in a mode that is very similar to the FARGO code Actually the first run of FARGO3D is the same first run of FARGO A Jupiter like planet embedded in a 2D cylindrical gas disk Warning A note to former FARGO users the setup must not be confused with the parameter file The setup is chosen at build time and contains fundamental information about the executable produced MHD on or off mesh geometry equation of state used number of dimensions etc An executable build for a given setup can then be run on as many parameter files as required without any rebuild If you have a look at the content of the main directory you will see that after the compilation a new file has been created called fargo3d This file is the binary file We can now perfor
98. interactive python shell called Python http ipython org Here you have an example on how to work in an interactive IPython shell ipython pylab IPython 1 0 0 An enhanced Interactive Python In 1 rho fromfile gasdens10 dat reshape 128 384 In 2 imshow logl10 rho origin lower cmap cm Oranges_r aspect auto In 3 colorbar You should see an image similar to inside a widget 120 1 8 100 2 1 2 4 80 2 7 60 3 0 40 33 20 3 6 3 9 0 0 50 100 150 200 250 300 350 Figure 2 3 Matplotlib image of the first run output number 10 8 Chapter 2 First Steps FARGO3D User Guide Release 1 1 2 3 4 More tools The is a lot of software for reading and plotting data but in general you need to have an ASCII file of the data In the utils directory you will find some examples on how to transform the data into a human readable format written in different languages If you are working with a large data set this option is not recommended It is always a good choice to work with binary files your outputs are lighter and the reading process is much faster Warning Using ASCII format is very slow and should never be used for high resolution simulations or a 3D run Note that FARGO3D can also produce data in the VTK format which can be inspected with software such as VISIT This feature of FARGO3D will be entertained later in this manual 2 4 First parallel run Until now
99. ion type void ComputePressureFieldIso_cpu All the mesh functions must return a void This is very important because CUDA kernels cannot but return a void this is one of CUDA kernels limitations Until now the name of the function is not important but you can see that its suffix is must be actually _ cpu Arguments In this simple case the function has not argument but in general you can have a wide range of arguments The most commons are integers reals amp Field variables Global variables realx dens Density gt field_cpu realx cs Energy gt field_cpu real pres Pressure gt field_cpu int pitch Pitch_cpu int stride Stride_cpu int size_x Nx int size_y Ny 2x NGHY int size_z Nz 2x NGHZ This block is related to all the global variables that are not passed by argument to the function In practice here you have all the fields required to achieve the calculation the size of each loop and some useful variables related with indices Local variables int i int 3 int k This block is devoted to variables that are neither passed as arguments nor global Indices of the loop are always here but it is possible to add any variable you want Main Loop for k 0 k lt size_z k for 3 0 j lt size_y j for i 0 i lt size_x i pres 1 dens 1 x cs 1 cs 1 This block is where all the expensive computation is done and all the parsing process was deve
100. is a set of indices helpers for working with boundaries 3gh kgh The index of the y z ghost cells Jact kact The index of the y z active cells This helpers prove very useful to perform complex extrapolations at the boundaries You can see examples in the file std boundaries txt 5 6 1 Examples Zero gradient boundary Suppose we want to define a zero gradient boundary That means we want to copy all the active zone in the ghost zone for both centered and staggered meshes The syntax is as follows It groups together the definitions that we have worked out above for centered and staggered fields SYMMETRIC Centered lactivelactive l Staggered lactivelactivelactivel where the right active value will be copied without any modification to the ghost zone Note that this boundary definition is direction and side independent ie it can apply to ymin ymax zmin and zmax 28 Chapter 5 Boundaries FARGO3D User Guide Release 1 1 Keplerian extrapolation example We define here a more complex boundary condition KEPLERIAN2DDENS Centered surfdens pow Ymed jact Ymed jgh SIGMASLOPE surfdens This line is equivalent to KEPLERIAN2DDENS Centered lactivexpow ymed jact ymed jgh sigmaslope active as per the rules listed above What kind of action does this instruction correspond to It sets the value of the ghost zone to a ghost value Dact
101. is parameter file as said earlier corresponds to the disks contemplated in Baruteau et al 2011 with a radial range from 1 to 8 and from 0 3 to 0 3 in z half a disk in azimuth and an aspect ratio of 0 1 the word aspect ratio is misleading here it merely imply that cs 0 1v everywhere in the disk Besides the mesh is rotating so as to have its corotation at r 3 The initial 6 of the gas is 50 and the initial magnetic field is toroidal see setups mri condinit c Some shot noise is introduced on the vertical and radial components of the velocity with an amplitude of NOISE percent of the local sound speed therefore here 5 Some resistivity is introduced As there is a file called resistivity c in the setup directory it supersedes the same file in the src directory We see that this file implements a linear ramp of resistivity near each radial boundary of radial width 1 7th of the radial extent hence here of radial width 1 7 4 3 Hands on test We hereafter run the setup for 300 orbits at the disk s inner edge and examine some statistical properties of the turbulence that arises To start with we forget any prior build option of the code make mrproper We then build the mri setup Owing to the computational cost it is a good idea to run it on one or several GPUs In what follows we take the example of a run on one GPU The run takes about 10 hours to complete on one Tesla C2050 You can degrade the resolution to spee
102. ith other setups no error nor warning message is issued Now we need to be able to select which function is called _gpu or _cpu by means of a function pointer In the file src global h add the following line at the end void xEdamp real We now have all the variables required to edit the ChangeArch function This function allows to you to switch between a CPU or GPU execution of your new kernel without recompiling the code In the file change_arch c add the following lines Before the line while fgets s MAXLINELENGTH 1 func_arch NULL add Edamp Edamp_cpu This line set the defaults value of the function pointer Edamp it calls the _cpu function If we want to use the GPU version of our function we need to point to Edamp_gpu In practice this is done with the func_arch cfg file To activate this possibility add the following lines at the end before the last endif if strcmp name edamp 0 if strval 0 g Edamp Edamp_gpu printf Edamp runs on the GPU n We eventually add the following line into the func_arch cfg file Edamp GPU Finally we change the invocation of the energy damping into a more general invocation before substep3 inside algogas c Edamp dt Calls either _cpu or _gpu depending on its value If you run the code again you will see that nothing changed are issued indicating expensive comm
103. itoring a quantity Thus far in this section we have vaguely used the expression the quantity What is the quantity and how is it evaluated The quantity is any scalar value which is stored in a dedicated 3D array It is the user s responsibility to deter mine an adequate expression for the quantity and to write a routine that fills for each zone the array with the corresponding quantity For instance if one is interested in monitoring the mass of the system the quantity of interest is the product of the density in a zone by the volume of the zone The reader may have a look at the C file mon_dens c Toward the end of that file note how the interm array is precisely filled with this value This array will further be integrated in X and depending on what has been requested by the user possibly in Y and or Z as explained above Note that the same function is used for the three flavors of monitoring scalar 1D profiles and 2D maps 12 7 4 Monitoring in practice We now know the principles of monitoring it simply consists in having a C function that evaluates some quantity of interest for each cell No manual averaging or integration is required if you program your custom function But how do we request the monitoring of given quantities what are the names of the corresponding files and how do we include new monitoring functions to the code We start by answering the first question The monitoring quantities and flavors is re
104. ive E since Ymed stands for the radius of the center of zone in a cylindrical setup The reader familiar with protoplanetary disk s jargon and notation will easily recognize the radial extrapolation of the surface density of the disk in the ghost zone with a power law of exponent a a is dubbed SIGMASLOPE in the parameters of FARGO3D hence the string chosen to represent the value of the active cell surfdens although at this stage nothing specifies yet to which variable this boundary condition should be applied This is the role of the file SETUP bound 5 7 SETUP bound This file must be located in the sub directory of a given setup and must have same prefix as the setup to which it refers For instance in setup fargo the file that specifies the boundaries is called fargo bound The files that we described in the previous paragraphs are used to specify what transformation rule is used for an arbitrary field in boundaries txt depending on its centering while the centering of all fields is specified by std centering txt The SETUP bound file now specifies the transformation rule to use for each physical variable It obeys the following general syntax e LEVEL1 gt The name of the field e Level2 gt The side of the boundary ymin ymax zmin zmax and some data e data gt Boundary label the label defined in boundaries txt We show hereafter a few examples 2D XY reflecting problem We want to have a 2D iso
105. lanation at the same time We will develop routines for simulating an exploding 2D periodic media where the explosions are governed by a random generator and we will develop a cooling function for the gas This cooling function will be a kernel The explosions will be modeled by energy spheres appearing at random positions and at random times Also we will include a magnetic field The name of our setup will be explosions 19 1 Setup folder The very first step is to make the directory where the setup will be We will store all the files inside se tups explosions This setup is very similar to the otvortex setup so we can copy all the files inside this setup and modify them We need to keep explosions bound explosions opt explosions par and condinit c files cd setups mkdir explosions cp otvortex explosions cd explosions mv otvortex par explosions par mv otvortex opt explosions opt etc for all files that begin with otvortex UN UY Or LU 4 wm Do not forget to change the out put dir and the Set Up parameter of the par file OutputDir outputs explosions SetUp explosions Also we will add some viscosity and resistivity to our setup add the option VISCOSITY to the opt file and set nu 0 001 and eta 0 001 in the parfile We can check that the setup can run make SETUP explosions view fargo3d m setups explosions explosions par We see the otvortex setup but with the name explosions 19 2 Init
106. lls buffers UNITS 0 MKS CGS How the units are interpreted by the code See the unit section for further details GPU 0 1 Activates a GPU compilation PARALLEL 0 1 Activates the use of the MPI Libraries BIGMEM 0 1 Activates the use of the global GPU memory to store light 1D arrays which are otherwise stored in the so called constant memory on board the GPU Normally it is needed when your simulation exceeds 750 cells in some direction This feature is device dependent Typically a 3D run with mesh size 500 500 500 does not require BIGMEM 1 but a 2D run with mesh size 50 1000 does However when BIGMEM 1 is not required you may activate it Perform some benchmarking to check which choice yields faster results This is very problem and platform dependent jobs JOBS N where N is the number of processes that you want to spawn for building the code By default N 8 This option is much needed when working with CUDA The building process of CUDA object files is by far the most expensive stage of building a GPU instance of FARGO3D as you will soon realize All these variables must to be defined by make VARIABLE VALUE The shortcut rules are invoked with the command make option where option can be cuda nocuda gpu nogpu gt GPU 1 0 bigmem nobigmem gt BIGMEM 1 0 18 Chapter 4 Make Process FARGO3D User Guide Release 1 1 seq para gt PARALLE
107. loped to pass this job to the GPU Remember that index 1 is a function of i j k pitch amp stride defined in src define h There is no need to define nor calculate it here it is done automatically at built time 78 Chapter 14 Cuda translator C2CUDA py FARGO3D User Guide Release 1 1 14 2 How the script works In order to develop a general GPU function there are a few problems that must be solved Develop a proper CUDA header Develop the kernel function that is the core of the calculation Develop a launcher or wrapper function which is called from the C code and constitutes the interface between the main stream of FARGO3D and the CUDA kernel Perform communications between host and device between CPU and GPU Split the main loop into a lot of threads that are given to the CUDA cores Develop a method for passing global variables to the kernel Develop a method for passing complex structures eg Field to a kernel A method for switching between the C function and the CUDA function are run time so that we can among other things compare the results of execution on the CPU to those on the GPU in order to validate the correct GPU built You can see the structure of a mesh function is really very simple and you can see that in all the code the structure of mesh functions is the essentially the same This allows us to develop an automatic process to generate CUDA code Here A series of special line
108. lpha_maxwell alpha_reynolds imshow alpha aspect auto origin lower colorbar maxw fromfile which gives the following picture We plot the different time averaged values of a once the turbulence has reached a saturated state plot r alpha_maxwel1 500 mean axis 0 plot r alpha_reynolds 500 mean axis 0 plot r alpha 500 mean axis 0 which gives the following plot We finally plot the radially averaged value of a between r 2 and r 6 corresponding to bins 46 to 228 in Y asa function of time 7 4 MRI 41 FARGO3D User Guide Release 1 1 1400 0 105 1200 0 090 1000 0 075 800 0 060 600 0 045 400 0 030 200 0 015 0 000 0 08 0 07 0 06 0 05 0 04 0 03 0 02 0 01 0 00 0 01 42 Chapter 7 Default SETUPS FARGO3D User Guide Release 1 1 plot arange 1500 1 256 alpha 46 228 mean axis 1 which gives the following plot 0 08 0 06 0 04 0 02 0 00 0 500 1000 1500 2000 We see that we obtain a relatively substantial value for a in this fiducial run much larger than the one obtained with same parameters in the run with NIRVANA in Baruteau et al 2011 at Fig 6 One reason for that is the use of orbital advection another one is the systematic use of the van Leer slopes in the upwind evaluation of all quantities involved in the MHD algorithm Comparison with other code of the Orszag Tang vortex at diff
109. m the first run fargo3d m setups fargo fargo par And you will see the following lines x 1 0000000000 y 0 0000000000 z 0 0000000000 vx 0 0000000000 vy 1 0004998751 vz 0 0000000000 Non accreting Doesn t feel the disk potential Doesn t feel the other planets potential Found 0 communicators OUTPUTS 0 at date t 0 000000 OK TotalMass 0 0121800000 All right all works fine These lines should look familiar to former FARGO users All the outputs are written to outputs fargo You can now open it with your favourite data reduction software We include in the following some examples on how to visualize this first data We will assume that you run the test at least until the output 10 6 Chapter 2 First Steps FARGO3D User Guide Release 1 1 Note For all the instructions it is assumed you are in outputs fargo directory You may go to this directory by issuing cd outputs fargo from the USER_PATH 2 3 1 Gnuplot Gnuplot http www gnuplot info is a portable command line driven graphing utility and it is a useful tool for showing the outputs of FARGO3D Here you have an example on how to load the outputs of FARGO3D on our two dimensional first run The command line should be similar to gnuplot Version 4 6 patchlevel 1 last modified 2012 09 26 gnuplot gt set palette rgbformulae 34 35 36 gnuplot gt set logscale cb gnuplot gt nx 384 ny 128 gnuplot gt plot 0 nx 1 O ny 1 g
110. metric but vy should be reflected in Y This set is the reflective boundary condition on Y The free outflow is the same except for vy Ymin SYMMETRIC Ymax SYMMETRIC 10 1 3 Defining the parameter file The parameter file is very useful when we want to change a value inside the code but you do not want to recompile the code It is used in much the same way as with the former FARGO code Yet in that code parameters were defined in a rather manual way in a file called var c With FARGO3D we do not have to edit this file Rather we provide in the SETUP sub directory a template parameter file that has same name as the setup plus the par extension From this file a Python script will automagically draw a list of all global variables and guess their type 52 Chapter 10 Defining a new SETUP FARGO3D User Guide Release 1 1 and will make a var c accordingly in a manner transparent to the user At run time the user is free to run the code either with this par file or any other par file in another directory without rebuilt There are a set of minimal requirement in a par file related with the mesh size and output parameters We will start with the basic parameters We edit the new file myblob par touch myblob par and write inside something similar to Setup myblob Nx 400 Ny 100 Xmin 2 0 Xmax 2 0 Ymin S Ymax 0 5 Ntot 1000 Ninterm 1 DT 0 05 OutputDir outputs myblob Warning Because a
111. n We present in some detail all these kinds of communications hereafter 13 1 GPU CPU communications First to all assume we run the code in sequential mode on one GPU Naturally routines that run on the GPU must have at their disposal data in the Video RAM device s global memory in GPU s jargon whereas routines that run on the CPU must have at their disposal data on the normal RAM host memory When the data is updated on say the GPU it is not updated automatically on the CPU nor vice versa Managing manually data transfer between CPU and GPU host and device is a programmer s nightmare It is extremely error prone and proves to be impractical for a code of the complexity of FARGO3D The GFARGO code which has a simpler structure has been developed using manual data transfers from GPU to CPU and vice versa and it took a very long time to get the data transfers right In order to understand how data transfer is dealt with in a semi automatic way in FARGO3D let us examine a real life example In what follows a green rectangle means that the data is correct and up to date whereas a red rectangle corresponds to random or out of date data We start by initializing a data field say Vx on the CPU there is absolutely no reason to try and initialize them directly on the GPU this would add a lot of complexity and it would be pointless We have therefore the following situation 67 FARGO3D User Guide Release 1 1
112. n e Y Activates the Y direction e Z Activates the Y direction Note Some fields are not available until one specific direction is activated Equation of state e ADIABATIC The equation of state P y 1 e will be used The field Energy is the volumic internal energy e e ISOTHERMAL The equation of state P csp will be used The ill named field Energy is then the sound speed of the fluid Additional Physics e MHD Activates the MHD solver It is necessary to have X Y amp Z all activated e STRICTSYM Only has sense if MHD is activated It enforces strict symmetry of the MHD solver e VISCOSITY Activates the viscosity module 45 FARGO3D User Guide Release 1 1 e POTENTIAL Activates the gravity module e RESISTIVITY Activates the magnetic diffusion module e STOCKHOLM Activates wave killing boundary conditions in Y amp Z Very useful for local studies where reflections on the edges must to be avoided e HILLCUT Activates a cut for the force computation Must to be defined in order to accept ExcludeHill parameter file variable set to yes Coordinates e CARTESIAN x y z are Cartesian e CYLINDRICAL x azimuthal angle y gt cylindrical radius z gt Zz e SPHERICAL x gt azimuthal angle y gt spherical radius z gt colatitude Transport e STANDARD Forces the standard advection algorithm in x By default the x advection is done with the orbital advecti
113. n that directory called myblob par 5 the opt file in that directory called myblob opt 6 the file called condinit c in that directory This is where the fields are initialized Optionally e the units file called myblob units e the mandatory file called myblob mandatory e the objects file called myblob objects e Aboundaries txt file if it is not present it will be taken from the std directory 10 1 1 Making the setup directory We start by creating the setup directory myblob 51 FARGO3D User Guide Release 1 1 cd setups mkdir myblob 10 1 2 Defining boundaries We define from scratch all the boundaries in the setup boundaries txt inside setups myblob emacs boundaries txt Naturally you may use your favorite editor instead of emacs Now we write these lines SYMMETRIC Centered lalal Staggered lalalal ANTISYMMETRIC Staggered I alOlal In order to do that we create a file called We will use the SYMMETRIC boundary for both reflective and free outflow boundaries and ANTISYMMETRIC only for the normal velocity Vy in the reflective case We must create the file myblob bound emacs myblob bound And write this lines Density Ymin SYMMETRIC Ymax SYMMETRIC Energy Ymin SYMMETRIC Ymax SYMMETRIC Vx Ymin SYMMETRIC Ymax SYMMETRIC Vy Ymin ANTISYMMETRIC Ymax ANTISYMMETRIC We say that all our fields are sym
114. n was launched and the command line issued to launch the run In addition two new lines are appended to the lastout file a time stamp as previously and the absolute path of the output directory It can be useful to parse the last line of this file with a script to go directly where the output is written alias fo cd tail n 1 HOME fargo3drc lastout The above line in the bashrc file will define a command fo like Fargo3d Output which changes the directory to the output directory of the last run Known issue When the o flag is used on the command line the subsequent quotation marks are not written to the file history If one wishes to cut and paste some line of this file to repeat a given run one must ensure to manually edit the line to restore the quotation marks when the flag o is used 20 10 How can see the output of python scripts in particular the CUDA files You may have noticed a long list of files being removed at the end of the build process especially on GPU builds These files are intermediate files such as CUDA files automatically produced by the c2cuda script etc and the make process must remove them upon completion in order to preserve dependencies Failing to do so would result for instance in the GPU version of a routine not being rebuilt if its C source was edited We do not issue explicitly this rm command in the makefile This is done automatically out of our control because make knows tha
115. nly taking care about prototypes and function pointers Here is a brief summary about this process 1 make a directory for the setup copy the important files you need into the new directory If you will include routines add the setup objects file Add the prototypes to the file src directory h Add the global function pointer to global h Modify change_arch c E oe oe ON Add the function to func_arch cfg and point correctly to this file in your parameter file with FuncArchFile 8 Validate your new kernel using FARGO_DEBUG 108 Chapter 19 Developing a complex setup CHAPTER TWENTY TIPS TRICKS TODOS AND TROUBLESHOOTING With no particular order we draw hereafter a list of questions that may come to the user s mind This list will be updated according to the users feed back 20 1 How do add a new parameter to a setup Imagine you want to add a real variable RHOR density on the right side to the set up sod1d thus far the right value of the density in the Riemann problem is hard coded in condinit c Just add a line such as the following in setups sod1d sodld par RhoR 001 and this is all A python script which parses this file makes sure that you have access to a global real variable named RHOR throughout all the functions of the C code You can now use this variable to replace the hard coded value in setups sod1d condinit c Note The variable name in the par file is case insensitive but C i
116. nment variable called FARGO_ARCH the same as for the FARGO code Also inside this makefile are defined a lot of useful variables Here is where the structure of the code is defined and where the variable VPATH is specified This variable is extremely powerful and if you want to extend the FARGO3D directory structure you should learn about the use of this variable GNU Make Reference Guide Another important set of variables are e MAINOBJ The name of all the CPU objects that will be linked with the final executable All new source file in the code must to be included in this variable with the o extension instead of c GPU_OBJ The name of the static kernels used in the code In practice you will never need have to touch it By static we mean that these few kernels are not generated automatically from the C code by a Python script GPU_OBJBLOCKS The name of the objects that will be generated by the script c2cuda py Note all of them must have the suffix _gpu o with a prefix that is the one of the corresponding C file This is very important because the rule that generates CUDA files from C files uses the suffix of the object name In the tutorial on how to develop a GPU Routine function this will be presented in more detail All the functions that must be generated automatically from C code at build time must appear as a list in this variable 4 4 FARGO_ARCH environment variable FARGO3D is a multi platform code and can run on a m
117. nstance Z in the 2D polar fargo setup the corresponding value of NGHY Z is set to 0 The variables NX NY and NZ are defined in the parameter file they default to 1 so there is no need to define for instance NZ in a 2D setup such as fargo or NX in the 2D setup ot vortex which corresponds to the Orszag Tang vortex problem in Y and Z In practice this mesh is split among processors and locally within the scope of a given process the submesh considered has size Nx Ny 2 NGHY and Nz 2 NGHZ The information about cells coordinates is stored in 1D arrays e xyz min index e xyz med index where min refers to the inner edge of a zone in x y or z whereas med refers to the center of a zone in x y or Z This notation should look familiar to former FARGO users Warning xyz min max index are not vectors they are macrocommands They must to be invoked with not with NGHY and NGHZ are preprocessor variables defined in the file src define h Because we have a multi geometry code another set of secondary geometrical variables is defined surfaces volumes See the end of this section for details 6 2 Fields Fields are structures and they can be seen as cubes of cells of size equal to the mesh size The location at which a given variable is defined is xyz med if the field is xyz centered or xyz min if the field is xyz staggered You can find a comprehensive list of the fields in src global h The place whe
118. o that we know whether it is up to date or not and they transfer the data to the place where the calculation is going to proceed if it turns out to be out of date at the beginning of a routine For instance consider the following piece of code 68 Chapter 13 Communications FARGO3D User Guide Release 1 1 field gpi GPU Video RAM CPU Normal RAM field cpu ComputePressure INPUT SoundSpeed INPUT Density OUTPUT Pressure main loop pressure 1 density 1 soundspeed 1 soundspeed 1 The macrocommands INPUT and OUTPUT expand differently on the CPU and on the GPU to be more accurate in C functions and in CUDA kernels On the CPU INPUT checks the color red or green of its argument field on the CPU and if it is red it requires a communication device to host of this specific field This ensures automatically that the field we process on the CPU is up to date when we enter the routine s main loop Similarly on the CPU OUTPUT sets to green the state of its argument field on the CPU and to red its state on the GPU On the GPU the macrocommands expand in the opposite way We leave as an exercise to the reader to check that one can exchange in the above paragraph the words CPU and GPU device and host Note Implementation wise we do not truly define a color for CPU and GPU Rather each Field has two boolean flags named fresh_cpu and fresh_gpu If fresh_cpu is YES it
119. odern cluster of GPUs but also on your personal computer even without a GPU For the ease of use we adopt a computer dependent makefile scheme managed by the environment variable FARGO_ARCH You can see in src makefile a group of lines similar to LINUX PLATFORM GENERIC FARGO_ARCH must be set to LINUX CC_LINUX gcc SEQOPT_LINUX 03 ffast math PARAOPT_LINUX SEQOPT_LINUX PARACC_LINUX mpicc LIBS_LINUX lm INC_LINUX NVCC_LINUX nvcc PARAINC_LINUX PARALIB LINUX These lines are telling the makefile where the libraries are and which compilers will be used In the LINUX case default case we are not including any parallel library to PARALIB and any header to PARAINC because we are assuming they are in your LD_LIBRARY_PATH or they are installed in the default places In general in your cluster you should have something similar to FARGO_ARCH must be set to MYCLUSTER CC_MYCLUSTER bin gcc SEQOPT_MYCLUSTER 03 ffast math PARAOPT_MYCLUSTER SEQOPT_LINUX PARACC_MYCLUSTER bin mpicc LIBS_MYCLUSTER 1m INC_MYCLUSTER 20 Chapter 4 Make Process FARGO3D User Guide Release 1 1 NVCC_MYCLUSTER CUDA bin nvcc PARAINC_MYCLUSTER 1I S MPIDIR include PARALIB_MYCLUSTER L S MPIDIR 1ib64 Where MPIDIR and CUDA are variables pointing to the place where MPI and Cuda are installed To use the FARGO_ARCH
120. of arguments begins by inserting a comma ifdef X FARGO_SPEEDUP SubStep1_x dt lt Note the comma before the endif We now build the code for the target setup with the PROFILING and GPU options enabled make SETUP mri GPU 1 PROFILING 1 and we run it fargo3d m in mri par You should see an output such as Wall clock time elapsed during MPI Communications 0 030 s OUTPUTS 0 at Physical Time t 0 000000 OK TotalMass 0 0271300282 97 FARGO3D User Guide Release 1 1 KKKXKKXk Check point created KKK KKK KKK KKK Check point restored KKK KK KK GPU CPU speedup in SubStepl_x 22 775 CPU time 91 1 ms GPU time 4 ms We see that the function is timed both in its CPU and GPU version this test was obtained on an Intel R Core TM 17 950 at 3 07 GHz and on a Tesla C2050 card We also note how execution continues after the evaluation so that periodically an evaluation of the speed up of our target function is provided It is interesting to see how the macrocommand is expanded by the preprocessor SynchronizeHD SaveState InitSpecificTime amp t_speedup_cpu for t_speedup_count 0 t_speedup_count lt 200 t_speedup_count SubStepl_x_cpu dt time_speedup_cpu GiveSpecificTime t_speedup_cpu SynchronizeHD RestoreState InitSpecificTime amp t_speedup_gpu for t_speedup_count 0 t_speedup_count lt 2000 t_speedup
121. of instructions that manages the build variables such as GPU PARALLEL etc This makefile works as a wrapper that further invokes scripts param py There are two ways to define a build variable from the command line using shortcut rules or defining the value of the variables manually The set of variables allowed are e PROFILING 0 1 The profiler flags and a set of timers will be activated Useful for benchmarks and development 17 FARGO3D User Guide Release 1 1 RESCALE 0 1 Activates a rescaling of the real parameters of the parameter file to certain units The numerical values used for this rescaling depends on the value of the build variable UNITS The output are all done with the new units SETUP Selects the proper directory where your setup is A list will be shown if you type make list MPICUDA 0 1 Activates the peer2peer uva MPI Communications between different devices Only compatible with MVAPICA2 2 0 and OpenMPI 1 7 Strongly recommended FARGO_DISPLAY NONE MATPLOTLIB Run time visualization of your fields using python numpy matplotlib packages DEBUG 0 1 The code will be compiled in debug mode The main change is the flag g at compila tion time No optimization is performed in this mode FULLDEBUG 0 1 The code runs in full debug mode Similar to debug mode but the merge flag m 1s not allowed All the fields are dumped with their ghosts ce
122. on FARGO algorithm Slopes e DONOR Activates the donor cell flux limiter for the transport Actually deactivates the default van Leer s second order upwind interpolation Artificial Viscosity e NOSUBSTEP2 If it not defined the artificial viscosity module called Substep2 is invoked e STRONG_SHOCK If strong shocks make the code crash you may try using this variable It is never used in the tests It uses a linear rather than quadratic artificial pressure Cuda blocks The cuda blocks must be defined in the form ifeq GPU 1 FARGO_OPT DBLOCK_X 16 FARGO_OPT DBLOCK_Y 8 FARGO_OPT DBLOCK_Z 4 endif This is needed to define a default block size for GPU kernels Alternatively for a given platform you may determine individually for each CUDA kernel routine which block size gives best results See also Improving CUDA Performance There is a special set of variables not contained in the FARGO_OPT variable MONITOR_2D MONITOR_Y MONITOR_Y_RAW MONITOR_Z MONITOR_Z_RAW MONITOR_SCALAR Those are used at build time to request systematic fine grain monitoring The meaning of these variables is explained in Monitoring 46 Chapter 8 opt files 9 1 CHAPTER NINE UNITS Introduction Unlike its ancestor FARGO FARGO3D comes with a variety of unit systems The reason for this is twofold 9 2 Working in a different unit system may help reveal bugs
123. on the lower z border ez_lowy j k The same as edge_size_x but measured half a cell above in z ey_lowz j k The same as edge_size_x but measured half a cell above in y Surfaces e SurfX j k The lower surface of a cell at x cte e SurfY j k The lower surface of a cell at y cte e SurfZ j k The lower surface of a cell at z cte Volumes e Vol 3 k The volume of the current cell e InvVol j k The inverse of the current cell s volume You can see examples on how to use these variables in src They are widely used in many routines 34 Chapter 6 Mesh and Fields CHAPTER SEVEN DEFAULT SETUPS FARGO3D was developed with simulations of protoplanetary disks in mind but it is a sufficiently general code to tackle a lot of different problems This property makes that its ancestor the public code FARGO is simply a particular case of the wide range of possible setups that can be designed This section contains a brief summary of the setups that come with the public version of FARGO3D We develop in more detail the setup called fargo We emphasize that a setup must not be confused with a set of parameters those being provided in a so called parameter files with extension par by convention A setup corresponds to a given physical problem and geometry in a setup we specify the grid geometry the equation of state to be used whether we use the MHD module etc In the parameter file
124. oncept of SETUPS and with the help of the VPATH Makefile variable If you know the RAMSES code http www ics uzh ch teyssier ramses RAMSES html you will see that we use the same patch concept Warning In practice when using FARGO3D you do not need to know about the VPATH variable but if you want to develop some new features it is a good idea to keep in mind that it is used under the hood Another problem is related to the different modules of the code For example in some situations we need to use the MHD module but for another set of problems the MHD is irrelevant In order to avoid a lot of logical run time tests inside the code i s we prefer to use MACROCOMMAND variables These variables are interpreted prior to compilation time by the so called preprocessor and activate deactivate certain features lines of the code allowing a tailor made executable built for a specific problem All these variables are activated from the Makefile they are defined actually in the opt files that we shall see below The most important file to ultimately build a FARGO3D executable is obviously the Makefile For this reason there is a section devoted to this particular file But because FARGO3D uses a lot of scripting at compilation time the process of building the code does not simply reduce to the Makefile We refer to the FARGO3D building process as the make process A given instance of FARGO3D has many different properties an
125. one for any of the files which has same radix than those of the list GPU_OBJBLOCKS in the makefile We take the example here of the file compute_emf c In the src directory issue python scripts c2cuda py i compute_emf c o foo cu In this command line i stands for the input and o for the output Note that we do not follow here the automatic rule that would create bin compute_emf_gpu o As a result there is no risk to break dependencies if we forget to remove the file created manually You can examine the file foo cu and compare it to the input C file You may also try to invoke it with the p flag that implements a loop in the wrapper function to determine the best block size 20 11 A very incomplete TODO list We have several projects or improvements in mind for FARGO3D Some of them are cosmetic time savers others are more substantial Among them e Since we can parse the C code to produce CUDA code we can also in principle produce automatically OpenCL code This would enable FARGO3D to run on non NVIDIA s GPUs and on multi core platforms e We would like to merge FARGO3D and the nested mesh structure of the code JUPITER developed by one of us but never publicly released This would require to have normal ghost zones in the X direction as for Y and Z a 3D mesh splitting of processing elements and an adaptation of the ghost filling procedure In this page you can see JUPITER at work with a number of nested meshes onto
126. onetheless it is customary to distribute the tasks on clusters so that one PE runs on one core We frequently commit the abuse of language that consists in saying processor instead of PE e g processor of rank 0 1 2 Licence FARGO3D is released under the terms of the GNU GENERAL PUBLIC LICENSE Version 3 29 June 2007 Copyright 2007 Free Software Foundation Inc lt http www fsf org gt Everyone is permitted to copy and distribute verbatim copies of this license document but changing it is not allowed Full text of the GPL 4 Chapter 1 Introduction CHAPTER TWO FIRST STEPS 2 1 Getting FARGO3D FARGO3D can be downloaded from the the same webpage as the public FARGO code http fargo in2p3 fr In that page you will find a tar gz file with all the sources inside You can check the README file which contains some basic installation tips Hereafter is a set of detailed instructions If you have trouble at some stage read the troubleshooting first and if your problem is not solved you may send an email to the discussion group asking for help 2 2 Installing Warning This document was written for UNIX type systems We have not tested the behavior of FARGO3D on a Windows system Suppose you have downloaded the code to your download directory USER_PATH downloads where USER_PATH is the path to your user eg home pablo downloads You need to decide where you want to install FARGO3D We assume in the foll
127. ore Process 1 GPU 1 node 0 CPU core Process 2 GPU 0 CPU core Process 3 GPU 1 node 1 CPU core Process 4 GPU 0 Q ar q CPU core Process 5 rill node 2 Figure 13 2 The process ranks increases within a node then from node to node A node is filled with processes until it is full then MPI continues with the following node 72 Chapter 13 Communications FARGO3D User Guide Release 1 1 node 0 Process 1 GPU 0 B O EE 2 a CPU core Process 2 GPU 0 Q FU E CPU core Process 5 node 2 Figure 13 3 The processes are distributed in a Round Robin fashion process 0 to node 0 process I to node 1 etc and MPI returns to nodeO once the available number of nodes has been reached 13 3 MPI CUDA 73 FARGO3D User Guide Release 1 1 SelectDevice int myrank in the file src select_device c You can see that in this function we have a series of tests on the hostname for which we have implemented some selection rules For instance we have developed FARGO3D among others on a workstation with two tesla C2050 cards hostname tesla and for this device we have the selection rule device 1 myrank 2 which selects device 1 for rank 0 and device 0 for rank 1 the reason for swapping the GPUs with respect to normal order is that a run with 1 process only will run on the GPU 1 for which the temperature levels off at a smaller value than GPU 0 during a long run A
128. ormats the name of all parameters and their corresponding values IDL var is properly formatted to simplifying the reading process in an IDL GDL script IDL gt IDL var IDL gt print input_par nx 384 IDL gt print input_par xmax 3 14159 The standard par format is used in variables par It may be used again as the input parameter file of FARGO3D should you have erased the original parameter file 12 4 Grid files One grid file is created per processor Inside each file there is information stored about the submesh relative to each processor The current format is on 7 columns with the data e CPU_Rank Index of the cpu e YO Initial Y index for the submesh e YN Final Y index for the submesh e ZO Initial Z index for the submesh e ZN Final Z index for the submesh e IndexY The Y index of the processor in a 2D mesh of processors e IndexZ The Z index of the processor in a 2D mesh of processors For an explanation of the last two items go to the section about MPI 12 5 Planet files These files are output whenever a given setup includes a planetary system which may consist of one or sev eral planets This among others is the case of the fargo and p3diso setups There are three such files 12 3 Variables 61 FARGO3D User Guide Release 1 1 per planet named planet i dat bigplanet i dat and orbit i dat where iis the planet num ber in the planetary system file specified by the parameter PLANE
129. oundaries are applied We note that the buffer or ghost zones are three cells wide 24 Chapter 5 Boundaries FARGO3D User Guide Release 1 1 5 2 boundparser py In order to work with boundaries we have developed a script called boundparser py that reads a general boundary text file and converts it into a C file properly commented to be subsequently converted into a CUDA file boundparser py works with four files called e setup bound e boundaries txt e centering txt e boundary_template c note these last three files can be in your setup directory or by default the script takes those of the sta directory as prescribed by the VPATH variable boundparser py reads the information inside the setup bound file and compares it with the information in boundaries txt and centering txt Finally it builds a set of C files that apply the boundary conditions called y z min max _boundary c with a proper format given by boundary_template c We will not give a detailed description on how boundparser py works but it is not difficult to understand the script if you are interested in its details The key to developing boundary conditions is to understand the structure of setup bound and boundaries txt file Both files are in the same format described below 5 3 Boundaries files format boundaries txt centering txt and setup bound files have the same format All are case insensi tive and have two levels of information
130. owing that you install it in the directory USER_PATH fargo3d In order to do that the steps are cd USER_PATH or simply cd S cp USER_PATH downloads fargo3d tar gz USER_PATH tar xvf fargo3d tar gz Tf all went fine you will see directories called bin src outputs setups etc USER_PATH fargo3d 1s bin doc Makefile outputs planets README scripts setups src std test_suite utils You are now ready to build the code and do your first run 2 3 First run The first run of FARGO3D is the same first run as the one of its ancestor FARGO you can see the documentation of FARGO for details In order to build the code go to the USER_PATH fargo3d directory not the src directory and issue make FARGO3D User Guide Release 1 1 Warning You must use the GNU make utility The build process relies upon several important features of the GNU make We have tested the compilation process with GNU Make 3 81 We have thoroughly tested the build process with python version 2 7 It may work with older versions of Python too However should you experience unexpected results during the build process with an older version of python we recommend that you update it to 2 7 x lt python version lt 3 Issue python V at the command line to know your version No additional libraries are required for this simple first build With this instruction you will have a sequential or serial version of the b
131. pilation process is expensive when you are working with CUDA files For this reason a parallel make process is highly desirable Normally a general Makefile can handle parallel compilation but in the FARGO3D case that uses a lot of scripting some race conditions could be appear in parallel Makefiles Since the GNU make utility does not have a proper way to avoid such problem we developed an interface between Makefile and src makefile to do all the building process in the right order first invoking the scripts to build all the headers and variable declarations and then a parallel execution of src makefile Also make py keeps track of the last flags used in the last built sticky options All this information is stored in the hidden file std lastflags 4 2 scripts make py Stage2 19 FARGO3D User Guide Release 1 1 4 3 src makefile Stage3 The third and last step in the building process is to call src makefile which is done by scripts make py The role of this makefile is to build the executable normally called fargo3d in the main directory All the calls to scripts are done here Inside there are a set of rules for making the executable in the correct sequence You may have a look through these different rules which are self documented by their names There is a set of system configuration blocks that allows to build FARGO3D on different platforms with the same makefile This configuration blocks are selected by using the enviro
132. ppeared to be much needed as we were developing the code We had initially left the treatment of BCs on the CPU even for GPU builds We thought that the filling of the ghost zones by the CPU would have negligible impact on the overall speed of the code This however turned out to be untrue the CPU GPU communication overhead and the slowness of CPU calculations lowered the code performance significantly We therefore decided to deal with BCs in the exact same way as with other time consuming routines which can be translated automatically to CUDA so as to run on the GPU This implied some syntax constraints on the associated C code which would have made the development of a variety of boundary conditions on the four edges of the mesh a time consuming and error prone process yes four because they are specified only for Y and Z For this reason we have chosen to write a script that produces the C code for a given set of boundary conditions with all the syntactic comments needed to subsequently translate 1t to CUDA 5 1 How are boundary conditions applied The boundary conditions are applied just after the initialization of a specific SETUP before the first output and subsequently twice per time step Boundary conditions are applied to all primitive variables and to the electromotive forces EMFs All boundary conditions are managed by the routine FillGhosts inside src algogas c Below we depict schematically how cells of the active mesh
133. pter 12 Outputs CHAPTER THIRTEEN COMMUNICATIONS Mono GPU runs with CUDA multi CPU runs with MPD and obviously multi GPU runs with MPI CUDA all need some kind of communications When we run a SETUP on a GPU the GPU sometime needs to exchange information with the host CPU It is therefore important to give some information about this kind of transactions On the other hand when we run a CPU parallel version of the code each processor must communicate information about its contour cells with neighboring processors which involves MPI communications Finally when we run a mixed version ie GPU parallel on a cluster of GPUs both processes are combined and communications device lt gt host CPU happen as well as communications CPU lt through MPI gt CPU Traditionally GPU to GPU communication would imply a three part trip the information is downloaded from the GPU to its corresponding CPU process then sent to a neighbor and finally uploaded to the GPU of that neighbor Recent CUDA implementations permit however to issue MPI communication statements using directly pointers on board the GPUs The information then travels using the fastest way in a manner totally transparent to the user FARGO3D handles this case which is activated with the make option mpicuda make mpicuda or make MP ICUDA 1 If you want to get the best performance of FARGO3D on your GPU cluster it is mandatory to use the mpicuda optio
134. quested at build time through the opt file There you can define up to 6 variables which are respectively e MONITOR_SCALAR ONITOR_Y ONITOR_Y_RAW ONITOR_Z O NITOR_Z_RAW e MONITOR_2D Each of these variables is a bitwise OR of the different quantities of interest that are defined in define h around line 100 These variables are labeled with a short self explanatory upper case preprocessor variable For instance assume that you want to monitor the total mass scalar monitoring and total angular momentum also scalar monitoring that you want to have a formatted output of the radial torque density plus a 2D map of the azimuthally averaged angular momentum You would have to write in your opt file the following lines MONITOR_SCALAR MASS MOM_X MONITOR_Y TORO MONITOR_2D MOM_X Note the pipe symbol on the first line which stands for the bitwise OR It can be thought of as switching on simultaneously several bits in the binary representation of MONITOR_SCALAR which triggers the corresponding request for each bit set to one The condition for that naturally is that the different variables defined around line 100 in define h are in geometric progression with a factor of 2 each of them corresponds to a given specific bit set to one In our example we therefore activate the scalar monitoring of the mass and of the angular momentum we will check in a minute that MOM_X corresponds to the angular momen
135. r reduction in the Y an Z dimensions of the previous 2D array This is done on the host as this has a low computational cost This operation involves a Device gt Host communication only of a one cell thick single 2D array which is not taken into account when evaluating 2D communications that yield the diagnostic on the terminal see CUDA aware MPI implementations The two stage process detailed above is generally not a problem as we expect most setups to have a number of zones in X or azimuth much larger than the GPU vs CPU speed up factor hence it is the first stage reduction on board of the GPU that constitutes the time consuming part If however you have a YZ setup the number of zones in X is just one and in this case it is the second stage of the reduction process on board the CPU that constitutes the bottleneck it is like if the reduction was entirely performed on the CPU If you want to assess how much of your setup s slowness can be put on the account of this restriction you may try to hard code the time step in src algogas c to see if this yields significant improvement Also during this test you have to deactivate all monitoring in the opt file as monitoring requires reductions which are performed as described above 20 7 What is this sign at the beginning of the outputs path Unless you have defined the environment variable FARGO_OUT this sign is not used and you may remove it if you wish When
136. r the make process std par Default parameters It is used when you have not defined the value of a certain parameter in your parameter file It is used for compatibility with some specific problems related to disks simulations and some plot related parameters standard units The scaling rules for all standard real variables of the code 14 Chapter 3 Directory tree FARGO3D User Guide Release 1 1 e func_arch cfg The standard architecture file This file selects where each function will run CPU or GPU It only has sense if the compilation was done with GPU Compatibility GPU 1 By default all the function run on the GPU but this file is a great tool for debugging the code one of the best tools to develop a kernel 3 1 8 test_suite This directory was developed to ensure a stable development of the code Here there is a set of test files written in python All of them use the script test py They are easy to understand The main idea is if your recent developments pass all the tests at least your new improvements do not interfere with the main code and do not alter its behavior 3 1 9 utils Here is a set of routines to analyze the data The content inside is self documented Feel free to explore it 3 1 10 doc The doc directory is where these documentation files reside Also here are some files related with the licence of the code 3 2 setups SETUP directory The directory setups SETUP is one of the most
137. re the fields are created is in CreateFields inside src LowTasks c Internally all fields are cubes written as 1D arrays So we need indices to work with the 3D data We have a set of helpers defined in src define h They are e 1 The index of the current zone e 1xp 1xm Ixplus Ixminnus the right left x neighbor e lyp lym lyplus lyminnus the right left y neighbor 31 FARGO3D User Guide Release 1 1 e 1zp lzm Izplus lzminnus the right left z neighbor These helpers must be used with the proper loop indices int i j k for k z_lower_bound k lt z_upper_bound k for j y_lower_bound j lt y_upper_bound j for i x_lower_bound i lt x_upper_bound i field 1 3 0 field2 1 field1 1xp field1 1 xmed 1xp xmed 1 obviously some gradient where kji always means zyx direction Warning Do not change the order of the indices The definition of 1 1xp 1xm etc assumes the following correspondence i gt x j gt y k gt z These helper are extremely useful No explicit algebra has to be performed on the indices within a loop but never use or define a variable called 1 or 1xp Besides the definition of 1 is also correct within GPU kernels for which the indices algebra is slightly different owing to memory alignment considerations and this is totally transparent to the user who should never have to worry about this In practice a loop is similar to i
138. red along each direction that does not appear explicitly in the centering definition Let us have a look at some lines of centering txt Density Staggering C Bz Staggering z Emfz Staggering xy The first two lines specify that the density is a cube centered quantity The following two lines specify that the magnetic field in the z direction is centered in x and y but staggered in z ie defined at the center of an interface in z Finally the last two lines indicate that the EMF in z is centered in z but staggered in x and y It is therefore a quantity defined at half height of the lower vertical edge in x and y of a cell All primitive variables and the EMF fields are defined by default in std centering txt If you create a new primitive field in the code you should specify its correct centering in std centering txt if you want to create boundary conditions for this field 5 5 boundary_template c This file is taken as a template for building automatic boundary condition C files You should never have to deal with it Also it contains all the information for subsequent building of the CUDA files If you need special boundary conditions you could try modifying this file first As long as you do not alter the lines beginning with you can modify this template This kind of modification should be made by an advanced user 5 6 boundaries txt This file is the core file for the boundary condition The main idea is to provide a
139. rmation The coordinates of each cell are stored in additional files called domain_ xyz dat see the section below When you use MPI the situation becomes more complex because each processor writes its piece of mesh If you want to merge the files manually you need the information of the grid files detailed below In practice all the runs are done with the run time flag m merge in order to avoiding the need for a manual merge If your cluster does not have a global storage you have to do the merge manually after having copied all files to a common directory The fields may be written with a different output format called VTK format This format is a little bit more complicated and is discussed in the VTK section 59 FARGO3D User Guide Release 1 1 Mesh Binary file Here you have some minimalist reading examples with different tools additional material can be found in the First Steps section and in the utils directory c FILE fi double f nx ny nz fi fopen filename r fread f sizeof double nxxnyx nz fi fclose f python from pylab import x rho fromfile gasdens120 dat reshape nz ny nx x GDL or IDL GDL gt openr 10 gasdens10 dat GDL gt rho dblarr nx ny GDL gt readu 10 rho GDL gt close 10 fortran realx8 data nxxnyxnz open unit 100 status old file filename form unformatted access direct recl NXxNYxNZx8
140. rum fargo3d 1 1 A foreword about the terminology used in this manual We are aware that most FARGO3D potential users come from a FORTRAN background and for this reason we have avoided as much as possible an uncontrolled use of C and CUDA s jargon We have tried to explicit specific terms every time we used them We give hereafter a very short list of the main terms that you may encounter in this manual e What is called a routine in FORTRAN is a function in C We have nonetheless used frequently the incorrect term routine even for C functions A function is always referred to with trailing empty parentheses for instance the main function What is called a function in C is called a kernel in CUDA This is not to be confused with the operating system s kernel of course A kernel is therefore a function generally lightweight owing to memory constraints on board the streaming multiprocessors of a GPU that spawns a huge amount of threads on the GPU cores In the GPU s jargon the CPU and its RAM are usually designed as the host whereas the GPU is called the device Uploading data to the GPU is therefore called a host to device communication The video RAM of a GPU is called the global memory of the device Finally in the MPI parlance each instance of a job should be called a processing element PE in short It should be distinguished from the processor as several PEs can run on one processor or even on one core N
141. s very similar on other Linux systems with a package manager Warning The m flag that we have used so far in the command line instructs FARGO3D to merge all data from all processes when writing them to the disk which makes much easier its subsequent reduction In general it is a good idea to always use this flag 2 4 First parallel run 9 FARGO3D User Guide Release 1 1 2 5 First GPU run Warning We assume you have installed CUDA and the proper driver on your system You can test if the driver works correctly by running nvidia smi ina terminal It is also a good idea to run a few examples of the NVIDIA suite to ensure that your installation is fully functional Running FARGO3D in GPU mode is similar to the first parallel run The only important thing is to know where CUDA is installed FARGO3D knows about CUDA by the environment variable CUDA defined in your system If you do not have the CUDA variable defined FARGO3D assumes that the default path where CUDA is installed is usr local cuda If this is not the right place modify your bashrc file and add the following line export CUDA Your CUDA directory Warning The above example assumes that you use the bash shell After that we are ready to compile the code make PARALLEL 0 GPU 1 or the corresponding shortcut make PARALLEL 0 gpu Warning you cannot combine shortcuts in the command line You could for instance issue make seq
142. s case sensitive The name is converted to upper case for the C code You must help the parser to automagically guess the type of the variable Here 0 1 works but 1 would not 20 2 forgot to run the code with the m flag Is there a way to merge the outputs Yes you can Assume you run ran on 8 processors and you want to merge the output 5 You may issue on the command line in FARGO3D s main directory mpirun np 8 fargo3d s 5 m 0 yourparfile It will do the trick restart from fragmented output lower s then merge outputs and exit Edit the above line according to your needs or insert it into a bash loop to merge the outputs of an entire directory Note This technique is non destructive in the sense that it does not change nor remove the individual outputs Warning The above instruction requires that the whole simulation may fit on your platform s memory If you issue it on a small laptop for a simulation that ran on a large cluster it may fail if there is not enough RAM 109 FARGO3D User Guide Release 1 1 20 3 I see lots of or instead of dots at execution What does that mean This happens when you have a GPU built and GPU GPU communications are not fully optimized See the section GPU CPU communications and CUDA aware MPI implementations Note that periodic boundary conditions are handled like normal MPI communications even on a one process run so that
143. s were developed for simplifying the parsing process you have a complete example lt FLAGS gt define _ GPU define __NOPROTO lt lt I FLAGS gt NCLUDES gt include fargo3d h lt INCLUDES gt void ComputePressureFieldIso_cpu lt USER_DEF INED gt INPUT Energy INPUT Density OUTPUT Pressure lt USER_DEFINED gt lt EXTERNAL gt realx dens Density gt field_cpu realx cs Energy gt field_cpu realx pres Pressure gt field_cpu int pitch Pitch_cpu int stride Stride_cpu int size_x Nx int size_y Ny 2 NGHY int size_z Nz 2xNGHZ lt E IIKI in in in in lt XTERNAL gt NTERNAL gt oe a t j ie Kes t LL INTERNAL gt lt MAIN_LOOP gt 14 2 How the script works 79 FARGO3D User Guide Release 1 1 ifdef Z for k 0 k lt size_z k endif ifdef Y for 3 0 j lt size_y j endif ifdef X for i 0 i lt size_x i endif lt gt 11 1 pres 11 dens 11 x cs 11 x cs 11 lt gt ifdef X endif ifdef Y endif ifdef Z endif lt MAIN_LOOP gt As you see all the main blocks are identified by some special comments C comments on one line begin with but also there are two additional blocks We can make an abstract portrait of a general FARGO3D s C mesh fun
144. s you see you have all freedom to implement your own rules within this routine with tests similar to those already written It would be probably better to have tests using an environment variable or to use ifdef directives which would use some variable defined in the platform specific section of the makefile We might implement such features in the future The device eventually adopted by the process is as follows e If an explicit rule is defined for your platform the device defined in this rule is adopted e If you specify explicitly the device with the D option on the command line the device thus chosen has priority in any case in particular it overwrites the device given by your platform rule if any Note If your run is MPI and you use option D a warning is issued since all your processes run on the same GPU e If no rule is found for your platform and you have not specified any device on the command line CUDA chooses the device for you the rules for this selection are those of the function cudaChooseDevice DO NOT RELY ON THIS AUTOMATIC SOLUTION to decide for you in a MPI run The different pro cesses will see that device 0 is available when they enter simultaneously the function select_device and they will all select this device Finally a message is issued in any case stating the process rank and the device on which it runs 13 3 3 CUDA aware MPI implementations As advertised earlier recent implementations o
145. sothermal equation of state int 1 3 for k 0 k lt Nz 2 NGHZ k for 3 0 j3 lt Ny 2 NGHY j for i 0 i lt Nx i pres 1 dens 1 x cs 1 cs 1 Note Note that the lines of code above do not evaluate nor define 1 which is used straight out of the box since it is a preprocessor macrocommand 6 3 Working with fields A field structure is defined as follows in src structs h struct field char xname real field_cpu real xfield_gpu y where we have stripped the definition of all extra lines not relevant at this stage The name is a string that is used to determine the name of output files field_cpu is a pointer to a double or float 1D array which has been duly allocated on the RAM prior to any invocation Similarly field_gpuis a pointer to a double or float 1D array which has been duly allocated on the Video RAM prior to any invocation The user should never have to invoke directly this field Rather C files will always make use of the field_cpu which will be automatically translated to field_gpu as needed during the C to CUDA conversion Acceding a field value is generally done as follows 32 Chapter 6 Mesh and Fields FARGO3D User Guide Release 1 1 struct Field Density Definition at the beginning of a function real density veal is either double or float density Density gt field_cpu later on in a loop density 1l Note Note that we define an
146. t it must preserve dependencies 112 Chapier 20 Tips Tricks Todos and Troubleshooting FARGO3D User Guide Release 1 1 This may be frustrating as you cannot have a look at the CUDA files or boundary source codes produced by the Python scripts This sometimes can be useful to understand unexpected behaviors Here we indicate the manual procedure to produce all the intermediary files used during the build process Do not forget to remove them prior to a full build of the code or dependencies may be broken Creation of var c In src issue python scripts par py setups mri mri par std stdpar par Naturally substitute the mri setup in the above line with your own setup Creation of param h param_noex h global_ex h In src issue python scripts varparser py Creation of rescale c In src issue python scripts unitparser py mri The rescale c file is produced in the src directory Creation of boundary source code In src issue python scripts boundparser py std boundaries txt std centering txt setups mri mri bo If you issue the command 1s 1tr you will see new files with name y z min max _bound c that you can exam ine Note that since periodic boundary conditions are not dealt with as other BCs but rather with communications they do not generate such files For instance with the mri setup you only have the y files not the z files Creation of the CUDA source code It can be d
147. tation rate has rotated in total po The inclination 7 of the orbit in radians 9 the longitude w of the ascending node with respect to the actual x axis 10 the position angle a of perihelion the angle of the projection of perihelion onto the x y plane with respect to the actual x axis Note that in the limit of vanishing inclination we have axw y The information of column 7 is very useful to determine precession rates whenever the frame is non inertial For instance the precession rate of the line of nodes is given by d Y w dt Note The file s planet i dat are emptied every time a new run is started This is because these files are needed for restart so we want to avoid that out of date incorrect information be used upon restart In contrast lines accumulate in the files orbit 1 dat and bigplanet 1 dat until those or the directory containing them are manually suppressed 62 Chapter 12 Outputs FARGO3D User Guide Release 1 1 12 6 Datacubes All primitive variables density velocity components internal energy density or sound speed for isothermal setups and magnetic field components for MHD setups are written every NINTERM steps of length DT each of those being sliced in as many timesteps as required by the CFL condition In addition some selected arrays can be written every NSNAP steps of length DT These arrays names are controlled by the boolean parameters WriteDensity Writ
148. ternal SelectDevice function you will be able to work with FARGO3D on a cluster of GPUs You could try to run with MPI over one GPU Of course it is a bad idea for performance reasons but it shows if all the parallel machinery inside FARGO3D works Actually for developing an MPI CUDA code you do not need more than one thread and one GPU card Try the following make mrproper make PARALLEL 1 gpu mpirun np 2 fargo3d m setups fargo fargo par If all goes fine that is to say if the output looks correct FARGO3D is working with MPI and GPUs The next step is to configure it to run on a cluster of GPUs using as many different GPUs as possible We will learn how to do that later on in this manual 12 Chapter 2 First Steps CHAPTER THREE DIRECTORY TREE FARGO3D was developed as a general code It solves a set of coupled differential equations on a mesh Because 1t was developed as a general solver it is necessary to keep certain general features isolated from other more specific ones that correspond to a specific problem A simple example of that is the initial condition IC of a problem Obviously the IC is a problem dependent feature and a mechanism is needed to keep it isolated from another problem Touching the main structure of the code only to change the IC is not a good idea yes you can but this is ugly error prone and wreaks havoc with versioning systems This kind of problem is solved using the c
149. the FARGO_OUT environment variable is defined for instance in your job script or in your bashre file its value substitutes the sign in your output path If for instance you have OutputDir mri betal50 in your par file and you have defined in your bashrc file the following export FARGO_OUT data myname fargo3d then your run will output its data in data myname fargo3d mri betal50 This trick is version control friendly you define once for ever the sub directory where you want your data to be and the prefix part is specified in the environment variable out of the version control so that if different persons work on the parameter file it will not trigger a sequence of different versions You may have on one platform export FARGO_OUT data2 pablo fargo3d and on another one export FARGO_OUT scratch3 frederic fargo3d and the same parameter file may be used without any editing 20 8 see that there are par files in each setup directory and the same par files are found in the in sub directory What is this for The par files found in the setup directories are necessary to build FARGO3D a python script uses them to determine the set of global upper case variables that will be available everywhere throughout the C code with the 20 7 What is this O sign at the beginning of the outputs path 111 FARGO3D User Guide Release 1 1 value that the user defines for them in his par f
150. thermal fluid with periodic boundary conditions in X as required in FARGO3D and reflecting boundary conditions in Y We have three primitive variables to which we must apply BCs Those will only apply in Y not in X because of the periodicity nor in Z because we have a 2D X Y setup The setup bound file should therefore look similar to Density Ymin SYMMETRIC Ymax SYMMETRIC VX Ymin SYMMETRIC Ymax SYMMETRIC Vy 5 7 SETUP bound 29 FARGO3D User Guide Release 1 1 Ymin ANTISYMMETRIC Ymax ANTISYMMETRIC where Vy is ANTISYMMETRIC in Y because we have reflection in the y direction At build time the script boundparser py goes through this file The first three lines instruct it to generate C code to implement a boundary condition for the density field in ymin and ymax It goes to the definition of SYMMETRIC found in std boundaries txt Two definitions are available one for centered fields the other one for staggered fields In std centering txt it finds that the density is centered in y and therefore generates the C code corresponding to the centered case The same thing occurs for Vx it finds that this field is centered in Y Finally the last three lines instruct it to generate C code for an ANTISYMMETRIC boundary condition of Vy in Y It finds in std centering txt that this field is staggered in Y and generates the C code corresponding to the definition value 0 value The C func
151. tions thus produced contain all the comments required to further conversion to CUDA should the user request a GPU built 2D YZ reflecting problem Now a more complex example with all directions yet in 2D Density Ymin SYMMETRIC Ymax SYMMETRIC Zmin SYMMETRIC Zmax SYMMETRIC VX Ymin SYMMETRIC Ymax SYMMETRIC Zmin SYMMETRIC zmax SYMMETRIC Vy Ymin ANTISYMMETRIC Ymax ANTISYMMETRIC Zmin SYMMETRIC zmax SYMMETRIC Vz Ymin SYMMETRIC Ymax SYMMETRIC Zmin ANTISYMMETRIC zmax ANTISYMMETRIC Note Note how the staggering is implicit in Vy Vz boundaries An extensive list of examples can be found in the setups directory 5 8 Common errors This section will be developed later from users feed back The boundparser py script is at the present time a bit taciturn and may silently ignore errors which might be a bit difficult to spot afterwards Among them e Incorrect names or misprints e An incorrect centering 30 Chapter 5 Boundaries CHAPTER SIX MESH AND FIELDS 6 1 Mesh The mesh consists of NX cells in X hence azimuth in cylindrical and spherical geometries NY 2 NGHY cells in Y radius in cylindrical and spherical geometries and NZ 2 NGHZ cells in Z colatitude in spherical geometry Here NGHY and NGHZ stand for the number of ghost or buffer zones next to the active mesh If a direction is not included in the setup for i
152. to restart the run Each normal output could be used as a restart file It works for Dat amp VTK files The output files used for the restart are separate i e they have the shape fieldn_m dat and must have been produced by a run without the n flag S Restart from merged files Same as before It works for Dat amp VTK files The only difference is that the output files used for the restart are merged 1 e they have the shape fieldn dat and must have been produced by a run with the m flag S 89 FARGO3D User Guide Release 1 1 Restart and expand a run along the X dimension This is typically used to run a prior 2D calculation radius colatitude or Z to enable a disk to relax toward some equilibrium state Once this equilibrium state has been reached the data is used to make up axisymmetric three dimensional data cubes Note that this is not considered as a restart inside the code but rather a fresh start except that the arrays are not filled with the initial condition but rather with the data at output number n expanded as necessary in X fargo3d m initial_2D_run par fargo3d S 100 m o Nx 628 initial_2D_run par In the above example the 2D output number 100 is used to fill the arrays The second run is this time 3D as the number of zones in X azimuth is now larger than one Since technically this is not a restart the run will output its data at numbers 0 1 etc as for any fresh start The flag
153. ts have a well known value if you are not convinced try to work out what is the value of Stefan s constant in a system of units where the solar mass is the mass unit one astronomical unit the length unit such that G the gravitational constant has value one and where the ratio of the ideal gas constant over the mean molecular weight has also value one Finally the output of FARGO3D may be used by third party codes such as radiative transfer codes that produce a simulated image and having the data in a standard unit system may prove useful Specifying the unit system The unit system must be specified at build time The different systems are defined in the file named fondam h The unit system used to build the code depends on whether the preprocessor variable MKS or CGS is defined If none of them is defined a trivial unit system dubbed scale free is adopted From the makefile activating one or another of these preprocessor variables is done as follows make UNITS MKS or make UNITS CGS Finally to use the scale free system of units issue make UNITS 0 47 FARGO3D User Guide Release 1 1 Note As other build options the UNITS flag is sticky is keeps implicitly its previous value until it is changed explicitly We note that specifying the unit system in the FARGO3D code is done by giving a numerical value to five con stants that have linearly independent powers of M L T 0 and I mass length
154. tum in cylindrical and 64 Chapter 12 Outputs FARGO3D User Guide Release 1 1 spherical coordinates This example also shows that a given variable may be used simultaneously for different flavors of monitoring MOM_X the angular momentum is used both for scalar monitoring and 2D maps The answer to the second question above file naming conventions is as follows e Unique files are written directly in the output directory Their name has a radix which indicates which quantity is monitored e g mass momx etc then a suffix which indicates the kind of integration or averaging performed _1d_Z_raw or _1d_Y_raw or nothing for scalar monitoring and the extension dat e Monitoring flavors that require new files at each fine grain output do not write the files directly in the output directory in order not to clutter this directory Instead they are written in subdirectories which are named FGO00 like Fine Grain plus the number of the current coarse grain output with a zero padding on the left In these directories the files are written following similar conventions as above plus a unique zero padded fine grain output number It is a good idea to have a look at one of the outputs of the public distribution choose a setup that requests some monitoring by looking at its opt file in order to understand in depth these file naming conventions e Some monitoring functions depend on the planet such as the torque
155. u should be able to develop any complex kernel First we will keep a clean version of FARGO3D In order to do that you must to copy the file std func_arch cfg to your setup directory setups explosions Also we will need the file src change_arch c cp std func_arch cfg setups explosions cp src change_arch c setups explosions We will alter the files of the setup directory but we will leave the files of the main distribution untouched If we issue an Is command inside setups explosions we should see something similar to algogas c condinit c explosions bound explosions opt explosions units change_arch c edamp c explosions objects explosions par func_arch cfg main c Now we will add the prototype of this function in src prototypes h After the line ex void init_var charx charx int int char add ex void Edamp_cpu real and after the line 104 Chapter 19 Developing a complex setup FARGO3D User Guide Release 1 1 ex void addviscosity_sph_gpu real add ex void Edamp_gpu real Warning The declaration of the _gpu must not be in the same block as the _cpu functions otherwise the code will not build Keep them grouped as they are Note We cannot copy the file prototypes h to out setup directory because in the present version header files are parsed from the src directory However it is harmless to declare extra functions in prototypes h If they are unused w
156. unications which are a clue that our new function still runs on the CPU It is because the funch_arch cfg file is taken by default from std In order to change that you must include the value of the FuncArchF ile parameter in explosions par FuncArchFile setups explosions func_arch cfg Now if you run the code you will see lines similar to OUTPUTS 0 at Physical Time t 0 000000 OK TotalMass 2 0000000000 19 4 Incorporating our kernel 105 FARGO3D User Guide Release 1 1 ELE And we have no more communications device lt gt host between outputs The means that actually we still have some less expensive communications between host and device for the periodicity This can be avoided with a proper build but no further implementation is required at this stage See CUDA aware MPI implementations Field gasdens Figure 19 1 A snapshot of the density field with the explosions setup 19 5 Using FARGO_DEBUG We can use the macrocommand FARGO_DEBUG to check that the GPU kernel and its CPU counterpart yield same results to machine accuracy Here this is done as follows you simply have to wrap the invocation of Edamp dt in algogas c within the macrocommand As for the macrocommand FARGO_SPEEDUP presented in the section GPU vs CPU Benchmarking we need to insert a comma at the end of the function name so as to help the C preprocessor which cannot do string analysis 106 Chapter 19 De
157. veloping a complex setup FARGO3D User Guide Release 1 1 FARGO_DEBUG Edamp dt lt lt Notice the comma We then compile the code with a GPU built make SETUP explosions GPU 1 and run it fargo3d m setups explosions explosions par KKKKKK Check point created KKKKKK Executing Edamp_cpu dt Dumping at 999 divb Emfy bz by bx ORight gasenergy gasdens Pressure Qs DensStar potential Moment KKK KKK Secondary Check point created KaK KKK KKK KKK Check point restored KKKKKKK Executing Edamp_gpu dt Dumping at 998 divb Emfy bz by bx ORight gasenergy gasdens Pressure Qs DensStar potential Moment List of fields that differ Skipping comparison of field Emfz used as a temporary work array in file as declared at line 0 Skipping comparison of field DivRho used as a temporary work array in file src reduction_generic_gpu cu as declared at line 86 What the code does is as follows e Prior to entering Edamp_cpu dt it creates a check point of all HD MHD arrays e It then executes Edamp_cpu dt and dumps all arrays with the arbitrary output number 999 so that we can examine it in case of problem All arrays are dumped not only the primitive variables e It creates a secondary checkpoint with all the data updated by Edamp_cpu dt e It rewinds the execution flow by restoring the first check point created prior to the execution of Edamp_cpu
158. verage is done over the two other dimensions only so as to get respectively radial or vertical profiles in each output Besides the 1D monitoring comes itself in two flavors a raw format for which a unique file is written in which a row of bytes is appended at every fine grain output This file can be readily used for instance with IDL using openr amp readu commands or Python using numpy s fromfile command For example this allows to plot a map of the vertically and azimuthally averaged Maxwell s tensor as a function of time and radius This map allows to estimate when the turbulence has reached a saturated state at all radii In another vein one can imagine a map of the azimuthally and radially averaged torque which provides the averaged torque dependence on time and on z 12 6 Datacubes 63 FARGO3D User Guide Release 1 1 a formatted output In this case a new file is written at each fine grain output It is a two column file in which the first column represents the Y or Z value as appropriate and the second column the integrated or averaged value The simultaneous use of both formats is of course redundant They have been implemented for the user s convenience e 2D monitoring In this case the integral or averaging is performed exclusively in X or azimuth so that 2D maps in Y and Z of the quantity are produced In this case a new file in raw format is written at each fine grain output 12 7 3 Mon
159. way to the user to define a boundary prescription in as user friendly a manner as possible Let us begin with a simple example Assume that we want to define a boundary condition that we call SYMMETRIC which simply consists in copying the data of an active cell into the ghost zone We represent schematically what is intended on this diagram A N ghost active zone zone The active zone contains a value that we call arbitrarily active active A A ghost active zone zone We want to set the value of the ghost zone to the same value that is we want 26 Chapter 5 Boundaries FARGO3D User Guide Release 1 1 active active N N ghost active zone zone Therefore we represent this boundary condition with the following rather intuitive line of code active active Assuming for the moment that this boundary condition applies to centered variables we would finally write the following piece of code in boundaries txt to define our SYMMETRIC boundary condition SYMMETRIC Centered active active The right cell always represent the active cell and the left cell the corresponding ghost cell The string active could be actually any string SYMMETRIC Centered value value If we wish to have an anti symmetric boundary condition that is to say that we set the ghost value to the negative of its active counterpart active active Should we want to s

FARGO3D User Guide

Contents

Download Pdf Manuals

Related Search

Related Contents