Home
Samβada: User manual - LaSIG
Contents
1. Figure 15 Parameter file for Supervision in case molecular data of fig is stored in mol data txt and is to be split in blocks of four markers NAME M4 M7 M8 M9 M16 M17 MIS ID1 1 1 1 0 1 1 1 ID2 0 0 0 0 0 0 NaN ID3 0 1 0 0 0 0 1 ID4 0 0 1 1 0 1 1 ID5 0 0 1 0 0 1 0 ID6 0 1 0 0 0 1 0 a File b File c File mol data env txt mol data mark 0 0 txt mol data mark 1 4 txt Figure 16 Molecular data of fig a after splitting in block of four markers In this case the column NAME is considered as an environmental vari able since it is not a molecular marker NAME will be copied to the file mol data env txt which will not be used for SamBada s analysis The ac tual environmental data env data txt is shown on fig The column 18 NAME cannot be used as an identifier since it was not copied to the molecu lar data Thus it must be indicated as a supplementary column to SamBada option COLSUPENV see p 10 Example 2 Supervision can split combined data files containing both environmental and molecular information Figure shows the parameter file used to split data from fig 5 stored in combo data txt in blocks of three markers The splitting is performed as follows Supervision param split combo txt The data files will be named combo data env txt combo data mark 0 0 txt combo data mark 1 3 txt and combo data mark 2 6 txt the latter will have only one column see fig 18 In this cas
2. I am lost with the Command Line Don t panic there are a couple of online tutorials For Windows you can start with Hint Use PowerShell instead of the Command Line Mac OS and Linux Terminal are basically the same see for instance http www davidbaumgold com tutorials command line My results do not make any sense Check whether the samples in the environmental data are in the same order as those in the molecular data What could have happened is that one of your files got sorted by some column during data preparation To be continued References Anselin Luc 1995 Local Indicators of Spatial Association LISA Ge ographical Analysis 27 2 GISDATA Geographic Information Systems Data Specialist Meeting on GIS Geographic Information Systems and Spatial Analysis Amsterdam Netherlands Dec 01 05 1993 93 115 Joost St phane Aur lie Bonin Michael W Bruford et al 2007 A Spatial Analysis Method SAM to detect candidate loci for selection towards a landscape genomics approach to adaptation Molecular Ecology 16 18 3955 3969 Purcell Shaun 2009 PLINK 1 07 purcell plink Stucki Sylvie 2014 D veloppement d outils de g o calcul haute perfor mance pour l identification de r gions du g nome potentiellement soumises la s lection naturelle analyse spatiale de la diversit de panels de poly morphismes nucl otidiques haute densit 800k chez Bos taurus et B indicus en O
3. Results files are named as follows constant models are saved in the file data Out 0 ext univariate models in the file data Out 1 ext bivariate models the file data Out 2 ext and so on 0 Success 1 Exponential divergence X B is diverging 2 Singular matrix impossible to invert the information matrix 3 Too large 8 divergence 4 Maximal number of iteration number reached without convergence 5 Monomorphic marker appears in the output file for constant models 6 Significant model with non significant parents multivariate analysis with option SIGNIF Figure 10 List of possible errors for logistic models Marker Loglikelihood AverageProb Beta 0 NumError Hapmap43437 BTA 101873 AA 228 2100569 0 082089552 2 414289083 0 Hapmap43437 BTA 101873 AG 542 450042 0 404228856 0 387875415 0 Hapmap43437 BTA 101873 GG 556 9893006 0 513681592 0 054740033 0 ARS BFGL NGS 16466 AA 44 84132815 0 009950249 4 600157644 0 ARS BFGL NGS 16466 AG 389 8189189 0 189054726 1 456164041 0 ARS BFGL NGS 16466_ GG 401 2120224 0 800995025 1 392524911 0 Hapmap34944 BES1 Contig627 1906 AA 456 4590694 0 254975124 1 072251619 0 Hapmap34944 BES1 Contig627 1906 AC 555 856645 0 470149254 0 119545151 0 Hapmap34944 BES1 Contig627 1906 CC 472 7907257 0 274875622 0 970024485 0 Figure 11 Exemple of Samfada s results for constant models there is one marker per line The first column is the name of the molecular marker here th
4. ID1 46 972 ID2 32 987 ID3 32 987 IDA 32 987 ID5 32 987 ID6 35 102 ENV3 ENVA ENV5 236 230 132 238 232 83 238 232 83 NaN 232 83 238 232 83 235 230 87 a File mol data env txt M4 M7 M8 1 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 b File combo data mark 0 0 txt M9 Ml16 MI7 1 1 eo Svc X C c File combo data mark 1 3 txt M18 NaN CoO d File combo data mark 2 6 txt Figure 18 Molecular data of fig b after splitting in blocks of three markers 20 HEADERS YES HEADERS YES NUMVARENV 6 NUMVARENV 6 NUMMARK 4 7 NUMMARK 3 7 NUMINDIV 6 NUMINDIV 6 COLSUPENV NAME COLSUPENV NAME DIMMAX 1 DIMMAX 1 SAVETYPE END ALL 0 01 SAVETYPE END ALL 0 01 a param a txt b param b txt Figure 19 Parameter files for distributed analysis with SamBada File param a txt is used for data from fig 3 and Usually there are more blocks of molecular data and all blocks containing the same number of markers would be analysed with this set of parameters File param b txt is used for the last block of molecular data when it contains less markers data from fig 3 and 16c The analysis is launched with two commands Sambada param a txt env data txt mol data mark 0 0 txt Sambada param b txt env data txt mol data mark 1 4 txt Example2 Fig 20 shows the parameter files needed to analyse molecular data from fig and d wi
5. be recoded These variables will be copied to a separate file If the molecular file contains no ID or environmental column numEnv is to be set to 0 Please vagis org 16 Environmental data Molecular data Supervision merge Figure 13 Workflow of analysis for distributed computing Rectangles stand for data and round cornered figures stand for programs Grey elements are mandatory and white ones are optional Arrows show processing order for environmental data dashed line and for molecular data solid line Supervision is used before SamBada to split molecular data in blocks and afterwards to merge results 17 also note that the fifth parameter counts the total number of lines including the header line if any New data files are named automatically The molecular file names con tains two numbers the first one is the block number starting from 0 and the second one is the number of the first marker in the block starting from 0 in the first block Example 1 Let s assume the data of fig 4 is stored in mol data txt and that we want blocks of four markers The corresponding parameter file is shown on fig 15 and the splitting is performed as follows Supervision param split txt The resulting data files mol data env txt mol data mark 0 0 txt and mol data 1 4 txt are displayed on fig mol data txt thingy txt 1 numEnv 7 numMark 7 numLines 4 blockSize
6. the spatial autocorrelation takes some time to compute Provided you have the file tree shown on fig 1 if you launch a shelf and navigate to the folder RandomData or SubsetCattleSNP examples can be run using Binaries Sambada exe input Sample100 txt Samplei00 txt or Binaries Sambada exe paramTest txt TableEnvUG csv extrait _marq txt Sambada Manual pdf Binaries Sambada exe Supervision exe RecodePLINK exe Examples DataFromManual OneDataFile combo data txt i param combo txt TwoDataFiles 1 env data txt 1 mol data txt I param txt RandomData input Samplei100 txt Samplei00 txt SubsetCattleSNP extrait marq txt paramTest txt TableEnvUG csv Figure 1 Suggested file tree to run the examples Command Line or PowerShell on Windows Terminal on Mac OS and Linux see p 24 3 3 Analysis overview Three programs are available Samfada processes univariate and multivariate logistic models for the land scape genomics analysis and optionally measures the spatial autocor relation in environmental and molecular datasets Supervision can split molecular data in blocks in order to run the analysis on several computers and can merge the results afterwards RecodePLINK can translate molecular data from PLINK s to SamBada s format The user must provide SamBada with a parameter file to set up the analy sis as well as environmental and m
7. there are two input files NUMMARK counts the total num ber of columns in the molecular data file For distributed analysis this parameter must indicate the number of markers for the current block of data followed by the total number of marker In this case NUMVARENV NUMMARK is the total number of columns in the molecular data file The first NUMVARENV columns contain environmental data and the following NUMMARK columns hold molecular markers The total number of markers is used to compute the Bonferroni correction Thus ignored data should be excluded of this total NUMINDIV Mandatory int Number of samples included in the data file s Active and inactive columns IDINDIV COLSUPENV COLSUPMARK Optional string or int Name s of the column s containing sample identifierg These optional identifiers are used to label samples in the output files for local spatial autocorrelation otherwise the line numbers are used If there are two data files two names or numbers can be provided the first one is for the environ mental data and the second one is for molecular data The identifier columns are automatically set as inactive Moreover if this option is specified with two data files the two identi fiers must match on each line Sample must be in the same order in each file Optional string or int Name s of the column s in the environmental data to be excluded from the analysis These columns are
8. EST 20 AUTOCORR BOTH MARK 1000 DIMMAX 1 SAVETYPE END BEST 0 01 Figure 6 Example of a parameter file for setting up Samfada s analysis Each line contains an option for the computation those marked with a sign in the margin are mandatory The line order has no influence In this ex ample the two first lines indicate that data files contain a header line and that columns are separated by spaces The next lines state the number of environmental variables the number of molecular markers and the number of individuals samples The option IDINDIV indicates which columns con tain identifiers of individuals here environmental and molecular data are recorded in two separated files The next two lines address the measure of the spatial autocorrelation with the coordinates names which are spherical the weighting scheme and the bandwidth here the 20 nearest neighbours are taken into account The analysis will include both global and local auto correlation BOTH of molecular markers MARK and the significance will be assessed with 1 000 permutations The next option means that the detec tion of selection signatures will rely on univariate models DIMMAX 1 The last ligne indicates that results will be stored at the end of the process that only significant models with a significant parent will be stored and that the threshold for significance is set to 1 before Bonferroni s correction HEADERS YES NUMVARENV 6 NUMMARK 8 NU
9. MINDIV 6 IDINDIV NAME DIMMAX 1 SAVETYPE END ALL 0 01 Figure 7 Parameter file to analyse data from fig 3 and 4 param txt NUMVARENV and NUMMARK count the total number of columns in the data files HEADERS YES NUMVARENV 6 NUMMARK 7 NUMINDIV 6 IDINDIV NAME DIMMAX 1 SAVETYPE END ALL 0 01 Figure 8 Parameter file to analyse data from fig 5 param combo txt Sample names are provided once thus there is one molecular column less Sambada parameterFile envFile molecularFile Therefore examples 1 and 2 would be launched with Sambada param txt env data txt mol data txt and Sambada param combo txt combo data txt respectively Futher examples are provided on p 5 1 3 List of options This section presents the available parameters for SamBada The options are presented in the following way the first line shows the parameter name whether it is mandatory the list of possible values or the expected type in parenthesis and the default value The paragraph is completed by a description of the option Data files and format INPUTFILE Optional string Name s of the data file s If there are two files indicate first the environmental file then the molecular file This information may also be given as an argument to the program OUTPUTFILE Optional string Base name s for the results file s If this option is omitted the output files will be named after the molec
10. Sam ada User manual Sylvie Stucki and St phane Joost April 13 2015 Version v0 5 1 Contents 1 What is Samfada 2 2 Installation 2 3 Analysis overview 4 4 Data format 5 6 5 1 Samada lt u caw wb w bw b e 8E as se oos 6 D 4 Supervision a 41 wass k oe dh Suk 974 S Sw ee 16 9 9 RecodePLINK 00000 eee eee 23 6 Frequently Asked Questions 24 24 Laboratory of Geographic Information Systems LASIG School of Civil and En vironmental Engineering ENAC Ecole Polytechnique F d rale de Lausanne EPFL B timent GC Station 18 1015 Lausanne Switzerland Webpage lasig epfl ch sambada Contact sylvie stucki a3 epfl ch stephane joost epfl ch 1 What is SamfBada Samfada is an integrated software for landscape genomic analysis of large datasets The key features are the study of local adaptation in relation ship with environment and the measure of spatial autocorrelation in en vironmental and molecular datasets On one hand SamBada uses logistic regressions to estimate the probability that an individual carries a specific genetic marker given the habitat that characterises its sampling site One the other hand spatial autocorrelation is measured with Moran s I and lo cal indicators of spatial association in order to assess whether the observed data in each location depends on the values in the neighbouring locations Underlying models are kept simple to put emphasis on process efficiency and use
11. To oxox ogeN ATO Sr 0 ATO ATO 6T 0 8T 0 8T 0 810 070 070 810 6T 0 6T 0 0c 0 0c 0 070 070 070 070 IGO IGO IGO TT 0 TT 0 2 0 2 0 ougxoo ero ATO PrO ero STO PTO iam PT0 9T 0 9T 0 STO STO STO 9T 0 9T 0 9T 0 9T 0 9T 0 9T 0 LTO ATO LTO STO 6T 0 6T 0 6T 0 py oppeto PTO TO PTO T 0 STO PTO STO STO 9T 0 9T 0 STO STO STO 9T 0 9T 0 9T 0 LTO 910 LTO TO STO STO 6T 0 6T 0 6T 0 6T 0 Uuoppe qoIN LTO T0 8T0 LTO 020 610 610 610 020 IcO0 8T0 020 020 Ic0 Ic0 cc 0 IcO0 IO IcO0 IcO0 c0 c0 PTO PeO Ere gZ 0 uoyy C O COCO OCO CO CO OO O OO O OO U OOO OO OO U OO U OO OO O OO O OO OO O O OO O O O O O 1orrqumN T6901 IC TIT 69 T1T 0 6TT TL TUT OV EST 89 GcT I9 86T 06 8cT 4 6cT 6 08T 46 0 I 86 08T T0 9 I AC LET 6G LET TT 8ET ET 8ET SS 8eT 66 IVI I9 9vT 68 9vVT 09 2vVT F SPT OL TST GL TST aIOISPTRM T9871 v6 09T O EST 66 LV1 0 829T 8C 6GI STY 2GI 90 T9T 09 611 I9 08T 8T 9I TS 99T TL V9T EV LLT LLL 68 641 SY 84I 66 6LT 6T T8T T0 98T T e61 68 6I c8 006 6 606 49 806 68 806 910287 LA TLb AC 18 11 097 06 427 02 997 GS PLP 84 LGP 9 89rv ZE 9GP 20 Agv 9g 8ery LTYSv 68 69P 8T 09F Ie IGp 0 c9rv ec ISv SE LGP Sp aey Ob vrY 18 097 ererr YO OPT 96 G r L TPY Irervp poourmexmsoT otq otq Orq 92o1d Or
12. at is read by any GIS software for instance QuantumGIg gt 5 2 Supervision For large molecular datasets computation workload may be distributed among several computers To this end Supervision is called prior to the analysis to split the molecular dataset in blocks see fig 13 The new files are named automatically on the basis of the molecular data file The last file will contain less markers if the total number is not divisible by the block size Each share of data is processed separately either on the same multi core computer or on distrinct computers Environmental data must provided to each processing node Results are gathered afterwards so Supervision can merge them and produce the same output as if the whole analysis were run on a single node 5 2 1 Split process Information about splitting are provided by a parameter file and the pro cessed is launched as follow Supervision parameterFile The parameter file must contain the six following lines in the same order dataFile name of the data or molecular file beware of trailing tabulations paramFile name of the parameter file not used yet numEnv number of environmental parameters numMark number of molecular markers numLines number of lines in the data file blockSize size of blocks of molecular data Figure 14 Instance of parameter file for Supervision numEnv indicates the number of non molecular columns in the file to
13. e combo data env txt will contain environmental data and can be used in Samfada s analysis As in the previous example the column NAME cannot be used as an identifier since it was not copied to the molecular data Thus it must be indicated as a supplementary column to SamBada option COLSUPENV see p 10 combo data txt thingy txt 6 numEnv 7 numMark 7 numLines 3 blockSize Figure 17 Parameter file for Supervision in case molecular data of fig is stored in combo data txt and is to be split in blocks of three markers 5 2 2 Analysis with Samfada The distributed analysis follows the same process as the single node one The parameter file has to be modified to include both the current and the total number of molecular markers parameter NUMMARK see p B Thus if the last block has less markers than the other ones there will be a common parameter file for the first blocks and another one for the last block The total number of markers is used to adjust the significance threshold with the Bonferroni correction if relevant Example 1 Fig 19 shows the parameter files needed to analyse molecular data from fig and with environmental data from fig For comparison the parameter file for single node analysis is shown on fig Beside the change in the number of markers the column NAME has to be indicated as supplementary data COLSUPENV since it is not available in the molecular files 19 1 NAME ENV1 ENV2
14. e locus name combined with the allele name The following columns are the log likelihood the frequency of the marker the estimate of parameter 6o for the logistic model and the error code 0 if success Constant models are not sorted and thus are in the same order as the markers in the input file When considered markers are SNPs like here there are three binary markers per locus Concerning the measure of spatial autocorrelation results are stored separately for environmental data and molecular markers In each case there are three output files The first one is named Data AS Env ext or Data AS Mark ext and stores Moran s I and local indicators of spatial as sociation Anselin 1995 If provided the sample names appear in the first column If both Moran s I and LISAs are computed the first line is the global index and each subsequent line is a local index samples are in the same order as in the data file or Data AS Mark Sim ext and stores the simulated values of the global The second file is either named Data AS Env Sim ext 14 7e ggg Z gP s1ojourered uorsso18o1 10 pue g AUG g AUH se qerreA ejuouruoJrAUO 10 suumn oo euorgrppe urejuoo s opour ojerreArj nur 10 Y sj nso3 o qerreA eguouruodrAuo oj Surpuodso roo ouo pue 1ojourered juejsuoo ouo uorssolgdol OY 10J g S1ojoure red og urejuoo uurn oo jse OM ou TPM se uor uomwuofui UDISAIDg OG pue uo1i9n49 uow vwuofui axvoxV QIY ey sepn our srsAq
15. ental data must be provided in the first columns left part of the tabular and molecular data in the last columns right part The dentifier if any is considered as an environmental variable 5 Program use 5 1 Samfada 5 1 1 Input files The required input for Samfada consists of environmental and molecular data formatted as explained in sec 4 A parameter file is needed as well to set up the analysis The parameter file contains one line per parameter they can be specified in any order Each line begins with the name of the current parameter followed by the values separated by spaces Some parameters are mandatory otherwise the entire line may be omit ted Any line beginning with a hash character will be ignored Fig 6 presents a working parameter file Example 1 Let s assume we want to analyse mol data txt fig 4 with env data txt fig B The simplest parameter file is shown on fig Example 2 When environmental and molecular data are provided alto gether slight changes in the parameter file reflect the new data size see fig 5 1 2 Program launch Samfada is launched as follows if environmental and molecular data are stored in the same file Sambada parameterFile dataFile The command changes slightly if there are two separated input files HEADERS YES WORDDELIM NUMVARENV 24 NUMMARK 120103 NUMINDIV 804 IDINDIV short name ID indiv SPATIAL longitude latitude SPHERICAL NEAR
16. i correction when all models were saved during the analysis or to select a subset of models for instance for a post processing analysis in R selScore and scoreThreshold must 22 be provided together sortScore indicates which score is used to sort the models possibles values are G Wald the default AIC and BIC wordDelim shows the current word delimiter space is the default Optional arguments may be omitted from right to left Thus the possible sets of arguments are Supervision base name txt numBlock blockSize maxDimension Supervision base name txt numBlock blockSize maxDimension selScore scoreThreshold Supervision base name txt numBlock blockSize maxDimension selScore scoreThreshold sortScore Supervision base name txt numBlock blockSize maxDimension selScore scoreThreshold sortScore wordDelim 5 3 RecodePLINK This tools allows the recoding of PLINK s ped and map files to Samfada s format see Purcell 2009 for further information on this format RecodePLINK is called as follows RecodePLINK nbSamples nbSNPs inputFile outputFile In case only a subset of the samples are to be used the list of sample names may be provided in a separate file one name per line RecodePLINK nbSamples nbSNPs inputFile outputFile subsetFile Please note that RecodePLINK does not recognise comment lines in ped map files at the moment Please remove them before recoding 23 6 Frequently Asked Questions
17. me columns may be excluded from the analysis for instance pheno typical information stored with the environmental data If there is a single data file environmental data must be provided in the first columns and molecular data in the last ones Sample names and coordinates are consid ered as environmental data If data is split between two files sample must be in the same order in both files Missing data can be coded as any character string for instance NaN or Fig 3 and 4 are examples of environmental and molecular files Fig 5 is the combined file for the same data NAME ENV1 ENV2 ENV3 ENVA4 ENV5 ID1 46 972 236 230 132 ID2 32 987 238 232 83 ID3 32 987 238 232 83 IDA 32 987 NaN 232 83 ID5 32 987 238 232 83 ID6 35 1021 235 230 87 Figure 3 Example of environmental file env data txt NAME M4 M7 M8 M9 M16 M17 MIS ID1 1 1 1 0 1 1 1 ID2 0 0 0 0 0 0 NaN ID3 0 1 0 0 0 0 1 IDA 0 0 1 1 0 1 1 ID5 0 0 1 0 0 1 0 ID6 0 1 0 0 0 1 0 Figure 4 Example of molecular file mol data txt NAME ENV1 ENV2 ENV3 ENV4 ENV5 M4 M7 M8 M9 M16 M ID1 46 972 236 230 132 1 1 1 0 1 1 ID2 32 987 238 232 83 0 0 0 0 0 0 ID3 32 987 238 232 83 0 1 0 0 0 0 ID4 32 987 NaN 232 83 0 0 1 1 0 1 ID5 32 987 238 232 83 0 0 1 0 0 1 ID6 35 1021 235 230 87 0 1 0 0 0 1 M18 NaN oor Figure 5 Example of combined file for environmental and molecular data corresponding to fig 3 and 4 combo data txt Environm
18. olecular data The workflow is summarised on fig 2 and the data format is presented in the next section Environmental data Molecular data RecodePLINK v Supervision split lt Supervision merge Figure 2 Workflow of analysis Rectangles stand for data and round cornered figures stand for programs Grey elements are mandatory and white ones are optional Sam ada computes correlative models and spa tial autocorrelation The two other features are optional Supervision en ables distributed computing while RecodePLINK transforms ped map files to comply with Samfada s format Arrows show processing order Sam Bada input consists in environmental data dashed line and molecular data solid line The zigzag line indicates that Supervision is used before and after the main analysis 4 Data format Samfada s input consist of molecular and environmental data They can be provided as a single or two separate files Files may have any name and extension Each line provides information for an individual each column contains an environmental variable or a binary molecular marker Examples are provided on fig 4 and 5 Information about data format and analysis options are specified separately in the parameter file Data files are organised as follows the header line is optional and the column separator is up to the user Sample names identifiers are optional and so
19. olecular or combined data file which was split in blocks numBlocks and blockSize refer to the number of data blocks including the last one and to the size of the complete ones maxDimension is the maximum number of environmental parameters included in the models 1 for univariate analysis 2 for bivariate analysis Supervision merges all results discards unconverged models error num bers 1 5 and sort models according to their Wald score One output file is produced for each dimension of modeling as in the single node Samfhada s analysis If the original data file is called base name txt the results files are named base name res 0 txt base name res 1 txt base name res 2 txt Example 1 Data file mol data txt was split in two blocks of four markers and the analysis involved univariate models Supervision mol data txt 2 4 1 Example 2 Data file combo data txt was split in three blocks of three markers and the analysis involved univariate models Supervision combo data txt 3 3 1 Supervision also takes some optional arguments The complete call is Supervision base name txt numBlock blockSize maxDimension selScore scoreThreshold sortScore wordDelim selScore indicates which score s is are used to select significant models possible values are G Wald and BOTH the default scoreThreshold indicates the minimum score for which a model is considered as significant This option can be used either to apply Bonferron
20. ording to their Wald scores before saving Model selection ALL saves all models SIG ALL NIF saves significant models according to 2 lt SIGNIF the G and Wald scores and BEST saves sig BEST nificant models with at least a significant parent Significance threshold p value for options 3 double SIGNIF and BEST The Bonferroni correc tion is applied on this threshold Example SAVETYPE END BEST 0 01 UNCONVERGEDMODELS Optional Yes No No This option controls the back up of unconverged models If enabled these models are saved in a separate file with the suffix unconvergedModels Spatial autocorrelation SPATIAL Optional 5 values see descr 11 AUTOCORR SHAPEFILE 1 2 o3 5 bo 3 an or int Column name or number for longitude SPHERICAL CARTESIAN DISTANCE GAUSSIAN BISQUARE NEAREST string or int Column name or number for latitude Type of coordinates spherical or projected Type of weighting scheme see fig 9 Bandwidth of the weighting function double or int double for the first three cases in km int for the last Example SPATIAL X Y CARTESIAN BISQUARE 10 Optional 3 values see descr This entry requires the specification of SPATIAL GLOBAL Type of indices to compute Moran s I for LOCAL the global spatial autocorrelation LISA for BOTH the local one ENV MARK Variables for the analysis BOTH int Number of permutations for com
21. peue ou T ge opnesd uoisso1o1 of 10J so1nseour 31 jo ssoupooS ore suurn oo 3xou OAY oq ssooons jt 0 opoo 10116 Y pue o100s p eAA 09S O pooylfoyl soy y ore suum oo Surwo 0 IJL o qerreA equouruodrAUO Y JO oureu Y ST uumo5 puooos ou OULU JPL ILM pourquioo oureu snoo Y iox reur re noo our oY Jo oureu u1 st uurnjoo 3 amp 1 AYL oUt Jod 1oxreur ouo st r u3 s opour oyerreArun 107 sj nso1 s epegjuieg jo o duroxz zT o1nSrq vc 0 9c 0 2070 0 0 8070 c0 0 0 0 94 0 800 8070 94 0 0 0 94 0 0 0 0 0 0 0 c8 0 070 0 0 v80 980 980 0 0 0 0 0 0 070 I eg Ic 0 8661 TOT L6 T GE TT I6 I A8 T 19 0 2811 G8 IT 36 EZ 96 T 940 8 TI TUZ Z6 1 69 0 EZZ ecc 160 G4 0 4 0 88 I 98 I Z0 C YO C 0 jog 66046 0 684 0 8T6 6946 ST Ovr6 1 946 cv6 81 96 OV 66 06 06 8V VV6 60 SE6 9 996 IT LV6 82 666 9 186 07 676 e Tv6 TL LE6 9G ST6 8 866 T0 SZ6 8 906 89 868 c 0I6 86 cI6 Old ESLG VS 99L PSSO 6L ES6 6 416 02 86 A4 616 c2 0v6 v9 916 VUSIG c2 166 cI6 LL EV6 S T66 0 206 09 866 VV 906 424 816 S6 TI6 08 c68 c9 806 S86 c06 A40 Y88 C66 848 LV L98998 GZ 068 OIV 4070 oTo 4070 4070 80 0 4070 80 0 80 0 60 0 60 0 80 0 80 0 80 0 80 0 60 0 80 0 60 0 60 0 60 0 60 0 60 0 60 0 OUO oTo oTo o
22. puting the an pseudo p values default 99 Example AUTOCORR GLOBAL BOTH 999 Optional YES NO NO With this option the LISA are saved as a shapefile in addi tion to the usual output This format is composed of three files shp shx and bdf These files can be loaded together in any GIS software to map the local autocorrelation This entry requires the specification of SPATIAL 5 1 4 Output Samfada produces several output files To illustrate the naming scheme let us assume that the molecular data file is named data ext If the log is saved for future reference the corresponding file is named data log ext For logis tic regressions there is one file for constant models which are not sorted see fig 11 There is also one file per distinct number of parameters univariate bivariate trivariate models and so on see fig 12 In these files models are 12 Moving window 0 otherwise if dij lt b Wij w x Gaussian kernel 1 dij V Wij exp 25 EX w x Bisquare kernel ee h if dij lt b w x 0 otherwise N nearest neighbours 0 otherwise T E if j is amongst the N nearest neighbours of i Figure 9 Weighting schemes available for measuring spatial autocorrelation 13 sorted according to their Wald scores Fig LO lists possible errors for logistic models If the back up of unconverged models is enabled the output file is named data unconvergedModels ext
23. q 19 1d gooid apnyrye Orq Orq opnjrduo good opnyrye 2901d good 2001d apnyrye good gooid epnyrye apnyrye apnyrye 1901d 2901d 2001d 2901d I Aug 55 888ETT S N I AT SUV DD 9t 8 2 VLd ea6gcdeuideg O9 O0LeL1T V LH C94 Ty dewideg _ VV Srou 9Ig8 VLS 55 8609 S N T AT SUV DD YVGblc VLd 98Tedeuideg OO 968 V Ld Ga6gcdeudeg _ VV SCOU9IS V LH DD SS86TI SONSIOdG SHV VV OcS 2 V LH v40 Ty dewuideg D 609b SON IOdH SHV DD 0L9L11 V LH c9415dewdeg VV GVWWAc VLH 181ydeuideg _ VV Ssrou 9Ig8 VLS 55 8609 S N T AT SUV VV GpAZ VIL gI8TpdeurdeH 55 8609F SDN T 48 SHV VV Occ 4 V LH v40 Ty deuideg 55 ssseTI SDN IDAG SUV DD 0L9L11 V LH c941ydewdeg VV Occ 2 V LH v40Tydeudeg 55 SsS8611 SDN IOJH SUV DD 8609 SON IOdH SH V DD 0L9L11 V LH c941ydeudeg 55 SSSCTI SDN IOJH SUV VV OZcG V LH v20Tpdeuideg IOIEN Moran s I for each variable This file can be used to plot their distribu tion and compare it to the actual value of the index Simulation results are not stored for LISAs in order to save disk space The third file is named Data AS Env pVal ext or Data AS Mark pVal ext and stores the pseudo p values for the permutations based significance tests For R permutations and M events Isim is equal or more extreme than T the p value is Au If requested LISAs are also stored as a shapefile whose parts are named data shp data shx and data dbf for data file data ext This form
24. r available options 2 Installation Software source code documentation and examples are available on our webpage lasig epfl ch sambada Executable binaires Compiled versions of Samfada are already packaged for Windows Mac OS and Ubuntu Download and expand the archive at the location of your choice Compiling from sources Just run make in the source folder the executables are placed in the subfolder binaries Documentation Software usage is explained in this manual Theoretical background is cov ered by SamBada s release article in preparation and Joost et al 2007 Ex tended information on methods and implementation can be found in Stucki 2014 in French Examples Three sample cases are provided with the software DataFromManual is a tiny set of six samples with fives environmental variables and seven molecular markers The examples from this manual are build on this dataset in order to illustrate data format analysis workflow and distributed computing RandomSample is a random set of 100 georeferenced points with two en vironmental variables and a molecular marker The first environmental parameter is random while the second one is correlated to longitude in order to provide some spatial autocorrelation The molecular marker is random SubsetSNP Cattle contains 386 SNPs from Ugandan cattle lien appro pri They are already recoded for SamBada so there is three binary markers per loci In this case
25. set as inac tive For instance COLSUPENV can indicate columns such as the sampling date or the name of the area Optional string or int Name s of the column s in the molecular data to be ex cluded from the analysis These columns are set as inactive SUBSETVARENVOptional string or int SUBSETMARK Name s of the column s in the environmental data to be in cluded in the analysis while the other columns are set as inac tive The different options cumulate the active columns are those listed here minus those specified with COLSUPENV as well as IDINDIV Optional string or int Name s of the column s in the molecular data to be included in the analysis while the other columns are set as inactive The different options cumulate the active columns are those listed here minus those specified with COLSUPMARK as well as IDINDIV The column numbers replace their names in case there is no header line 10 Logistic model and results storage DIMMAX Mandatory int Maximum number of environmental variables included in the logistic models The models with less parameters are com puted as well Use 1 for univariate models 2 for univariate and bivariates models SAVETYPE Mandatory 3 values see descr Saving method and model selection Storage mode REAL saves results during processing END writes them upon comple REAL END tion of computation The second option en ables sorting the models acc
26. th environmental data from fig For comparison the parameter file for single node analysis is shown on fig Beside the change in the number of markers the column NAME has to be indicated as supplementary data COLSUPENV since it is not available in the molecular files HEADERS YES HEADERS YES NUMVARENV 6 NUMVARENV 6 NUMMARK 3 7 NUMMARK 1 7 NUMINDIV 6 NUMINDIV 6 COLSUPENV NAME COLSUPENV NAME DIMMAX 1 DIMMAX 1 SAVETYPE END ALL 0 01 SAVETYPE END ALL 0 01 a param combo a txt b param combo b txt Figure 20 Parameter files for distributed analysis with SamBada Data file combo data txt has been split in fig all computations use environmen tal data from subfig a File param combo a txt is used to analyse markers from subfig b and d while file param combo b txt is used for the last block of molecular data subfig ld 21 The analysis is launched with three commands Sambada param combo a txt combo data env txt combo data mark 0 0 txt Sambada param combo a txt combo data env txt combo data mark 1 3 txt Sambada param combo b txt combo data env txt combo data mark 2 6 txt 5 2 3 Merge process Once all blocks of markers have been analysed with SamBada Supervision can merge the results Copy every output file whose name contains Out to a single folder then launch the program as follows Supervision base name txt numBlock blockSize maxDimension base name txt is the name of the m
27. uganda th se de doct Lausanne Ecole Polytechnique F d rale de Lausanne DOI 10 5075 epf1 thesis 6014 24
28. ular input file The different ouput files are distinguished by adding suffixes Out AS thus the input files are untouched With this option the results can be saved in a different folder than the data HEADERS Optional Yes No No WORDDELIM LOG Data size NUMVARENV NUMMARK Presence or absence of variable names If present they are read on the first line of the data file otherwise the environ mental variables are labelled P1 P2 P3 and molecular markers are labelled M1 M2 M3 Optional char is Word delimiter it must be a single character This option applies to both molecular and environmental data while the parameter file is assumed to be space separated Optional 1 value see descr BOTH Location of log information TERMINAL rint the log on the standard output ne T T j l FILE writes the log in a file with the suffix log BOTH uses both methods Mandatory int Number of columns with environmental variables including ignored variables and the column of identifiers if any If there is one input file this counts the number of columns that don t concern molecular data If there are two input files NUMVARENV counts the total number of columns in the environmental data file Mandatory int Number of columns with molecular data including ignored data and identifiers if applicable If there is one input file this counts the number of columns concerning molecular markers If
Download Pdf Manuals
Related Search
Related Contents
BETRIEBSANLEITUNG USER MANUAL SACD 1260 R 自分でできる「喜び」と積極的に行動する「意欲」を “後押し Lanier 480-0209 User's Manual COBAN GROUP CO.,LTD GE GSM2100 User's Manual LH1208VCA/LH1216VCA エージェントハブ 取扱説明書 Istruzioni di montaggio Samsung RL56GEGSW Наръчник за потребителя Bedienungsanleitung Copyright © All rights reserved.
Failed to retrieve file