Home

6. Updates to the Network 4.6.1.1 User Guide - fluxus

1. menu select Optional Postprocessing MP Calculation The MP calculation options window will appear Leave the default radio button active Network containing all shortest trees and list of some shortest trees sufficient to generate this network Click the Start button to select the out file and run the MP calculation The MP calculation results are saved into the same folder into a file with the extension sto Calculate Network Draw network Time estimates Tools Al Optional Pre Processing Network Calculatians d Optional Post Processing MP Calculation Cbtrl e Fig 19 MP Calculation Polzin et al is recommended after every network calculation Displaying the results Select the Network Draw subprogram or start the Network Publisher software File menu Open select the sto file At the pop up window click the No button if you are interested to compare the original network with the cleaned up network When the network graphic is completed you can compare the cleaned up network with the original network by switching between the radio buttons Network and Original Network shortest Trees Tree s as Network O Original Network Fig 20 Draw subprogram After activating the radio button Tree s you can interactively display a list of shortest trees This list does not contain all shortest trees in the network The list only contains all shortest trees which are sufficient to define
2. see Fig 53 and then click all descendent nodes To un select a node click it again The selected nodes are coloured green Due to a colouring bug some pie slices are not coloured green correctly Optionally median vectors can also be selected to specify the true tree 1f known within the network Mutation rate 1 mutation every 20180 s je Step 1 Specify ancestral node Step 2 Fig 53 Calibration step 2 specify descendent nodes Step 3 Click button Calculate time Fig 53 and see next page 47 Step 4 Calibrate the calculated age Fig 54 with the known age For example if the ancestral node mv4 is known to be 25000 years old not 59045 years then the mutation rate needs adjustment from 1 mutation in 20180 years to 1 mutation in 25000 59045 20180 8544 Results Age in mutations rho statistic 2 9259 Age In years 54045 1652 years Standard deviation sigma 0 9757 8544 standard deviation in years years 19699 6109 years Fig 54 Calculated time Step 4 adjust mutation rate to fit known age Mutation rate 1 mutation every After entering the corrected the mutation rate Fig 54 click the OK button Then proceed to estimate the ages of nodes within the network 2 8 2 Age estimation of a node in the network Step 1 example Fig 55 click button Specify ancestral node then click onto node OR2 Step 2 Click button Specify descendent nodes and click onto nodes OR1 and OR3 Ste
3. Changed name length limits in 2 1 2 Preparation of variable data sets for Network 7 8 9 54 Changed instructions on entering and saving STR weights in chapter 2 1 3 Weights and in chapter 2 5 STR data short tandem repeat microsatellite data Added sub chapter 2 1 7 MP option to clean up networks Added sub chapter 2 1 8 Star Contraction option Use for network simplification or for identification of population expansion events Added sub chapter 2 1 9 Frequency gt 1 Criterion for networks with large number of taxa Added instructions for attributes entry in the Network data editor and node pie network graphics in Network Publisher in 2 2 4 1 Node and pie chart colouring in Network Publisher Deleted former instructions about renaming file extensions in 2 4 Amino acid nucleotide sequence data Added notes in chapter 2 5 STR data short tandem repeat microsatellite data Changed contact name and email in Chapter 5 Feedback 10 Several new figures are inserted many figures are renumbered 13 Updates to Network 4 2 0 1 User Guide compared to 3 April 2007 The following user guide changes on 19 September 2007 were due to an update in the Network Publisher add on software where emf format was added to solve import problems into MS Office 2003 and newer versions l Added emf format to Fig 1 General overview over the work flow 2 Added emf format to Chapter 2 2 2 Initial analyses using the MJ
4. into an intra species alignment ali file and export this as an rdf file We recommend users to first align the intra species sequences without the outgroup because the automatic alignment algorithm is designed for closely related sequences We recommend aligning just the outgroup sequence to the reference sequence in a new session to minimise manual editing work Finally merge the ali files using the Import alignment function in DNA Alignment 1 3 2 0 which can introduce additional insertions and deletions After manually checking that no alignment problems occurred in this merging step export the rdf file The outgroup sequence must be named ROOT The network calculation will ignore ROOT during network construction with the parameter External rooting active After network construction the algorithm will search for the root proxy node i e the nearest existing network node to the ROOT sequence taking the specified character weights into account Note that the actual network root may lie within an adjacent multi mutation link and some mutation re ordering along this link and the generation of a new median vector may be required Generation of a new root median vector and re ordering of mutations has not been implemented due to significant computational complexities Network Publisher 2 0 0 0 is able to highlight the root proxy node which is a useful feature in very complex networks 30 2 2 3 Discussing a
5. 00 0 ccc ceecccesecceeeeseseeeeeeeeseseees 40 AE AD NS MN shack nash ESTE E S AEE E EATE A EET AAE S E AEAT E AE EEE 40 2 5 2 Network calculation analysis interpretation and graphics ccccceecccseeeeeeeeeeeeeeees 4 2 6 Endonuclease data RFLP restriction fragment length data c ec cceccceseseeeeeeeeeeeeees 42 SUE OA Ral B21 15 1 1a mee ene ea eee re ee ee ere re pe ee ee per mare neon ane ee er poe eterno iene voene sey 42 2 6 2 Network calculation analysis interpretation and graphics cccccceecccseseeeeeeeeeeeeees 43 Ds Mes MANX NA sags Rect soe ee ht aera hate a 44 Pa fal DB E ree A E E E A E nore 44 2 7 2 Network calculation analysis interpretation and graphics ccccceescceeeeceeeeeeeeeeeees 44 7 Mee E 5 E EE E EEEE EET T A AEE EATE TEE IE IAEE AO E 45 2 8 1 Calibration of network mutation rate with a known event sssesssesseesseesssesssesssersserss 45 2 8 2 Age estimation of a node in the network ssesssesssesserssersserssersssesssrssersssrssserssersseres 47 5 SO lUware Lims a NE tWOrK 42041 Ws sicacaseiachccusen ta earorasunencaa Seconaaaccaaeaaatecaneaees muaaaruacpenaumenees 49 de NetWork 4o t I Present and FUt re siinne aa ea a a aaa 50 5 Feedback Bug Reports and Enhancement Requests ccceccccseecceeeceeeeeceeseeeeeseeseneees 5I 6 Updates to the Network 4 6 1 1 User Guide nosenssenssenssoesseesssesssesseesssesssesssersserss 52 7 Update
6. 5 p T upen EVAID30 5 AA EV31 TMi MUL AN v65 MULT EV33 MMSA83 CHU49 m A106 SA109 TIB131 SA112 CHU50 CHU40 Fig 26 ExampleRFLPWeighted rdf MJ epsilon 0 Frequency gt 1 criterion active MP Fig 27 ExampleRFLPWeighted rdf MJ calculation epsilon 0 MP 26 2 1 10 RM MJ network calculation for reduced complexity Data sets with a very large number of sequences taxa profiles can become very difficult to analyse even for MJ network calculation with epsilon 0 followed by MP see fig 27 The RM MJ technique is used to reduce network complexity as follows 1 The RM network calculation stage result rmf file splits loci on the basis of how far apart they are in the network 2 After saving the rmf file stage 2 of the RM network calculation is not required in RM parameters reduction threshold uncheck Generate list of links see Fig 15 3 The rmf file is used for the MJ network calculation Result the RM MJ network is often simpler than a pure MJ network because implausible parallelisms have been avoided in step 1 see fig 28 where additionally star contraction preprocessing has been used The RM MJ technique can be used for STR data RFLP data binary data and binarised dna or amino acid data When binarising dna or amino acid data with the DNA Alignment software please read the notes on binary rdf files in the DNA Alignment help pages TW59 2P91 MAVN33 1P103 P99 KN
7. 5584 AG CT where the RS was cut Line 2 15412k cut at position 15412 GT AC where the RS was not cut Line 3 1 number of individuals with this sequence RS Reference Sequence For this example file the Cambridge RS CRS was used 43 2 6 2 Network calculation analysis interpretation and graphics Both the rdf and tor files can be opened with the Network Calculation options Star Contraction Reduced Median Median Joining However we suggest to import tor files into the Network Data Editor see 2 6 1 check the data import and save them as rdf before continuing with the Network Calculation options If no Ns are present in the manually entered data or after import from the tor file the rdf file will consist of binary data These data can be used for RM and for MJ network calculation If Ns are present in the file they will be automatically replaced by O or See chapter 2 2 5 before the RM or MJ calculation is carried out For detailed instructions on the network calculation steps see chapters 2 2 2 2 2 5 and 2 1 10 For detailed instruction on analysis discussion and interpretation see chapter 2 2 3 For detailed instructions on graphical layout of results see chapter 2 2 4 44 2 Binary data A binary character has only the 2 states O or 1 See chapter 2 1 2 and Fig 3 Binary data can be used both by the RM and the MJ network calculation options Note that ambiguous character states are denoted with N The R
8. G C G T G Doug T C G T C 2 1 2 Preparation of variable data sets for Network You can enter small data sets using Network s data editor Start Network Data Entry menu Manual then select the data type you wish to enter Example 3 Network s data editor Consider the data set in Example 2 You can enter these data in 4 different ways 1 with the option DNA nucleotide data and nps 16091 16095 2 with the option DNA nucleotide data and nps 16091 16093 16095 3 with the option Binary data and nps 16091 16095 4 with the option Binary data and nps 16091 16093 16095 For cases and 3 the network building algorithm will ignore np16092 Case 2 Choose the option DNA nucleotide data Continue Sequences 4 i e Alice Bruce Clarissa Doug Number of characters 4 1 e 16091 16093 16094 16095 Create Double click into the Charact and Sequence cells to enter the np names and sequence names Note that the Network data editor limits entry of the Character names to 8 older Network versions 6 and Sequence names to a length of 15 old 6 For STR data the Locus name length limit is 6 old 5 Click into the table cells to enter the nucleotides You can use the keyboard keys for editing and for moving up down left right Alternatively you can right click a table cell and use the context menu to edit the nucleotide RDF Editor Data editor 416091 Fig 2 Network s Data Editor with dna nucleoti
9. and increase the parameter epsilon see chapter 2 1 5 to identify where new median vectors are added to the network Experiment with increasingly higher epsilon settings and each time save the results under different names e g DNA_multi_MJ_2_epsilon_20 out 29 Rooting the network ancestral node root proxy node The process of determining the ancestral node is referred to as rooting the network The ancestral node of a network can be determined by comparing the network nodes with suitable outgroups For example to manually find the ancestral node of a network of domestic horses Jansen et al used zebra and wild asses as outgroups Jansen T Forster P Levine MA Oelke H Hurles M Renfrew C Weber J Olek K Mitochondrial DNA and the origins of the domestic horse Proc Natl Acad Sci USA 2002 Aug 6 99 16 10905 10 We recommend the External rooting active parameter in the MJ network calculation BP Network 4 6 Data Entry Calculate Network Draw network Ti Median Joining File Parameters Calculate network Help Exit Change weights Change Epsilon currently 0 Frequency gt 1 criterion inactive Weighting transversions transitions Criterion Connection cost External rooting active First you need to add your outgroup to the intraspecies rdf file This procedure is facilitated by the DNA Alignment software version 1 3 2 0 for release in January 2011 which allows you to merge an outgroup alignment ali file
10. be run up to 3 times on the loaded data set Finally Network suggests a name for the star contraction results file sco The sco file can be used for the network calculation MJ or RM After the network calculation the results file out can be used for the MP option to clean up the contracted network 29 2 1 9 Frequency gt 1 Criterion for networks with large number of taxa Data sets with a very large number of sequences taxa profiles can become very difficult to analyse even for MJ network calculation with epsilon 0 followed by MP see fig 27 The Frequency gt 1 criterion simplifies networks fig 26 by ignoring sequences taxa profiles which are unique in the data set because a skeleton network should be obtainable from groups rather than individuals Furthermore a group of identical sequences taxa profiles is less likely to include random errors sampling lab typing conversely a group leading to a suspicious network artefact may point to a systematic process error The Frequency gt 1 criterion is available both for RM calculations and for MJ calculations after a file is opened in the Parameters menu To toggle the criterion on off go into the Parameters menu and click onto the line Frequency gt 1 criterion Then click Calculation MA76 NI4 NI3 AWN TPIO7 44 33 KRY39 KRY35 ITL60 VN37 MA75 2P112 18P110 Ga114 B1 52 KRY3 NI8 X KNITO5 12 MAE UD21
11. genetic distance the MJ algorithm is guaranteed to yield a full median network Usually we find epsilon 10 to be sufficient The range of unweighted genetic distances can be calculated and displayed with Network s Tools Mismatch Distribution Fig 12 In this example the maximal pairwise difference is shown as 4 If all character weights are 10 and the transversion transition weighting is 1 1 then an epsilon value of 40 will guarantee a full median network for this example 1 4 3 4 Pairwise Unweighted mean pairwise difference 2 333 differences Fig 12 Calculating genetic distances in Network s Mismatch Distribution Tool What is a median vector A median network consists of nodes and links which connect the nodes The nodes are either sequences from the data set or median vectors The links are character differences A median vector is a hypothesised often ancestral sequence see Fig 13 mvl and mv2 which is required to connect existing sequences within the network with maximum parsimony Without the median vector there would be no shortest connection between the data set s sequences CLARISSA Fig 13 Median network showing median vectors mv1 and mv2 18 Switching the distance calculation method between Connection Cost and Greedy FHP The switch between the two available distance calculation methods default Connection cost method of R hl et al alternative Greedy FHP method of Foulds Hendy Penny et a
12. if the character weights are 10 or greater Conversely epsilon settings of 10 20 30 etc can be useful if the character weights are 10 or greater Our experience suggests that epsilon values of O or 10 normally result in a good network For small and clear networks we suggest that more MP links can be found by activating the MJ square option Setting the parameter epsilon The parameter epsilon is set in Median Joining Parameters menu Change epsilon see Fig 11 after the data file has been opened File menu Open To change the value of epsilon type a number or click the lt up gt or lt down gt button and click OK Epsilon values may range from 0 to 231 All parameter settings are logged in the first lines of the network calculation out file The Median Joining option is accessable from the Calculate Network main menu Network Calculations Median Joining Farameters Value of epsilon 0 OK Fig 11 Setting epsilon parameter in Calculate Network Median Joining 17 What does epsilon mean The parameter epsilon specifies a weighted genetic distance to the known sequences in the data set within which potential median vectors may be constructed If epsilon is set less than the greatest weighted genetic distance within the data set then there is a theoretical possibility that the MJ network will not contain all possible shortest trees If epsilon is set equal to or greater than the greatest weighted
13. option ccseeeeeeeeeees 19 2A MP Option 10 Clean Up MEtWOrk S ereere E E T E 21 2 1 8 Star Contraction option Use for network simplification or for identification of pop latiom expansion CV EIS resni a aa aa a N a 23 2 1 9 Frequency gt 1 Criterion for networks with large number of taxa sseneeseeseseeessees 29 2 1 10 RM MJ network calculation for reduced Complexity cccecccseseceseeeeeeseeeeeeeeeeeeees 26 22 DNA nue eonde SEGUECNCC dai scsicn ach tasteadne aninnssh hoslaereyanienash iansadseatenab E A e Ta 27 Dee MOAI AL aeae a A EN 2 2 2 2 Network calculation using the MJ algorithm with optional external rooting 28 2 2 3 Discussing analysing and interpreting network results MJ and RM eee 30 22 AACA layout OL resu Mosea cate aa nana aa snaewe sere ican eomeee anon ananee ereeaae eG 32 2 2 4 1 Node and pie chart colouring in Network Publisher 2 0 0 0 ccecccceescceseeeeeeeeees 33 2 29 Venhcat n using the RM Optiot ira E Ra 35 ZRNA n cleoude SEQUENCE GALA rae E EEE wacndauaendvoles 37 Da E S e N E O E E EES 37 24 Amino acid nucleotide SEQUENCE da r 225 jaschashaosdeces idacdevsedadunetidsehasaeoadasaladactessscasneatacaates 38 a RD NEE E a eee ee EE A ETA ar ee RPM ET Cree EMERY E ERNIE E ERC Ree 38 2 4 2 Network calculation analysis interpretation and graphics cccccceecccseeeeeeeeeeeeeeees 39 2 5 STR data short tandem repeat microsatellite data 2
14. option 3 Added emf format to Chapter 2 2 4 Graphical Layout of results
15. pie chart colouring in Network Publisher 1 1 0 6 the old names for taxon attributes color schemes Ethnic Group Region Haplogroup are replaced by Phenotype Geography Lineage and 3 new color schemes are available which can be renamed 11 Updates to Network 4 5 0 1 User Guide compared to Network 4 5 0 0 User Guide of 31 December 2007 l Added RM MJ calculation for network complexity reduction in chapters 1 3 Further Complexity reduction options chapter 2 1 Overview of the general work flow and the RM MJ work flow new chapter 2 1 10 RM MJ calculation chapter 2 5 2 STR Network calculation analysis interpretation and graphics chapter 2 6 2 Endonuclease RFLP data Network calculation analysis interpretation and graphics chapter 2 7 2 Binary data Network calculation analysis interpretation and graphics Added note on human Y STR 389I and 389JI loci in chapter 2 5 1 Data entry Added general notes and added new limit for Network Draw in chapter 3 Software Limits in Network 4 5 0 1 Split Fig 1 into Fig la General overview of the work flow and Fig 1b Specific work flow for the RM MJ network calculation Inserted Fig 28 and renumbered Figs 28 49 Exchanged Figs 2 3 9 10 32 33 39 41 43 46 48 50 for the slightly modified pictures of Network 4 5 0 1 and Network Publisher 1 1 0 4 12 Updates to Network 4 5 0 0 User Guide compared to Network 4 2 0 1 User Guide of 19 September 2007 l
16. which module menu and button or command 3 With which data file or with which manually entered data zip stuff any file before emailing to prevent the possibility of email corruption 4 Reproducibility Does this happen every time or did it happen after you did certain things If it does not happen every time can you remember what you did before this happened Did you manage to reproduce this once 5 Screen setting reproducibility Is this a graphics problem If so under what screen resolution does it happen Does it happen at a resolution of 1024 x 768 6 Hardware Reproducibility What computer did you use CPU graphics chipset RAM memory size free memory on hard disk If you have more than one computer Does this happen only on one computer or do you see it on others too 7 Operating system reproducibility What operating system did you use If you have more than one operating system Does this happen only on one operating system or do you see it on others too Are you running the operating system in a virtual machine 8 LAN reproducibility Is your computer connected to LAN wire or wireless If so were you working on your own computer s hard drive or partially over the LAN reading from a non local folder writing to non local folder or running the software from a non local folder 9 User privileges reproducibility Did a different user log into windows with different user privileges and everything worked then you logge
17. 16091 16092 16093 16094 16095 Alice T C G A G Bruce T C C A C Weight 10 10 10 10 10 Bruce differs from Alice in two characters 16093 and 16095 The weighted distance is 20 Let us assume that there are 100 more sequences in the data set and that character 16095 is hypervariable within the data set A frequently changing character is less valuable for network construction than infrequently changing characters Therefore we downweight 16095 16091 16092 16093 16094 16095 Alice T C G A G Bruce T C C A C Weight 10 10 10 10 5 Bruce differs from Alice in two characters 16093 and 16095 The weighted distance is 15 For first network building calculations with a new data set we suggest that you leave the default weights If your network turns out to be poor containing high dimensional cubes or large cyles you can change the weights for the next runs as explained on the following pages Types of weight in Network In Network you can change two types of weights 1 weights of characters This value may range between 0 and 99 A value of O instructs Network to ignore the character 10 is the default value 2 in MJ only weights of single nucleotide mutation types transversions transitions This weight may range from 1 50 to 50 1 The default is 1 1 12 Guidelines for changing weights if the calculation with defaults is unsatisfactory 1 Increase the weight for events that might be much less likely to happen because they are s
18. 99 Sf MULT54 2P102 21P106 ON OS 1P10 L N SCO5 TW56 SCO9 SA112 MM88 SA122 v AA M 2P46 MC19 MA73 MA78 VN31 VN38 MA81 CHU40 KN100 TW60 VN36 MULT55 KN105 VN34 MA76 MMSA83 A106 TIB135 KN102 Fig 28 ExampleRFLPWeighted rdf Star contraction RM MJ and MP calculation 27 2 2 DNA nucleotide sequence data 2 2 1 Data entry Small data sets can be entered manually and saved into a file using Network s Data Editor Start Network Data Entry menu Manual DNA nucleotide data Continue See chapter 2 1 2 for details DNA Alignment Load Reference sequence Settings About Exit Fasta file auto align Fasta file display only Fig 29 Loading FASTA data into DNA Alignment with or without alignment option Larger data sets should be imported into the DNA Alignment software in FASTA format aligned if required checked see chapter 2 1 2 and exported for Network in rdf format This allows longer sequences to be analysed than with Network s manual data entry The DNA Alignment software Fig 30 can easily export character multistate data Fig 2 chapter 2 1 2 rdf files or binary data Fig 3 chapter 2 1 2 rdf files from the same FASTA file allowing both MJ and RM to be run on the same data After the MJ analysis RM can be run on the binary data file if an independent verification of the MJ results is required as the two a
19. M and MJ network calculation options automatically replace these Ns with either 0 or 1 see chapter 2 2 5 before the RM or MJ calculation is carried out 2 7 1 Data entry Small data sets can be entered manually and saved into a binary rdf file using Network s Data Editor Start Network Data Entry menu Manual Binary data Continue RDF Editor Data edito 16111 Fig 50 Network s Data Editor with binary dna sequence data Larger data sets should be imported into the DNA Alignment software in FASTA format aligned 1f required checked see chapter 2 1 2 and exported for Network in binary rdf format This allows longer sequences to be analysed than with Network s manual data entry 2 7 2 Network calculation analysis interpretation and graphics For detailed instructions on the network calculation steps see chapters 2 2 2 2 2 5 and 2 1 10 For detailed instruction on analysis discussion and interpretation see chapter 2 2 3 For detailed instructions on graphical layout of results see chapter 2 2 4 45 2 8 Time estimates The Time estimates sub program is applied to the finished network 1 e the network must first be calculated laid out as a tree like structure in the Draw sub program or Network Publisher and saved in fdi format There are two conceptual steps in the time estimates sub program First the mutation rate must be obtained usually by calibration Then the ages of nodes within the netw
20. Network 4 6 1 1 User Guide Version date 20 December 2012 Copyright 2012 Fluxus Technology Ltd All rights reserved Legal Disclaimer This user guide shall not be interpreted as a warranty of any kind Use of the software is subject to the terms under www fluxus engineering com network_terms htm Table of Contents PR Scoala N oes eee rere rere terete a tree ree tenner eer errr sneer eee re etree eee tree Terres tre 4 Mal SSC OPH OL APLC ALLO ses sete sar ants ase e oa a dost naieton etal aseles 4 L2 INCU Ol ie DUIS OID EONS eea E eee aed sla E sual eines leant 4 1 3 Further complexity reduction Options cccccccccsescccesecceeecceeeecseeeceenceeseseeeenceeseseeeeneesees 4 LA Coniplementary ODMONS siicseteaieussuinde E E ise 4 2 NV ONE EG WY ete ad he eee tated eee eared latices eget ee a dees 5 2 1 Overview of the general work flow and the RM MJ work flow ccccccccsescecseeeeeeeeeeees 5 SL E UO MeeVee neds ence eet ener tice E E peck ore eatin eect E ame eae ence an ene T 7 2 1 2 Preparation of variable data sets for Network ccccccecccsescccseeecneceeeseceenceeeeseeaeneeeees 8 ard lee D oik od E eee ere Seer eae e Ere Sone eee E eer are nner err S er eee etre ere trance etree eer eer etre 11 2 A PVCQUCHICY s crsh cineca shots cerecclanc A EE a a a 15 2 1 5 Epsilon Gn MJ Connection Cost Greedy FHP in MJ MJ square option 16 2 1 6 Reduction threshold r and out file option Gn RM network
21. SS Fig 4 Insertion artefact displayed by DNA Alignment for checking and correcting In real data check around each inserted gap but bear in mind that an artefact will not always be so obvious Check each nucleotide mismatch Fig 4 16111 T in NuuSa and NuuS5b against the sequencing chromatograms to confirm validity of the nucleotide Investigate each ambiguous nucleotide Double check newly discovered mutations against the possibility of contaminations and sequencing errors If you do not check unknown data or auto aligned data you risk that Network will build an incorrect network Note that the Network Data Editor does not highlight alignment differences and does not allow alignment editing but that we recommend the DNA Alignment software to display and manually edit alignments Il 2 1 3 Weights Introduction Genetic Distances and Weights A fundamental concept within network building algorithms is the genetic distance between two sequences in a data set This is calculated by the number of different characters between these sequences To explain the genetic distance let us look at two sequences 16091 16092 16093 16094 16095 Alice T C G A G Bruce T C C A C Bruce differs from Alice in two characters 16093 and 16095 The genetic distance is 2 To take into account that some characters can be more important than others Network applies a weight to each character Consider the example with Network s default weight of 10
22. Time estimates Tools About Exit Median Joining File Parameters Calculate network Help Exit Exit Fig 32 Opening the rdf data file for MJ network calculation We recommend to run the MP option on the out file to delete all superfluous median vectors and links which are not contained in the shortest trees in the network Calculate Network menu Optional Postprocessing MP Calculation Network containing all shortest trees and list of some shortest trees sufficient to generate this network Start Open Network output file out The MP results are saved as sto file format The sto file can be opened viewed and analysed in the Draw Network option See Fig 32 first menu line or in the Network Publisher software Fig 33 Alternatively the out file can be opened Save a screen snap bmp or from Network Publisher only a vector graphic emf wmf pdf of the network Network Publisher DNA muli MJ Tout File Help Exit M or RM outfies ou Fig 33 Opening the sto file or the out file for visual display and analysis Re run with different settings see Fig 1 in chapter 2 1 For example if the network is messy identify nucleotides which have mutated frequently within your data set see Fig 5 in chapter 2 1 3 downweight these nucleotides see chapter 2 1 3 calculate the MJ network and save the results under a new name e g DNA_multi_MJ_2 out When your network looks clean keep the successful weight settings
23. acter 16093 10 Ole Weight of character 16094 10 Weight of character 16095 10 Mew weight Fig 6 Editing character weights in Network s Calculation Parameters In the Median Joining option for non binary nucleotide data you can additionally apply a transversions transitions weight globally to all characters For example for mtDNA you can enter 3 for Weighting transversions and 1 for Weighting transitions This weighting will be interpreted additionally to character weights e g a character with the character weight 20 and containing transversions will be weighted 20 3 60 OK VVeignhting transversions ri Weighting transitions l ee Fig 7 Editing transversion weights in Network s MJ Calculation Parameters panes pyrimidines transversion T ae i gt Y transition C Fig 8 Transitions are chemically more likely to occur than transversions in human mtDNA 14 To edit weights and save the changes to into the Data Entry main menu Import rdf file specify the file type and click Continue This will load the file and open Network s Data editor To edit a weight click into the cell see Fig 8 type a value and confirm by hitting the lt Enter gt key or clicking into a different cell Finally click the Save button to update the rdf file and Exit Note For STR data you can now also edit weights in the Network STR editor and save the weights in the new ych format From Network 4 5 0 0 upw
24. all the network nodes and links see chapter 2 2 3 for a graphic example 23 2 1 8 Star Contraction option Use for network simplification or for identification of population expansion events Complex networks containing many sequences taxa or Y profiles can be very difficult to interpret at first glance One option to reduce complexity is the star contraction algorithm another option is described in sub chapter 2 1 9 Historic demographic expansion events are characterised by star like clusters of nodes around a founder population node The star contraction algorithm identifies such clusters and shrinks the nodes of a cluster back towards the founder node The star contraction algorithm therefore has two separate uses a to help analyse networks for historic demographic expansion events and b to help simplify complex networks into skeleton networks for a first overview NUU7 NUU5A NUU9A NUU8A NUU10 NUU6A NUU12A NUN4 NUU17 NUU15A NUU18A NUU16 mVvo NUU21A NUU20A wee NUU19 NUU28 NUU27 mv3 mv9 mv4 NUU3 NUU22A ue NUU24A mvi NUU1A NUU2A NUU25A NUU26 Fig 21 ExampleDNAMultistate rdf Star like cluster around founder node NUU11A NUU11A NUU7 scol NUU22A NUU24A NUU2A Fig 22 Same data with star contraction preprocessing mv1 renamed to sco1 24 Running the star contraction calculation Star Contraction is run before running the network calculation MJ or RM In the main window of Ne
25. ards Weight 10 Fig 9 Editing character weights in Network s Data Editor 15 2 1 4 Frequency Definition of Frequency in Network s data editor The frequency value allows you to specify the number of times that a sequence or STR profile occurs in your data set Fasta files Frequency value in files generated by the DNA Alignment software The frequency value is 1 in the rdf files generated by the DNA Alignment software This means that one sequence entry corresponds uniquely to one taxon 1 e to one individual in the rdf files generated by DNA Alignment if the sequence names are unique within the FASTA file DNA Alignment can create duplicate taxa with identical names when truncating long names longer than the name length 6 permitted by Network but Network will issue a warning message when such a file is imported Manual data entry Frequency value If the same sequence or STR profile occurs several times in your data set you do not need to enter this several times in Network s Data Editor Instead you can click into the cell in the Frequency column see Fig 10 and type a value i e the number of times that the sequence or profile occurs in your data set and press the lt Enter gt key or click into a different cell Duplicated sequences and profiles with different names are allowed Fig 10 Editing sequence frequencies in Network s Data Editor Note that duplicate taxa i e identical sequences w
26. at high magnification Note In the current software version the fdi file is saved specific to the display setting of the computer on which it was saved This means that the size and position of the network graphic may change if the file is opened on a different computer If you plan to exchange fdi files during a project it would make sense to agree on a consensus setting and to put this information into the file name e g DNA_multi_MJ_2_epsilon_20_display1400x1200 fd1 33 2 2 4 1 Node and pie chart colouring in Network Publisher 2 0 0 0 Node colouring and pies are increasingly used within network graphics to display additional information in a network for example geographic affiliation haplogroup or ethnic affiliation of each sequence or STR profile Colour coded nodes and pies can also be used to help analysis and interpretation for example whether the geographic origins of sequences correlate with their relationships within the network Fig 38 Network graphic with colour coded nodes for geography lineage etc In Network Draw it is possible to define colours and pies after right clicking a node However this is a lot work and furthermore this work must be performed again after every network calculation for example when different calculation parameters are explored In Network colour coding information may be entered in the Network data editor Start Network Data Entry menu Import rdf file before running a network calc
27. be opened with the Network Calculation options Star Contraction Reduced Median Median Joining For detailed instructions on the network calculation steps see chapters 2 2 2 and 2 2 5 For detailed instruction on analysis discussion and interpretation see chapter 2 2 3 For detailed instructions on graphical layout of results see chapter 2 2 4 40 2 5 STR data short tandem repeat microsatellite data 2 5 1 Data entry Small data sets can be entered manually and saved into a file using Network s Data Editor Start Network Data Entry menu Manual Y STR data Continue M STR Editor E Data editor oon mo Fig 48 Network s Data Editor with STR data When the editor is opened for new data entry the taxa and loci names are defaults to edit these click into the cells and edit For STR data of individual persons we suggest to enter each individual s abbreviated name as a taxon and leave the frequency value at 1 Note Human Y STR loci 389I and II cannot be used for network analysis as they together comprise the 4 independently mutating DNA stretches 389m 389n 389p 389q which can each be used for network analysis see Forster 2000 If only 389I and II are available these must be left away To enter the number of repeats click into a cell and edit or use the right mouse button To edit weights for network calculations click into a cell and edit To assign attributes to each taxon for node and pie c
28. button Change scheme names Renaming is only possible in Network Publisher not in the Network rdf STR editor Attributes can also be imported into a Network Publisher 2 0 0 0 session from Excel or a csv file First column sequence names Second column attribute information which is used for the color scheme Example csv file separator semicolon S00 Ls FEMALE SQ02 FEMALE S003 MALE S004 FEMALE Note Network Publisher is an add on which can be ordered for a fee 35 2 2 5 Verification using the RM option The RM algorithm is a separate and different algorithm to the MJ algorithm This makes the RM option suitable for verifying MJ networks BE Network 4 5 Data Entry Calculate Network Braw network File Parameters Calculation Help Exit Open _ Fig 43 Opening the rdf data file for RM network calculation In contrast to MJ RM can only work with binary data see Fig 3 If the data set for MJ consists of binary data the same file can be used for RM Otherwise the data set needs to be binarised For manual data entry see chapter 2 1 2 example 3 case 4 For FASTA file data binarisation is automatically carried out by the DNA Alignment software when saving as Network binary data format see Fig 29 right The DNA Aligment software will write N into some positions when the character is multistate Network RM will give a warning message and automatically replace these Ns in a greedy manner minimizi
29. calculation can be started Calculate network and the result file 1s saved out We recommend to run the MP option on the out file to delete all superfluous median vectors and links which are not contained in the shortest trees in the network Calculate Network menu Optional Postprocessing MP Calculation Network containing all shortest trees and list of some shortest trees sufficient to generate this network Start Open Network output file out The MP results are saved as sto file format Note that for some data sets the MP option may take a very long time to run up to several days in this case kill the MP option The sto file can be opened viewed and analysed in the Draw Network option or in the Network Publisher software Fig 33 Alternatively the out file can be opened Save a screen snap bmp bitmap pdf or a vector graphic emf vector pdf of the network Re run with different settings see Fig 1 in chapter 2 1 For example if the network is messy identify nucleotides which have mutated frequently within your data set see Fig 5 in chapter 2 1 3 downweight these nucleotides see chapter 2 1 3 calculate the RM network and save the results under a new name e g DNA_binary_RM_2 out When your network looks clean keep the successful weight settings and increase the parameter r see chapter 2 1 6 to identify where new median vectors are added to the network Experiment with increasingly higher r settings and eac
30. d into windows with your privileges and the software would not work If so does the software work again for the other user when he or she logs into windows again and tries to use the software Enhancement requests user wishes We are happy to include your enhancement requests into our bug tracking databases We will assign your enhancement requests to Network DNA Alignment or Network Publisher If you would like us to program a specific enhancement within a specific time frame on your organisation s budget please mention this in your email Otherwise we will simply add your enhancement request to our list Contact Network development team nw at fluxus technology dot com 92 6 Updates to the Network 4 6 1 1 User Guide Compared to the Network 4 6 1 0 User Guide of 31 December 2011 l Minor updates to chapters 2 2 2 Network calculation using the MJ algorithm with optional external rooting 2 2 4 Graphical layout of results and 2 2 5 Verification using the RM option to reflect the vector PDF output in Network Publisher 2 0 0 0 Minor updates to chapters 3 Software limits in Network 4 6 1 1 and 4 Network 4 6 1 1 Present and Future to reflect the improved memory management limits in Network Publisher 2 0 0 0 7 Updates to the Network 4 6 1 0 User Guide Compared to the Network 4 6 0 0 User Guide of 31 December 2010 3 Added MJ square option in chapter 2 1 5 Epsilon in MJ Connectio
31. de data Case 4 Choose the option Binary data Continue Continue as for case 2 on the previous page but enter the nucleotide states compared to the first sequence For example Doug does not have G at 16095 so enter O into his 16095 cell RDOF Editor Data editor Fig 3 Network s Data Editor with binary data The maximal number of characters allowed in the data editor is 1000 For long sequences with sequencing ranges gt 1000 it becomes necessary to leave away non variable characters here np 16092 But note that for large data sets manual data entry and manual alignment is error prone For larger data sets in FASTA files we request you to use Fluxus DNA Alignment software Example 4 DNA Alignment software DNA and amino acid FASTA files can be imported and prepared for Network using Fluxus DNA Alignment software This software has a limit of 99999 on the number of characters and no limit on the number of sequences The DNA Alignment software can be run with or without the auto alignment option Alignment algorithms vary in quality and poor auto alignment results will lead to poor network results 10 The alignment algorithm in Fluxus commercial DNA Alignment software is a sophisticated pairwise alignment algorithm which compares whole segments of sequences and does not employ gap penalties If the user chooses to run this algorithm all sequences in the FASTA file will be auto aligned under a reference sequence wh
32. es 30000 Data columns characters loci nucleotide positions 5000 Frequency 999 Network Calculations Data lines taxa sequences 30000 Data columns characters loci nucleotide positions 5000 Nodes haplotypes median vectors 40000 Links 500000 Note that data from Y STR ych format files can lead to locus splitting and very complex networks The practical limit for the number of loci may therefore be reached at 500 loci or less depending on the number of STRs involved For RM calculations limits are much lower than for MJ or RM MJ calculations e g for typical RM calculations ca 1500 taxa Network Draw Max number of mutations displayed on a link 200 Max number of links 10000 Mismatch Distribution Tool Max distances 100 50 4 Network 4 6 1 1 Present and Future Network has been freely available since January 2000 New versions are usually released at the beginning of each year Some interim releases were made in response to urgent requests Although Network includes a data editor and a graphics program fast growing data sizes and enhancement suggestions from our user base motivated the development of two additional software products which are available for a small fee DNA Alignment is useful for importing and preparing FASTA files for Network Network Publisher is useful for producing higher quality network graphics and for displaying node and pie colours depending on attributes additiona
33. explore whether and how the network If this also leads to poor networks use only changes For RM change r to 3 4 5 etc those taxa which contain gt 1 individuals MP Option MP Option when exploring N or out Draw Network Network Publisher To lay out final network graphics in high quality 4 Ca or emf or bmp or pdf Import emf picture into MS Powerpoint or wmf picture into publication layout software Fig 1a General overview of the work flow Prepare your binary variable data DNA Alignment Network will ignore loci e g nucleotide positions which are invariable throughout your data set lt gt binary rdf only ych tor Calculate RM MJ Network Run RM switch off out file generation then run MJ on the rmf file Weights default 10 r 2 no out file for RM and epsilon O for MJ lt gt MP Option Purge superfluous links and median vectors from network kill MP if too long run time gt Draw Network clean noA b cubes or large cyles Re Calculate RM MJ Network Re Calculate RM MJ Network Change epsilon to 10 20 30 etc for MJ Change weights see detailed notes to explore whether and how the network If this also leads to poor networks use only changes those taxa which contain gt 1 individuals MP Option MP Option when exploring iinn or out Draw Network Network Publisher To lay out final network graphics in high quality o or emf or bmp
34. h time save the results under different names e g DNA_binary_RM_2_r_5 out pies Network 4 5 Data Entry Calculate Network Draw network Time estimates Tools A Optional Pre Processing P Network Calculations b Optional Post Processing MP Calculation Ctrl C Fig 46 MP option to delete superfluous median vectors and links from networks See chapter 2 2 3 for discussion analysis and interpretation of networks and chapter 2 2 4 for graphical layout of results 37 2 3 RNA nucleotide sequence data 2 3 1 Data entry RNA nucleotide data currently needs to be entered with a minor work around For data containing just the standard 4 rna bases A C G U use the dna data entry options manual entry or DNA Alignment software replacing U by T this will have no adverse effects on the alignment calculations or the network calculations Before importing FASTA file data into DNA Alignment use a text editor to search and auto replace all occurrences of U by T check the sequence names afterwards for auto replacements within the sequence name For detailed instructions on the network calculation steps see chapter 2 2 on dna nucleotide data For data containing modified rna bases resort to the amino acid data entry as a work around see chapter 2 4 38 2 4 Amino acid nucleotide sequence data 2 4 1 Data entry Small data sets can be entered manually and saved into a file using Network s Data Editor Start Network Da
35. he locus name e g D19aa to distiguish between repeat numbers e g aa 10 repeats ab 13 repeats depending on data file For detailed instructions on graphical layout of results see chapter 2 2 4 42 2 6 Endonuclease data RFLP restriction fragment length data 2 6 1 Data entry Small data sets can be entered manually and saved into a file using Network s Data Editor Start Network Data Entry menu Manual Endonuclease RFLP data Continue isi RDF Editor Data edito dbpins 0 RFLP site absent 1 RFLP site present M nok determined Fig 49 Network s Data Editor with RFLP data When the editor is first opened for new data entry the character and sequence names are defaults to edit these click into the cells and edit To enter end click into a cell and edit or use the right mouse button The data is saved in rdf file format For further details on the Network Data Editor see chapter 2 1 2 Alternatively tor files can be imported into the Network Data Editor Start Network Data Entry Menu Import rdf file Endonuclease RFLP data Continue Open tor file and saved in rdf format Example for tor file format see ExampleRFP tor from Forster Torroni Renfrew R hl 2001 MC17 4 990a os0044a F1l5412k i Lines 1 3 definition of sequence MC17 Line 1 name of sequence Line 2 4990a no cut at position 4990 AG CT where the RS was cut Line 2 5584a no cut at position
36. ich the user can choose Normally the user will choose an arbitrary sequence from the data set as a reference sequence for the special case of human mtDNA the choice can be the Cambridge Reference Sequence but nucleotide numbering must be consistent between the data set and the CRS Alternatively the DNA Alignment software can import FASTA files and export them as Network rdf format without alignment This option can be useful 1f your FASTA data are pre aligned by other programs or if your data set fits into the Network limit of 1000 characters without alignment As a general rule Before using unknown aligned data or auto aligned data in Network you must check the quality of the alignments Fig 4 shows a poor alignment which we intentionally created manually to demonstrate an insertion artefact There is a gap inserted at np16096 in the sequences Nuulb Nuulc Nuu2a which shifts nps16096 16109 right by one np compared to the reference sequence 41400 geeks OO 0 GA G2 GA G2 GI OG Q Se r r r e r ar De a a a PRP E EEE RERREE E r pa e e a a a r E e e a a r E DDDODQODDONTO Q G0 Ga at HAntddaao CK PEP P PAA Fe IIrIIOrPrE O FEEFEF PODO T epn FE e JAAA H p PP AAS OIIMIIOOPPFE O e oa Donn dA gD DODDO TD O OOO Ol om D PFEEEFErOOG I Tonn bP re DODODDOTDD O e a a a PPP EF AaIaAaoaaAaao IWIAIIaaaang FEF P PEF EF SS MaMa HM NHM GA 0 G3 GJ G2 02 G G G PEPPER EF FEF PPP PF SS PEPE PP EP E
37. ignificant when they do happen 2 Decrease the weight for events that might be much more likely to happen 3 For characters in which deletions or insertions have occurred we suggest a double weight weight value 20 4 For human mtDNA data we suggest transversions to be weighted three times as high as transitions Transversions occur about 20x less often than transitions in human mtDNA see Fig 8 5 For hypervariable sites characters including length repeat mutations in mixed data we suggest downweighting the character to 5 or even 0 To identify a hypervariable or fast mutating character within your network draw the network and press the statistics button see fig 5 character 176 Fig 5 Statistics button to identify fast mutating character for downweighting Before you change weights decide whether you want to save the changed weights or not Note For all data including STR data and mixed data you can save changed weights from Network 4 5 0 0 upwards 13 To change weights without saving go into the Network Calculations main menu into either the Reduced Median RM option or the Median Joining MJ option Then open the rdf file open the Parameters menu Change weights see Fig 6 Click onto the line for which you want to edit the weight and a New Weight entry field will appear with the current weight Edit this weight and click OK Weighting Weight of character 16091 vera char
38. ike meshes Fig 16 right from a full median network The reasoning is that parallel mutations Fig 16 character 19101 are more likely to have occurred between existing sequences Kay John Mary Nat than between an existing sequence and a median vector Lucy mv1 In this example the weights of characters 19101 19102 and 19103 are 10 If the sum of weights on the long side of the ladder characters 19102 and 19103 is greater or equal r times the weight of the parallel mutation character 19101 the reduction algorithm deletes the parallel mutation between a median vector mv1 and the other side of the ladder existing sequence Lucy sum character weights long side of ladder gt r character weight parallel mutation In this example the sum of weights on the long side of the ladder is 20 For r 2 r times the weight of 19101 is 20 so mv1 and the links are deleted Fig 16 left For r 3 r times the weight of 19101 is 30 so mv1 is not deleted Fig 16 right JOHN NAT JOHN mvi NAT KAY LUCY MARY KAY LUCY MARY Fig 16 RM network with r 2 left and full median network with r 3 right Option for not generating an out file New in Network 4 5 1 6 The RM network calculation produces an rmf file with split loci in a first step and an out file with a network in the second step Optionally the second step can be switched off checkbox Generate list of links out file for drawing The rmf file can then be u
39. is then also run on suitably prepared data nucleotide FASTA data are easily prepared with the DNA Alignment software 1 3 Further complexity reduction options The star contraction option can simplify complex data The MP option deletes non MP links from the network i e links which are not used by the shortest trees in the network For STR data or RFLP data or binary data a combined RM MJ calculation may be performed to simplify the network 1 4 Complementary options Network includes a data editor and a graphics program FASTA files can be imported and prepared for Network using Fluxus DNA Alignment software Higher quality graphics of Network s results files can be prepared using Fluxus Network Publisher software 2 Work Flow 2 1 Overview of the general work flow and the RM MJ work flow Prepare your variable data DNA Alignment Network will ignore loci e g nucleotide positions which are invariable throughout your data set lt a or other format ami ych tor nex phy Calculate Network For rooting use MJ method If you are a new user use MJ method Weights default 10 epsilon 0 for MJ or r 2 for RM ee MP Option Purge superfluous links and median vectors from network kill MP if too long run time gt Draw Network clean a cubes or large cyles Re Calculate Network Re Calculate Network Change epsilon to 10 20 30 etc for MJ Change weights see detailed notes to
40. ith identical sequence names are allowed by the DNA Alignment software when importing FASTA and saving as rdf Such rdf files can be opened by the Network Data Editor but cannot be processed by the Network Calculation There will be a warning message and you can correct the problem in the Network Data Editor 16 2 1 5 Epsilon in MJ Connection Cost Greedy FHP in MJ MJ square option The Median Joining algorithm will build a sparse network if the parameter epsilon is set to zero default or other low numbers This can cut run time for large data sets significantly allowing a first approximate impression of the network within a short run time For special cases an epsilon value of zero or other low numbers may be sufficient to create a complete network The full median network will be calculated when the parameter epsilon is sufficiently high but for large data sets this calculation may take a very long time or hit the software s internal limits Furthermore a full median network may look very complex and may be difficult to interpret for data sizes larger than non trivial data sets For this reason we suggest to experiment with epsilon settings of 0 10 20 etc to see how the network develops Note that epsilon is a weighted genetic distance measure Therefore epsilon increments should be consistent with the weight settings For example epsilon settings of 1 2 3 9 are not useful because they will give identical networks
41. l is set in Median Joining Parameters menu Criterion see Fig 14 after the data file has been opened File menu Open To change the distance calculation method click onto the Criterion line in the Parameters menu This will change the method and close the menu When you re open the Parameters menu the currently active method is shown e g Criterion Greedy FHP Median Joining Median Joining File Parameters Calculate network Help Exit File Parameters Calculate network Help Exit Change weights Change weights Change Epsilon currently 0 Change Epsilon currently 0 Frequency gt 1 criterion inactive Frequency gt 1 criterion inactive Weighting transversions transitions Weighting transversions transitions Criterion Connection cost Criterion Greedy FHP Fig 14 Distance calculation method Click Criterion line to change from the default Connection cost method left to the alternative Greedy FHP method of Foulds Hendy Penny right 19 2 1 6 Reduction threshold r and out file option in RM network option The Reduced Median algorithm will build a reduced network if the parameter r is set to 2 default or other low numbers greater than two The reason for reducing a full median network is to improve clarity for data sizes larger than trivial data sets because a full median network can easily contain too many links and median vectors to visualise and interpret The full median network will be calcu
42. l information for each sequence or taxon defined in the Network Data editor Phenotype Geography Lineage and the generic Group 1 Group 2 Group 3 which can be renamed A compare tool in Network Publisher helps compare two completed network diagrams di files and locate nodes or mutations in very obscured networks Implementation of future developments We will gradually migrate the Network source code to a different software compiler in order to address memory management limits in larger data sets This work is quite complex and time intensive Extensions to data import for Network will be implemented within DNA Alignment Extensions to graphics for Network will be implemented within Network Publisher Continued availability of the software should be no problem for the next 20 30 years if user interest remains 51l 5 Feedback Bug Reports and Enhancement Requests Feedback between software users and software suppliers is important We read all emails and reply to most We are always happy to read emails from users who simply wish to express their gratitude keep these up Bug Reports Sometimes a reported problem is not quite clear to us leading to delays and question answer emails going back and forward So before sending your email to us please take the time to check whether you have included all information needed to understand and reproduce your problem Checklist for your bug report 1 Describe what happens 2 In
43. lated when the parameter r is sufficiently high but this network may be difficult to interpret For this reason we suggest to experiment with r settings of 2 3 4 5 etc to see how the network complexity increases Note that r is a weighted genetic distance ratio for the likelihood of parallel mutations see Fig 16 Setting the parameter r The parameter r is set in Reduced Median Parameters menu Changing reduction threshold see Fig 15 after the data file has been opened File menu Open To change the value of r type a number and click OK All parameter settings are logged in the first lines of the network calculation out file The Reduced Median option is accessable from the Calculate Network main menu Network Calculations Reduced Median The reduction threshold value is a real number which should be set to at least 2 For human mtDNA control region sequences the value 2 works ok Sensible values for other data need to be determined experimentally by increasing r and seeing whether shorter MP trees are then generated Generally the longer the branches in the data set the higher the r setting should be Threshald Reduction threshold 2 00 OK l Generate list of links out file for drawing Fig 15 Setting Reduction threshold parameter in Calculate Network Reduced Median 20 What does the Reduction threshold r mean The Reduction threshold r is a parameter for deleting parallel mutations in ladder l
44. lgorithms are distinctly different MyData MyData Network multistate data format rdf Network binary data format rdF Fig 30 Save as character multistate data left or binary data right in DNA Alignment Note Please do not use the phy or nex import formats unless you know what you are doing because Network does not perform any checks and may calculate incorrect results The phy and nex formats exported by the DNA Alignment software Fig 31 are interpreted correctly by Network prior data checking within DNA Alignment see chapter 2 1 2 is mandatory MyData MyDiata Sequential Phylip format phy f sequential Nexus format rex ie Fig 31 Saving sequential Phylip left or Nexus right formats in DNA Alignment 28 2 2 2 Network calculation using the MJ algorithm with optional external rooting Calculating the initial network In Network Fig 32 Calculate Network menu Network Calculations Median Joining File Open DNA data file rdf Select the rdf file which you manually created in the Network Data editor or exported from the DNA Alignment software After opening the file the calculation parameters can be viewed and changed Parameters menu see chapters 2 1 3 2 1 5 for details on the parameters After changing the parameters the calculation can be started Calculate network and the result file 1s saved out gt Network 4 5 Data Entry Calculate Network Draw network
45. n Cost Greedy FHP in MJ MJ square option to reflect this new option in the MJ calculation 8 Updates to the Network 4 6 0 0 User Guide Compared to the Network 4 5 1 6 User Guide of 31 December 2009 4 10 Modified flow chart Fig la in chapter 2 1 Overview of the general work flow and the RM MSJ work flow Added For rooting use MJ method to reflect the new External rooting feature in the MJ calculation In chapter 2 1 5 Epsilon Gan MJ Connection Cost Greedy FHP in MJ added 2 sentences that epsilon values of O or 10 are generally recommended To avoid misleading impression that higher epsilon values are generally recommended In chapter 2 2 2 Network calculation using the MJ algorithm with optional external rooting added the heading Calculating the initial network and added the new section Rooting the network ancestral node root proxy node to reflect the the new External rooting feature in the MJ calculation In chapter 2 2 3 Discussing analysing and interpreting network results MJ and RM deleted the paragraph on rooting and ancestral node In chapter 2 2 4 1 Node and pie chart colouring in Network Publisher 1 3 0 0 added that attributes can be imported into Network Publisher from Excel or a csv file In chapter 4 Network 4 6 0 0 Present and Future added that Network Publisher s Compare tool includes a feature to search for nodes and mutations in a comple
46. nalysing and interpreting network results MJ and RM Homoplasy cycle reticulation cube hypercube Mutation of a genetic site can occur at different times and independently of previous evolution Figure 34 below shows characters 16094 and 16095 occurring twice in the network this is referred to as parallel mutations or homoplasy In this special case the characters in the network form a cycle cycles may have more than 4 sides or reticulation Box like cycles are sometimes referred to as a cube or hypercube in the case of a 4 dimensional box or more Cycles within the central regions of a network are not uncommon Peripheral cycles are not necessarily incorrect but they sometimes point at problems in sampling lab processing data alignment or data entry Conversely networks without peripheral cycles are not sufficient proof for error free data BRUCE CLARISSA DOUG Fig 34 Network showing homoplasy in the form a cycle reticulation Shortest trees in network Full median networks are designed to contain all possible equally shortest trees for a given data set The network in Fig 34 contains 4 such trees see Fig 35 below Networks may contain superfluous links which are not required for any of the possible equally shortest trees Network s MP option should be run to delete these superflous links esp if the network contains homoplasies such as hypercubes or large cycles Note Network s MP option does not find all shortest
47. ng the distance to all other sequences within the dataset when the file 1s opened Fig 43 You can also import this binary rdf file Data Entry Import rdf file Binary Data Continue select the file and open into the Network Data Editor A warning message appears and Ns are highlighted red To see how Network replaces each N with either O or 1 use the Replace and Undo buttons Fig 44 If there are more than 5 characters with N we suggest to delete these characters in the Network Data Editor by right clicking the cell in the character header row and choosing delete character Fig 45 then save and exit BF Network 4 5 Data Entry Calculate Network i a Manual CEN Replace ambiguous states Gnda Import rdf file Ctrl I Insert character Delete character Duplicate character dd Final character Sek default character skate Fig 45 Delete characters containing N in the Network Data Editor 36 Verification using the RM option continued To run the RM network calculation algorithm In Network Fig 43 Calculate Network menu Network Calculations Reduced Median File Open DNA data file rdf Select the rdf file which you manually created in the Network Data editor or exported from the DNA Alignment software After opening the file the calculation parameters can be viewed and changed Parameters menu see chapters 2 1 3 2 1 4 and 2 1 6 for details on the parameters After changing the parameters the
48. o interpret Work flow including data preparation and interpretation of results 1s described in detail in the next chapters 1 2 Network building options The Network software was developed to reconstruct all possible shortest least complex phylogenetic trees all maximum parsimony or MP trees from a given data set Two different network building options are included which can be used independently of each other The reduced median or RM network algorithm RM requires binary data example at nucleotide position 16092 each taxon must have either T or C To allow interpretation of complex data a reduction parameter is available If the reduction threshold r is set to a sufficiently high number RM will yield a full median network containing all MP trees The median joining or MJ network algorithm allows multi state data example at nucleotide position 16092 there can be A C G T and ambiguities such as N For larger data sizes the parameter epsilon can be set low to calculate sparse networks quickly or incrementally increased to calculate higher resolution networks at the cost of longer run times and increased network complexity If epsilon is set to a sufficiently high number MJ will yield a full median network software and memory limits permitting Optionally MJ allows external rooting of the network using an outgroup We recommend MJ for general use as first choice If verification of the MJ results is an issue we recommend that RM
49. olouring in Network Publisher click the button Switch input window The data is saved in new ych file format not compatible with Network 2 1 for DOS or Network 4 2 Note that the old ych file format can be used by all Network versions including Network 2 1 for DOS However weights and taxon attributes are not available in old ych format files Details on the Data Editor see chapter 2 1 2 4 Example for old ych file format see ExampleYSTR ych from Forster 2000 re Bianchi 1998 DigyDoolG pee In boo Psi bs IZ D3 75 OA Ds IO le pee Oe ge dg ede hes 5 Line 1 list of loci Lines 4 6 definition of taxon 0A Line 4 name of taxon Line 5 13 repeats of D19 10 repeats of D389q etc Line 6 5 number of individuals in this taxon Note all line ends are defined by lt CR gt lt LF gt 2 5 2 Network calculation analysis interpretation and graphics The old and new ych files can be opened with the Network Calculation options Star Contraction Reduced Median Median Joining For detailed instructions on the network calculation steps see chapters 2 2 2 2 2 5 and 2 1 10 For detailed instruction on analysis discussion and interpretation see chapter 2 2 3 Locus names in data editor and mutated position names in network graphics Activating the Display mutated positions checkbox in Network Draw or Network Publisher will display the mutated position names along the network links Two characters are appended to t
50. or pdf Import emf picture into MS Powerpoint or wmf picture into publication layout software Fig 1b Specific work flow for the RM MJ network calculation 2 1 1 Variable data Network will use only the variable data from your data file or manually entered data set Network will ignore invariable data if your file or your manually entered data contains such data What do we mean by variable data Definition of variable data By variable data we mean a genetic nucleotide position or a genetic locus or a trait or a linguistic feature or more generally a character which allows you to separate your individuals into at least two groups Example I variable data You have an mtDNA data set and your sequencing range included nucleotide position 16092 for all individuals In your data some individuals are C others are T at np 16092 This means that nucleotide position 16092 holds variable data for your set of data 16091 16092 16093 16094 16095 Alice T C G A G Brenda T T C A C Chris G T G T G Doug T C G T C Example 2 some in variable data All individuals in your data set have C at np 16092 So nucleotide position 16092 is useless for differentiating between the individuals in your data set This means that np 16092 holds in variable data for your set of data You can leave away np 16092 You only need to enter nps 16091 16093 16095 16091 16092 16093 16094 16095 Alice T C G A G Bruce T C C A C Clarissa
51. ork can be estimated 2 8 1 Calibration of network mutation rate with a known event The Time estimates subprogram can be used to calibrate the network mutations with a known event e g deglaciation colonisation of an island crossing a new land bridge to a previously unpopulated region archaeologically dated remains This calibration is necessary if a calibration has not already been performed for exactly the same species same data type e g DNA sequences as opposed to RFLPs and the same loci for example sequence range 16054 16365 as opposed to 16024 16400 For a discussion of calibration and age estimation see c Genetic dating on pages 256 257 of P Forster 2004 Ice Ages and the mitochondrial DNA chronology of human dispersals A review The software operation steps are explained in the following example First load an fdi file The program will display the network similar to Network Draw due to a minor bug the node coloring is changed The buttons and controls are located in the bottom right corner Fig 51 highlighted box Fig 51 Layout of Time estimates program 46 Step 1 Click button Specify ancestral node then click onto the ancestral node in Fig 52 example mv4 Mutation rate 1 mutation every 20180 El years otep peur of r S Specify descendent nodes l cH s CHS Fig 52 Calibration step 1 specify ancestral node of known age Step 2 Click button Specify descendent nodes
52. p 3 Click button Calculate time Mutation rate 1 mutation every e544 ears step 1 opecify ancestral node Step 2 Specify descendent nodes ED Calculate time Fig 55 Specify the node for which the age is to be estimated and its descendents 48 The rho age estimation Fig 56 is independent of demographic parameters However the standard deviation is highly dependent on demographic history and the resulting tree shape The standard deviation calculation in the example of Fig 56 yields 6976 years or 0 8165 mutations Results Age in mutations rho statistic 2 Age in years fU88 years standard deviation sigma 0 86165 Standard deviation in years bY 6 146 79 years Fig 56 Estimated age of taxon OR2 and standard deviation in mutations and in years 49 3 Software Limits in Network 4 6 1 1 In the current version limits are often imposed by memory management rather than available memory on the PC which means that a Windows out of memory error will occur before the theoretical maximum number of links or nodes is reached We have resolved memory management problems in Network Publisher 2 0 0 0 by migrating its code to a different software compiler and we plan the same work for Network in the future Network Data Editor Data lines taxa sequences 3000 Data columns characters loci nucleotide positions 1000 Frequency 999 rdf files from DNA Alignment Data lines taxa sequenc
53. r in the Network Publisher software to lay out your results clearly and attractively and save a definitive picture of the results To move nodes of the network click and drag them To move a link click it and drag a node of the link To change the style colour etc of nodes right click a node To change the style of links right click a link Use the display options to produce a clear and clean looking graphic Fig 37 Network graphics before left and after manual editing of layout right Finally save the laid out work in fdi file format for opening again in Draw Network or in Network Publisher You can also save it in bmp or bitmap pdf format standard Network or in emf wmf or vector pdf formats Network Publisher The bmp file can be imported into MS Office or layout software as a limited resolution non editable graphic The emf file can be imported into MS Office or layout software as a high resolution editable graphic After importing the emf or wmf graphic ungrouping the graphic object is normally required Font sizes and graphic re scaling will depend on the application For example Powerpoint may not handle the graphic identically to Word Corel Draw or Adobe Illustrator therefore plan to spend some time on tweaking the graphic after import The reward for using emf or wmf or vector pdf is the high resolution to see the striking difference in resolution magnify Fig 36 or Fig 37 and compare this to the bitmap in Fig 5
54. s to the Network 4 6 1 0 User Guide eccccsccccsesccceeeeeeeeeeeneceeeseeaeeeeeees 32 8 Updates to the Network 4 6 0 0 User Guide ccccccccccsesccceeeeeeseesencceseseeeeneeeaeees 52 9 Updates to Network 4 5 1 6 User Guide Compared to Network 4 5 1 0 User E Ole CCl IDS 08 E seperate cheno easels a tsven econo testers ee ea ein oa meee 52 10 Updates to Network 4 5 1 0 User Guide compared to Network 4 5 0 1 User Gade OF 24 Jui 2008 inscccanseapeine sae lastnnatdancna AEA a A 53 11 Updates to Network 4 5 0 1 User Guide compared to Network 4 5 0 0 User Guide oD ee lale a 010 Wp AEEA tr Tee ene AE Sane TEA OE re Tene ete Erna Cen ene en 53 12 Updates to Network 4 5 0 0 User Guide compared to Network 4 2 0 1 User Guide Ol CO PCM 2 OO erates teeatcw een ae a eee sie eee ca aecae eee 53 13 Updates to Network 4 2 0 1 User Guide compared to 3 April 2007 ceecceeeeeees 54 1 Overview 1 1 Scope of application Network is used to reconstruct phylogenetic networks and trees infer ancestral types and potential types evolutionary branchings and variants and to estimate datings The algorithms are designed for non recombining bio molecules Successful applications include mtDNA Y STR amino acid RNA virus DNA bacterium DNA some effectively non recombining autosomal DNA and non biomolecule data such as linguistic data By contrast recombining bio molecules will deliver high dimensional networks which will be difficult t
55. sed for the MJ network calculation This combines the advantages of both methods Locus splitting of RM and improved speed and memory management of MJ 21 2 1 7 MP option to clean up networks A full median network parameter epsilon in MJ calculation or parameter r in RM calculation set sufficiently high contains all possible shortest MP or Steiner trees However the network calculations can also produce unnecessary median vectors and links see fig 18 mv5 mv8 mv18 The MP option Polzin et al see fig 19 identifies the unnecessary median vectors and links which can be switched off in the results display see fig 20 OR3 OR1 mvi mv3 OR oN O3 myZ MA mae CH5 ma N CH12 BAB BAB2 2 7 a GO3 Fig 17 ExampleAminoAcids ami MJ with epsilon 10 cleaned up with MP OR3 OR1 mv2 N u mi mv 19 ABSA OR2 O39 my7 o MAG S BABI MTS oy f BAB2 ies 6 CH4 E mv18 GO3 Fig 18 eyAmpleAminoAcids ami MJ with epsilon 10 Note After MJ calculation of ExampleAminoAcids ami with epsilon 0 the network looks identical to fig 17 except that the blue links and mv3 are missing meaning that trees are missing MJ with epsilon 10 finds all trees fig 18 and MP cleans up the network fig 17 ZZ Running the MP calculation We recommend that the MP calculation is generally run on all out files produced by the MJ or RM network calculations In the main window of Network select the Calculate Network
56. ta Entry menu Manual Amino acid data Continue See chapter 2 1 2 for details DNA Alignment D gt downloaded DNA _AL_TEST 2006 09_13 ExampleAminoFasta txt Save Undo Edit Help Settings Exit 2 3 fa 5 fe U7 fe fo a Ua fy fa Ua Un fa Pa hh PEELE eee 1 a eee ee AE A LLA Deletion Lielela LA Delete L GEA BoA Insert L GEA m L GEA a LA Ela BoA sane Wi A E A PA Ponk acid or asparagine WA E A P oA eer aig M A E A pA D Aspartic acid L Tsar pA E Glutamic ode ilelela P oA EE etl nine L GEA A LGEA Low ies ee Licea pop pe Euge LIGIEIA PA k Lysine MA E A PA MAE A L Leucine Fig 47 Auto aligned amino acid FASTA data in DNA Alignment with editing option Larger data sets should be imported into the DNA Alignment software in FASTA format aligned if required checked see chapter 2 1 2 and saved for Network in ami format This allows longer sequences to be analysed than with Network s manual data entry The DNA Alignment software Fig 32 can easily export character multistate data Fig 2 chapter 2 1 2 ami files or binary data Fig 3 chapter 2 1 2 ami files from the same FASTA file allowing both MJ and RM to be run on the same data After the MJ analysis RM can be run on the binary data file if an independent verification of the MJ results is required as the two algorithms are distinctly different 39 2 4 2 Network calculation analysis interpretation and graphics The ami files can
57. trees in the network it only finds the shortest trees required to build the network for example only 2 of the 4 trees below are required to define the network ee ae Ke Fig 35 The 4 equally probable shortest trees contained in the example network 3 Groups clades haplotypes haplogroup In phylogenetics the network nodes are living or extinct sequences with specific mutations Descendants of a node can be grouped into a cluster or branch also known as a clade greek klados branch When the number of characters loci under consideration is extended the sequence may be differentiated into sub sequences and the branches become longer Individual human mtDNA sequences and Y chromosomal profiles are often referred to as haplotypes in the literature although these two loci are necessarily haploid In this usage branches and clusters are sometimes referred to as haplogroups TW59 2P91 KN100 MAVN33 TW60 VN36 1P103 1P99 KN99 2P102 TW61 MULT55 o1 MA71 21P106 TINON 1P10 L N sco eee MULT54 SA112 KN105 VN34 MC19 MA76 2P46 MMSA83 KRY37 A106 MA73 CHU49 TIB135 MA78 1P152 VN31 wee KN102 MA81 CHU40 Fig 36 Human mtDNA network showing branching and clustering Displayed names are for sequences not haplotypes See file ExampleRFLPweightedRMMJ out 32 2 2 4 Graphical layout of results Finally when you are satisfied with your network spend some time in the Draw Network option o
58. twork select the Calculate Network menu select Optional Preprocessing Star Contraction The Star Contraction window will appear Click onto File and select the file to be processed Calculate Network Draw network Time estimates Tools 4b Optional Pre Processing Star Contraction Ctrl 5 Network Calculations Optional Post Processing Fig 23 Star Contraction is used before Network calculations You can then click on Parameters and change the star contraction radius this is measured in number of mutated positions For STR microsatellite data the number of mutated positions is not the number of repeats but the number of network characters to see the difference run a network calculation and display the mutated positions in the graphics Parameters i F Maximum star radius X y Set Fig 24 Star Contraction radius delta in number of mutated positions After setting the maximum star radius click on Calculation to run the star contraction Network will suggest a name for the protocol file pro After a first round of the star contraction calculation Network will ask whether to continue the contraction click yes if you want the contracted data to be contracted again After the second round of contraction Network allows you to contract the doubly contracted data a final time x 9 There are 22 taxa left Should the contraction be continued Fig 25 Star Contraction can
59. ulation Click the Switch input window button fig 39 and enter the information for each sequence in max 15 characters length fig 40 Note that you can also use these attributes for storing other information such as home town food preferences metabolic disorders etc Free mpm cas a kay ma par mie rely Fai nianus iG ie toe Switch input window BRUCE cH vanes aaa Fig 40 Entry of attributes for later colour coding Network 4 5 0 0 or later versions 34 In Network Publisher displaying colour coded nodes and pies is made easier and faster for network calculation files which contain attributes After importing the network calculation results select an attribute type from the Color scheme pull down menu assign a colour for each attribute and display the network Your colour assignments are saved with the network graphic when you save in fdi format Tip you may want to cut and paste the 6 last lines defining the colour scheme into other fdi files To display the color scheme explanation within the graphics activate the Display legend checkbox olor scheme Mone Phenotype Feb Fig 41 The Network Publisher pull down menu for colour coding of nodes Scheme NEWYORK Colour eu Change Scheme SHANGHAI Colour ij rs Change Scheme CHICAGO Colour Fig 42 The Network Publisher menu for colour assignment To rename the color schemes Group1 Group 3 in Network Publisher use the
60. x network In chapter 5 Feedback Bug Reports and Enhancement Requests added operating system running inside a virtual machine Updates to Network 4 5 1 6 User Guide Compared to Network 4 5 1 0 User Guide of 27 December 2008 11 Modified flow charts in chapter 2 1 Overview of the general work flow and the RM MJ work flow In both flow charts general work flow RM MJ work flow the sentence MP can sometimes take several days is deleted because this is no longer the case In the second flow chart RM MJ work flow the new RM option no out 12 13 14 15 53 file is mentioned and the recommendation to try different values of reduction threshold r is deleted In chapter 2 1 6 Reduction threshold r and out file option Gn RM network option there is a new checkbox option to deactivate step 2 of the RM from calculating links for an out file With an explanation of the reason behind this option Added chapter 2 8 Time estimates In chapter 3 Software Limits added note that many memory management limits have been extended in version 4 5 1 6 In chapter 4 Network 4 5 1 6 Present and Future added note on new diagram compare tool in Network Publisher 10 Updates to Network 4 5 1 0 User Guide compared to Network 4 5 0 1 User Guide of 24 June 2008 l 2 Exchanged Figs 39 41 for the slightly modified pictures of Network 4 5 1 0 and Network Publisher 1 1 0 6 In chapter 2 2 4 1 Node and

6. Updates to the Network 4.6.1.1 User Guide - fluxus

Contents

Download Pdf Manuals

Related Search

Related Contents