Home

AnGST: The Analyzer of Gene & Species Trees - Alm Lab

image

Contents

1. AnGST The Analyzer of Gene z Species Trees User manual written by Lawrence A David Last revised December 2010 Abstract AnGST the Analyzer of Gene amp Species Trees is a phylogenetic algorithm that rec onciles any observed differences between a gene tree and a reference tree species tree AnGST uses a generalized parsimony criterion to infer a minimal set of evo lutionary events including horizontal gene transfer HGT gene duplication DUP gene loss LOS speciation SPC and exactly one gene birth or genesis event GEN Inference errors due to phylogenetic uncertainty are minimized by incorporating tree construction into the reconciliation process Multiple gene tree bootstraps can be provided to AnGST the algorithm will retain and combine bootstrap subtrees which yield the most conservative reconciliation consistent with the sequence data The AnGST software package is implemented in the Python programming language and can be downloaded from This user manual focuses on explaining how to run AnGST For a more thorough discussion of the theory behind AnGST and algorithm benchmarking see the follow ing manuscript s Supplementary Information LA David amp EJ Alm Rapid evolutionary innovation during an Archaean Genetic Expansion Nature 2010 doi 10 1038 nature09649 About Lawrence David wrote AnGST during the completion of his PhD in Computational and Systems Biology at the Massachusetts Instit
2. ST score rw r r 1 lad staff 676 Dec 21 19 31 AnGST stats AnGST counts Records the number of gene copies inferred in lineages on the species tree Ancestral lineages are denoted using dashed concatenations of their leaf species names AnGST events A list of the inferred evolutionary events brn gene family birth spc speciation los gene loss dup gene duplication hgt horizontal gene transfer and cur an extant gene copy Ancestral lineages are denoted using dashed concatenations of their leaf species names AnGST leaf Chronicles the evolutionary history of each gene copy AnGST newick The reconciled gene tree in Newick format If multiple bootstraps were provided as input for the gene tree Section B this gene tree may not exactly match any of the individual bootstrap trees Mappings from internal nodes on the gene tree to nodes on the species tree are recorded where node bootstrap values are conventionally written in Newick format These mappings are expressed using a concatenation of leaf species names AnGST score Contains the AnGST reconciliation score AnGST stats A general log of the AnGST reconciliation Includes statistics on AnGST running time and memory usage 5 Optional Bootstrap trees Errors or uncertainty in gene phylogenies can lead to the inference of spurious macroevo lutionary events 2 and is a particular concern for deeply branching phylogenies 3 AnGST can account
3. al Otherwise please use our online AnGST server almlab mit edu angst AnGST requires Python 2 X to run AnGST is not compatible with Python 3 X If you don t already have Python installed you can download the Python installation package from the official Python website After installing Python decompress the AnGST tarball and you re ready to run AnGST 2 Quick start A toy dataset has been included with the AnGST distribution To reconcile the gene and species trees in the toy dataset navigate to the unzipped AnGST folder and execute on the command line gt gt python angst_lib AnGST py example AnGST input The following sections walk through the various inputs Section 3 and outputs Section 4 associated with the toy reconciliation 3 Inputs Input file The only argument directly passed to AnGST on the command line is the AnGST input filename Tree filenames and reconciliation parameters are stored in this file Here are the contents of the example AnGST input file species example species txt gene example gene txt penalties example penalty file output example angst Species tree Specify the species tree in AnGST input using the following format species species newick The input species reference tree to AnGST should be written in Newick format Bootstrap values on the species tree will be disregared Species names should not contain periods underscores or spaces Th
4. ata See Section 5 below for details on how to use this feature Event penalties AnGST will use event penalties to find the reconciliation with the lowest overall cost The file enumerating event penalties are specified in AnGST input with the line penalties penalties file User penalties in penalties file should take the following format for horizontal gene transfer hgt gene duplication dup gene loss los and speciation spc het 3 0 dup 2 0 los 1 0 spc 0 0 Penalties should be real and non negative Different event penalties will lead to different reconciliation scenarios Choosing event penalties is not an easy problem and you may want to try a range of penalties We found that when looking across a broad range of gene families and eukaryotic and prokaryotic genomes the event penalties listed above minimized the divergence in genome size among related genomes KT Output directory Assign the AnGST output directory in AnGST input using the following syntax output angst_output 4 Outputs An AnGST run generates several output files gt gt ls 1t example angst total 48 rw r r 1 lad staff 77 Dec 21 19 31 AnGST counts rw r r 1 lad staff 140 Dec 21 19 31 AnGST events rw r r 1 lad staff 378 Dec 21 19 31 AnGST leaf rw r r 1 lad staff 100 Dec 21 19 31 AnGST newick rw r r 1 lad staff 4 Dec 21 19 31 AnG
5. e input species tree must be rooted All branches on the species tree including the root must have a branch length Trees should end with a semicolon Here is an example of the required species tree format UUCESTO 221 0 321 0 1 0 21 0 41 1 0 5 1 0 sl Geil 1 0 20 001 See Section 5 below if you have a time tree and would like to place temporal constraints on proposed HGT events Gene tree Specify the gene tree in AnGST input using the following format gene gene newick The input gene tree s to AnGST should also be provided in Newick format The gene tree should be unrooted AnGST roots the tree while looking for the lowest scoring reconciliation scenario All branches on the gene tree should have a branch length Each leaf of the gene tree must correspond to one leaf on the species tree All leaves must also have an identifier tag Two leaves from the same genome should not have the same identifier tag Identifier tags should be separated from species names by a period or an underscore Here is an example of an acceptably formatted gene tree ODO GU a ee ay O OE AnGST can minimize inference errors due to phylogenetic uncertainty by incorpo rating tree construction into the reconciliation process Multiple gene tree bootstraps can be provided to AnGST the algorithm will retain and combine bootstrap subtrees which yield the most conservative reconciliation consistent with the sequence d
6. for phylogenetic uncertainty by simultaneously reconciling and reconstructing a gene tree the tree with the lowest reconciliation cost can be con structed from an ensemble of trees consistent with the sequence data Suitable tree ensembles can be generated by the non parametric bootstrapping step of tree infer ence algorithms like AnGST can merge subtrees from these bootstraps into a single chimeric tree that does not match any of the input bootstraps exactly but whose bipartitions can be found in at least one of the input bootstraps In simula tions we observed these trees to be significantly more accurate than trees based on sequence likelihood alone although they generally have lower likelihood 1 AnGST will incorporate multiple bootstraps into a reconciliation if there are mul tiple trees in the gene tree file An example bootstrapped gene tree is provided in example boot txt gt gt more example boot txt 20 1 0 1 0 1 0 3_0 1 0 1 0 2 0 4 0 1 0 5 0 1 0 1 0 6_0 1 0 10 1 0 20 1 0 30 1 0 1 0 2 0 40 1 0 6_0 1 0 1 0 5_0 1 0 Reconcile this tree by editing AnGST input to read gene example boot txt Ultrametric trees If the branch lengths on the provided species tree represent times AnGST can restrict the set of possible inferred gene transfers to only those between contemporaneous lineages Add to AnGST input ultrametric True Any non zero chronological overlap
7. is sufficient to allow transfers But if a gene transfer is inferred from node s to node s2 subsequent transfers of the gene copy in 52 may only occur with lineages which exist during the range 7 17 gt where T7 and T are the times spanned by the parent edges of s and s2 respectively 6 Currently unsupported Several additional features have been built into AnGST but have not yet been exten sively debugged These include the ability to e Specify an outgroup for the gene trees or provide rooted gene trees e Provide event specific costs to AnGST e g HGT from A B costs exactly 2 5 e Fix the node on the species tree at which the gene family was born If there is sufficient user interest future versions of AnGST may include these features Please feel free to ask 10 Bibliography 1 Lawrence A David and Eric J Alm Rapid evolutionary innovation during an Archaean genetic expansion Nature Dec 2010 y Matthew W Hahn Bias in phylogenetic tree reconciliation methods implications for vertebrate genome evolution Genome Biol 8 7 R141 Jan 2007 ES J Bergsten A review of long branch attrac tion Cladistics 21 163 193 Jan 2005 11
8. ute of Technology 2005 2010 He is presently a Junior Fellow at the User questions suggestions or bug reports are welcomed and can be sent to Lawrence via the email address Lawrence maintains a personal homepage at Citation AnGST is free for users to run and modify If you use AnGST in a publication please cite LA David amp EJ Alm Rapid evolutionary innovation during an Archaean Genetic Expansion Nature 2010 doi 10 1038 nature09649 Acknowledgements AnGST was developed in collaboration with many individuals Eric Alm guided the development of the algorithm Dirk Gevers Abdoulaye Diallo Ming Chun Miki Lee and David Robinson provided helpful initial user feedback Albert Wang built the online AnGST server almlab mit edu angst During his research Lawrence David was supported by a National Defense Science amp Engineering Graduate Fellowship DoD and a Whitaker Health Sciences Fund Fellowship Contents 1 lO cea a ra e Shes as 2 Q ick start oec Go sa tora ea Pora HE OE eh ew eS E n 3 WG s E enea e eee a e A 4 SN III 5 Optional IE 6 Currently unsupported o 10 Bibliography 11 O 0 Ot OT a 1 Installation To run AnGST on your own computer or cluster you will need to have experience running programs on the command line If you haven t used command line software before but would like to learn more about this computing environment you can try perusing this online tutori

Download Pdf Manuals

image

Related Search

AnGST: The Analyzer of Gene die grenze der angst analyse the gene to be angry

Related Contents

TEFAL FV3520E0 Instruction Manual  Sony VAIO SVE17132CX  Samsung XE500T1C-K01RO User Manual (Windows 8)    PTS 100 - Electrocomponents  Version1  Sony Multi Functional Digital Camera User's Manual  User Manual.cdr  Puncher Unit-BF1/ -BG1/ -BH1 Service Manual  Manuale utente di Seagate DiscWizard  

Copyright © All rights reserved.
DMCA: DMCA_mwitty#outlook.com.