Home

User Manual for MEGAN V4.69.3

1. typically collected in a metagenomics project In a preprocessing step a sequence comparison of all reads with a suitable database of reference DNA or protein sequences must be performed to produce an input file for the program MEGAN is suitable for DNA reads metagenome data RNA reads metatranscriptome data peptide sequences metaproteomics data and using a suitable synonyms file that maps SILVA ids to taxon ids on 16S rRNA data amplicon sequencing At start up MEGAN first reads in the current NCBI taxonomy consisting of over 670 000 taxa A first application of the program is that it facilitates interactive exploration of the NCBI taxonomy However the main application of the program is to parse and analyze a the result of a BLAST comparison of a set of reads against one or more reference databases typically using BLASTN BLASTX or BLASTP to compare against NCBI NT NCBI NR or genome specific databases The result of a such an analysis is an estimation of the taxonomical content species profile of the sample from which the reads were collected The program uses a number of different algorithms to place reads into the taxonomy by assigning each read to a taxon at some level in the NCBI hierarchy based on their hits to known sequences as recorded in the BLAST file Alternatively MEGAN can also parse files generated by the RDP website 4 or the Silva website 13 Moreover MEGAN can parse files in SAM format 8 A
2. Chart SEED item Chart number of reads assigned to SEED nodes e The Window Chart KEGG item Chart number of reads assigned to KEGG nodes e The Window Network Comparison item Open a network comparison window e The Window Rarefaction Analysis item Compute and chart the species rarefaction curve e The Window Command Line Syntax item Shows the command line syntax for commands relevant for current window e The Window Input Command item Enter and execute a command 12 16 The MEGAN Menu Under MacOS there is an additional standard menu associated with the program called the MEGAN menu As usual this contains the Window About and File Quit menu items 12 17 Wheel Mouse and Special Keys Use of a wheel mouse is recommended for zooming of graphics displayed in different windows The default is vertical zoom For horizontal zoom additionally press the shift key To scroll the graph either press and drag the mouse using the right mouse button or use the arrow keys To zoom the graph in verticial or horizontal direct press the shift key while using the arrow keys To increase the zoom factor additionally press the alt key or the control key To select a region of nodes using the mouse click hold for a second until the cursor changes to an arrow and then drag the mouse to capture the nodes to be selected 13 SEED Window The SEED window is used to display a SEED analysis of gene function ba
3. The Layout Fully Contract item Contract tree vertically The Layout Fully Expand item Expand tree vertically The Layout Draw Circles item Draw data as circles The Layout Draw Pies item Draw data as pie charts The Layout Draw Heatmaps item Draw data as heat maps The Layout Draw Bars item Draw nodes as bars The Layout Cladogram item Draw the tree as a cladogram with all leaves aligned right The Layout Phylogram item Draw the tree as a phylogram in which all edges have length one The Layout Use Magnifier item Turn magnifier on and off The Layout Draw Leaves Only item Only draw leaves The Layout Highlight Differences submenu 15 12 11 The Expand Contract Menu The Expand Contract menu contains the following items e The e The e The e The Expand Contract Expand Horizontal item Expand view horizontally Expand Contract Contract Horizontal item Contract view horizontally Expand Contract Expand Vertical item Expand view vertically Expand Contract Contract Vertical item Contract view vertically 12 12 The Highlight Differences Menu The Highlight Differences menu contains the following items e The Highlight Differences Uncorrected item In a comparison of exactly two datasets highlight statistically significant differences using no correction e The Highlight Differences Holm Bonferroni Corrected item In a comparison of exactly two datasets highlight statist
4. 1 The Options List Microbial Attributes item List NCBI microbial attributes for selected microbes The Options Compare item Open compare dialog to produce a comparison of multiple datasets The Options Reorder or Rename item Change the order of names of the datasets in a comparison The Options Open NCBI Web Page item Open NCBI Taxonomy web site in browser The Options Inspect item Inspect the read to taxon assignments 14 12 9 The Taxon Disabling Menu The Taxon Disabling menu contains the following items The Taxon Disabling Enable All item Enable all taxa The Taxon Disabling Disable item Disable all selected taxa If none are selected asks for taxa to disable comma separated The Taxon Disabling Enable item Enable all selected taxa If none are selected asks for taxa to enable comma separated The Taxon Disabling List Disabled item List all disabled taxa 12 10 The Layout Menu The Layout menu contains the following items The Layout Expand Contract submenu The Layout Layout Labels item Layout labels The Layout Scale Nodes By Assigned item Scale nodes by number of reads assigned to taxon The Layout Scale Nodes By Summarized item Scale nodes by number of reads assigned to and below a taxon The Layout gt Set Max Node Radius item Set the maximum node radius in pixels The Layout Zoom To Selection item Zoom to the selection
5. Creator MEGAN version 4 0alpha20 built 14 Oct 2010 2 CreationDate Wed Oct 27 17 10 52 CEST 2010 3 ContentType Summary4 4 Names 155_PE_1_fixed paired ecoli testrun 2000 nr 5 QUids 1288068180866 1288190195887 6 Sizes 51246 2000 7 TotalReads 200000 8 Collapse SEED 2000041 9 Algorithm Taxonomy tree from summary 10 Parameters normalized_to 100000 11 NodeStyle KEGG piechart The first and second lines are optional descriptions of who generated the file when The third line identifies the format as Summary4 indicating that this is a summary file in the format introduced 27 with MEGAN 4 The fourth line lists the names of all datasets that are represented by this file In this case there are two Line 5 of this example lists the unique identifier numbers associated with the datasets if any Line 6 lists the original sizes of the datasets Line 7 lists the total number of reads This is not necessary the sum of the original sizes Line 8 specifies for the SEED classification which nodes are to be collapsed in the visual representation of the classification The keyword SEED can be replaced by TAXONOMY or KEGG for the other classifications Line 9 contains the name of the algorithm used to compute a classification The second word here is a keyword to identify which classification is meant Line 10 lists parameters of the computation used to generate the file Line 11 specifies the style used to draw nodes in a given classification
6. In the NCBI NR database names of taxa are placed between square brackets Otherwise MEGAN searches for maximal and non overlapping substrings that can be mapped onto an NCBI taxon id Again the taxon id of the match is set to the LCA Otherwise the taxon is set to unknown 32 5 Required Format of Read Files Reads from sequencing are assume to be provided in multi FastA format in a reads file The first word of a FastA header is assumed to be the read id The remaining text of the FastA header must contain the length of the read either as length number or as length length 32 6 Graphics Formats The following graphics formats are supported e BMP Bitmap e EPS Encapsulated PostScript vector format e GIF Graphics Interchange Format e JPEG Joint Photographic Experts Group e PDF Portable Document Format vector format e PNG Portable Network Graphics SVG Scalable Vector Graphics vector format 31 32 7 CSV File Format MEGAN supports importing data from other programs in a comma separated format from a CSV file using the File Import CSV menu item The input file must be a text file in which either all lines each contain two strings that are separated by a comma or all lines each contain three strings separated by commas Importing read assignments If each line of the CSV file contains two strings separated by a comma then the first string will be i
7. Nodes 13 arb file 8 Assignments To CSV 12 BLAST file 28 BLASTN file 28 BLASTP file 28 BLASTX file 28 BMP 31 Bonferroni Corrected 16 c 14 Case sensitive 24 Change LCA Parameters 14 25 Chart Diversity 23 Chart KEGG 18 22 Chart Microbial Attributes 18 Chart SEED 18 22 Chart Taxa 18 22 Cladogram 15 Class 14 17 Classifier Assignment Detail 8 Classifier Hierarchy View 8 Close 12 24 Collapse 16 21 Collapse at Level 16 Collapse At Taxonomic Level 16 17 Collapse At Taxonomic Level Class 17 Collapse At Taxonomic Level Family 17 Collapse At Taxonomic Level Genus 17 Collapse At Taxonomic Level Kingdom 17 Collapse At Taxonomic Level Order 17 Collapse At Taxonomic Level Phylum 17 Collapse At Taxonomic Level Species 17 Color Matches 23 Color Mismatches 23 color change 24 command context 35 command line 33 command line installation 6 command line mode 33 Command Line Syntax 18 35 Compare 25 Compare 14 25 Content pane 20 Contract Gaps 24 Contract Horizontal 16 Contract Vertical 16 Convert text to graphics 26 Copy 12 Copy Alignment 23 Copy Consensus 23 Copy Image 12 Copy Reference 23 CSV file 32 Cut 12 Disable 15 Disclaimer 3 Don t save 20 Draw Bars 15 Draw Circles 15 Draw Heatmaps 15 Draw Leaves Only 15 Draw Pies 15 Edit 12 23 Edit Comparison Colors 13
8. OTU tables that are generated and used by the QIIME package 3 6 The NCBI Taxonomy TheNCBI taxonomy provides unique names and IDs for over 660 000 taxa including approximately 25 000 prokaryotes 84 000 animals 65 000 plants and 17 000 viruses The individual species are hierarchically grouped into clades at the levels of Superkingdom Kingdom Phylum Class Order Family Genus and Species and some unofficial clades in between At startup MEGAN automatically loads a copy of the complete NCBI and then displays the tax onomy as a rooted tree The taxonomy is stored in an NCBI tree file and an NCBI mapping file which are supplied with the program 7 The NCBI NR and NCBI NT Databases The NCBI NR non redundant protein sequence database is available from the NCBI website It contains entries from GenPept Swissprot PIR PDF PDB and RefSeq It is non redundant in the sense that identical sequences are merged into a single entry The NCBI NT nucleotide sequence database is available from the NCBI website It contains entries from GenBank and is not non redundant It contains untranslated gene coding sequences and also mRNA sequences 8 Assigning Reads to Taxa The main problem addressed by MEGAN is to compute a species profile by assigning the reads from a metagenomics sequencing experiment to appropriate taxa in the NCBI taxonomy At present this program implements the following naive approach to this problem 1 C
9. and to control the program via the main menus Initially at startup before reopening or creating a new RMA file the Main window displays the NCBI taxonomy By default the taxonomy is only drawn to its second level Parts of the taxonomy or the full taxonomy can be explored using the menu items of the window Once a data set has been read in the full NCBI taxonomy is replaced by the taxonomy that is induced by the data set The size of nodes indicates the number of reads that have been assigned to the nodes using the algorithm described in Section 8 Double clicking on a node will produce a textual report stating how many reads have been assigned to the corresponding taxon and how many reads have been assigned in total to the taxon and to any of the taxa below the given node in summary Subtrees can be collapsed and expanded as described below Most windows in MEGAN provide access to their functionality through menus a tool bar that contains a selection of the menu items and popup menus that also provide contextual access to menu items We now discuss all menus of the Main window 12 1 The File Menu The File menu contains the following items e The File gt New item Open a new empty document e The File Open item Open a MEGAN file ending on rma meg or megan Under Linux or Windows multiple files can be selected and opened To open multiple files simutaneously on a Mac press the shift key when selecting this menu i
10. as alignment The Options Show As Mapping item Show alignment as mapping The Options Unsorted item Do not sort sequences The Options Sort By Names item Sort sequences by names The Options Sort By Start item Sort sequences by start positions The Options Move Up item Move selected sequences up The Options Move Down item Move selected sequences down The Options Sort By Similarity item Sort sequences by pairwise similarity The Options Set Amino Acid Colors item Set the color scheme for amino acids The Options Color Matches item Color letters that match the reference sequence The Options Color Mismatches item Color letters that do not match the reference sequence The Options Chart Diversity item Opens a chart showing a diversity analysis that aims at estimating the number of distinct genomic sequences corresponding to a given gene Using a sliding window default length 25 along the reference sequence the program records the total number n of reads that cover the window and the number k of such reads that have distinct sequences over the window These are then depicted in a scatter plot Using a simple function based on Michaelis Menten kinetics 14 the program plots a curve for the data that is used to estimate the number of different genomes involved See 6 for details 23 The Layout menu contains the following items e The Layout Show Insertions item Show insertions in reads
11. at least change the parameters which do not affect the memory management All changes will be performed by editing the postgresql conf file and will be applied by a restart of the PostgreSQL server e max_connections defines how many parallel connections can be handled by the database server Even though the interaction of a MEGAN client and the database server can usually be reduced to one connection there are scenarios when a MEGAN client needs more the one connection PostgreSQL can manage a couple of hundred connections at the same time therefore it is advisable to set this parameter a little higher then the expected number of MEGAN clients accessing the database in parallel Recommended value is 100 39 e shared_buffers determines how much memory is dedicated to PostgreSQL for caching data It has been showed that particularly for a multiuser database system it is beneficial to increase the value of the parameter to around one quarter of the systems main memory e effective_cache_size describes in contrast to the previous parameter how much disk space is reserved for caching This is especially important when searching for an optimal query plan In general a well accepted value is around half of the main memory of the system e default statistics target is an important parameter for the analysis of data stored in the database A higher value will increase the quality of the generated database queries and therefore the execution
12. file Alternatively a nine column may be supplied that contains either the taxon name or taxon Id associated with the database sequence The program also scans the subject field for RefSeq identifiers to determine the associated gene MEGAN can read gzipped BLAST files For human readable format any BLASTX file or BLASTP file is expected to adhere to the format shown in Figure 1 Any BLASTN file is expected to adhere to the format shown in Figure 2 28 BLASTX text text followed by 0 or more blocks of the following type Query query id text length length text or Query query id text length length text text followed by 0 or more blocks of the following type gt text NCBL taxon name text line breaks ok Score score bits bits Expect e value Identities text percent identities Positives text percent positives Gaps text percent gaps h Frame frame followed by 0 or more blocks of the following type Query text text Sbjct text Figure 1 The required structure of a BLASTX file Labels shown as label are tokens that must occur verbatim in the file Labels shown as label are values that are read into the program The first word in the file must be BLASTX The header line starting with Query which is taken from the Fasta header of the query sequence a read must start with a one word unique identifier for the re
13. if the file was recently opened by the program then it may be contained in the File Open Recent submenu By default when parsing an input file for each read taxon and RefSeq id only one best scoring match is kept For example if read R has two equally high scoring matches M and M to two sequences from L coli say then MEGAN will discard one of the two matches unless they have different RefSeq accession numbers or unless exactly one of the two matches does not have a RefSeq accession number To turn this filter off use the Window Input Command menu item to enter the following command setprop one_match_per_taxon false 5 1 Blast Files New input to the program is usually provided as a BLAST file obtained from a BLAST comparison of the given set of reads to a database such as NCBI NR or NCBI NT see Section 32 for details of the file formats used MEGAN supports BLASTN BLASTX and BLASTP standard text format and BLAST XML format MEGAN can read gzipped BLAST files directly so there is no need to un gzip them although at present MEGAN processes uncompressed files much faster than compressed ones MEGAN can also parse tabular BLAST output generated using BLAST option m 8 however as this form of output does not contain the subject line for sequences matched it is unsuitable for MEGAN because MEGAN cannot determine the taxon or gene associated with the database sequence However if you add an additional column to this forma
14. in this case KEGG The main body of a MEGAN text file contains multiple lines as follows in any order TAX 199310 0 1250 TAX 1 271 100 TAX 28216 35 TAX 32523 8 TAX 2 8336 1350 KEGG 7716 12 KEGG 3859 2 KEGG 7714 2 100 SEED 54 6 50 The general format is classification count 1 count 2 count n Here classification is either TAX for taxonomy SEED or KEGG This is followed by a number indicating a class in the given classification In the case of taxonomy this is the NCBI taxonid This is followed by up to n numbers where n is the number of datasets mentioned in the header indicating how many reads in the 1 st 2 nd etc dataset were assigned to the given class Because this format is also embedded in RMA files to provide a summary of the data when opening an RMA file generated by an earlier version of MEGAN the program automatically updates the summary to the new format As a consequence any RMA file that has been opened by MEGAN 4 cannot later be opened by an earlier version of the program 32 3 Required Syntax of BLAST Files MEGAN imports data from a BLAST file MEGAN can parse BLAST files in standard or XML format obtained using the BLAST output option m 0 or m 7 respectively MEGAN can also parse tabular format BLAST output option m 8 For this to work the subject field must either contain taxon names or GI accession numbers In the latter case please use the Load GI Lookup File button to load a GI lookup
15. internal nodes The Select All Intermediate Nodes item Select all intermediate nodes The Select Subtree item Select subtree The Select Leaves Below item Select allow leaves below The Select Invert item Invert selection The Select Level submenu 13 12 7 The Level Menu The Level menu contains the following items The Level Kingdom item Select Kingdom The Level Phylum item Select Phylum The Level Class item Select Class The Level Order item Select Order The Level Family item Select Family The Level Genus item Select Genus The Level Species item Select Species 12 8 The Options Menu The Options menu contains the following items The Options Set Number Of Reads item Set the total number of reads in the analysis The Options Change LCA Parameters item Rerun the LCA analysis with different parameters The Options Taxon Disabling submenu The Options List Summary item List summary of hits for selected nodes of tree The Options List Path item List path from root to selected nodes The Shannon Weaver Index computes the Shannon Weaver diversity index on the set of selected nodes This is J p log pi where p is the proportion of reads assigned to node i The Simpson Reciprocal Index computes the reciprocal Simpson diversity index on the set of selected nodes This is 1 p where p is the proportion of reads assigned to node
16. match exceeds the win score only matches exceeding the win score winners are used to place the given read The hope is that secondary homology induced matches are discarded in the presence of stronger primary matches The Min Complexity item can be used to identify low complexity reads These are placed on a special Low Complexity node To turn this filter off set the value to 0 A value of 0 3 catches most low complexity short reads e The Paired Reads item can be used to turn paired read awareness of MEGAN on and off In paired read mode MEGAN utilities read pairing information to enhance the taxonomic assignment of reads e The Use Percent Identity Filter item can be used to turn on an additional filter for assigning reads to a specific taxonomic level When this is active the percent identity of a match must exceed the given value of percent identity to be assigned at the given rank Species 99 Genus 97 Family 95 Order 90 Class 85 Phylum 80 This should only be used when analyzing 16S rRNA sequences 28 Compare Dialog The Compare dialog is opened using the Options Compare item This dialog provides a list of currently open datasets To construct a comparison select at least two different datasets and then press ok Select Use absolute counts if you want the comparison the original counts of reads for each dataset SelectNormalize over all reads if you want all counts to be normalized such that each dataset
17. 6 Show Number of Reads Summarized 16 Show Reference 24 Show Taxon 21 Show Taxon Ids 16 Show Taxon Names 16 Show Taxon 21 Show Unaligned 24 Silva 8 silva2ncbi map 8 Simpson Reciprocal Index gt c 14 Sort By Names 23 Sort By Similarity 23 Sort By Start 23 Species 14 17 subsystems 10 Subtree 13 Summary 12 SVG 31 sync 22 synonyms file 20 taxon assignment to matches 7 Taxon Chart Window 22 Taxon Disabling 10 14 15 Taxon Disabling Disable 15 Taxon Disabling Enable All 15 Taxon Disabling Enable 15 Taxon Disabling List Disabled 15 taxon node 21 text storage policies 20 Top Percentage 25 Translate 23 Tree 16 Tree Collapse 16 Tree Collapse at Level 16 Tree Collapse At Taxonomic Level 16 Tree gt Node Labels Off 17 Tree gt Node Labels On 17 Tree Show Intermediate Labels 17 Tree Show Number of Reads Assigned 16 Tree Show Number of Reads Summarized 16 Tree Show Taxon Ids 16 Tree Show Taxon Names 16 Tree Uncollapse 5 16 Tree Uncollapse Subtree 5 16 Type setting conventions 3 47 unassigned 25 Windows XP 6 ncollapse 5 16 Wizard pane 19 ncollapse Subtree 5 16 ncorrected 16 nix 6 nsorted 23 pdate 34 se absolute counts 25 se All Hits 21 se Alternative Taxonomy 13 se Default NCBI Taxonomy 13 se GI Lookup 20 se Hit 21 se Magnifier 15 se Percent Identity Filter 25 s
18. 6 Message Window The Message window is opened using the Window Message Window item The program writes all messages to this window The window contains the usual File and Edit menu items 24 27 Parameters Dialog TheParameters dialog is used to control the parameters of the LCA assignment algorithm It can be invoked by selecting Options Change LCA Parameters The dialog options are e The Min Support item can be used to set a threshold for the minimum support that a taxon requires that is the number of reads that must be assigned to it so that it appears in the result Any read that is assigned to a taxon that does not have the required support is pushed up the taxonomy until a node is found that has sufficient support version 3 10 onward earlier versions counted the read as unassigned By default the minimum number of reads required for a taxon to appear in the result is 5 e The Min Score item can be used to set a minimum threshold for the bit score of hits Any hit in the input data set that scores less than the given threshold is ignored e The Top Percentage item can be used to set a threshold for the maximum percentage by which the score of a hit may fall below the best score achieved for a given read Any hit that falls below this threshold is discarded e The Win Score item can be used to try and separate matches due to sequence identity and ones due to homology If a win score is set then for a given read if any
19. Edit Edge Label 13 Edit Node Label 12 Edit Copy 12 Edit Copy Alignment 23 Edit Copy Consensus 23 Edit Copy Image 12 43 Edit Copy Reference 23 Edit Cut 12 Edit Edit Edge Label 13 Edit Edit Node Label 12 Edit Find Again 13 Edit gt Find 13 24 Edit Format 13 24 Edit Paste 12 Edit Preferences 13 Edit Show Taxon 21 Edit Translate 23 Enable All 15 Enable 15 enzymes 19 EPS 31 examples 5 Expand 21 Expand Horizontal 16 Expand Vertical 16 Expand Contract 15 16 Expand Contract Contract Horizontal 16 Expand Contract Contract Vertical 16 Expand Contract gt Expand Horizontal 16 Expand Contract Expand Vertical 16 Export 11 12 Export Image 26 Export Image 11 26 Export Alignments 12 Export Assignments To CSV 12 Export Matches 12 Export Reads 12 26 Export gt Summary 12 Extract Reads 12 26 Extractor 26 Family 14 17 File 11 23 File Close 12 File gt Export 11 File gt Export Image 11 26 File Extract Reads 12 26 File gt Import CSV 12 32 File Import From BLAST 5 11 27 File gt Import QIIME 12 File gt New 11 File gt Open Recent 6 11 File gt Open 6 11 File gt Page Setup 12 File Print 12 File Properties 12 File Quit 12 18 File gt Save Alignment 23 File gt Save As 11 File Save Consensus 23 Files pane 20 F
20. G Chart Window TheKEGG Chart Window can be used to visualize the abundance distribution of the KEGG classes as pie and bar chart and as a heat map It can be opened using the Window gt Chart KEGG menu item If nodes of the dataset have been selected in the KEGG window they will be displayed directly in the chart To change the nodes shown in the chart window select them in the main window and then press the sync button 23 Alignment Viewer TheAlignment Viewer can be used to compute and visualize a multiple sequence alignment of all reads that have significant matches to a reference sequences associated with a given taxon SEED class or KEGG class It can be opened using the Window Alignment Viewer menu item 22 Here is an overview of the menus available in this viewer and those menu items that do not appear in the main viewer See 6 for details The File menu contains the following items The File Save Alignment item Save alignment to a file The File Save Consensus item Save consensus to a file Edit menu contains the following items The Edit Copy Alignment item Copy selected part of the alignment The Edit Copy Consensus item Copy selected consensus sequence to clipboard The Edit Copy Reference item Copy selected reference sequence to clipboard The Edit Translate item Translate DNA or cDNA sequence Options menu contains the following items The Options Show As Alignment item Show
21. User Manual for MEGAN V4 69 3 Daniel H Huson June 28 2012 Contents Contents 1 Introduction 2 Getting Started 3 Obtaining and Installing the Program 4 Program Overview 5 Importing Reading and Writing Files 6 The NCBI Taxonomy 7 The NCBI NR and NCBI NT Databases 8 Assigning Reads to Taxa 9 Identification of SEED Functional Classes 10 Mapping of Reads to KEGG Pathways 11 Comparison of datasets 12 Main Window 13 SEED Window 14 KEGG Window 15 Network Window 10 10 11 11 18 19 19 16 Import Dialog 17 Inspector Window 18 Microbial Attributes Window 19 Rarefaction Window 20 Taxon Chart Window 21 SEED Chart Window 22 KEGG Chart Window 23 Alignment Viewer 24 Find Toolbar 25 Format Dialog 26 Message Window 27 Parameters Dialog 28 Compare Dialog 29 Extractor Dialog 30 Export Image Dialog 31 About Window 32 File Formats 33 Command Line Options 34 Command Line Commands 35 Examples 36 Using More Memory 37 PostgreSQL 38 Acknowledgments 19 21 21 22 22 22 22 22 24 24 24 25 25 26 26 26 26 33 35 37 37 38 40 References 40 Index 43 1 Introduction Disclaimer This software is provided AS IS without warranty of any kind This is develop mental code and we make no pretension as to it being bug free and totally reliable Use at your own risk We will accept no liability for any dama
22. ad and must also contain a statement containing the length of the read in the format length length or as length length Another important feature is that the comment line of the database sequence must contain a NCBI taxon name If names are not contained in the comment lines then the accession lookup support must be used Finally the Gaps statement is optimal 29 BLASTN text text followed by 0 or more blocks of the following type Query query id text length length text or Query query id text length length text text followed by 0 or more blocks of the following type gt text NCBI taxon name text line breaks ok Score score bits bits Expect e value Identities text percent identities Gaps text percent gaps h Strand strand strand followed by 0 or more blocks of the following type Query text text Sbjct text Figure 2 The required structure of a BLASTN file Labels shown as label are tokens that must occur verbatim in the file Labels shown as label are values that are read into the program The first word in the file must be BLASTN The header line starting with Query which is taken from the Fasta header of the query sequence a read must start with a one word unique identifier for the read and must also contain a statement containing the length of the read in the format length length Another impo
23. are initially assigned to a taxon that is not deemed present are pushed up the taxonomy until a node is reached that has enough reads Taxa in the NCBI taxonomy can be excluded from the analysis For example taxa listed under root unclassified sequences metagenomes may give rise to matches that force the algorithm to place reads on the root node of the taxonomy This feature is controlled by Options Taxon Disabling menu At present the set of disabled taxa is saved as a program property and not as part of the Megan document Note that the LCA assignment algorithm is already used on a smaller scale when parsing individual blast matches This is because an entry in a reference database may have more than one taxon associated with it For example in the NCBI NR database an entry may be associated with up to 1000 different taxa This implies in particular that a read that may be assigned to a high level node even the root node even though it only has one significant hit if the corresponding reference sequence is associated with a number of very different species Note that as of version 4 60 1 the list of disabled taxa is also taken into consideration when parsing a BLAST file Any taxa that are disabled are ignored when attempting to determine the taxon associated with a match unless all recognized names are disabled in which case the disabled names are used 9 Identification of SEED Functional Classes The SEED classification of gen
24. cherichia coli CFT078 with a bitscore of 100 e The read r01 matches Escherichia coli K 12 with a bitscore of 110 and e The read r01 matches Salmonella enterica subsp enterica serovar Choleraesuis str SC B67 with a bitscore of 120 32 e The read r02 matches Caldicellulosiruptor saccharolyticus DSM 8903 with a bitscore of 90 To import this data into MEGAN so as to analyze is using the LCA algorithm produce the following CSV file r01 Escherichia coli CFTO73 100 r01 Escherichia coli K 12 110 r01 Salmonella enterica subsp enterica serovar Choleraesuis str SC B67 120 r02 Caldicellulosiruptor saccharolyticus DSM 8903 90 MEGAN can also import SEED or KEGG counts In addition MEGAN is able to map entries consisting of a RefSeq Id and counts to KEGG or SEED 32 8 Tree and Map Format The NCBI taxonomy is loaded by MEGAN at startup It is contained in a NCBI tree file in the standard Newick tree format The mapping from taxon IDs to taxon names is loaded by MEGAN at startup It is contained contained in a NCBI mapping file in a line based format in which each has three entries taxon ID taxon name and then a number indicating the size of the genome or 1 if the size is unknown 33 Command Line Options MEGAN has the following command line options Program usage g lt switch gt default true Use GUI f lt String gt default MEGAN file separate multiple files using fs lt String gt default S
25. d level below the root Clicking on such a tagon node will open a new level of nodes each read node representing a read that has been assigned to the named taxon Clicking on a read node will then open a new level of nodes each such read hit node representing an alignment of the given read to a sequence associated with some taxon Finally double clicking on a read hit node will display the actual BLAST alignment provided to deduce the relationship 17 1 Inspector Menus Here we describe those menu items that are different from the main window 17 2 The Inspector Edit Menu The Edit menu contains the following item e The Edit gt Show Taxon item Show the named taxon and all reads assigned to it 17 3 The Inspector Options Menu The Inspector Options menu contains the following items e The Options Collapse item Collapse the selected nodes or all if none selected e The Options Expand item Ecxpand the selected nodes or all if none selected e The Options Ignore Hit item Ignore all selected hits e The Options Use Hit item Use all selected hits e The Options Use All Hits item Use all hits 18 Microbial Attributes Window The Microbial Attributes Window can be used to analyze the composition of microbial taxa Bacteria and Archeae and their various physiological features Taxa have to be assigned with at least one read to be considered The classification and its nomenclature is based on the NCBI s 21
26. d to contain all reads and matches Moreover it has better locality and thus updating it is much faster MEGAN 4 can read RMA files generated by earlier versions of MEGAN but not vice versa A RMA file is generated using the File Import From BLAST menu item from a BLAST file and a read file or from multiple files if the reads are spread across multiple files Depending on which of the three possible text storage policies is chosen the RMA file may contain all reads and matches in a compressed form or these may be stored in a separate RMAZ file or otherwise only links to there locations in the original reads and match files are stored In the first case the size of such a file is around 10 20 of the size of the original input files and is thus usually smaller than the file that one obtains by simply compressing the BLAST file The file is indexed and thus provides MEGAN with fast access to data stored in it The reads and matches can be extracted from the file and so the MEGAN file provides a means of keeping all reads BLAST matches and analysis in one document RMA is an open format which we will describe in a separate document 32 2 The Text File Summary Format As of version 4 the MEGAN text file format has been completely rewritten to accommodate SEED and KEGG classifications A MEGAN file starts with a number of header lines each starting with a These lines can occur in any order This is best illustrated by an example 1
27. different formats see Section 32 6 The format is chosen from a menu There are two radio buttons Save whole image to save the whole image andSave visible image to save only the part of the image that is currently visible in the main viewer If the chosen format is EPS then selecting the Convert text to graphics check box will request the program to render all text as graphics rather than fonts Pressing the apply button will open a standard file save dialog to determine where to save the graphics file 31 About Window The About Window is opened using the Window About item It shows the program s splash screen 32 File Formats MEGAN uses its own file formats to store the data describing the result of a sequence comparison computation between a file of DNA reads and a database of reference sequences such as computed by BLASTX BLASTP or BLASTN 1 26 32 1 RMA Files Files ending in rma are in a compressed binary format called RMA read match archive which is a new open format that we will describe in a separate document MEGAN 1 used a text format files ending on megan or meg which are now deprecated and will not be supported by futher versions of the program By convention we use the suffix megan for MEGAN text files and rma for binary read match archive files With MEGAN 4 we have introduced a new version of the RMA format internally known as RMA 2 This format is more flexible as it does not necessarily nee
28. e Synonyms 20 J Zoom To Selection 15 ecddddad HETG caca vertical zoom 18 Website 17 Whole words only 24 Win Score 25 Window 17 Window gt About 17 18 26 Window gt Alignment Viewer 17 22 Window Chart KEGG 18 22 Window gt Chart Microbial Attributes 18 Window Chart SEED 18 22 Window gt Chart Taxa 18 22 Window gt Command Line Syntax 18 35 Window gt How to Cite 17 Window Input Command 7 18 Window Inspector Window 17 21 Window KEGG Analyzer 17 Window Main Viewer 17 Window Message Window 17 24 Window Microbial Attributes Window 18 22 Window Network Comparison 18 Window Rarefaction Analysis 18 Window Register 17 Window gt SEED Analyzer 17 Window Set Window Size 17 Window Website 17 Windows 6 Windows 7 6 48
29. e The Layout Contract Gaps item Contract all columns consisting only of gaps e The Layout Show Nucleotides item Show nucleotides in alignment e The Layout Show Amino Acids item Show amino acids in alignment e The Layout Show Reference item Show reference sequence e The Layout Show Consensus item Show consensus sequence e The Layout Show Unaligned item Show the unaligned prefix and suffix of reads 24 Find Toolbar The Find toolbar can be opened using the Edit Find item Its purpose is to find taxa genes or other strings in a window Use the following check boxes to parameterize the search e If the Whole words only item is selected then only taxa or reads matching the complete query string will be returned e IftheCase sensitive item is selected then the case of letters is distinguished in comparisons e If the Regular Expression item is selected then the query is interpreted as a Java regular expression Press the Close Find First or Find Next buttons to close the toolbar or find the first or next occurrence of the query respectively Press the Find All button to find all occurrences of the query Press the From File button to load a set of queries one per line from a file 25 Format Dialog The Format dialog is opened using the Edit Format item This is used to change the font color size and line width of all selected nodes and edges Also it can be used to turn labels on and off 2
30. e current input line have been processed For example if you want to open and MEGAN file and then to save a picture of the taxonomical analysis in a PDF file then the two commands should be entered on separate lines because otherwise the taxonomy will be drawn before the data from the MEGAN file has been processed Here is an example of the correct way to produce a picture of a taxonomic analysis open file Users huson data megan x rma exportimage file Users huson data megan x pdf format PDF replace true quit Alternatively the update command can be used to explicitly force MEGAN to update all data structures in this case the commands show appear together on one line e g open file x rma update exportimage file x pdf format PDF replace true As described below the update command takes a number of different parameters that can be used to determine exactly what type of update is required All commands supplied using the x command line option are parsed as if they were contained in one line So here the update command must be used to ensure that commands are completed when necessary To open a file print the taxonomical analysis and then close the file using the x option enter the following x open file x update exportimage file x pdf format PDF replace true quit To save an image of a SEED analysis after loading a file we must then open the SEED viewer change the command context to the se
31. e format In this case MEGAN expects the format to be a tab separated file in which each line corresponds to one read read name taxon name rank name score taxon name rank name score In more detail the first token is a string that identifies the read The next token is either empty or a minus in the latter case indicating that the read is reverse complemented Then all further tokens come in groups of three indicating the name of a taxon the name of the rank of the taxon and a score between 0 and 1 which MEGAN will multiple by 100 MEGAN reports each such taxon as a separate hit for the read 5 4 Silva Files Similarly MEGAN can import rRNA analysis files downloaded from the Silva website at http www arb silva de 13 To create a file using the Silva website that can be imported into MEGAN go to the Aligner tab of the Silva website and upload your sequences and then press the align sequences button Once the Silva website has computed an alignment you will be able to download two files an arb file and a log file MEGAN requires the log file as input not the arb file When importing such a file into MEGAN one must specify that MEGAN uses the synonyms file called silva2ncbi map to map Silva accession numbers to NCBI taxa This file is available from the MEGAN download page 5 5 CSV Files MEGAN supports import of data from other programs in a comma separated format from a CSV file 5 6 QIIME OTU Files MEGAN can import
32. e function consists of a collection of biologically defined subsystems 11 The SEED classification can be displayed as a tree containing about 10 000 nodes and edges Genes are mapped onto functional roles and these are present in one or more subsystems The program will attempt to map each read onto a gene that has an known functional role and then into one or more subsystems To perform this analysis MEGAN uses a mapping of RefSeq ids to SEED functional roles Hence if a SEED based analysis is desired then the database that is used in the BLAST comparison must contain RefSeq ids This is the case for the NCBI NR database 10 Mapping of Reads to KEGG Pathways The KEGG database provides a collection of metabolic pathways and other pathways 7 The KEGG classification can be displayed as a trees Genes are mapped onto so called KO identifiers and these are present in one or more pathways The program will attempt to map each read onto a gene that has a valid KO identifier and thus to one or more pathways To perform this analysis MEGAN uses a mapping of RefSeq ids to KO identifiers Hence if a KEGG based analysis is desired then the database that is used in the BLAST comparison must 10 contain RefSeq ids This is the case for the NCBI NR database 11 Comparison of datasets Multiple datasets can be opened simultaneously and then displayed together in a comparison view 12 Main Window TheMain window is used to display the taxonomy
33. e selected nodes The Export Summary item Export as summary file The Edit Menu Edit menu contains the following items The Edit Cut item Cut The Edit Copy item Copy The Edit Copy Image item Copy image to clipboard The Edit Paste item Paste The Edit gt Edit Node Label item Edit the node label 12 12 5 The 12 6 The The Edit Edit Edge Label item Edit the edge label The Edit gt Format item Format nodes and edges The Edit Find item Open the Find toolbar The Edit Find Again item Find the next occurrence The Edit Preferences submenu The Preferences Menu Preferences menu contains the following items The Preferences Show Legend item Show legend identifying different datasets The Preferences Edit Comparison Colors item Edit the color palette used in comparison views The Preferences Use Alternative Taxonomy item Allows one specify an alterna tive taxonomy For example this allows one to use a Silva based taxonomy The Preferences Use Default NCBI Taxonomy item Switches back to the NCBI tax onomy shipped with MEGAN The Select Menu Select menu contains the following items The SelectAll Nodes item Select all nodes The Select None item Deselect all nodes The Select gt From Previous Window item Select from previous window The Select gt All Leaves item Select all leaves The Select All Internal Nodes item Select all
34. ed taxa or the named ones enable taxa selected lt name gt enable all selected taxa or the named ones list taxa disabled List all disabled taxa Layout menu set autolayoutlabels true false Layout labels set scaleby assigned Scale nodes by number of reads assigned to taxon set scaleby summarized Scale nodes by number of reads assigned to and below a taxon set maxnoderadius lt num gt Set the maximum node radius in pixels set zoom selected Zoom to the selection set zoom fit Contract tree vertically set zoom full Expand tree vertically set nodedrawer circle Draw data as circles set nodedrawer piechart Draw data as pie charts set nodedrawer heatmap Draw data as heat maps set nodedrawer barchart Draw nodes as bars set drawer Cladogram Phylogram Draw tree as cladogram with all leaves aligned right set drawleavesonly true false Only draw leaves Expand Contract sub menu expand direction horizontal Expand view horizontally contract direction horizontal Contract view horizontally expand direction vertical Expand view vertically contract direction vertical Contract view vertically Highlight Differences sub menu set highlightdifferences true false correction none bonferroni holm_bonferroni In a comparison of exactly two datasets highlight statistically significant differences using no correction set comparison_highlight_color lt number gt Set the pairwi
35. ed viewer then configure the size of the window and how much of the tree to uncollapse Then we save the image file Here is an example open file Users huson data megan x x rma show window seedviewer set context seedviewer set windowsize 1000 x 1000 select nodes all uncollapse subtrees exportimage file Users huson data megan x x pdf format PDF replace true quit 34 34 Command Line Commands Command processing has been completely rewritten for MEGAN 4 Each type of window that can be opened by MEGAN has its own command interpreter Initially on startup the program will open a Main window and all commands piped to the program will be executed using the command interpreter associated with the main window The main window provides a number of commands for opening other windows For example the command show window seedviewer will open the SEED classification viewer To pipe commands to the SEED viewer the command context has to be set to the SEED viewer by entering set context seedviewer After entering this command all subsequent commands are handled by the interpreter associated with the SEED viewer To obtain a list of all commands available for the current interpreter enter help In obtain help on a particular command for example on export enter help export All command description lines that contain the word export case insensitive will be listed In the following we list all commands available in the Mai
36. eferences Use Alternative Taxonomy 13 Preferences Use Default NCBI Taxonomy 13 Print 12 properties file 34 Properties 12 QIIME 8 Quit 12 18 RapSearch2 5 Rarefaction 22 Rarefaction Analysis 18 rarefaction plot 22 RDP 8 RDP Assignment Detail 8 RDP standalone 8 read file 27 read hit node 21 read node 21 reads file 31 Reads 12 26 RefSeq 10 46 Register 17 Regular Expression 24 Reorder or Rename 14 26 RMA 27 RMA file 27 RMAZ 20 SAM 7 Save Alignment 23 Save As 11 Save Consensus 23 Save in main file 20 Save in separate file 20 Save visible image 26 Save whole image 26 Scale Nodes By Assigned 15 Scale Nodes By Summarized 15 SEED 10 18 SEED Analyzer 17 SEED Chart Window 22 Select 13 Select All Intermediate Nodes 13 Select All Internal Nodes 13 SelectAll Leaves 13 Select gt All Nodes 13 Select From Previous Window 13 Select Invert 13 Select Leaves Below 13 Select Level 13 Select None 13 Select Subtree 13 Set Amino Acid Colors 23 Set Max Node Radius 15 Set Number Of Reads 5 14 Set Window Size 17 Shannon Weaver Index gt c 14 Show Amino Acids 24 Show As Alignment 23 Show As Mapping 23 Show Consensus 24 Show Insertions 24 Show Intermediate Labels 17 Show KEGG Pathway 19 Show Legend 13 Show Nucleotides 24 Show Number of Reads Assigned 1
37. en to Use a list of IP addresses the server should listen to or a to determine that all possible IP addresses are allowed One should additionally define which users are allowed to connect to the database server and which method is used to authentice The pg_hba conf file contains information on which hosts are allowed to connect how clients are authenticated which PostgreSQL user names the clients can use and which databases they can access If PostgreSQL is to be installed only in order to deal with MEGAN s database a simple way to implement this is to allow all users who are registered to the database server to login In order to do so add the following line to the file host all all 0 0 0 0 0 md5 The first parameter allows TCP IP connection The second and the third define the database which can be accessed and the user allowed to login In the example above define both parameters the fact that every registered user can login to the database The fourth parameter is a CIDR address determining an allowed IP range Here all IPs are accepted The last parameter defines the login method here md5 For further information and other options for user authentication consider http www postgresql org docs 9 0 static client authentication html 37 4 Optimization In order to increase the speed of the PostgreSQL database server certain configurations can be adapted to the host system This is an optional step however it is recommended to
38. etwork 19 Network Comparison 18 New 11 Node Labels Off 17 Node Labels On 17 node size change 24 Node Inspect 5 21 None 13 Normalize over all reads 25 Open NCBI Web Page 14 Open Recent 6 11 12 Open 6 11 Options 14 23 Options Change LCA Parameters 14 25 Options Chart Diversity 23 Options Collapse 21 Options Color Matches 23 Options Color Mismatches 23 Options Compare 14 25 Options Expand 21 Options Ignore Hit 21 Options Inspect 14 Options List Microbial Attributes 14 Options List Path 14 Options List Summary 14 Options Move Down 23 Options Move Up 23 Options Open NCBI Web Page 14 Options Reorder or Rename 14 26 Options Set Amino Acid Colors 23 Options Set Number Of Reads 5 14 Options gt Show As Alignment 23 Options Show As Mapping 23 Options Show KEGG Pathway 19 Options Show Taxon 21 Options Sort By Names 23 Options Sort By Similarity 23 Options Sort By Start 23 Options Taxon Disabling 10 14 Options Unsorted 23 Options Use All Hits 21 Options Use Hit 21 Order 14 17 Page Setup 12 Paired Reads 25 Paired reads 20 Parameters 25 Paste 12 pathway tab 19 PDF 31 Phylogram 15 Phylum 14 17 PNG 31 Preferences 13 preferences 34 Preferences Edit Comparison Colors 13 Preferences Show Legend 13 Pr
39. file alternatively the output of a number of other 19 tools can also be specified then a reads file and finally the name of the new RMA file to be created The program allows one to open more than one BLAST file or reads files for the case that reads and matches are spread across multiple files If the reads are from a paired read project then selecting the Paired reads check box will request MEGAN to perform a paired read analysis see 10 Once this information has been collected the user can press the Apply button to import the data The other four panes are for advanced settings The second tabbed pane titled the Content pane can be used to specify whether the SEED or KEGG content shall be analyzed additional to an analysis of the taxonomical content The third tabbed pane titled the Files pane can be used to setup the location of files The first two items are used to specify the location of the input files to be read namely the BLAST file and the reads file The third item is used to specify the location of the new RMA file This pane provides two options The Max number of matches per read file specifies how many matches per read to save in the RMA file A small value will reduce the size of the RMA file but may exclude some important matches By default the 100 highest scoring matches per read are save The fourth tabbed pane titled the LCA Parameters pane contains all items of the Parameters dialog which allows one to set t
40. ges incurred through the use of this software Use of the MEGAN is free however the program is not open source Type setting conventions In this manual we use e g Edit Find to indicate the Find menu item in the Edit menu How to cite If you publish results obtained in part by using MEGAN then we require that you acknowledge this by citing the program as follows e D H Huson S Mitra H J Ruscheweyh N Weber and S C Schuster Integrative analysis of environmental sequences using MEGAN 4 Genome Res 2011 21 1552 1560 software freely available for academic purposes from www ab informatik uni tuebingen de software megan The first version of the program as described in 5 was designed by Daniel H Huson and Stephan C Schuster The program was written by Daniel H Huson Suparna Mitra Daniel C Richter Paul Rupek Hans Ruscheweyh and Nico Weber contributed many ideas and some supporting code The term metagenomics has been defined as The study of DNA from uncultured organisms Jo Handelsman and an approximately 99 of all microbes are believed to be unculturable A genome is the entire genetic information of one organism whereas ametagenome is the entire genetic information of an ensemble of organisms Metagenome projects can be as complex as large scale vertebrate projects in terms of sequencing assembly and analysis The aim of MEGAN is to provide a tool for studying the taxonomical content of a set of DNA reads
41. has 100 000 reads Select Ignore all unassigned reads if you want all reads assigned to the three special nodes labeled Not Assigned No Hits and Low Complexity if present to be ignored To change the order in which the datasets appear in the comparison 25 use the Move up andMove down buttons To change the order of datasets or their names as they appear in the window select the Options Reorder or Rename item 29 Extractor Dialog This provides an alternative to the Export Reads item which allows to save reads from different taxa to files whose names contain the taxon name The Extractor dialog is opened using the File Extract Reads item The dialog is used to extract all reads assigned to selected nodes For any selected nodes all reads assigned to it or to any node below it in the hierarchy are saved to a file Use the Browse button to specify the output directory As the MacOS X dialog does not support the selection of a dialog select any file inside the desired target directory Specify the file name for output in the File name field If the name contains t then the program will produce one output file per node and the name of the file is generated by replacing t by the node name Otherwise all reads are written to one file 30 Export Image Dialog TheExport Image dialog is opened using the File Export Image item This dialog is used to save a picture of the current tree in a number of
42. he command line option c For example under MacOS X type the following Volumes MEGAN MEGAN Installer app Contents MacOS JavaApplicationStub c 4 Program Overview In this section we give an overview over the main design goals and features of this program Basic knowledge of the underlying design of the program should make it easier to use the program MEGAN is written in the programming language Java The advantages of this is that we can provide versions that run under the Linux MacOS Windows and Unix operating systems Typically after generating a RMA file read match archive from a BLAST file the user will then interact with the program using the Find toolbar to determine the presence of key species collapsing or un collapsing nodes to produce summary statistics and using the Inspector window to look at the details of the matches that are the basis of the assignment of reads to taxa The assignment of reads to taxa is computed using the LCA assignment algorithm see 5 for details The program is designed to operate in two different modes in a GUI mode the program provides a GUI for the user to interact with the program In command line mode the program reads commands from a file or from standard input and writes output to files or to standard output 5 Importing Reading and Writing Files To open an existing RMA file or MEGAN text file select the File Open menu item and then browse to the desired file Alternatively
43. he installation directory depends upon your operating system 3 Obtaining and Installing the Program MEGAN is written in Java and requires a Java runtime environment version 1 5 or newer freely available from www java org MEGAN is installed using an installer program that is freely available from www ab informatik uni tuebingen de software megan There are four different installers targeting different operating systems e MEGAN_windows_4 69 3 exe provides an installer for a 32 bit version of MEGAN for Windows XP e MEGAN windows 64x_4 69 3 exe provides an installer for a 64 bit version of MEGAN for Windows 7 e MEGAN macos_4 69 3 dmg provides an installer for MacOS X e MEGAN_unix_4 69 3 sh provides a shell installer for Linux and Unix The 32 bit Windows version of MEGAN is configured to use 1 1 GB of memory For all other versions of the software the installer will allow you to set the maximal amount of memory during the installation process By default the program will suggest to use 2 GB If your computer has more memory available then it is a good idea to set this limit higher For example if you have 4 GB of main memory then set the limit for MEGAN to 3 GB This is because the program runs faster the more memory it is given To change the maximum amount of memory used after installation of the program see Section 36 To install MEGAN using a command line dialog launch the installer from the command line and pass t
44. he parameters used by the LCA algorithm Because re computation of an analysis can take quite long on a very large dataset it is recommended to set these values at this stage The last tabbed pane titled the Advanced Options pane controls how MEGAN attempts to identify the taxon associated with a given BLAST hit By default MEGAN looks for the name of a taxon in the header line of the subject sequence which is the fastest option The Load Synonyms File button can be used to load a file of customized synonyms to help identify taxa e g human for homo sapiens Each line of asynonyms file should contain two strings separated by a tab the synonym followed by the NCBI taxon name or id The Use Synonyms check box item is used to turn the use of synonyms on and off The Load GI Lookup File button can be used to load a file that maps GI accession numbers to taxon ids Due to the large size of this lookup table the file is preprocessed so as contain this data in a binary format suitable for direct access so that MEGAN does not need to read in the whole table This file should be used when importing matches that do not contain the names of taxa in a text format To use this feature please download the file gi_taxid_nucl zip or the file gi_taxid_prot zip from the MEGAN website and then unzip the file Please note that the unzipped file gi_taxid_nucl bin or gi_taxid_prot bin is over one gigabyte in size The Use GI Lookup check box item is used to turn the
45. ically significant differences using Holm Bonferroni cor rection e The Highlight Differences Bonferroni Corrected item In a comparison of exactly two datasets highlight statistically significant differences using Bonferroni correction 12 13 The Tree Menu The Tree menu contains the following items e The e The e The e The e The e The e The Tree Collapse item Collapse selected nodes Tree Collapse at Level item Collapse all nodes at given depth in tree Tree Collapse At Taxonomic Level submenu Tree Uncollapse item Uncollapse selected nodes Tree Uncollapse Subtree item Uncollapse whole subtree beneath selected nodes Tree Show Taxon Names item Display the full names of taxa Tree Show Taxon Ids item Display the NCBI ids of taxa e The Tree Show Number of Reads Assigned item Display the number of reads assigned to a taxon e The Tree Show Number of Reads Summarized item Display the total number of hits to a taxon and its descendants 16 e The Tree Node Labels On item Show labels for selected nodes e The Tree Node Labels Off item Hide labels for selected nodes e The Tree Show Intermediate Labels item Show intermediate labels at nodes of degree 2 12 14 The Collapse At Taxonomic Level Menu The Collapse At Taxonomic Level menu contains the following items e The Collapse At Taxonomic Level Kingdom item Collapse Kingdom e The Collapse At Taxonomic Level Phylu
46. ies BrowserLauncher2 10rc4 Jama 1 0 2 MRJAdapter axis batik colt h2 jcommon 1 0 16 jfreechart 1 0 13 mds 1 0 0 and postgresql 9 0 801 jdbc4 These libaries and their licenses are located in the jars folder of the MEGAN installation directory References 1 S F Altschul T L Madden A A Schaffer J Zhang Z Zhang W Miller and D J Lipman Gapped BLAST and PSI BLAST a new generation of protein database search programs Nucleic Acids Res 25 3389 3402 1997 40 2 D A Benson I Karsch Mizrachi D J Lipman J Ostell and D L Wheeler Genbank Nucleic 3 10 11 12 Acids Res 1 33 D34 38 2005 J Gregory Caporaso Justin Kuczynski Jesse Stombaugh Kyle Bittinger Frederic D Bush man Elizabeth K Costello Noah Fierer Antonio G Pena Julia K Goodrich Jeffrey I Gor don Gavin A Huttley Scott T Kelley Dan Knights Jeremy E Koenig Ruth E Ley Cather ine A Lozupone Daniel McDonald Brian D Muegge Meg Pirrung Jens Reeder Joel R Sevinsky Peter J Turnbaugh Wiliam A Walters Jeremy Widmann Tanya Yatsunenko Jesse Zaneveld and Rob Knight Qiime allows analysis of high throughput community se quencing data Nature Methods 7 5 335 336 April 2010 J R Cole Q Wang E Cardenas J Fish B Chai R J Farris A S Kulam Syed Mohideen D M McGarrell T Marsh G M Garrity and J M Tiedje The ribosomal database project improved alignments and new tools fo
47. ind 24 Find Again 13 Find All 24 Find First 24 Find Next 24 Find 13 24 font change 24 Format 24 Format 13 24 From File 24 From Previous Window 13 Fully Contract 15 Fully Expand 15 functional roles 10 genome 3 Genus 14 17 GI accession 20 gi_taxid_nucl bin 20 gi_taxid_nucl zip 20 gi_taxid_prot bin 20 gi taxid_prot zip 20 GIF 31 gzipped BLAST files 28 Highlight Differences 15 16 Highlight Differences Bonferroni Corrected 16 Highlight Differences gt Holm Bonferroni Cor rected 16 Highlight Differences Uncorrected 16 Holm Bonferroni Corrected 16 horizontal zoom 18 How to cite 3 How to Cite 17 Ignore all unassigned reads 25 Ignore Hit 21 44 Import 19 Import CSV 12 32 Import From BLAST 5 11 27 Import QUME 12 Import wizard 19 Input Command 7 18 Inspect 5 21 Inspect 14 Inspector 21 Inspector Options 21 Inspector Window 17 21 Invert 13 JPEG 31 KEGG 10 19 KEGG Analyzer 17 KEGG Chart Window 22 Kingdom 14 17 KO 10 Layout 15 24 Layout Labels 15 Layout Cladogram 15 Layout Contract Gaps 24 Layout Draw Bars 15 Layout Draw Circles 15 Layout Draw Heatmaps 15 Layout Draw Leaves Only 15 Layout Draw Pies 15 Layout Expand Contract 15 Layout gt Fully Contract 15 Layout gt Fully Expand 15 Layout Highlight Differences 15 Layout Layout Labels 15 Layo
48. indow to an image file show window pagesetup Setup the page for printing show window print Print the main panel extract what document file lt megan filename gt sparsefile false true data Taxonomy SEED KEGG ids lt numbers gt names lt names gt allbelow false true Extract all reads and matches on or below selected node s to a new document extract what reads outdir lt directory gt outfile lt filename template gt data Taxonomy SEED KEGG ids lt SELECTED numbers gt names lt names gt allbelow false true Extract reads for the selected nodes import csv reads summary separator commaltab file lt fileName gt toppercent lt num gt taxonomy true false seed false true kegg false true useRefSeq false true minscore lt num gt minsupport lt num gt Load data in comma separated values CSV format READ_NAME CLASS NAME SCORE or CLASS COUNT COUNT import format qiime file lt fileName gt Import data from a table in format used by QIIME show window properties Show document properties close Close the window Export sub menu export what CSV format readname_taxonname readname_taxonid readname_taxonpath taxonname_count taxonpath_count taxonid_count taxonname_readname taxonpath_readname taxonid_readname taxonname_length taxonpath_length taxonid_length readname_refseqid readname_ seedname readname_seedpath seedname_count seedpath_count seedname_length
49. ing gt server Xms2000M Xmx2000M lt string gt lt I4J_INSERT_VMOPTIONS gt and replace them by lt key gt VMOptions lt key gt lt string gt server Xms2000M Xmx8000M lt string gt lt I4J_INSERT_VMOPTIONS gt to run using 8 gigabytes for example To run MEGAN with more than 2GB on a 64 bit unix linux system open the file installation dir MEGAN vmoptions in a text editor Find the current memory specification e g Xmx1600M and replace it by Xmx8G to run with 8 gigabytes of memory say 37 PostgreSQL One of the new features of version 4 is the ability to work on data provided by a local or global PostgreSQL database server For this purpose a PostgreSQL database server must be installed configured and optimized when required 37 1 Installation Please obtain the installation packages or the source code from http www postgresql org and follow the provided installation instructions 37 2 User Management During the installation process the superuser postgres who has the privilege to create and delete databases as well as tables within the databases is created Normal users however should have more restricted rights A user should be allowed to login to the database and query insert delete and update data in the tables but should not be allowed to change the structure of the database Hence user accounts following defined requirements have to be created First create a new role named restricted_u
50. m item Collapse Phylum e The Collapse At Taxonomic Level Class item Collapse Class e The Collapse At Taxonomic Level Order item Collapse Order e The Collapse At Taxonomic Level Family item Collapse Family e The Collapse At Taxonomic Level Genus item Collapse Genus e The Collapse At Taxonomic Level Species item Collapse Species 12 15 The Window Menu The Window menu contains the following items e The Window About item Information about the program Windows and Linux only e The Window How to Cite item Show how to cite the program e The Window Website item Go to the program website e The Window Register item Register program for free e The Window Message Window item Open the message window e The Window Set Window Size item Set the window size e The Window Inspector Window item Open inspector window e The Window Alignment Viewer item Open alignment viewer for the selected nodes e The Window Main Viewer item Brings the main viewer to the front e The Window SEED Analyzer item Opens the SEED Analyzer e The Window KEGG Analyzer item Opens the KEGG Analyzer 17 e The Window Microbial Attributes Window item Open Microbial Attributes window e The Window Chart Taxa item Chart number of reads assigned to taxa e The Window Chart Microbial Attributes item Chart attributes of all found microbes in datasets e The Window
51. n viewer Other viewers support many of these commands too but also other viewer specific ones To determine which commands are available for a given window run MEGAN in GUI mode open the window of interest and then select the Window Command Line Syntax item to obtain a listing of all commands available for the given window Here are the commands that are available in the Main viewer Available commands context mainviewer File menu new Open a new empty document open file lt filename gt readonly false true fixlinks true false Open a MEGAN file ending on rma meg or megan import blastfile lt name gt lt name gt lt name gt fastafile lt name gt lt name gt lt name gt meganfile lt name gt maxmatches lt num gt minscore lt num gt toppercent lt num gt winscore lt num gt minsupport lt num gt mincomplexity lt num gt useseed true false usekegg true false paired false true suffixi lt string gt suffix2 lt string gt textstoragepolicy 0 1 2 blastformat GUESS BLASTX BLASTN BLASTP BLASTXML BLASTTAB RDP Assignment Detail RDP Standalone SILVA SAM Import BLAST or RDP or Silva or SAM and reads files to create a new MEGAN file save file lt filename gt summary false true Save current data set exportimage file lt filename gt format eps svg gif pngl jpglpdf replace false true textasshapes false true Export content of w
52. nk Phylum Select Phylum select rank Class Select Class select rank Order Select Order select rank Family Select Family select rank Varietas Select Varietas select rank Genus Select Genus select rank Species_group Select Species_group select rank Subspecies Select Subspecies select rank Species Select Species Options menu recompute minsupport lt number gt minscore lt number gt toppercent lt number gt winscore lt number gt mincomplexity lt number gt pairedreads false true useseed false true usekegg false true Rerun the LCA analysis with different parameters set totalreads lt num gt Set the total number of reads in the analysis will initiate recalculation of all classifications list summary all selected List summary of hits for selected nodes of tree compare mode absolute relative merge ignore_unassigned false true pid lt number gt meganfile lt filename gt Open compare dialog to produce a comparison of multiple datasets set order lt number gt lt number gt Change the order of datasets in a comparison view show window colorpalette Edit the color palette used in comparison views show webpage taxon lt name id gt Open NCBI Taxonomy web site in browser inspector taxa selected Inspect the read to taxon assignments Taxon Disabling sub menu enable taxa all Enable all taxa disable taxa selected lt name gt disable all select
53. ntepreted as a taxon name or taxon id and the second string will be intepreted as an integer specifying the number of reads assigned to the named taxon MEGAN will assume that this is the result of some analysis and thus will produce a summary file from it and will simply display it on the NCBI taxonomy with no further analysis For example assume that you have performed a metagenome analysis using some other method and have obtained the following result e Gammaproteobacteria 55 reads e Mollicutes 400 reads e Escherichia coli K 12 42 reads e Unknown 100 reads To import this data into MEGAN so as to visualize the taxonomical assignments produce the following CSV file Gammaproteobacteria 55 Mollicutes 400 Escherichia coli K 12 42 Not assigned 100 MEGAN will draw a tree with four nodes one for each of the named taxa Importing read matches Otherwise if each line of the CSV file contains three strings separated by a comma the first string will be interpreted as a read id the second one as a taxon name or id and the third one will be interpreted as a bit score for this assignment MEGAN will assume that this data describes a collection of reads and their matches This data will be analysed using the LCA algorithm and the result will be displayed on the NCBI taxonomy For example assume that you have done a database search using some other method than BLAST and have obtained the following result e The read r01 matches Es
54. ompare a given set of DNA reads to a database of known sequences such as NCBI NR or NCBI NT 2 using a sequence comparison tool such as BLAST 1 2 Process this data to determine all hits of taxa by reads 3 For each read r let H be the set of all taxa that r hits 4 Find the lowest node v in the NCBI taxonomy that encompasses the set of hit taxa H and assign the read r to the taxon represented by v We call this the LCA assignment algorithm LCA lowest common ancestor In this approach every read is assigned to some taxon If the read aligns very specifically only to a single taxon then it is assigned to that taxon The less specifically a read hits taxa the higher up in the taxonomy it is placed Reads that hit ubiquitously may even be assigned to the root node of the NCBI taxonomy If a read has significant matches to two different taxa a and b where a is an ancestor of b in the NCBI taxonomy then the match to the ancestor a is discarded and only the more specific match to b is used The program provides a threshold for the bit score of hits Any hit that falls below the threshold is discarded Secondly a threshold can be set to discard any hit whose score falls below a given percentage of the best hit Finally a third threshold can be used to report only taxa that are hit by a minimal number of reads By default the program requires at least five reads to hit a taxon before the taxon is deemed present All reads that
55. ort the BLAST file into MEGAN using the File Import From BLAST menu item The Import wizard will ask you to enter the name of the BLAST file a reads file containing all the read sequences in multi FastA format if available and the name of the new output RMA file As of version 4 you can also specify more than one BLAST file and one more than one reads file Alternatively instead of supplying a BLAST file one can also specify a file obtained from the RDP website or from the Silva website In additional MEGAN can also parse files in SAM format Some implementations or output formats of BLAST suppress those reads for which no alignments were found In this case use the Options Set Number Of Reads menu item to set the total number of reads in the analysis Clicking on a node will cause the program to display the exact number of hits of any given node and the number of hits in the subtree rooted at the node Right clicking on a node will show a popup menu and selecting the first item there Inspect will open the Inspector window which is used to explore the hits associated with any given taxon A node is selected by clicking on it Double clciking on a node will select the node and the whole subtree below it Double clicking on the label of a node will open the node in the Inspector window Example files are provided with the program They are contained in the examples subdirectory of the installation directory The precise location of t
56. out MEGAN and the authors show window checkforupdate Check for an update of the program show window cogs Open COG window show window comparisonstats Open dialog to produce a statistical comparison of two datasets show window fixlinks Fix missing links to source BLAST and reads files show window webservice Open metagenomic files from the MEGAN DB website tofront Bring window to front update reprocess false true reset false true reinduce false true Update data If nothing specified assumes reinduce true version Show version info 35 Examples Example files can be downloaded from the MEGAN website 36 Using More Memory The MEGAN installer allows you to specify the amount of MEGAN that the program can use We recommend at least 2 GB on a 64 bit machine and recommend 8 GB on a desktop MEGAN is a memory hungry application When importing BLAST files we recommend that you use a machine that allows you to run MEGAN with at least 4 GB of main memory Using less memory will work but Java will be forced to perform frequent garbage collection which will slow the program down Also because the program is i o intensive it is best to have all files on local disks as this will increase the speed of the program To run MEGAN with more than 2GB under MacOS X on an intel Mac edit the file Applications MEGAN MEGAN app Contents Info plist as follows Find the lines 37 lt key gt VMOptions lt key gt lt str
57. prokaryotic attribute table derived from http www ncbi nlm nih gov genomes Iproks cgi The window can be opened using the Window Microbial Attributes Window menu item when data has been loaded into the program 19 Rarefaction Window The Rarefaction Window be used to compute and draw a species rarefaction plot This operates by repeatedly sampling subsets from a set of reads and computing the number of leaves to which taxa have been assigned This analysis uses the current leaves of the taxonomy in other words collapsing or uncollapsing nodes will lead to a different result 20 Taxon Chart Window TheTaxon Chart Window can be used to visualize the abundance distribution of the taxa as pie and bar chart and as a heat map It can be opened using the Window Chart Taxa menu item If nodes of the dataset have been selected in the main MEGAN window they will be displayed directly in the chart To change the taxa shown in the chart window select them in the main window and then press the sync button 21 SEED Chart Window TheSEED Chart Window can be used to visualize the abundance distribution of the SEED classes as pie and bar chart and as a heat map It can be opened using the Window Chart SEED menu item If nodes of the dataset have been selected in the SEED window they will be displayed directly in the chart To change the nodes shown in the chart window select them in the main window and then press the sync button 22 KEG
58. r rrna analysis Nucleic Acids Research 37 suppl 1 D141 D145 January 2009 D H Huson A F Auch J Qi and S C Schuster MEGAN analysis of metagenomic data Genome Res 17 3 377 386 March 2007 Daniel H Huson and Chao Xie Reference guided multiple sequence alignment of metagenomic data Under review 2012 M Kanehisa and S Goto KEGG Kyoto encyclopedia of genes and genomes Nucleic Acids Res 28 1 27 30 Jan 2000 H Li B Handsaker A Wysoker T Fennell J Ruan N Homer G Marth G Abecasis and R Durbin The sequence alignment map SAM format and SAMtool Bioinformatics 25 2078 9 2009 S Mitra J A Gilbert D Field and D H Huson Comparison of multiple metagenomes using phylogenetic networks based on ecological indices ISME J 2010 doi 10 1038 ismej 2010 51 Suparna Mitra Max Schubach and Daniel H Huson Short clones or long clones a simulation study on the use of paired reads in metagenomics BMC Bioinformatics 11 Suppl 1 S12 2010 Ross Overbeek Tadhg Begley Ralph M Butler Jomuna V Choudhuri Han Yu Chuang Matthew Cohoon Val rie de Cr cy Lagard Naryttza Diaz Terry Disz Robert Edwards Michael Fonstein Ed D Frank Svetlana Gerdes Elizabeth M Glass Alexander Goesmann An drew Hanson Dirk Iwata Reuyl Roy Jensen Neema Jamshidi Lutz Krause Michael Kubal Niels Larsen Burkhard Linke Alice C McHardy Folker Meyer Heiko Neuweger Gary Olsen Robert Olson Andrei Osterman Vasiliy Por
59. re and map files e g ncbi tre and ncbi map mp analyzer what lca ranks compare infile lt filename gt outfile lt filename gt Compute the rank at which the LCA is found for each mate pair or preprocess comparison quit Quit the program replacelinks old lt filename gt new lt filename gt Replace links to source files select ids lt ids gt Select the nodes for the given ids select name lt names gt Select the named nodes set context lt window name gt Choose command context i e the window that should parse the subsequent commands set dir lt directory gt Set the current directory set margin left lt number gt right lt number gt bottom lt number gt top lt number gt Set margins used in tree visualization set proxy lt string gt port lt number gt user lt string gt password lt string gt Set proxy credentials set scaleby none Do not scale nodes set usekegg true false Turn KEGG analysis on or off set usepercentidentity false true Adjust assignment based on best percent identity of matches using the following minimum requirements Species 97 Genus 95 Family 90 Order 85 Class 80 Phylum 75 set useseed true false Turn SEED analysis on or off setprop lt name gt lt value gt Set a property show chart taxavsseed Chart taxa vs SEED show histogram taxonid lt num gt Shows the distribution of matches for a given taxon show window about Ab
60. rtant feature is that the comment line of the database sequence must contain a NCBI taxon name If names are not contained in the comment lines then the accession lookup support must be used 30 32 4 How MEGAN Parses Taxon Names MEGAN uses the following algorithm to determine the taxon from the header line of a reference sequence If the string consists only of an integer then this is interpreted as a taxon id Otherwise if Use Synonyms is turned on then MEGAN attempts to match an entry in the given synonyms file The longest matching synonym is used to determine the taxon Otherwise if Use GI Lookup is turned on then MEGAN searches for an occurrence of the string gil followed by a number and tries to use the number as a GI accession to determine the taxon Otherwise if the header line contains a semi colon then MEGAN assumes that a list of taxon names is given e g Bacteria Proteobacteria Alpha proteobacteria as present for example in the Silva database In this case MEGAN uses the right most name to determine the taxon id Otherwise if the header line contains the text TAXON_ID then MEGAN will attempt to read a taxon id following the text This syntax is used in BLAST files obtained from the CAMERA website Otherwise MEGAN searches for all pairs of disjoint square brackets and attempts to parse the strings between such brackets to obtain a set of taxon ids The taxon id for the match is then set to the LCA of the ids
61. s of version 4 MEGAN provides functional analysis using both the SEED classification 11 and also using KEGG pathways 7 For an example of its application see 12 where an early version of this software called GenomeTaxonomyBrowser was used to analyze the taxonomical content of a collection of DNA reads sampled from a mammoth This document provides both an introduction and a reference manual for MEGAN Follow MEGAN on facebook at http www facebook com meganMetagenomeAnalyzer 2 Getting Started This section describes how to get started First download an installer for the program from www ab informatik uni tuebingen de software megan see Section 3 for details Upon startup the program will automatically load its own version of the NCBI taxonomy and will then display the first three levels of the taxonomy To explore the NCBI taxonomy further leaves of this overview tree can be uncollapsed To do so first click on a node to select it Then use the Tree Uncollapse item to show all nodes on the next level of the taxonomy and use the Tree Uncollapse Subtree item to show all nodes in the complete subtree below the selected node or nodes To analyze a data set of reads first BLAST the reads against a database of reference sequences such as NCBI NR 2 using BLASTX 1 or BLASTP NCBI NT 2 using BLASTN 1 In addition the output of a number of other programs can also be parsed for example RapSearch2 15 Then imp
62. se comparison highlight color Tree menu collapse nodes selected Collapse selected nodes collapse level lt num gt Collapse all nodes at given depth in tree uncollapse nodes all selected subtree Uncollapse selected nodes nodelabels names true false Display the full names of taxa nodelabels ids true false Display the NCBI ids of taxa nodelabels assigned true false Display the number of reads assigned to a taxon nodelabels summarized true false Display the total number of hits to a taxon and its descendants show labels selected Show labels for selected nodes hide labels selected Hide labels for selected nodes show intermediate lt bool gt Show intermediate labels at nodes of degree 2 Collapse At Taxonomic Level sub menu collapse rank Kingdom Collapse Kingdom collapse rank Phylum Collapse Phylum collapse rank Class Collapse Class collapse rank Order Collapse Order collapse rank Family Collapse Family collapse rank Varietas Collapse Varietas collapse rank Genus Collapse Genus collapse rank Species_group Collapse Species_group collapse rank Subspecies Collapse Subspecies collapse rank Species Collapse Species Window menu show window howtocite Show how to cite the program show window website Go to the program website 36 show window register Show registration window show window message Open the message window set windowsize lt width gt x l
63. sed on 11 The SEED classification is displayed as a tree Genes are mapped onto functional roles and these are present in one or more subsystems Modes of interaction and available menu items are similar to those of the main window The window is split into two panes The right pane contains a tree based display of the result of the SEED classification The left pane contains two tabs one containing a textual tree based view 18 and the other using a heat map style listing of the current leaf nodes of the tree displayed in the right pane 14 KEGG Window The KEGG window is used to display a KEGG analysis of gene function based on 7 The KEGG classification is displayed as a tree Genes are mapped onto enzymes and these are present in one or more pathways Modes of interaction and available menu items are similar to those of the main window The window is split into two panes The right pane contains a tree based display of the result of the KEGG classification The left pane contains two tabs one containing a textual tree based view and the other using a heat map style listing of the current leaf nodes of the tree displayed in the right pane Additionally in the KEGG viewer the right pane of the window is tabbed Initially only the tree based display of the KEGG classification is visible However by double clicking on any item in the left pane for which a KEGG pathway diagram exists a new pathway tab is opened containing the corre
64. seedpath_length seedname_readname seedpath_readname readname_keggname readname_keggpath keggname_count keggpath_count keggname_length keggpath_length keggname_readname keggpath_readname separator commaltab file lt filename gt Export assignments of reads to nodes to a CSV comma separated values file export what reads data Taxonomy SEED KEGG file lt filename gt Export all reads to a text file or only those for selected nodes if any selected export what matches data Taxonomy SEED KEGG file lt filename gt Export all matches to a text file or only those for selected nodes if any selected Edit menu show window formatter Format nodes and edges show findtoolbar true false Open the Find toolbar Preferences sub menu set db lt string gt user lt string gt password lt string gt Set postgres database name and user authorization set showlegend true false Show legend identifying different datasets Select menu select nodes all Select all nodes select nodes none Deselect all nodes select nodes previous Select from previous window select nodes leaves Select all leaves select nodes internal Select all internal nodes 35 select nodes intermediate Select all intermediate nodes select nodes subtree Select subtree select nodes subleaves Select allow leaves below select nodes invert Invert selection Level sub menu select rank Kingdom Select Kingdom select ra
65. ser which defines appropriate rights by executing the following commands in a psql shell CREATE ROLE restricted_user LOGIN GRANT SELECT UPDATE INSERT DELETE ON header aux classificationblock read classread match blasthit TO GROUP restricted_user Then create concrete user accounts and assign them to the previously introduced role CREATE USER user IN GROUP restricted_user PASSWORD password Alter user to the concrete name of the user and password to a password in either plaintext or md5 checksum The checksum is calculated by the concatenation of the password and the user string and begins with the letters md5 For example the md5 password of user USER and password PASSWORD is 38 md5 concat md5 PASSWORD concat USER md558c0bba0050180fc35346bcfcel blef3 Other login strategies are supported as well Please follow the instructions in the PostgreSQL documentation http www postgresql org docs 9 0 static sql createuser html 37 3 Access Configuration The initial installation of PostgreSQL is very restrictive in terms of external access options In order to use PostgreSQL as a non local database server some parameters must be changed in the servers config files postgresql conf The location of this file is determined by the SQL command SHOW config file and pg hba conf In postgresql conf the parameter listen_addresses needs to be altered to the addresses the server should list
66. sponding pathway Different shades of green are used to indicate how many reads were assigned to any given enzyme of gene product in the pathway Another way to open a pathway tab is to use the following menu item which is available in the Options menu and from context menus associated with nodes e The Options Show KEGG Pathway item Show the specified KEGG pathway 15 Network Window The Network window provides methods for comparing multiple datasets It is available after generating a comparison of multiple datasets containing at least four datasets The network window allows one to compute a distance matrix of the compared datasets using a number of different ecological indices The calculation can be based on the results of a taxonomic SEED or KEGG analysis If no nodes are selected then the distances will be based on the number of reads assigned to the current leaves of the analysis If some nodes are selected then only those nodes are used in the calculation The distance matrix can be visualized either using a split network calculated using the neighbor net algorithm or using a multi dimensional scaling plot See 9 for details 16 Import Dialog The Import dialog is used to import new data from BLAST or a similar tool and to create a new RMA file The dialog has five tabbed panes The first tabbed pane titled the Wizard pane provides an Import wizard for creating anew RMA file The user is first asked to specify a BLAST
67. t containing the associated taxon name or numerical NCBI taxon id for each line then MEGAN will parse these and use them as input For unknown taxa write either unknown or 1 in the column Note that in all cases the reads file should be given to use the full potential of the program The BLAST file and reads file are supplied to MEGAN when setting up a new MEGAN project Both files are parsed and all information is stored in the project file The input data is then analyzed and can be interactively explored All reads and BLAST matches are contained in the project file and MEGAN provides different mechanisms for extracting them again A MEGAN project file contains all reads and all significant BLAST matches by default up to 100 matches per read in a binary and incrementally compressed format The size of such a project file is around 20 of the size of the original input files and is thus usually smaller than the file that one obtains by simply compressing the BLAST file As of version 4 MEGAN provides more control over whether and how BLAST matches and reads are stored see the discussion of the Import window As of version 4 41 MEGAN uses a new algorithm for determining the taxon associated with a given reference sequence In all previous versions the program looked in the header line of a reference sequence for the longest substring that matches some valid taxon name or synonym in the NCBI taxonomy This determined which taxon to assign
68. t height gt Set the window size show window inspector Open inspector window show window mainviewer Brings the main viewer to the front show window seedviewer Opens the SEED Analyzer show window keggviewer Opens the KEGG Analyzer show window attributes Open Microbial Attributes window show chart taxa Chart number of reads assigned to taxa show chart attributes Chart attributes of all found microbes in datasets show chart seed Chart number of reads assigned to SEED nodes show chart kegg Chart number of reads assigned to KEGG nodes show window network Open a network comparison window show chart rarefaction data taxonomy seed kegg Compute and chart a rarefaction curve based on the leaves of the tree shown in this window help keyword Shows syntax help for commands Additional commands exportimage old file lt filename gt format eps svg gif png jpglpdf replace false true textasshapes false true Export content of window to an image file list assignments List the number of reads assigned to each level of the taxonomy load colorfile lt filename gt Load dataset colors from a file format one RGB color per line load gi2taxfile lt filename gt Load the GI mapping file gi_taxid_nucl bin downloaded from the MEGAN website load synonymsfile lt filename gt Load a file of taxon name synonyms load treefile lt filename gt mapfile lt filename gt Load the taxonomy t
69. tem so as to obtain a different file dialog that allows selection of multiple files e The File Open Recent submenu e The File Import From BLAST item Import BLAST file RDP or Silva and reads files to create a new MEGAN file e The File Save As item Save current data set e The File Export submenu e The File Export Image item Export the tree to an image file 11 12 2 The 12 3 The 12 4 The The File Page Setup item Setup the page for printing The File Print item Print the main panel item The File Extract Reads item Extract reads for the selected nodes The File Import CSV item Load data in comma separated values CSV format taxon sum or readid taxon score The File Import QIIME item is used to import data from a QUME OTU table 3 The File Properties item Show document properties The File Close item Close the window The File Quit item Exit the program Windows and Linux only The Open Recent Menu Open Recent menu contains a list of recently opened documents The Export Menu Export menu contains the following items The Export Assignments To CSV item Export assignments of reads to taxa to a CSV comma separated values file The Export Reads item Export reads to a FastA file The Export Matches item Export matches to a file The Export Alignments item Export all multiple sequence alignments associated with th
70. time will decrease Its value should be increased to at least 100 e work_mem describes the size of the memory PostgreSQL assigns to execute a single query In some cases queries need to sort data which can if this parameter is set to a small value lead to temporary disk caching and decrease the execution speed When using MEGAN a value of around 10MB should withstand all most possible scenarios e maintenance_work_mem is a parameter defining the memory allocation for internal pro cesses which clean up and analyze the database in regular intervals A reasonable setting for this parameter is around 256MB e random_page_cost defines the cost of getting a random disk page from the hard disk This is especially important when like in MEGAN most data is retrieved by using an index and not sequentially In order to advise PostgreSQL to prefer index usage over sequential scan set this parameter to a value of 2 37 5 Firewall and Kernel settings So far we have described a list of both necessary and optional changes within the PostgreSQL database system Now we discuss neccessary changes to the host system The local firewall must be adjusted to accept TCP IP connections at the PostgreSQL s port usually 5432 And in order to change caching parameters in PostgreSQL the Kernel resources must be changed as well To do so please follow the documentation of your operating system 38 Acknowledgments This program uses the following Java libar
71. tnoy Gordon D Pusch Dmitry A Rodionov Christian Riickert Jason Steiner Rick Stevens Ines Thiele Olga Vassieva Yuzhen Ye Olga Zagnitko and Veronika Vonstein The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes Nucleic Acids Res 33 17 5691 5702 2005 Hendrik N Poinar Carsten Schwarz Ji Qi Beth Shapiro Ross D E Macphee Bernard Buigues Alexei Tikhonov Daniel H Huson Lynn P Tomsho Alexander Auch Markus Rampp Webb Miller and Stephan C Schuster Metagenomics to paleogenomics large scale sequencing of mammoth dna Science 311 5759 392 394 Jan 2006 41 13 E Pruesse C Quast K Knittel B Fuchs W Ludwig J Peplies and F O Gl ckner SILVA 14 15 a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB Nuc Acids Res 35 21 7188 7196 2007 Wikipedia Michaelis Menten kinetics http en wikipedia org wiki Michaelis Menten_kinetics 2012 Yongan Zhao Haixu Tang and Yuzhen Ye Rapsearch2 a fast and memory efficient protein similarity search tool for next generation sequencing data Bioinformatics 28 1 125 126 2012 42 Index m 0 28 m 7 28 m 8 28 meg 27 megan 27 rma 27 About 26 About 17 18 26 Advanced Options pane 20 Alignment Viewer 22 Alignment Viewer 17 22 Alignments 12 All Intermediate Nodes 13 All Internal Nodes 13 All Leaves 13 All
72. to the match However because many entries in the NR database mention multiple different species for a given match the program now determines only maximal matching names in the header line and assigns the match to the LCA of the taxa mentioned So in particular the LCA algorithm is used twice in MEGAN namely once to figure out which taxon to assign to a match and then based on this again to determine which taxon to assign to a given read 5 2 SAM Files MEGAN can now parse files in SAM format 8 Note however that SAM files usually do not contain the names of the taxa associated with the reference sequences and so one must supply a synonyms file that maps identifiers used for the reference sequences to NCBI taxon names or ids 5 3 RDP Files In addition MEGAN can import rRNA analysis files downloaded from the RDP website at http rdp cme msu edu 4 Go to the website and upload your rRNA sequences and then let the website process them for you Please note that the RDP website allows one to download two types of files namely a hierarchy as text file from its Classifier Hierarchy View window and a tezt file obtained from its Classifier Assignment Detail window Input to MEGAN must be of the latter type The RDP website recommends using a Min Score setting of 80 MEGAN calls this the RDP Assignment Detail format If you use the standalone RDP classifier then the output has a different format MEGAN calls this the RDP standalon
73. use of this feature on or off MEGAN supports three different text storage policies Select Save in main file to have all reads and BLAST matches embedded in the computed RMA file This provides best portability of files If theSave in separate file button is selected then all reads and matches are stored in a separateRMAZ file In this case the RMA file will be much smaller and can be used independently of the RMAZ file unless one wants to access the reads or matches in which case the RMAZ file will be asked for Finally if theDon t save is selected reads and matches are not stored explicitly If they are requested by the user then the program will obtain them from the original files This mode leads to the smallest RMA files and shortest computation time but is less portable 20 17 Inspector Window TheInspector Window can be used to inspect the alignments that are the basis of the assignment of reads to taxa It can be opened either using the Window Inspector Window menu item or by right clicking on a taxon and then selecting the Inspect popup item This window displays data hierarchically using a data tree The root node of this tree represents the current input file This window can only be opened when data has been loaded into the program Any taxon added to the window either by right clicking a taxon and then selecting the Inspect popup item in the main viewer or by using the Options gt Show Taxon item is shown at a secon
74. ut Phylogram 15 Layout Scale Nodes By Assigned 15 Layout Scale Nodes By Summarized 15 Layout Set Max Node Radius 15 Layout Show Amino Acids 24 Layout Show Consensus 24 Layout Show Insertions 24 Layout Show Nucleotides 24 Layout Show Reference 24 Layout Show Unaligned 24 Layout Use Magnifier 15 Layout Zoom To Selection 15 LCA 7 LCA Parameters pane 20 LCA assignment algorithm 10 Leaves Below 13 Level 13 14 Level Class 14 Level Family 14 Level Genus 14 Level Kingdom 14 Level Order 14 Level Phylum 14 Level Species 14 line width change 24 Linux 6 List Disabled 15 List Microbial Attributes 14 List Path 14 List Summary 14 Load GI Lookup File 20 Load Synonyms File 20 log file 8 Low Complexity 25 MacOS 6 MacOS X 6 Main 11 Main Viewer 17 Matches 12 Max number of matches per read 20 MEGAN 18 MEGAN project 7 MEGAN text file 27 MEGAN_macos_4 69 3 dmg 6 MEGAN_unix_4 69 3 sh 6 MEGAN_windows 64x_4 69 3 exe 6 MEGAN_windows_4 69 3 exe 6 Message 24 Message Window 17 24 metabolic pathways 10 metagenome 3 metagenomics 3 Microbial Attributes Window 21 Microbial Attributes Window 18 22 Min Complexity 25 Min Score 8 25 45 Min Support 25 Move Down 23 Move down 26 Move Up 23 Move up 26 NCBI mapping file 33 NCBI taxonomy 9 NCBI tree file 33 NCBI NR 9 NCBLNT 9 N
75. ynonyms file fg lt String gt default GI lookup file p lt String gt default Users huson Library Preferences Megan def Properties file m lt int gt default 0 minimum score w lt switch gt default true show message window x lt String gt default Execute this command at startup E lt switch gt default false Quit if exception thrown in non gui mode V lt switch gt default false show version string S lt switch gt default false silent mode d lt switch gt default false debug mode s lt switch gt default true show startup splash screen h lt switch gt default false Show usage Launching the program with option g will make the program run in non GUI command line mode first excuting any command given with the x option and then reading additional commands from standard input 33 Please be aware that the command line version of the program uses the same properties file as the interactive version So any preferences set using the interactive version of the program will also apply to the command line version of the program It this is not desired then please use the p option to supply a different properties file Another important thing to note is that the command parser operates in a line by line fashion When processing commands in a given line the parser makes note of required updates to the taxonomy and data structures These updates are not executed until all commands in th

User Manual for MEGAN V4.69.3

Contents

Download Pdf Manuals

Related Search

Related Contents