Home

Discovery Environment Manual

image

Contents

1. The main processing of such FASTA FASTQ files is mapping aka aligning the sequences to reference genomes or other databases using specialized programs Example of such mapping programs are Blat SHRIMP LASTZ MAQ and many many others However it is sometimes more productive to preprocess the FASTA FASTQ files before mapping the sequences to the genome manipulating the sequences to produce better mapping results The FASTX Toolkit tools perform some of these preprocessing tasks Above description from the FASTX Toolkit website http hannonlab cshl edu fastx_toolkit The following are currently enabled e Barcode Splitter Splits a FASTQ file containing multiple samples e Clipper Removes sequencing adapters linkers from FASTQ files e Groomer Quality Rescaler Converts FASTQ files from Illumina 1 3 and Solexa formats to Sanger PHRED format This is not listed on the main FASTX Toolkit page but is a part of the suite See http main g2 bx psu edu root tool_id fastq_groomer e Quality Filter Filters FASTQ formatted sequences based on quality e Trimmer Trims cuts barcodes or noise from FASTQ sequences Author Hannon Lab at Cold Spring Harbor Laboratory http hannonlab cshl edu This tool was identified for inclusion by the iPlant Genotype to Phenotype working group The 0 3 x release of the Discovery Environment uses FASTX Toolkit version 0 0 13 Discovery Environment Manual 108 R Language and Environment
2. File Create 3 adapter file for future use filename Enter 3 adapters used in library Minimum sequence length after clipping 28 C Discard sequences with unknown N bases Output options Output only clipped sequences 4 Choose Create 3 Adapter File from the drop down menu if you are going to create one now Discovery Environment Manual 56 FASTX Clipper Select file Trim 3 adapters Select 3 Adapter File bad Browse previously created 3 adapter files Minimum sequence length after clipping 28 C Discard sequences with unknown N bases Output options Output only clipped sequences Choose Select 3 Adapter File from the drop down menu if you are going to use a previously uploaded file Output options Output only clipped sequences Output only non clipped sequences Output both clipped and non clipped sequences Keep or modify the default settings Choose your desired output option from the Output options drop down menu Click Launch Job Discovery Environment Manual 57 Enter aname and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 58 FASTX Quality Filter An overview of FASTX Analyses is available Choose Analysis x Select from available analyses to use with your data gt Phylogenetic Systematics 4 Standardization ii TNRS Demo 4 Qu
3. N iPlant CO an Collaborative empowering A New Plant Biology Discovery Environment Manual 2011 iPlant Collaborative The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program EF 0735191 Discovery Environment Manual 1 Getting Started 1 1 1 2 1 3 1 4 1 5 Accessing the Discovery Environment Discovery Environment Overview Manage Data Perform Analyses Viewing and Deleting Notifications Analyses 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17 2 18 2 19 Ancestral Character Estimation ACE Overview Continuous Ancestral Character Estimation CACE Discrete Ancestral Character Estimation DACE Burrows Wheeler Aligner Single End Reads Burrows Wheeler Aligner Paired End Reads Cufflinks Transcript Quantification FASTX Analyses Overview FASTX Barcode Splitter Single End FASTX Clipper FASTX Quality Filter FASTQ Quality Rescaler FASTX Trimmer Find SNPs Overview Find SNPs Independent Contrasts Overview Independent Contrasts Taxonomic Name Resolution Service TNRS Demo TopHat Single End for Illumina TopHat Paired End for Illumina 10 17 23 26 27 32 38 41 44 49 50 54 59 62 65 68 69 76 77 83 89 95 Tools 3 1 Tools Overview 3 2 Analysis of Phylogenetics and Evolution ape 3 3 Burrows Wheeler Aligner BWA 3 4 Contrast 3 5 Cufflinks 3 6 FASTX Toolkit 3 7 R Lan
4. Password Log In Lost Password Request Access The log in page contains a box on the left with some links for information related to the Discovery Environment and the Log In box To the right of this you will find a definition of a Discovery Environment Enter your username and password in the boxes provided on the left of the page Click the Log In button to enter the environment Click Lost Password if you need to reset your password Click Request Access to access the same web form described earlier to request access to the Discovery Environment Discovery Environment Manual 6 Discovery Environment Overview AN iPlant Show al notifications M te Manage DRI The Workspace The Discovery Environment provides a consistent user interface and access to the high performance computing resources needed for specialized scientific analyses Discovery Environment Manual 7 The Menu User Preferences Help User Manual About Discovery Environment TR Demo The Menu available from the lower left corner of the Discovery Environment is where you access some basic functions e User Preferences lets you update personal institutional and account information e Help User Manual brings up the current version of this file e About Discovery Environment lists software details e TR Demo launches a demonstration preview version of our Tree Reconciliation tool e Logout will end your session Disc
5. Project Data Processing Subgroup The tool was identified for inclusion by the iPlant Genotype to Phenotype working group The 0 3 x release of the Discovery Environment uses SAMtools version 0 1 12a Discovery Environment Manual 110 TopHat TopHat is a fast splice junction mapper for RNA Seq reads using the ultra high throughput short read aligner Bowtie and then analyzes the mapping results to identify splice junctions between exons Above description from the TopHat website http tophat cbcb umd edu Authors TopHat is a collaborative effort between the University of Maryland Center for Bioinformatics and Computational Biology and the University of California Berkeley Departments of Mathematics and Molecular and Cell Biology It incorporates work from Cole Trapnell Daehwan Kim Geo Pertea Lior Pachter and Steven Salzberg The tool was identified for inclusion by the iPlant Genotype to Phenotype working group The 0 3 x release of the Discovery Environment uses TopHat version 1 2 0 Discovery Environment Manual 111 Tree Reconciliation Demo Tree Reconciliation uses an estimate of the species tree to infer the history of gene duplication and loss lineage sorting lateral transfer and other events in a gene family s history The tool uses Muscle to align sequences TreeBeST to build a tree and PriMETV to display it Author information for the component tools is available at each component s website listed above The t
6. R is alanguage and environment for statistical computing and graphics It is aGNU project which is similar to the S language and environment There are some important differences but much code written for S runs unaltered under R R provides a wide variety of statistical linear and nonlinear modelling classical statistical tests time series analysis classification clustering and graphical techniques and is highly extensible The S language is often the vehicle of choice for research in statistical methodology and R provides an Open Source route to participation in that activity One of R s strengths is the ease with which well designed publication quality plots can be produced including mathematical symbols and formulae where needed The above description is from http www r project org More information about R is available from http www r project org The tool was identified for inclusion by the iPlant Tree of Life working group for use with ape The 0 3 x release of the Discovery Environment uses R version 2 12 0 Discovery Environment Manual 109 SAMtools SAMtools provide various utilities for manipulating alignments in the SAM format including sorting merging indexing and generating alignments in a per position format Above description from the SAMtools website http samtools sourceforge net Authors H Li B Handsaker A Wysoker T Fennell J Ruan N Homer G Marth G Abecasis R Durbin and 1000 Genome
7. delete files or folders or to view or download file contents If you select one or more checkboxes you may use the More Actions box or select an option in the menu to the right of any one of the selected items to perform the same tasks on single files or folders or some actions on multiple files or folders at the same time Options are made available as follows e One file selected enables renaming deleting viewing or downloading the selected file e One folder selected enables renaming the selected folder or deleting it and all of its contents e More than one file selected enables deleting or viewing all selected files e More than one folder or a combination of files and folders selected enables deleting all the selected items Discovery Environment Manual 15 View specific data TNRS Results 2011 01 26 09 55 15AM txt Submitted Selected Match Name default is name with the best score Macrolobium acaciifolium Benth Macrolobium acaciifolium Ocotea cf rubinervis Ocotea rubrinervis Pouteria M1 Hedyosmum M3 Hedyosmum Psychotria brachybotrya Psychotria brachybotrya GOETHALSIA MEIANTHA Goethalsia meiantha Marila AF 8653 Marila GEONOMA MAXIMA Geonoma maxima Porcelia M1 Porcelia 1 more Clusia leather leaf Clusia Tabebuia obtusiifolia Tabebuia obtusifolia faramea bangii Faramea bangii Miconia montana Miconia montana 1 more ESCHWEILERA RUFIFOLIA Eschweilera rufifolia Depending on the file selected diffe
8. false positive spliced alignment filtration 0 01 This fraction of a spliced read must span an exon junction 0 12 Minimum fragments per transfrag 10 Number of importance samples generated for each locus 1000 Iterations allowed during Maxiumum Likelihood Estimation 5000 Select library type Not _Notstrand speciic v Launch Job Click Launch Job Enter a name and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 48 FASTX Analyses Overview Choose Analysis x Select from available analyses to use with your data Phylogenetic Systematics 4 Standardization ii TNRS Demo 4 Quality Control and Manipulation 4 Single End Reads tg FASTX Barcode Splitter Single En i FASTX Trimmer 2 ti FASTQ Quality Rescaler D tm FASTX Quality Filter D FASTX Clipper a Tranecrintamics Gannmircs Cancel The FASTX Toolkit is a collection of command line tools for preprocessing of DNA and RNAseq Short Reads Several of these are available as analyses in the Discovery Environment They are found in Perform Analyses under Choose Analysis Each of these is described in a separate section Discovery Environment Manual 49 FASTX Barcode Splitter Single End An overview of FASTX Analyses is available The FASTX Barcode Splitter splits a FASTQ file into several files using barcodes a
9. select datatypes Expansion of the datatypes supported is in the requirements phase This issue will be addressed in a future release Import from URL e Import from sites with a self signed certificate fails A fix for this is being evaluated Import from the Sequence Read Archive is no longer supported due to a change in their format from fastq to sra this issue will be addressed in a future release Display of file size e This functionality has not yet been incorporated and is being evaluated Sort order of files folder e The display of files and folders in the Manage Data window is inconsistent and may change with each opening of the Manage Data window A fix for this is being evaluated Description of files e Users are provided the ability to create a description for their data at import This functionality is expected in a future release of the DE Auto detection of file types which is the display in the description field currently is inconsistent as well A fix for this is being evaluated Filter search e The ability to filter or search for particular files is currently not available This functionality is in the requirements phase of development File consolidation at upload import e Currently users need to upload files one at atime A fix for this is in the requirements phase Zipped file upload e This functionality is in the requirements phase Large file deletion upload e This is suboptimal in the current version Fi
10. that it is unique to their work Session based Guest account e This will provide users a preview of the functionality that is available with a full account It will have limits such as no way to save work and return to retrieve it later Partial saving of parameters e Users will be able to save partial entry of parameters to be used for an analysis and run at a later time Data management Improvements from 0 2 1 Menu bar e Data import and upload were obscured behind a file menu This has been made more apparent to the user by exposing the functionality on a menu bar Data management window e Categorization of actions a user may wish to perform on data files or folders has begun This allows for appropriate services to be more efficiently tied to functionality and limiting the user actions to those that are appropriate for the hierarchy selected The data management window is a work in progress This issue will be more completely addressed in a future release Known issues File movement between folders e This functionality is currently not supported but is a high priority on our roadmap This issue will be addressed in a future release Discovery Environment Manual 121 Expansion of all folders at once e This functionality is not enabled with the current view of the Manage Data window This topic is under discussion for integration Upload data from desktop e Support for upload from a user s local environment is limited to
11. titles and codes with a space Click Launch Job Discovery Environment Manual 52 Manage Barcodes select FASTX Barcode Splitter Single End Select file Manage barcodes Select Barcode File s Browse previously created barcode files Browse Number of allowed mismatches 1 Choose Select Barcode File if you have previously uploaded one to the Discovery Environment Click Launch Job Enter aname and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 53 FASTX Clipper An overview of FASTX Analyses is available Choose Analysis x Select from available analyses to use with your data gt Phylogenetic Systematics 4 Standardization ii TNRS Demo 4 Quality Control and Manipulation 4 Single End Reads ti FASTX Barcode Splitter Single En il FASTX Trimmer ii FASTQ Quality Rescaler 2 ii FASTX Quality Filter arte gt Tranecrintamics Gannmircs Cancel Select FASTX Clipper from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 54 Single end read data input FASTX Clipper Select file Select file Trim 3 adapters Click Browse to select your previously uploaded file Click Trim 3 Adapters Discovery Environment Manual 55 FASTX Clipper Select file Trim 3 adapters Create 3 Adapter
12. trait nex 2011 02 07 07 19 10 076 Ok Cancel dl Highlight your desired file and click Ok Discovery Environment Manual 80 Match Data Drag and Drop species within tree and trait columns for matching Grab a name mel in either Tree Data Species Trait Data Species column and Acinonyx_j Acinonyx_j k move it up or Aepyceros _ Aepyceros _ i down in the Alcelaphus Alcelaphus r list until all Alces_alce Alces_alce names in this Antilocapr Antilocapr F a column ntilope_c ntilope_c match those Bison_biso Bison_biso in the other Camelus dr Camelus_dr column Canis_aure Canis_aure Canis _latr Canis _latr Canic him Canic hens Hold the left mouse button to drag and swap to move species data up and down until all tree species and trait species are matched When the text above the table shows All tree species are matched to trait species click Select output details Discovery Environment Manual 81 Select Output Details O Independent Contrasts Select input data Select output details M Output correlations and regressions O Output contrasts Next click Select output details You can select Output correlations and regressions Output contrasts or both if desired Neither is required Click Launch Job See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 82 Taxonomic Name Resolution Service TNRS Demo Acc
13. type may appear to be truncated This can be fixed by moving the heading bar in the View Notifications window to allow for more room in the column Future plans Additional notification types e In asubsequent release general notifications related to iPlant services and announcements will be added Examples of such notifications include system downtime community data availability new tool analysis capability and others Icon highlighting for notification type e Creating a feature that informs users of new notifications is being designed Highlighting the appropriate icon to indicate job or data upload import completion will do this The current proposal is similar to notification behavior on Facebook where the icons are enabled when a notification is available with a numeric representation of the number of notifications Email notifications e The initial version will appear soon but expanded and additional features are planned for future releases Collaboration notifications e Anew notification icon will be created for collaborations The details of this notification type are still in the requirements gathering phase Analyses jobs that are run in the DE Improvements from 0 2 1 0 3 0 release goal e The goal of the 0 3 0 release was to enable submitting a job to a Condor cluster in a uniform manner for tools integrated into the DE A service was created and hardcoded executables from 0 2 1 were re written Creation of an OSM No
14. 11 01 31 07 19 15 431 Match Data Drag and Drop species within tree and trait columns for matching Grab a name mel in either Tree Data Species Trait Data Species column and Acinonyx_j Acinonyx_j k move it up or Aepyceros _ Aepyceros _ i down in the Alcelaphus Alcelaphus r list until all Alces_alce Alces_alce names in this Antilocapr Antilocapr F a column ntilope_c ntilope_c match those Bison_biso Bison_biso in the other Camelus dr Camelus_dr column Canis_aure Canis_aure Canis _latr Canis _latr Canic him Canic hens Hold the left mouse button to drag and swap to move species data up and down until all tree species and trait species are matched When the text above the table shows All tree species are matched to trait species click Select output details Discovery Environment Manual 36 Set parameters O Discrete Ancestral Character Estimation Select input data Set parameters Initial value starting rate for ML estimation 0 1 You may change the initial value for ML estimation or leave the default value in place Click Launch Job Enter a name and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 37 Burrows Wheeler Aligner Single End Reads This analysis uses the Burrows Wheeler Aligner Choose Analysis x Select from available analyses to use with your dat
15. 2 Cheiloclinium Annona M1 Annona Lithocarpus catleyanus King Re Lithocarpus cantleyanus 1 more Pouruma cucura Pourouma cucura Coussarea HCC 148 Coussarea ISCHNOSIPHON PUBERULUS Ischnosiphon puberulus Chondodendron tomentosum R Chondodendron tomentosum Mascagnia lasiandra A Juss N Mascagnia lasiandra Hyeronima oblonga Tul Mnil Arg Hyeronima oblonga When the main results list shows the names you want to accept click Download to download a csv file of your results Discovery Environment Manual 87 Note that when no author was entered no authority returned indicates a case when there are multiple records having the same scientific name but different authorities Each item listed in this instance is asynonym A future release will add support to return the authority for the accepted name even when no author is entered as well as the ability to match from family to variety Discovery Environment Manual 88 TopHat Single End for Illumina This analysis uses TopHat The configuration options are set to be optimal for single end reads derived from Illumina sequencing technology not 454 ABI or PacBio A similar analysis is available for paired end reads Choose Analysis x Select from available analyses to use with your data gt Phylogenetic Systematics gt Standardization 4 Transcriptomics Genomics 4 Short Read Aligners l Burrows Wheeler Aligner Single End F i Burrows Wheeler Aligner Paired
16. 6 interface interactive tree functionality and a more generalized display of details for the user to make an informed decision regarding the gene family of interest TreeBest algorithm evaluation e A review of the TreeBest algorithm is underway to determine if this provides the best representation of the reconciliations The database will also be populated with the data generated by the 1KP group as opposed to the limited subset of data that is currently available The goal is to provide users with other data the ability to utilize the pipeline for generating reconciliations and loading this data into a uniform schema for visualizations Ultra High Throughput Sequencing UHTS New functionality in 0 3 0 Converted and split analyses e Many analyses that were hard coded in 0 2 1 used multiple tools to perform extended and complex tasks All UHTS tools were reformatted from hard coded inclusion to instead use the new metadata format for tool integration Then analyses were rewritten using the new metadata format and split into discrete analyses each focused on a specific task often corresponding to a step in a previous analysis This will allow for greater flexibility when user defined multi step analyses functionality is added in a future release Known issues FASTX related analyses are currently available only for single end reads e Paired end read analyses are planned Future plans Additional tool integration and created analyses e M
17. Create file for future use File Name test names for TNRS Enter names for analysis Syagrus M1 Cheiloclinium M2 Annona M1 Lithocarpus catleyanus King Rehd Pouruma cucura Coussarea HCC 148 ISCHNOSIPHON PUBERULUS Chondodendron tomentosum R amp P aff Mascagnia lasiandra A Juss Nied Hyeronima oblonga Tul Mnil Arg C Include family names in output You may enter alist of names directly into the tool by selecting Create File from the drop down menu If you check the box next to Create file for future use you can then enter a file name and the file will be available to you in Manage Data Click Launch Job Enter aname and desciription for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 85 View your results TNRS Results 2011 01 11 01 17 39PM txt Submitted Name ESCHWEILERA RUFIFOLIA Bauhinia glabra Syagrus M1 Cheiloclinium M2 Annona M1 Lithocarpus catleyanus King Re Pouruma cucura Coussarea HCC 148 ISCHNOSIPHON PUBERULUS Chondodendron tomentosum R Mascagnia lasiandra A Juss N Hyeronima oblonga Tul Mnill Arg Selected Match default is name with the best score Eschweilera rufifolia Bauhinia glabra 1 more Syagrus Cheiloclinium Annona Lithocarpus cantleyanus 1 more Pourouma cucura Coussarea Ischnosiphon puberulus Chondodendron tomentosum Ma
18. End F l Bowtie Single End for Illumina all para ti Bowtie Paired End for Illumina all para o l TopHat Paired End for Illumina gt b Nualitu Control and Maniniulatinn Ok Cancel Select TopHat Single End for Illumina from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 89 Select input data TopHat Single End for Illumina Select input data Select read file Select reference genome Arabidopsis lyrata Select parameters Click Browse to choose the previously uploaded read file you wish to align to a reference genome Discovery Environment Manual 90 Select Reference Genome TopHat Single End for Illumina Select input data Select read file Select reference genome Arabidopsis lyrata Arabidopsis thaliana v10 Arabidopsis thaliana v9 Brachypodium distachyon Oryza indica Oryza japonica Physcomitrella patens V1 Physcomitrella patens V1 1 Populus trichocarpa Sorghum bicolor Vitis vinifera Zea mays v2 Zeamays v1 i Select parameters Select the reference genome Discovery Environment Manual 91 Select Parameters part one TopHat Single End for Illumina Select input data Select parameters Anchor length 8 Splice mismatches 0 Minimum intron length 70 Maximum intron length 500000 Select input quality scale Input quals are from GA Pipeline ver gt 1 3 74 Minimum isoform fra
19. Job Enter aname and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 40 Burrows Wheeler Aligner Paired End Reads This analysis uses the Burrows Wheeler Aligner Choose Analysis x Select from available analyses to use with your data gt Phylogenetic Systematics gt Standardization 4 Transcriptomics Genomics 4 Short Read Aligners ii Burrows Wheeler Aligner Single End F ii Bowtie Single End for Illumina all para ii Bowtie Paired End for Illumina all para il TopHat Single End for Illumina l TopHat Paired End for Illumina 4 gt Nualitu Control and Maniniulation Ok Cancel Select Burrows Wheeler Aligner Paired End Reads from within Perform Analyses as described in that section Click Ok An analysis is available for single end reads Discovery Environment Manual 41 Select reads O Burrows Wheeler Aligner Paired End Reads Select reads l Select mate file Select reference genome Arabidopsis Lyrata Launch Job Click Browse next to Select reads and Select mate file to select the previously uploaded and preprocessed DNA sequence read file and mate file that you want to align to a reference genome Discovery Environment Manual 42 Select reference genome O Burrows Wheeler Aligner Paired End Reads Select reads Select mate file Select refe
20. Launch Job Discovery Environment Manual 18 Name Job Launch Job x Job Name IndContrastjob1 Description test for documentation _ Email when complete Ok Cancel Enter aname for the job and write a description of it The description is optional File name restrictions File names must be unique and may be a maximum of 250 characters All alphanumeric characters are permitted along with these special characters the dash underscore _ or period Spaces are allowed but are not permitted as the first last or only character Click Ok to initiate your analysis Discovery Environment Manual 19 View Analysis Status Perform Analyses Overview gp Choose Analysis More Actions Name Description Start Date End Date Status Xx J a Independent Contrasts IndContrastjob1 Tue Feb 01 2011 rors When you run an analysis other than TNRS or TR it will appear in Perform Analyses The Status will update as the analysis is completed Discovery Environment Manual 20 View Analysis Output s O Perform Analyses Overview d Choose anay Ma Aeon Name Description en Independent Contrasts IndContrastjob1 Tue Feb 01 2011 View Output s Delete After a completed run of an analysis you can view the results Select the analysis and then select View Output s from the drop down menu at the right You can also find View Output s under More Actions To delete a completed analysis
21. Quality Control and Manipulation ti FASTX Barcode Splitter Single End D i FASTX Trimmer Q FASTX Quality Filter i FASTX Clipper Transcriptomics Genomics Cancel Select FASTQ Quality Rescaler from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 62 O FASTQ Quality Rescaler Select file Select file Convert scoring Click Browse to select the previously uploaded file you want to convert Click Convert scoring to continue Discovery Environment Manual 63 FASTQ Quality Rescaler Select file Convert scoring My sequence reads are the following scoring type Convert FASTQ int format to the standard Sanger FASTQ Convert Solexa llumina lt 1 3 FASTQ to the standard FASTQ Convert Solexa llumina gt 1 3 FASTQ to the standard FASTQ Convert FASTA to the standard FASTQ Convert various FASTQ like format to FASTA Convert Solexa export format to Sanger FASTQ Convert AB SOLID read format to Sanger FASTQ Specify the scoring type used in your read library from the drop down menu Click Launch Job Enter a name and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 64 FASTX Trimmer An overview of FASTX Analyses is available Choose Analysis x Select from available analyses to use with your data gt Phylogenetic Systematics 4 Standa
22. a gt Phylogenetic Systematics gt Standardization 4 Transcriptomics Genomics 4 Short Read Aligners ii Burrows Wheeler Aligner Paired End F l Bowtie Single End for Illumina all para l Bowtie Paired End for Illumina all para i TopHat Single End for Illumina l TopHat Paired End for Illumina gt Nualitu Control and Maniniulatinn ra Ok Cancel Select Burrows Wheeler Aligner Single End Reads from within Perform Analyses as described in that section Click Ok An analysis is available for paired end reads Discovery Environment Manual 38 Select reads Burrows Wheeler Aligner Single End Reads Select read file s Select read file s File Name No files to display Select reference genome v Launch Job Click Add to select the previously uploaded and preprocessed DNA sequence read file that you want to align to a reference genome Discovery Environment Manual 39 Select reference genome Burrows Wheeler Aligner Single End Reads Select read file s Select reference genome Select reference genome Arabidopsis lyrata Arabidopsis thaliana v10 Arabidopsis thaliana v9 Brachypodium distachyon Oryza indica Oryza japonica Physcomitrella patens V1 Physcomitrella patens V1 1 Populus trichocarpa Sorghum bicolor Vitis vinifera Zea mays v2 Zeamays v1 Click the arrow to open a drop down box listing available reference genomes Click one to select it Click Launch
23. able analyses to use with your data Phylogenetic Systematics Standardization 4 Transcriptomics Genomics gt Short read aligners gt Quality Control and Manipulation gt RNA Seq 4 Variant Detection o Cancel Select Find SNPs from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 69 Select SAM File s Find SNPs Select SAM File s File Name No files to display ea Click Add to choose the previously uploaded SAM files in which you are seeking variants from the reference genome There is no limit to the number of files you may select here but files must be selected one at a time Select a file and Delete will remove files previously selected during this step prior to launching the job Discovery Environment Manual 70 Select Reference Genome Find SNPs Select SAM File s Select the reference genome Select the reference genome Arabidopsis Lyrata Arabidopsis Lyrata Arabidopsis Thaliana v10 Arabidopsis Thaliana v9 Brachypodium Distachyon Oryza Indica Oryza Japonica Physcomitrella Patens V1 Physcomitrella Patens V1 1 Populus Trichocarpa Sorghum Bicolor Vitis Vinifera Zea Mays v1 Select the reference genome to which you will compare your SAM files Discovery Environment Manual 71 Base Calling O Find SNPs Select SAM File s Select the reference genome 3 Base Calling Theta parameter error dep
24. about your experimental data Optional 255 chars max Import Cancel Enter the URL for the data file you wish to upload Enter details about the data Click Import Discovery Environment Manual 12 Import from Data Source Phylota Import x Taxon Name Cluster Taxon ID Name Import Cancel You may currently import data from the Phylota database provided by the Sanderson lab at the University of Arizona Enter the Taxon Name click Search Find the data you wish to import from the list and click Import Discovery Environment Manual 13 Confirm successful file import Manage Data Available Files f Up New Folder amp Import More Actions E Name ipti Uploaded O C pata Ae TestData E TNRS_test_names CSV Name List 2011 01 12 08 44 44 Your file will appear in the list of available files in the folder you had open when you imported There are 3 verifications of successful import a popup that flashes in the bottom right of the main screen a notification in the Notifications list and the file that appears in the selected folder in Manage Data Discovery Environment Manual 14 More Actions O Manage Data L TNRS Results 2011 01 26 09 55 15AM txt Taamaich Rosul STO 25 Oem CSV Name List 2011 01 26 09 54 56 _ TNRS_test_names csv Mark the check box to the left of an item in this window to expose the drop down menu shown at the right Choose the appropriate entry to rename or
25. ality Control and Manipulation v 4 Single End Reads tg FASTX Barcode Splitter Single En tg FASTX Trimmer FASTQ Quality Rescaler FASTX Quality Filter FASTX Clipper Yoo oe gt Trane crintamics Geannomics Select FASTX Quality Filter from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 59 Select file FASTX Quality Filter Select file Select file Quality filtering Click Browse to select your previously uploaded file Click Quality filtering Discovery Environment Manual 60 FASTX Quality Filter Select file Quality filtering Quality cut off value 20 Percent of bases in sequence that must have quality equal to higher than cut off value 90 Keep or modify the default settings Click Launch Job Enter a name and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 61 FASTQ Quality Rescaler An overview of FASTX Analyses is available The FASTQ Quality Rescaler updates the base quality scores in your sequence data to use the Phred33 scale adopted by the Sanger Centre and the NCBI Sequence Read Archive Conversion from Illumina 1 3 and Solexa is supported Choose Analysis x Select from available analyses to use with your data Phylogenetic Systematics 4 Standardization igi TNRS Demo 4
26. cs washington edu phylip doc contrast html Author J Felsenstein The tool was identified for inclusion by the iPlant Tree of Life working group The 0 3 x release of the Discovery Environment uses PHYLIP version 3 69 Discovery Environment Manual 106 Cufflinks Cufflinks assembles transcripts estimates their abundances and tests for differential expression and regulation in RNA Seq samples It accepts aligned RNA Seq reads and assembles the alignments into a parsimonious set of transcripts Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one Above description from the Cufflinks website http cufflinks cobcb umd edu Authors Cufflinks is a collaborative effort between the Laboratory for Mathematical and Computational Biology led by Lior Pachter at UC Berkeley Steven Salzberg s group at the University of Maryland Center for Bioinformatics and Computational Biology and Barbara Wold s lab at Caltech The tool was identified for inclusion by the iPlant Genotype to Phenotype working group The 0 3 x release of the Discovery Environment uses Cufflinks version 0 9 3 Discovery Environment Manual 107 FASTX Toolkit The FASTX Toolkit is a collection of command line tools for Short Reads FASTA FASTQ files preprocessing Next Generation sequencing machines usually produce FASTA or FASTQ files containing multiple short reads sequences possibly with quality information
27. ction 0 15 Number of threads to launch 2 Allow this many hits per read Select your desired options continued in following images Discovery Environment Manual 92 Select Parameters part two 9 TopHat Single End for Illumina Select input data Select parameters Allow this many hits per read 40 Minimum isoform fraction 0 15 C Look for reads incident to microexons O Use a slower but more sensitive algorithm Select library type Not strand specific 4 Segment mismatches 2 Segment subdivide length 25 Length of exonic hops in splice graph 50 Minimum intron length found during closure search Discovery Environment Manual 93 Select Parameters part three TopHat Single End for Illumina Select input data Select parameters Minimum intron length found during closure search 50 Maximum intron length found during closure search 5000 Minimum intron length found during coverage search 50 Maximum intron length found during coverage search 20000 Minimum intron length found during split segment search 50 Maximum intron length found during split segment search 500000 O Preserve intermediate files from TopHat O Only inspect junctions specified in reference annotation Click Launch Job Enter a name and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Enviro
28. d require no further action Discovery Environment Manual 24 Analyses Discovery Environment Manual 25 Ancestral Character Estimation ACE Overview An ancestral character is a biological trait that is present in a group of related organisms and is thus inferred to have been present in the most recent common ancestor of these organisms Traits of interest for example fruit size or the presence of parasite resistance can therefore be traced back in time along a known phylogeney Estimating ancestral character values is a phylogenetic analysis that can be used to test evolutionary hypotheses like the temporal sequence of evolutionary events or the appearance of adaptive traits Because ancestral characters values are not observed it is more rational to consider them as parameters in a model where the character values of recent species are the observed values It is possible to perform both continuous and discrete ancestral character estimations in the Discovery Environment Both use a software package called ape which is based on R to perform estimation based on a fully resolved phylogeny Continuous ancestral character estimation CACE assumes that traits evolve according to a Brownian motion process Under this model the expected difference between two taxa can be computed as a function of the time separating the taxa from their most recent common ancestor which is obtained from the phylogenetic tree Maximum Likelihood is then u
29. data from details tab e Users do not get a notification that data is being saved This issue will be addressed in a future release Interface for folder selection for saving of data e This interface is inconsistent with the current look and feel for the Manage Data window This issue will be addressed in a future release Saving of NHX files in Manage Data window e NHX files are being identified as Nexus files upon saving in the DE A fix for this is currently underway Download of this file provides proper NHX format Uploading this downloaded file in the DE will also cause the file to be interpreted as a Nexus file Tree visualization of saved NHX files e The image displayed by the tree renderer in the DE that is associated with tree files cuts off text for the leaves This will be fixed with the incorporation of new tree visualization tools This issue exists for all tree files with lengthy names at the leaves Display of GO annotations e The full annotation is being truncated This issue will be addressed in a future release Search performance e For searches that will provide a large listing of gene families example GO term of cytoplasm performance is not optimal A fix for this is being discussed Future plans User capabilities e The TR application is undergoing a complete rework to enable publication of the 1KP dataset currently housed at TACC Included in this rework is a basic advanced search Discovery Environment Manual 12
30. e addressed in a future release Job folders display e The folder containing the outputs of analysis executions has along name and contents are displayed in random order To view job outputs users can identify the correct folder by locating the folder with the name given to the user at runtime This issue will be addressed in a future release Ability to stop a running job e This functionality is not currently available Users can remove the representation of the job from the View Analysis window however this does not stop arunning job Consequently outputs will be generated and displayed in the Manage Data window This issue will be addressed in a future release Use of invalid file types for some analyses e The tools integrated currently allow for some invalid file types to be selected as inputs These analyses will execute and invalid or empty files will be generated as outputs The fix for this issue involves changes to file handling as opposed to a re tooling of the tools included in the DE This issue will be addressed in a future release Display of description with outputs e The viewer for the outputs contains a tab for the description given by the user at the time of execution This issue will be addressed in a future release Performance e Window loading and population of the window with information is not instantaneous This issue will be addressed in a future release Inconsistency in the extension for outputs e The file ext
31. e analyses to use with your data 4 Phylogenetic Systematics ii TNRS Demo gt PHYLIP 4 ACE K Discrete Ancestral Character Estimation K Continuous Ancestral Character Estimatic K3 gt Standardization Transcriptomics Genomics Cancel Select Continuous Ancestral Character Estimation CACE from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 27 Select data Continuous Ancestral Character Estimation Selected Tree s Add De File Name Label Uploaded Date Time PDAP tree nex UNKNOWN 1 2011 02 07 07 18 57 305 Selected Trait Dataset Add C File Name Uploaded Date Time PDAP trait nex 2011 02 07 07 18 52 751 Drag and Drop species within tree and trait columns for matching All tree species are matched to trait species Tree Data Species Trait Data Species Acinonyx_j Acinonyx_j Aepyceros __ Aepyceros_ Alcelaphus Alcelaphus Alces_alce Alces_alce Antilocapr Antilocapr Data needs to be uploaded to the Discovery Environment in advance Click Add in Selected Tree s and Selected Trait Dataset to choose appropriate tree and trait files from the boxes shown next Discovery Environment Manual 28 Select Tree or Trees Select Tree s x Enter a search string such as vio LX File Name Label Uploaded Date Time aq tree nex UNKNOWN 1 2011 02 07 07 19 14 802 shorebirds tree UNKNOWN 1 2011 02 07 07 19 06 024 Discovery Env
32. e not a part of the botanical database As aresult algae fungi mosses and other groups may not match appropriately in this application We anticipate this will be fixed with updates to the database with a future release of the DE Download of match results Discovery Environment Manual 124 e Some browsers will request that users turn off pop up blockers to allow for download of results from the window showing the matched names selection of the download button Selection to download from the Manage Data window does not present this problem however the downloaded list from the Manage Data window is a txt file whereas the downloaded list from the window displaying the results is a csv file This issue will be addressed in a future release Future plans Extending full names e The algorithm will be extended to allow for matching for full names Similar names e Synonymous name resolution will be integrated Additional sources e Sources of data will be added to the database for resolution and users will be able to specify which sources they would like to check their names against Tree Reconciliation TR New functionality in 0 3 0 Gene family search e This application is used to search for gene families of interest and view a reconciliation of that gene family tree with a species tree that contains those genes For the first release of this application a pipeline that includes MUSCLE TreeBest and PriMETV was described Databa
33. e will be addressed in afuture release Display of items in View Analysis window e Dueto the length of some of the items displayed in the View Analysis window longer items may appear to be truncated Users can adjust the width of column headings and view all details Users can also maximize the View Analysis window to view these items in greater detail Adjustments to this display are being discussed Display of outputs e The user will be notified of a completed job in View Notifications as well as via a completed status in the View Analysis window The user can then select view outputs from the View Analysis window or select the job name from the View Notifications window and will be directed to the location of the outputs in the Manage Data window These outputs will be located in generated folder that contains the name of the job anda key identifier The key identifier is currently a large sequence This will need to be modified to provide a user friendly interface This issue will be addressed in a future release Same name for output file and job e Thename of the output file should be the name of the job executed with outputs or out appended This issue will be addressed in a future release Discovery Environment Manual 119 Selection of folder for outputs e The current workflow automatically generates a folder for outputs Future implementations will allow a user to specify the location for those outputs This issue will b
34. endency coefficient Enter a number between 0 and 1 85 Number of haplotypes in sample 2 Expected fraction of differences between a pair of haplotypes 001 Probability of an indel in sequencing PHRED scale a 30 Select the base calling parameters The theta parameter or error dependency coefficient uses the maq consensus calling model and defines how much difference will be tolerated when calculating variance assuming these differences to be natural fluctuations or other error rather than different sequences For more details on these parameter settings please see SAMtools and Maq Discovery Environment Manual 72 Filtering part one Find SNPs X Select SAM File s x Select the reference genome y Base Calling x Filtering Minimum read depth 3 Maximum read depth 100 SNPs within X base pairs around a gap should be excluded X 10 Window size for filtering dense SNPs 10 Maximum number of SNPs in a window 2 Enter your desired filtering parameters here and below For details on the filtering parameters please see SAMtools Discovery Environment Manual 73 Filtering part two O Find SNPs a x Window size for filtering adjacent gaps 30 Minimum SNP quality PHRED based taois e Minimum RMS mapping quality for SNPS 25 Minimum RMS mapping quality for gaps 10 Minimum indel score for nearby SNP filtering 25 Discovery Environment Manual 74 O
35. ensions applied to the job outputs is not consistent across tools example QC preprocessing jobs will deliver different outputs depending upon which tools are actually utilized by the user in the analysis pipeline This is functionality is inherent in the tool itself This issue will be addressed in a future release Perpetual running jobs e There is asituation with the execution framework where communication with the monitor is lost This will result in a job showing a status of running perpetually These jobs will not complete This issue is currently being handled and a resolution is being worked on Discovery Environment Manual 120 Future endeavors User customized workflows e We will allow a user to create workflows based upon integrated tools These workflows will be able to be generated saved modified and shared with groups for future analyses Provenance tracking e Users will be provided a file that contains details of the analysis being executed Included in that file will be a description of the parameters used in the analysis data inputs and the date time of the execution Default value configuration e Users will be able to save a selected analysis with parameters that they expect to utilize on different datasets These values may differ from the default values provided by the author of the original analysis Users will be able to save their modified version with a name that differs from the original analysis name to indicate
36. epts a list of taxa and checks them against a database of canonical names to return both exact and possible matches Uses exact via database queries and fuzzy matching via Taxamatch to compare a list of submitted names with a standardized database Author data and further information available at http tnrs iplantcollaborative org The tool was identified for inclusion by the iPlant Tree of Life working group Choose Analysis x Select from available analyses to use with your data Phylogenetic Systematics 4 Standardization o Quality Control and Manipulation Transcriptomics Genomics Cancel Select TNRS Demo from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 83 Submit a list of names 9 TNRS Demo This tool only works for Genus and Species names If your list contains Family names they must be removed prior to submitting the job Select Fiet Browse previously uploaded files Browse C Include family names in output You may submit a previously uploaded list of names by selecting Select File from the drop down menu Click Launch Job Discovery Environment Manual 84 Enter a list of names TNRS Demo This tool will submit your list of names for matching to the iPlant database This tool only works for Genus and Species names If your list contains Family names they must be removed prior to submitting the job Create File Vi
37. estral Character Estimation 2 Continuous Ancestral Character Estimatic gt Standardization gt Transcriptomics Genomics Cancel Select Discrete Ancestral Character Estimation from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 32 Select data O Discrete Ancestral Character Estimation Select input data Selected Tree s File Name No trees to display Selected Trait Dataset File Name No traits to display Drag and Drop species within tree and trait columns for matching Tree Data Species Trait Data Species Select Traits Set parameters Data needs to be uploaded to the Discovery Environment in advance Click Add in Selected Tree s and Selected Trait Dataset to choose appropriate tree and trait files from the boxes shown next Discovery Environment Manual 33 Select Tree or Trees x Select Tree s Enter a search string such as vio ea File Name Label Uploaded Date Time salina z T n 7 oe a opera shorebirds tree UNKNOWN1 2011 01 31 07 19 27 409 aq tree nex UNKNOWN 1 2011 01 31 07 19 35 407 PDAP tree nex UNKNOWN 1 2011 01 31 07 19 19 44 Discovery Environment Manual 34 Select Traits Select Traits Enter a search string such as vio shorebirds trait nex aq trait nex PDAP trait nex Discovery Environment Manual 35 2011 01 31 07 19 23 315 2011 01 31 07 19 31 412 20
38. generated using a protocol that may result in a terminal 3 sequence adapter and is useful to learn and test our QC preprocessing Discovery Environment Manual 131 shorebirds trait nex This file represents a set of continuous traits for the 70 bird species supported in the tree file shorebirds tree nex This file can be used for an Independent Contrasts analysis shorebirds tree tex This file represents a phylogenetic tree for 70 species of birds that can be used as inputs to an Independent Contrasts analysis SRRO26996 zmv2 sam This is a SAM file produced from a BWA alignment of SRRO26996 fastq Mo17 genomic DNA from SRX010829 to the Zea mays v2 genome and can be used for variant detection Discovery Environment Manual 132
39. guage and Environment 3 8 SAMtools 3 9 TopHat 3 10 Tree Reconciliation Demo Reference 4 1 Discovery Environment 0 3 0 Release Notes 4 2 Tool Integration 4 3 Creating a New Analysis in the Discovery Environment 4 4 TestData folder contents 103 104 105 106 107 108 109 110 112 117 129 130 131 Getting Started nvironment Manual 4 Accessing the Discovery Environment Account request and creation Discovery Environment Log in to DE Create an account from the iPlant Collaborative website at http www iplantcollaborative org by moving your mouse cursor over the Discovery Environment tab and selecting Request Access to DE from the drop down menu Fill out the form and click Submit When access is granted you will receive a confirmation email that includes a link to create your password You will not be able to log in until you create a password This link can also be used to change your password Access the Log In Page Discovery Environment uest Access to DE Access the Discovery Environment either by selecting the Discovery Environment link from the Tools window near the top of the iPlant homepage or by hovering your mouse cursor over the Discovery Environment Preview tab on the home page and clicking Log In to DE Discovery Environment Manual 5 Login Discovery Environment Contact Support About Tools Release Notes Monitoring Web Services DE User Manual Username
40. h 5000 Minimum intron length found during coverage search 50 Maximum intron length found during coverage search 20000 Minimum intron length found during split segment search 50 Maximum intron length found during split segment search 500000 C Preserve intermediate files from TopHat C Only inspect junctions specified in reference annotation Expected mean inner distance between mate pairs Discovery Environment Manual 100 Select Parameters part four O TopHat Paired End for Illumina Select input data Select Parameters Maximum intron length found during closure search 5000 Minimum intron length found during coverage search 50 Maximum intron length found during coverage search 20000 Minimum intron length found during split segment search 50 Maximum intron length found during split segment search 500000 C Preserve intermediate files from TopHat C Only inspect junctions specified in reference annotation Expected mean inner distance between mate pairs 200 Click Launch Job Enter a name and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 101 Tools Discovery Environment Manual 102 Tools Overview Tools are software packages that perform specific tasks We do not run tools directly in the DE instead we create analyses for specific uses of installed tools A
41. ironment Manual 29 Select Traits Select Traits Enter a search string such as vio shorebirds trait nex Discovery Environment Manual 30 Uploaded Date Time 2011 02 07 07 19 10 076 2011 02 07 07 19 01 249 Match Data Drag and Drop species within tree and trait columns for matching Grab a name mel in either Tree Data Species Trait Data Species column and Acinonyx_j Acinonyx_j k move it up or Aepyceros _ Aepyceros _ i down in the Alcelaphus Alcelaphus r list until all Alces_alce Alces_alce names in this Antilocapr Antilocapr F a column ntilope_c ntilope_c match those Bison_biso Bison_biso in the other Camelus dr Camelus_dr column Canis_aure Canis_aure Canis _latr Canis _latr Canic him Canic hens Hold the left mouse button to drag and swap to move species data up and down until all tree species and trait species are matched When the text above the table shows All tree species are matched to trait species click Launch Job Enter aname and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 31 Discrete Ancestral Character Estimation DACE An overview of Ancestral Character Estimation is available Choose Analysis x Select from available analyses to use with your data 4 Phylogenetic Systematics ii TNRS Demo gt PHYLIP 4 ACE K Discrete Anc
42. lignment for Gene Tree DNA Multiple Sequence Alignment for Gene Tree Amino Acid NHX File for Gene Tree Newick File for Species Tree NHX File for Reconciled Tree View and download a fat tree representation under the Reconciliation tab a gene tree representation under the Gene Tree tab a species tree representation under the Species Tree tab and more details under the Details tab Click underlined listed items in Details to see and download the data Discovery Environment Manual 115 Reference Discovery Environment Manual 116 Discovery Environment 0 3 0 Release Notes This document summarizes known issues in the Discovery Environment DE The list is not all inclusive but includes the larger issues The CORE SOFTWARE project in iPlant s JIRA has a comprehensive listing https pods iplantcollaborative org jira Each section is broken down into improvements from the 0 2 1 release to the 0 3 0 release known issues and future work This list also includes information about the Tree Reconciliation TR and Taxonomic Name Resolution Service TNRS projects Information about Ultra High Throughput Sequencing UHTS and Trait Evolution TE are forthcoming Notifications Improvements from 0 2 1 Triggered notifications e Users can now view triggered notifications from within the View Notifications window Categorized notifications e Notifications have been categorized as either transient or persistent e Pe
43. links Transcript Quantification Select SAM File s Select Reference Annotation Select Reference Annotation Arabidopsis lyrata Arabidopsis thaliana v10 Arabidopsis thaliana v9 Brachypodium distachyon Oryza indica Oryza japonica Physcomitrella patens V1 Physcomitrella patens V1 1 Populus trichocarpa Sorghum bicolor Vitis vinifera Zea mays v2 Zeamays v1 Parameters for analysis Select the reference genome Discovery Environment Manual 46 Parameters part one Select SAM File s A Select Reference Annotation y Parameters for analysis 4 Maximum Intron Length 300000 O Cufflinks Transcript Quantification a x Minimum isoform fraction 0 05 Expected pre mRNA fraction 0 05 Minimum SAM mapping quality score to include in analysis 0 Exclude the contribution of the top 25 percent most highly expressed genes from FPKM denominator Alpha value for the binomial test used during false positive spliced alignment filtration 0 01 This fraction of a spliced read must span an exon junction 0 12 v Launch Job Select your desired parameters continued in following image Discovery Environment Manual 47 Parameters part two 9 Cufflinks Transcript Quantification Select SAM File s Select Reference Annotation y Parameters for analysis C Exclude the contribution of the top 25 percent most highly expressed genes from FPKM denominator Alpha value for the binomial test used during
44. n analysis may be created to use only one tool or many tools using outputs from one as inputs to another See Tool Integration and Creating anew Analysis in the Discovery Environment for more information Discovery Environment Manual 103 Analysis of Phylogenetics and Evolution ape Analysis of Phylogenetics and Evolution ape provides functions for reading writing plotting and manipulating phylogenetic trees analyses of comparative data in a phylogenetic framework analyses of diversification and macroevolution computing distances from allelic and nucleotide data reading nucleotide sequences and several tools such as Mantel s test computation of minimum spanning tree generalized skyline plots estimation of absolute evolutionary rates and clock like trees using mean path lengths non parametric rate smoothing and penalized likelinood Phylogeny estimation can be done with the NJ BIONJ and ME methods The above description is from http cran r project org web packages ape index html More information about ape is available from http ape mpl ird fr ape uses the R environment The tool was identified for inclusion by the iPlant Tree of Life working group The 0 3 x release of the Discovery Environment uses ape version 2 6 2 Discovery Environment Manual 104 Burrows Wheeler Aligner BWA Burrows Wheeler Aligner BWA is an efficient program that aligns relatively short nucleotide sequences against along reference sequence s
45. name details e The user is also provided links to the TROPICOS database housed by the Missouri Botanical Gardens for additional details The current algorithmic pipeline includes use of the GNI parser by Dmitry Mozzherin and TaxaMatch by Tony Rees Known issues Matching limitations e Current implementation only allows matching of genus and species Work is underway to incorporate matching for full names family through variety A revised algorithm is needed This issue will be addressed in a future release Resolving similar names e The current implementation does not provide synonymous resolution of names This issue will be addressed in a future release Entry of names e Currently names that are entered directly into the application must NOT contain family Discovery Environment Manual 123 names The application will not work until the GNI parser is able to accept family names Entry of invalid names e The only indication a name has not matched is a return of all parts of the name in the Unmatched column of the application A fix to identify the name as having no match is desired This issue will be addressed in a future release Multiple same name return with same score e The current version of TNRS is only performing a match to the name entered not resolving synonyms that exist in the TROPICOS database This information is available in the database however at this time all names that match the submitted name will be
46. nment Manual 94 TopHat Paired End for Illumina This analysis uses TopHat The configuration options are set to be optimal for pair end reads derived from Illumina sequencing technology not 454 ABI or PacBio A similar analysis is available for single end reads Choose Analysis x Select from available analyses to use with your data gt Phylogenetic Systematics gt Standardization 4 Transcriptomics Genomics 4 Short Read Aligners l Burrows Wheeler Aligner Single End F l Burrows Wheeler Aligner Paired End F Bowtie Single End for Illumina all para l Bowtie Paired End for Illumina all para TopHat Single End for Illumina TopHat Paired End for Illumina gt Ourality Control and Maniniulatinn ip Ok Cancel Select TopHat Paired End for Illumina from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 95 Select input data TopHat Paired End for Illumina Select input data Select read file Browse Select mate file Browse Reference Genome Arabidopsis Lyrata Select Parameters Click Add to choose the previously uploaded read and mate files you wish to align to a reference genome Discovery Environment Manual 96 Select Reference Genome O TopHat Paired End for Illumina Select input data Select read file Browse Select mate file Browse Reference Genome Arabidopsis Lyrata Arabid
47. nment Quick Start guide on the iPlant wiki to begin Discovery Environment Manual 130 TestData folder contents A quick description of each of the sample data files provided in the iPlant Discovery Environment accepted_hits sam This is a SAM file produced from aligning s_8_sequence clipper sanger txt to Arabidopsis thaliana v9 reference genome and can be used to determine Cufflinks Transcript Quantification aq trait nex This file contains the supporting continuous traits for the phylogenetic tree described in aq tree nex and can be used with aq tree nex for Independent Contrasts analysis aq tree nex This file represents a 30 character phylogenetic tree that can be used with aq trait nex for Independent Contrasts analysis PDAP trait nex This file contains supporting continuous traits for the phylogenetic tree described in PDAP tree nex and can be used with PDAP tree nex for Independent Contrasts analysis PDAP tree nex This file contains a phylogenetic tree for 49 mammals that can be used with PDAP trait nex for Independent Contrasts analysis s_8_ sequence clipper sanger txt This is a clipped rescaled FASTQ file produced from removing the terminal 3 sequence adaptor from s_8_sequence txt followed by conversion of the quality score scale to Sanger PHRED 33 and is useful to learn and test our alignment mechanism s_8_ sequence txt This is a dataset comprised of 6632564 100 bp reads from Arabidopsis that were
48. ool was identified for inclusion by the iPlant Tree of Life working group Select TR Demo from the menu User Preferences Help User Manual About Discovery Environment Discovery Environment Manual 112 Select search type Tree Reconciliation Search Type Gene Identifier BLAST GO Term Search GO Accession of Genes of Species No results to display Choose a Search Type from the drop down box You may search by Gene Identifier BLAST GO Term or GO Accession The genes that are currently available are Arabidopsis Cucumber Grape Papaya Poplar and Soybean The 0 3 x release of the Discovery Environment uses BLAST version 2 2 24 Discovery Environment Manual 113 View search results O Tree Reconciliation Search Type Search Results for V01G0952 Name of Genes of Species pg00892 7 8 6 Enter your search term in the box and click Search Highlight returned search results and click View Discovery Environment Manual 114 View results Gene Cluster pg00892 Reconciliation Gene Tree Number of Duplication Events 4 GO Annotations inti k cytoplasm Number of Speciation Events 3 transcription Number of Genes 8 translation Number of Species 6 pipaa nucleolus meiosis gene silencing by RNA virus induced gene silencing resnonse ta auxin stimulus DNA Sequences for Gene Family Amino Acid Sequences for Gene Family Multiple Sequence A
49. opsis Thaliana v10 Arabidopsis Thaliana v9 Brachypodium Distachyon Oryza Indica Oryza Japonica Physcomitrella Patens V1 Physcomitrella Patens V1 1 Populus Trichocarpa Sorghum Bicolor Vitis Vinifera Zea Mays v1 Zea Mays v2 Select the reference genome Discovery Environment Manual 97 Select Parameters part one TopHat Paired End for Illumina Select input data Select Parameters Anchor length 8 Splice mismatches 0 Minimum intron length 70 Maximum intron length 500000 Select input quality scale Input quals are from GA Pipeline ver gt 1 3 gt Minimum isoform fraction 0 15 Number of threads to launch 2 Allow this many hits per read Select your desired options continued in following images Discovery Environment Manual 98 Select Parameters part two O TopHat Paired End for Illumina Select input data Select Parameters Allow this many hits per read 40 Minimum isoform fraction 0 15 C Look for reads incident to microexons C Use a slower but more sensitive algorithm Select library type Not strand specific 4 Segment mismatches 2 Segment subdivide length 25 Length of exonic hops in splice graph Discovery Environment Manual 99 Select Parameters part three TopHat Paired End for Illumina Select input data Select Parameters Minimum intron length found during closure search 50 Maximum intron length found during closure searc
50. ore tools will be integrated and basic analyses for each tool will be created Trait Evolution TE New functionality in 0 3 0 Ancestral character estimation ACE e This uses an R based package called ape which was installed as a tool using the new metadata method Then analyses for both continuous and discrete versions of ACE were added to the DE using the new metadata format Phylogenetic Independent Contrasts PIC e This analysis was hard coded as a function in 0 2 1 and was rewritten for 0 3 0 using the new metadata methods for both tool integration and creation of analyses Known issues Discovery Environment Manual 127 File parsing e Some file formats are not uploading correctly This is being worked on currently and a fix is expected shortly Future plans e being researched Discovery Environment Manual 128 Tool Integration If you have a tool that you would like to have integrated into The iPlant Discovery Environment DE this can be done in just a few steps Please contact us if you are interested in collaborating with us to do so The basic steps include Deploying the software tool to our cyberinfrastructure Providing us with sample data for testing and a clear description of expected output Authoring metadata that tells our system about the tool and how it is used we have samples and a Clear tutorial Finally to expose the tool for use an analysis must be created Please see Creating anew Anal
51. ort More Actions v from Desktop from URL from Data Source p Phylota Import provides a drop down menu from which you can upload data from your computer import data from a URL or import data from external repositories that have been enabled for direct access from the Discovery Environment Navigate to the folder into which you want to import data and click a menu option to import Each method is described below Discovery Environment Manual 11 Import from Desktop Upload your data x Browse A File Type Selectfiletype Cancel Click Browse to choose the file from your computer to import from your desktop Select the appropriate file type from the drop down list Choices include Phylogenetic data List of names for resolution Sequence data and Barcode file Click Upload File name restrictions for imported files File names must be unique and may be a maximum of 250 characters All alphanumeric characters are permitted along with these special characters the dash underscore _ or period Spaces are allowed but are not permitted as the first last or only character If a file is imported that has the same name as an existing file the user is prompted that the file already exists and asked if he she wants to overwrite If yes the file is imported as a new file Import from URL Import from URL x Enter URL below http or ftp http yourdatasource edu location Enter details
52. overy Environment Manual 8 Icons Manage Data Perform Analyses Icons enable easy access to data and analyses Notifications CC O Show all notifications ii y Click Show all notifications to show messages from the system about status of data file imports and status updates for all analyses for your current session The icons next to the text will sort those notifications by type analysis or data Discovery Environment Manual 9 Manage Data Introduction O Manage Data Available Files New Folder 2 Import More Actions E Name Uploaded O TestData E E pata Click the Manage Data icon to upload and manipulate data files The window displays all files that you have uploaded or imported into the Discovery Environment as well as some sample data provided to you by iPlant in the TestData folder Home Icon A Home TestDat The Home icon at the lower left corner will always return you to the top level When browsing folders your current folder will appear next to this icon Discovery Environment Manual 10 Up icon Available Files Name When browing within a folder an Up icon will appear to the upper left of the list of files and folders Click this to navigate one level above your current location Create a folder New Folder Import More Actions Description Uploaded Click New Folder to create a new folder in your current location Import data New Folder 2 Imp
53. ptions Find SNPs Enter the sample name genotype20110301 Launch Job Enter a name for your genotype sample to make it easier for you to keep track of multiple VCF data records Click Launch Job Enter aname and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 75 Independent Contrasts Overview Phylogenetic Independent Contrasts PIC is a subset of phylogenetic comparative methods which use information on the evolutionary relationships of organisms phylogenetic trees to test for correlated evolutionary changes in two or more traits PIC is a statistically based approach that uses the phylogenetic tree and evolutionary branch lengths as a guide to determine whether two or more quantitative characters are evolutionarily correlated PIC can help users discern between characters that are similar because of acommon evolutionary history from those which are similar for other reasons such as an adaptive response to environmental conditions For someone doing data analysis PIC can be considered as a new set of characters with evolution history subtracted Thus the correlation between two or more PIC characters becomes meaningful PIC uses the Contrast program from PHYLIP This method originated in this paper Felsenstein J 1985 Phylogenies and the comparative method American Naturalist 125 1 15 Discove
54. rdization ii TNRS Demo 4 Quality Control and Manipulation 4 4 Single End Reads tg FASTX Barcode Splitter Single En K FASTQ Quality Rescaler i FASTX Quality Filter i FASTX Clipper gt Tranecrintamics Gannamircs 8S9Xess9s Cancel Select FASTX Trimmer from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 65 Select file FASTX Trimmer Select file Select file Remove non biological sequences Click Browse to select your previously uploaded file Click Remove non biological sequences Discovery Environment Manual 66 FASTX Trimmer Select file Remove non biological sequences First base to keep 1 Last base to keep 28 Keep or modify the default settings Click Launch Job Enter a name and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 67 Find SNPs Overview Find SNPs uses SAMtools Find SNPs finds variants or single nucleotide polymorphisms SNPs in DNA datasets You may upload your own existing SAM alignment files that have been derived from one of the supported reference genomes and use them to identify SNPs The output of this analysis is a listing of variants in VCF3 3 format Discovery Environment Manual 68 Find SNPs An overview of Find SNPs is available Choose Analysis x Select from avail
55. rence genome Arabidopsis Lyrata Arabidopsis Thaliana v10 Arabidopsis Thaliana v9 Brachypodium Distachyon Oryza Indica Oryza Japonica Physcomitrella Patens V1 Physcomitrella Patens V1 1 Populus Trichocarpa Sorghum Bicolor Vitis Vinifera Zea Mays vl Zea Mays v2 Zea Mays v5a Launch Job Click the arrow to open a drop down box listing available reference genomes Click one to select it Click Launch Job Enter aname and description for the job and click Ok See Perform Analyses for information about monitoring the process and where to find your results Discovery Environment Manual 43 Cufflinks Transcript Quantification This analysis uses Cufflinks Choose Analysis x Select from available analyses to use with your data gt Phylogenetic Systematics Standardization 4 Transcriptomics Genomics gt Short Read Aligners gt Quality Control and Manipulation 4 RNA Seq K Cufflinks Transcript Quantification gt Variant Detection Cancel Select Cufflinks Transcript Quantification from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 44 Select SAM File s 9 Cufflinks Transcript Quantification Select SAM File s File Name No files to display Select Reference Annotation x Parameters for analysis y Click Add to choose your previously uploaded SAM file s Discovery Environment Manual 45 Select Reference Annotation O Cuff
56. rent tabs will appear in the new window For example viewing a TNRS results file shows a list of names and matches with links Viewing a nex file will show Raw and Tree tabs Viewing a sam file will show Preview and Description tabs Other file types display their contents in appropriate ways Discovery Environment Manual 16 Perform Analyses a 7a RAN UN Perform Analyses Analyses take implemented tools and enable them to be executed in the Discovery Environment Click the Perform Analyses icon on the main page to start Perform Analyses Overview gp Choose Analysis f More Actions Name Description No Analyses to display Perform Analyses is where you initiate analyses as well as view or delete completed analyses Click Choose Analysis to initiate an analysis Discovery Environment Manual 17 Choose Analysis Choose Analysis x Select from available analyses to use with your data 4 Phylogenetic Systematics ii TNRS Demo 4 PHYLIP Independent Contrasts 4 ACE i Discrete Ancestral Character Estimation Continuous Ancestral Character Estimatic Standardization Transcriptomics Genomics Cancel Analyses are categorized into logical groups to make specific tasks easier to find Click the arrow next to a category to show what it contains Select an analysis and click Ok to start When you have finished setting up your chosen analysis by following the steps it requires click
57. returned TROPICOS does have a reference for which of these synonyms is the accepted name and this name is the one that is selected as the best match for a user Upon navigation to the TROPICOS web interface this name is identified by an exclamation point This issue will be addressed in a future release TNRS does not show results in View Analysis window e The View Analysis window currently displays a representation of a chosen analysis and its execution for jobs that utilize the Job Execution Framework TNRS is a web service call and does not use this framework to execute Therefore it does not appear in the View Analysis window Results from an execution populate in the Manage Data window with a timestamp This issue will be addressed in a future release TNRS does not use the Notifications framework e This issue will be addressed in a future release TNRS job name e The name entered is not displayed in the Manage Data window with the outputs The user is able to identify the job only by a timestamp and a description of Taxamatch Result This issue will be addressed in a future release TNRS Manage Data window population e The results for a TNRS job do not display consistently with other jobs and do not use the jobs execution framework No folder is generated and the outputs return in the root folder for data This issue will be addressed in a future release Other matching issues e Information cannot be matched to names that ar
58. rsistent notifications are related to file import or upload Success or failure and analysis Success or failure These appear in View Notifications and remain until a user chooses to delete them To filter these notifications by type users may select either the data icon the analysis icon or by utilize the drop down menu in View Notifications e Transient notifications are related to file deletion job submission and issues where the ability to view an output Success or failure is not available These are presented to the user as a pop up window in the lower right corner of the DE Adjustable notification display e Notifications are displayed in descending date time order by default however this is adjustable Point your cursor at the right hand of the Created Date column header will cause a down to appear Select the arrow to choose a sorting preference from the drop down menu Also shown is the ability to limit what columns are displayed Known issues Email notifications e There is currently an interface available to receive an email notification for long running jobs however support services for this are not currently integrated This issue will be addressed in a future release Notification persistence e Currently refreshing the browser will eliminate transient notifications from the main Discovery Environment Manual 117 window This issue will be addressed in a future release Display e Some of the text for the notification
59. ry Environment Manual 76 Independent Contrasts An overview of Independent Contrasts is available Choose Analysis x Select from available analyses to use with your data 4 Phylogenetic Systematics ti TNRS Demo 4 PHYLIP ACE Standardization Transcriptomics Genomics Cancel Select Independent Contrasts from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 77 Select input data O Independent Contrasts Select input data Selected Tree s File Name No trees to display Selected Trait Dataset File Name No traits to display Select output details Data needs to be uploaded to the Discovery Environment in advance Click Add in Selected Tree s and Selected Trait Dataset to choose appropriate tree and trait files from the boxes shown next Discovery Environment Manual 78 Select Tree or Trees Select Tree s Enter a search string such as vio x File Name Label Uploaded Date Time E aq tree nex UNKNOWN 1 2011 02 07 07 19 14 802 PDAP tree nex UNKNOWN 1 2011 02 07 07 18 57 305 shorebirds tree UNKNOWN 1 2011 02 07 07 19 06 024 Ok Cancel Highlight your desired file s and click Ok Discovery Environment Manual 79 Select Traits Select Traits Enter a search string such as vio x File Name Uploaded Date Time shorebirds trait nex 2011 02 07 07 19 01 249 PDAP trait nex 2011 02 07 07 18 52 751 aq
60. s the split criteria Barcode files are simple text files Each line should contain an identifier descriptive name for the barcode and the barcode itself A C G T separated by a TAB character or a space An example is given in an image on the FASTX documentation website Choose Analysis x Select from available analyses to use with your data Phylogenetic Systematics 4 Standardization i TNRS Demo 4 Quality Control and Manipulation ig FASTX Trimmer i FASTQ Quality Rescaler i FASTX Quality Filter GSeoos ti FASTX Clipper Transcriptomics Genomics Cancel Select FASTX Barcode Splitter from within Perform Analyses as described in that section Click Ok Discovery Environment Manual 50 Select file FASTX Barcode Splitter Single End Select file Select file Browse Manage Barcodes Click Browse to select your previously uploaded file Click Manage Barcodes Discovery Environment Manual 51 Manage Barcodes create FASTX Barcode Splitter Single End Select file Manage Barcodes Create Barcode File Create barcode file for future use barcode file for testing Enter barcodes used in library Barcode3 CTCGT Barcode6 CGACT Barcode9 TAGCT Number of allowed mismatches 1 Choose Create Barcode File from the drop down menu if you are going to create one now Create aname for the file to help you locate it later Enter your barcodes each on a new line separate
61. scagnia lasiandra Hyeronima oblonga Click the name of a Selected Match to view the database entry for the item on TROPICOS Matches are given a percent score based on the probability of the match Further details are available by clicking details When more than one item is found as a possible match this is noted Click details to view more details about possible matches found to determine which match is best Discovery Environment Manual 86 Choose from among possible matches Submitted Name Lithocarpus catleyanus King Rehd x j Lowest Scientific Author Genus Matched Specific Epithet Author Unmatched Overall Name Matched and Score Attributed and Score Matched and Score Match and Score Annotation Terms Match Select Lithocarpus cantleyanus 9 King ex Hoo Lithocarpus 100 cantleyanus 90 King ex Hook f 75 Lithocarpus cathayanus 95 Seemen R Lithocarpus 100 cathayanus 90 Seemen Rehd 74 O Ok Cancel When more than one item is found as a possible match you may view details in the TROPICOS database by clicking each matched name Denote which one you want to appear in your final list by placing a mark in the circle to the right Click Ok Download results TNRS Results 2011 01 11 01 17 39PM txt Submitted Selected Match Name default is name with the best score ESCHWEILERA RUFIFOLIA Eschweilera rufifolia Bauhinia glabra Bauhinia glabra 1 more Syagrus M1 Syagrus Cheiloclinium M
62. se search e Users are able to search the database which includes gene family clusters identified by John Bowers by selecting a gene family identifier GO term or accession or by performing a BLAST search for a gene of interest Search results and images e A listing of gene families is returned that meets the search criteria and can be selected to view an image of the gene species and fat tree representation of this data Download results e Users can also download all files associated with that gene family and view a summary of the family details Known issues Search interface e The working group has redefined the items that should be available as a search parameter A rework of this interface is underway to clarify the available options and allow Discovery Environment Manual 125 for direct selection of the family for display rather than selection of view to select a family Tree visualization e Fat tree image Some of the text in this image appears to be cut off Users can scroll to get the complete image e Gene tree image Curved lines and the bars for the speciation and duplication events are not standard and will be fixed when incorporation of a new tree visualization tool is implemented e Species tree image Curved lines are not standard and will be fixed with incorporation of new tree visualization tools e Download of images Images are not in the same format at download A fix for this is in progress Saving of
63. sed to obtain the ancestors trait values which minimizes the sum of squared changes along the branches The output is a table of ancestral trait values and the corresponding 95 confidence intervals These value estimates can be plotted on the phylogenetic tree using a color gradient Additionally the function outputs an estimate of the Brownian motion parameter o2 and the log likelihood of the model Discrete ancestral character estimation DACE describes evolutionary trait changes using a continuous time Markov model In this model the probability of change from one state to another depends only on the transition rate and the evolutionary time which is obtained by the phylogeny Maximum Likelihood is then used to estimate the transition rates and the proportional likelihoods of the ancestor s states The output is a table of proportional likelihood for all possible states at the internal nodes These value estimates can be plotted on the phylogenetic tree using pie charts to represent the likelihoods Additionally the function outputs an estimate of the transition rate with its associated uncertainty and the log likelihood of the model More details about ape can be found at http cran r project org web packages ape index html http ape mpl ird fr Discovery Environment Manual 26 Continuous Ancestral Character Estimation CACE An overview of Ancestral Character Estimation is available Choose Analysis x Select from availabl
64. select Delete from the drop down menu at the right or from More Actions Discovery Environment Manual 21 View Output s alternate O Manage Data Available Files t Up Name g New Folder amp Import More Actions Uploaded Data TestData OO mhelmke_iplantcollaborative org IndContrastjob1 je2e Analysis output s are automatically placed in a folder in Manage Data and may be viewed from there at any time after a completed run Discovery Environment Manual 22 Viewing and Deleting Notifications COO e Show all notifications K Click Show all notifications near the top right corner of the Discovery Environment screen to show messages from the system The icons next to the text will sort those notifications by type analysis or data View Notifications Filter By All v Category Messages Created Date No notifications to display Notifications are shown in View Notifications and may be filtered by type using the Filter By drop down menu Discovery Environment Manual 23 More Actions O View Notifications ka BigTree tre uploaded successfully Use the checkboxes to select notifications Notifications that include other data such as successful data imports and analysis results may be viewed or deleted from the More Actions drop down Notifications that merely inform such as delete success notices that only appear as popups in the main window are temporary an
65. tification Agent and JEX e An Object State Management system OSM a Notification Agent and a Job Execution Framework JEX were created Metadata tool description e The ability to describe tools with metadata in JSON format was implemented Discovery Environment Manual 118 Flexible tool integration e The Job Execution Framework JEX allows collaborators to integrate their own tools by describing the metadata in JSON format that is sent to the JEX and is stored by the Object State Management system OSM This change enables an easily repeatable process and a somewhat simple mechanism for users to integrate tools and customized implementations or uses of those tools which we call analyses into the DE Core Software personnel are still needed to perform part of the process but we have completed the first step toward making this easier for end users Known Issues Progress monitoring e This functionality is not currently available at a low level e g Job is 50 complete However states like running or completed display in the View Analysis window for a submitted job Low level progress reporting is being discussed Job naming e The name of the job given by the user is displayed in the View Analysis window however the description applied to the job is not displayed This issue will be addressed in a future release End date e The user is currently not returned an end date completion time for the job executed This issu
66. uch as the human genome It implements two algorithms bwa short and BWA SW The former works for query sequences shorter than 200bp and the latter for longer sequences up to around 100kbp Both algorithms do gapped alignment They are usually more accurate and faster on queries with low error rates Above description from the BWA website http bio bwa sourceforge net Authors H Li R Durban The tool was identified for inclusion by the iPlant Genotype to Phenotype working group The 0 3 x release of the Discovery Environment uses BWA version 0 5 9 Discovery Environment Manual 105 Contrast Contrast compares information on the evolutionary relationships of organisms phylogenetic trees to test for correlated evolutionary changes in two or more traits uploaded in Newick format Contrast reads a tree from a tree file and a data set with continuous characters data and produces the independent contrasts for those characters for use in any multivariate statistics package Contrast will also produce covariances regressions and correlations between characters for those contrasts and can also correct for within species sampling variation when individual phenotypes are available within a population Contrast is a part of PHYLIP Above description partially from http evolution genetics washington edu phylip progs data cont html More information is available at http evolution genetics washington edu phylip http evolution geneti
67. xes are under evaluation Import from Phylota e This functionality is suboptimal and improvements are in the planning stages There area Discovery Environment Manual 122 number of issues related to the way data is displayed as well as the general import functionality The import from Phylota fails in the current version of the DE Files with duplicate names do not import e If afile is imported that has the same name as an existing file the import will fail Ideally we would add an extension to the new file s filename such as filename 2 User can not always tell to where a file will import e Imported files are brought in to the folder currently selected by a user however this is not always clear to the user A note has been added to the help documentation Future plans Data and file management e Improvement to data and file management is slated for the next release of the DE As more issues are discovered through testing of the 0 3 0 release they will be added for evaluation for the Data Management project Taxonomic Name Resolution Service TNRS New functionality in 0 3 0 Desired name selection e This application performs exact and fuzzy matching of alist of plant taxonomic names against a database provided by the Missouri Botanical Gardens and returns all names within a set variance When more than one potential match is returned the user is allowed to select the name that best reflects the intended entered name Selected
68. ysis in the Discovery Environment for more information Please contact us if you would like to collaborate with us to integrate a tool and or create an analysis Please see the Tool Integration and Creating an Analysis in the Discovery Environment Quick Start guide to begin Discovery Environment Manual 129 Creating a New Analysis in the Discovery Environment Tools are software packages that perform specific tasks Once tools have been integrated into the Discovery Environment an analysis must be created Analyses are the means by which tools are used in the DE An analysis may include only one tool or several tools chained together into a workflow Tools are integrated into the Discovery Environment using a metadata description of the tool and a metadata description of the interface to that tool All metadata is in JSON format Please see Tool Integration for more information An analysis takes a tool interface description and customizes the settings in it to suit a specific task The analysis may choose to use all of the default values it inherits or it may set new default values reduce parameters or change validation criteria to define how the tool is to be used in the analysis Analyses may be modeled for one or a combination of several tools Please contact us if you would like to collaborate with us to integrate a tool and or author an analysis Please see the Tool Integration and Creating a new Analysis in the Discovery Enviro

Download Pdf Manuals

image

Related Search

Related Contents

取扱説明書 エア・ホースリール 品番: 35209000 型式:B6520  [11] Chap.9 Installation and Wiring    BATEDOR DE MILK SHAKE, COPO INOX, 1 HASTE MODELO  manual de instrucciones  USER`S MANUAL  

Copyright © All rights reserved.
Failed to retrieve file