Home

User Manual

1. is not permitted e If auser have multiple BAMs for a sample then user need to preprocess the all the BAMs to create a merge BAM for the sample which can be done using Samtools merge module lt TREAT_HOME gt bin samtools 0 1 12a samtools merge Usage samtools merge nr h inh sam lt out bam gt lt inl bam gt lt in2 bam gt MAYO BIC PI Support Page 13 Options sort by read names s attach RG tag inferred from file names u uncompressed BAM output R STR merge file in the specified region STR all h FILE copy the header in FILE to lt out bam gt inl bam e User should make sure the Quality values in the FASTQ are in the Sanger format if not then user need to convert to Sanger quality values e All the reference files are pre processed so we assume that user is using all the references provided in the package Results Navigation Tools needed for results visualization a IGV b Adobe PDF c EXCEL 2007 for larger data sets d Web browser IE or Firefox Safari Chrome Appendix Column header description for the SNV report IGV Link IGV link for variant call Click on the link will take you chri 100089177 to the variant position in IGV Chr Chromosome Index chr1 Start Genomic Position 100089177 dbSNP130 SNV id from dbSNP 130 a rsID is displayed is alleles are rs2307130 same Ref Reference Allele A HapMap CE
2. lt files and folder structure description gt lt IGV Setup document gt lt TREAT workflow image gt lt Column Description for variant Reports gt lt amazon cloud tutorial gt _ lt example configuration files gt _ lt test_data gt _ lt fastg gt _ lt sample FASTQs gt _ lt bam gt _ lt sample BAMs gt _ lt variant gt _ lt sample variants files gt _ lt sample _output gt _ lt output structure from all module gt _ lt example gt _ lt configuration files for example data sets for all four modules gt _ lt how to run TREAT various modules gt The Validation of the Installation with the example data set The workflow could be run in two different modes namely Sun Grid Engine SGE mode which requires a cluster environment and single workstation mode Single workstation Mode Under the lt rrear_xome gt directory run the following command to check the validity of the installation scripts non_sge treat sh example run_info_all_non_sge txt SGE mode if the user has the SGE environment under the lt rrear_xome gt directory run the following command to check the validity of the installation scripts non_sge treat sh example run_info_all_non_sge txt Upon successful completion of the test run you will receive an email notification stating that the workflow is completed and results are ready The results from the test run are stored in the following folder structure lt TREAT
3. 0 supportedRead ReadDepth Total Reads stack at the INDEL 8 5 inDBSNPOrNot whether the SNP is in the dbSNP database none none and or 1000 Genomes either echoing the user input in the case of a Maq file or with no user input whether represented in these datasets Accession NCBI or CCDS transcript identifier NM_001009611 CCDS2229 1 functionGVvs GVS class of SNP function using only hg18 frameshift intron and your submitted alleles rsID dbSNP identifier for SNP 0 0 aminoAcids list of amino acids for the codon none none starting with that of the reference base MAYO BIC PI Support Page 16 proteinPosition the position of the amino acid in the NA NA protein beginning at the N terminal with the first amino acid at position 1 followed by the total number of amino acids in the protein the total includes a count for the stop codon polyPhen column polyPhen amino acid substitution unknown unknown impacts geneList Gene Name USCS PRAMEF4 SPCZ5 Entrez_id GeneID UCSC 400735 57405 Gene _ title Gene Descripton UCSC PRAME family SPC25 NDC80 member 4 kinetochore complex component homolog S cerevisiae closest_transcri Splice variant 2bp NM_020675 pt_id Tissue_specifici Tissue Specifcity information related to Link Link ty the GeneID Pathway Pathway information related to the GeneID link Insertion Example Deletion Example Mayo BIC PI Support Cont
4. 26 22 Missense 2 468 1 888 coding synonymous 2 482 1 781 122 105 3 672 2 645 5 078 3 643 NOVEL VARIANTS Notin dbSNP130 Transition To Transversion 1 71 2 03 Ratio Nonsense 12 Missense 261 178 127 95 coding notMod3 12 5 Homozygous 659 408 Heterozygous 84 49 Figure Sample Statistics table for all module References Li H and Durbin R 2009 Fast and accurate short read alignment with Burrows Wheeler Transform Bioinformatics 25 1754 60 PMID 19451168 Langmead B Trapnell C Pop M Salzberg SL Ultrafast and memory efficient alignment of short DNA sequences to the human genome Genome Biol 10 R25 DePristo M Banks E Poplin R Garimella K Maguire J Hartl C Philippakis A del Angel G Rivas M A Hanna M McKenna A Fennell T Kernytsky A Sivachenko A Cibulskis K Gabriel S Altshuler D and Daly M A framework for variation discovery and genotyping using next generation DNA sequencing data Nature Genetics 2011 Apr 43 5 491 498 Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth G Abecasis G Durbin R and 1000 Genome Project Data Processing Subgroup 2009 The Sequence alignment map SAM format and SAMtools Bioinformatics 25 2078 9 PMID 19505943 MAYO BIC PI Support Page 18 Goya R Sun MG Morin RD Leung G Ha G Wiegand KC Senz J Crisan A Marra MA Hirst M Huntsman D Murphy KP Aparicio S Shah SP SNVMix predicting sin
5. 9 20 21 22 X Y M lanes are i separated _LANE INDEX one is required Run TREAT For non SGE mode run the following command under the lt TREAT_HOME gt directory scripts non_sge treat sh lt PATH TO RUN INFO gt run_info txt For SGE mode run the following command under the lt TREAT_HOME gt directory scripts sge treat sh lt PATH TO RUN INFO gt run_info txt lt OUTPUT_DIR gt _ lt PI gt _ lt TOOL gt _ lt OUTPUT_FOLDER gt _ lt Reports per Sample gt lt sample SNV cleaned_annot xls gt lt sample SNV cleaned_annot_filtered xls gt lt sample INDEL cleaned_annot xls gt lt sample INDEL cleaned_annot_filtered xls gt _ lt Main_Document html gt _ lt igv_session xml gt lt OUTPUT_DIR gt lt PI gt _ lt TOOL gt _ lt OUTPUT_FOLDER gt The above structure is created from the information supplied by the user in the run info file The rest of the folder structure is dependent on the analysis module user is specifying The folder and files in the structure above are the output from this module There are other intermediate folders and files created by TREAT that can be useful for tertiary analysis The user can run the following command to get rid of the intermediate files scripts sge cleanspace sh lt full path to OUTPUT _FOLDER gt Limitations to the workflow e Sample names should not start with a number and special character H
6. HOME gt _ lt test_data gt _ lt LastName_FirstName gt _ lt exome gt _ lt allModule gt _ lt Reports gt lt SNV cleaned_annot xls gt lt SNV cleaned_annot_filtered xls gt lt INDEL cleaned_annot xls gt _ lt INDEL cleaned_annot_filtered xls gt _ lt Reports per Sample gt lt sample SNV cleaned_annot xls gt lt sample SNV cleaned_annot_filtered xls gt lt sample INDEL cleaned_annot xls gt _ lt sample INDEL cleaned_annot_filtered xls gt _ lt Main Document html1 gt _ lt igv_session xml gt MAYO BIC PI Support Page 5 There are other files and folders which are created using TREAT but it contains intermediate files useful for tertiary analysis There is a document named as overviewFilesAndFolder pdf in the lt doc gt folder under lt TREAT_HOME gt which describes each folder and file format To view the local sequence alignment using IGV all rows of the first column of each variant report is hyperlinked to the IGV viewer you will need to go to the IGV home page at http www broadinstitute org software igv home and download and IGV application and load the igv_session xml file Alternatively in the doc folder lt TREAT HOME gt _ lt doc gt There is a tutorial which includes different steps to setup IGV takes less than 5 minutes and to utilize this feature Step by Step Instruction to run TREAT User needs to prepare two configuration files to run TREAT One for the sa
7. LIGNER BWA SNV_CALLER SNVmix PAIRED 1 READLENGTH 100 DISEASE NONE VARIANT TYPE BOTH PI baheti_saurabh OUTPUT _DIR TREAT1 0 test EMAIL baheti saurabh mayo edu SAMPLENAMES sampleA sampleB TOOL _INFO TREAT1 0 example tool_info_sge txt SAMPLE _INFO TREAT1 0 example sample info _all txt ANALYSIS all OUTPUT _FOLDER al1Module CENTER MAYO PLATFORM illumina GENOMEBUILD hg18 SAMPLEINFORMATION There are 2 samples for this study CHRINDEX Le2 3 425262 72829 10s Lisl2s13s14 1Ssl6 17s18s19 20s21s22sxX s YM LANE INDEX 1 2 NUM_SAMPLES 2 QUEUE 1 day INPUT_DIR TREAT1 0 test_data fastq aa RR ra Description of the identifiers in the run information file o create folder structure ALIGNER PATRED for PE or 0 for SR DISEASE Name of the disease lastname firstname Name of the person running the TREAT MAYO BIC PI Support Page 7 joa ae ccc Eee E E E NONE N SERENE aE IA last al aS sh tata ttl ad cli ee OUTPUT DI output directory location Email address Name of the sample same as sample info file seperated sampleA sampleB path to sample info file name for the output folder information Free Text for HTML report about the samples list of chromosomes user need to i analyze separated 1 223 4 52 6 7 8 9 10 1 LDL 2513 714515216217 18 19 20 21 22 Y M lanes are separated NOTE bo
8. TREAT User Guide version 1 0 Department of Biomedical Statistics and Informatics Mayo Clinic Sept 07 2011 Contents 1 Introduction 2 System Requirements 3 Supported sequencing platforms and file formats 3 1 lumina Sequencing 3 1 1 FASTQ 3 1 2 BAM 3 1 3 Called Variants 3 1 3 1 SNV 3 1 3 2 INDEL 3 2 Other sequencing format 4 Installation Qr TREAT workflow pipeline Validation of installed TREAT tool Step by Step Instruction to run TREAT Results Navigation O g ND Appendix 10 References 11 Contact Information MAYO BIC PI Support Page 1 Introduction TREAT Targeted RE sequencing Annotation Tool offers an end to end solution for analyzing and interpreting targeted re sequencing data Three Modules of TREAT e Sequence alignment e Variant calling e Variant annotation and filtering as well as visualization The source code and the executable of TREAT are available for download http ndc mayo edu mayo research biostat stand alone packages cfm An Amazon Cloud Image of the TREAT is also provided for researchers with no access to local bioinformatics infrastructures See a separate document on our website AmazonCloudTutorial PDF System Requirements To use TREAT user needs to meet these requirements 1 A CentOS Linux workstation with 4 cores and at least 16 GB of RAM We do not currently support running it on a Windows platform 2 175 gigabyte GB of storage space to down
9. U allele freq Allele Frequency for Hapmap CEU samples phase G A 0 492 0 50 caucasian 8 lkgenome_CEU_allele fr Allele Frequency for lkgenome CEU samples Release 6 G A 0 483 0 51 eq caucasian 7 HapMap YRI_allele freq Allele Frequency for Hapmap YRI samples phase G A 0 183 0 81 Yorubaian 7 1lkgenome_YRI_allele fr Allele Frequency for lkgenome YRI samples Release 6 G A 0 212 0 78 eq Yorubaian 8 HapMap _JPT CHB allele Allele Frequency for Hapmap CEU samples phase Japanese A G 0 478 0 52 freq and Chinese 2 1kgenome_JPT CHB allel Allele Frequency for lkgneome CEU samples Release 6 A G 0 474 0 52 e freq Japanese and Chinese 6 Alt Alternate allele G GenotypeClass SNV class Homozygous Alternate Al1tAlt or Heterozygous GG Alternate AltRef Alt SupportedReads Number of Reads supporting Alternate Allele min mapping and Base quality 20 Ref SupportedReads Number of Reads supporting Reference Allele min mapping and Base quality 20 ReadDepth Total Reads stack at the variant position min mapping and Base MAYO BIC PI Support Page 14 quality 20 probability quality For SNVmix we get posterior probability and For GATK we probability gt get quality 0 8 Codons codon that has been changed the bases are with respect to E mRNA orientation Transcript ID Ensemble Transcript ID ENST0000029472 4 Protein ID Ensemble Protein ID ENSP0000029472 4 Substitution Amino Acid substitution wi
10. created by TREAT that can be useful for tertiary analysis The user can run the following command to get rid of the intermediate files scripts sge cleanspace sh lt full path to OUTPUT _FOLDER gt Run TREAT using BAM Files Create sample information file NOTE Sample name followed by sign specify the name of the BAM file sampleA Name of the BAM file forSampleA sampleB Name_of the BAM file forSmapleB Create Run information file The run information file contains parameters that are used by the workflow User should make sure that the each column identifier should remain same as given in example file followed by sign TOOL exome DATE 5 26 2011 ALIGNER BWA SNV_CALLER SNVmix PAIRED 1 READLENGTH 100 DISEASE NONE VARIANT TYPE BOTH PI baheti_saurabh OUTPUT _DIR TREAT1 0 test EMAIL baheti saurabh mayo edu SAMPLENAMES sampleA sampleB TOOL _INFO TREAT1 0 example tool_info_sge txt SAMPLE INFO TREAT1 0 example sample info variant txt ANALYSIS variant OUTPUT_FOLDER variantModule CENTER MAYO PLATFORM illumina GENOMEBUILD hg18 SAMPLEINFORMATION There are 2 samples for this study CHRINDEX L 22 3 4 52 6272829 1l0slisl2slasl4sLSsl6r17s18s19s20s21s222XsY 2M LANE INDEX 1 2 NUM_SAMPLES 2 QUEUE 1 day INPUT_DIR TREAT1 0 test_data bam aaa aaa MAYO BIC PI Support Page 9 Description of the identifiers in the run information file o create folder structure Aligner used t
11. e each column identifier should remain same as given in example file followed by sign Below is the example of the run information file TOOL exome DATE 5 26 2011 ALIGNER BWA SNV_CALLER SNVmix PAIRED 1 READLENGTH 100 DISEASE NONE VARIANT TYPE BOTH PI baheti_saurabh OUTPUT _DIR TREAT1 0 test EMATL baheti saurabh mayo edu SAMPLENAMES sampleA sampleB TOOL _INFO TREAT1 0 example tool_info_sge txt SAMPLE _INFO TREAT1 0 example sample info _annotation txt ANALYSIS annotation PUT_FOLDER annotationModule ENTER MAYO LATFORM illumina ENOMEBUILD hg18 LEINFORMATION There are 2 samples for this study NDEXH132 32425 6272859210211912213214 15 16217 18219220 213223Xk Y2M NDEX 1 2 SAMPLES 2 UEBUE 1 day INPUT_DIR TREAT1 0 test_data variant Gq 4 2258 ti ize O p Atn m w rO Description of the identifiers in the run information file o create folder structure Optional specify if using SGE mode lastname firstname Name of the person running the TREAT path to input directory sampleA sampleB Name of the sample same as sample info file ted path to tool info file annotation bout the samples MAYO BIC PI Support Page 12 joi achat aaa ea caer AeA aeaeaie ge RAI I NR EE aE E I III O N slat lil tl te cial S EAE 1 CHRINDEX 1 2 3 4 5 6 7 8 9 10 i list of chromosomes user need to i i Lis 12 213814215162 17s analyze separated i 18 1
12. ents 1 Project Title 2 Project Description 2 1 Background o 2 2 Study design e 3 Analysis plan o 4 Received Data o 4 1 Sample Summary 5 Results Summary 5 1 Statistics based on per sample analysis 6 Results and Conclusions 7 Results Delivered 8 Useful Links L Project Title NGS Bioinformatics for exome sequencing IL Project Description 1 Background Item Disease Type Number of Samples Read Length PairedEnd SingleRead StartDate EndDate 2 Study design o What are the samples Description cancer There are 2 randomly picked samples for alignment module o Goals of the project Aligning sequencing samples using BWA UL Analysis Plan Figure Sample HTML page MAYO BIC PI Support Page 17 sampleA sampleB Total Reads 2 915 968 2 057 218 2 838 835 1 999 956 Mapped Reads 97 4 97 2 2 608 965 1 860 886 Mapped Reads Q gt 20 89 5 90 5 2 609 035 1 860 947 Used Reads 89 5 90 5 Mapped Reads in the Target 1 564 172 1 103 742 region 53 6 53 7 Called SNVs SNVmix 16 468 11 205 inTa i 9 493 6 745 Transition To Transversion 2 57 2 63 In dbSNP130 8 750 6 288 NotIn dbSNP130 743 457 Called Indels GAT 35178 I in Ta 193 102 Indels leading to frameshift 23 11 Indels in coding regions not 17 7 ins s 2 1 KNOWN VARIANTS in dbSNP130 Total Known SNVs 8 750 6 288 Transition To Transversion 2 66 2 69 Nonsense
13. gle nucleotide variants from next generation sequencing of tumors Bioinformatics 2010 Mar 15 26 6 730 6 Contact information MAYO BIC PI Support Asif Hossain Hossain Asif mayo edu Saurabh Baheti Baheti Saurabh mayo edu MAYO BIC PI Support Page 19
14. hether represented in these datasets Accession NCBI or CCDS transcript identifier NM 000644 functionGVvs GVS class of SNP function using only hg18 and your splice 3 submitted alleles rsID dbSNP identifier for SNP 2307130 aminoAcids list of amino acids for the codon starting with that of none the reference base proteinPosition the position of the amino acid in the protein beginning NA at the N terminal with the first amino acid at position 1 followed by the total number of amino acids in the protein the total includes a count for the stop codon polyPhen column polyPhen amino acid substitution impacts unknown geneList Gene Name USCS AGL Entrez_id GeneID UCSC 178 Gene title Gene Descripton UCSC amylo alpha 1 6 glucosidase 4 alpha glucanotransfe rase closest_transcript_id Splice variant 2bp NM_000644 Tissue_specificity Tissue Specifcity information related to the GeneID Link Pathway Pathway information related to the GeneID link Column header description for the INDEL report IGV Link IGV link for variant call Click on the ehrls12862133 chr2 169436315 link will take you to the variant position in IGV Chr Chromosome Index chrl chr2 Start Genomic Start position 12662133 169436315 Stop Genomic Stoip position 12862133 169436318 Ref Reference Allele AAA Alt Alternate Allele G Base Length Length of INDEL i 3 Indel Number of Reads Supporting INDEL 8
15. ld ones are our recommendations Run TREAT For Non SGE mode run the following command under lt TREAT_HOME gt directory scripts non_sge treat sh lt PATH TO RUN INFO gt lt run info file gt For SGE mode run the following command under lt TREAT_HOME gt directory scripts sge treat sh lt PATH TO RUN INFO gt lt run info file gt lt OUTPUT_DIR gt _ lt PI gt _ lt TOOL gt _ lt OUTPUT_FOLDER gt _ lt realigned_data gt _ lt sample gt _ lt sample igv sorted bam gt _ lt sample igv sorted bam bai gt lt Reports per Sample gt _ lt sample SNV cleaned_annot xls gt _ lt sample SNV cleaned_annot_filtered xls gt _ lt sample INDEL cleaned_annot xls gt _ lt sample INDEL cleaned_annot_filtered xls gt lt Reports gt lt SNV cleaned_annot xls gt lt SNV cleaned_annot_filtered xls gt lt INDEL cleaned_annot xls gt lt INDEL cleaned_annot_filtered xls gt lt variantLocation_SNVs gt _ lt variantLocation_INDELs gt lt Main_Document html gt lt igv_session html gt lt OUTPUT_DIR gt lt PI gt _ lt TOOL gt _ lt OUTPUT_FOLDER gt The above structure is created from the information supplied by the user in the run info file The rest of the folder structure is dependent on the analysis module user is specifying The folder and MAYO BIC PI Support Page 8 files in the structure above are the output from this module There are other intermediate folders and files
16. load and install TREAT 3 Additional storage space for user data 1 terabyte TB for one flow cell run using Illumina HiSeq2000 as of April 2011 User Supplied Files Illumina sequencing platform for the Illumina platform the TREAT accepts 3 different types of input files a FASTQ to run all 3 modules of TREAT A FASTQ format is a text based format which normally uses four lines per sequence Line 1 begins with a character and is followed by a sequence identifier Line 2 is the raw sequence letters ATGC Line 3 begins with a character and is optional followed by the same sequence identifier again MAYO BIC PI Support Page 2 Line 4 encodes the quality values for the sequence in Line 2 stale ea Ee SEQ_ID TCAGTGTTACCCAGGCTAGACTGCAGTTGCACAATCTCTGCTCACTGCAAGCTCTACCTCCCTGGTTCAAGTGATTCTCCTGCCT SEQ ID daddcc _abbb_ V U Q YORXW Q V dad_c _ Y_ccccXaYTabaabc c aBBBBBBBBBBBBBBB SESEE Soe SEES oS SEE ESOS SESS SSES EES SEE HE SO EEE SESE EEE OECO OLOO SES SE SO SEE HBSS SESE eB SES ED SS EEE OOTEL OTOLE OTOCEC COET HESS SED OELSE EOS DSEE EOS BEEE SOS B EES HOSS EEE SES BEES EOS BEES SES BEES SESE EEE SSeS EEE ESSE EES OSB EEE See BEES aaaen b BAM to run variant calling and annotation modules only BAM is compressed and binary format file which includes sequences aligned to reference genome Because it is binary and compressed it requires less storage space Indexing of BAM aims
17. mple information and the other for the run information These files can be located anywhere on the file system as long as they are accessible by the treat sh shell script The examples of both files can be found in the lt example gt folder under lt TREAT_HOME gt directory Run TREAT starting with FASTQ files Create sample information file NOTE Sample name follows sign and then readl and read2 are tab separated specify the name of the FASTQ files Option 1 One Paired End sample per lane i e one fastq per sample per read rn nner nnn nnn nnn T sampleA NameOf_FASTQ file ReadlForSampleA NameOf FASTQ file Read2ForSampleA sampleB NameOf FASTQ file ReadlForSampleB NameOf FASTQ file Read2ForSampleB l sampleA NameOf FASTQ file ReadForSampleA sampleB NameOf FASTQ file ReadForSampleB Option 3 Multiple Lanes per sample for Paired End MAYO BIC PI Support Page 6 sampleA NameOf FASTQ file ReadlForSampleA NameOf FASTQ file Read2ForSampleA sampleA NameOf FASTQ file NextReadlForSampleA NameOf FASTQ file NextRead2ForSampleA sampleA NameOf_FASTQ file ReadForSampleA sampleA NameOf FASTQ file NextReadForSampleA Create Run information file The run information file contains parameters that are used by the workflow User should make sure that the each column identifier should remain same as given in example file followed by sign Below is an example of the run information file TOOL exome DATE 5 26 2011 A
18. not xls gt lt INDEL cleaned_annot_filtered xls gt lt variantLocation_SNVs gt _ lt variantLocation_INDELs gt lt Main_Document html1 gt _ lt igv_session html gt lt OUTPUT_DIR gt _ lt PI gt _ lt TOOL gt _ lt OUTPUT_FOLDER gt The above structure is created from the information supplied by the user in the run info file The rest of the folder structure is dependent on the analysis module user is specifying The folder and files in the structure above are the output from this module There are other intermediate folders and files created by TREAT that can be useful for tertiary analysis The user can run the following command to get rid of the intermediate files scripts sge cleanspace sh lt full path to OUTPUT _FOLDER gt Run TREAT from Called Variants Create sample information file NOTE Variant identifier SNV or INDEL followed by Sample name follows sign and then specify the name of the file Option 1 User has both SNVs and INDELs SNV sampleA nameOfTheVairantFile INDEL sampleA nameOfTheVariantFile EISSITS EIEIO Soe SES Soe DE EE Soe SSeS OOOO OOO EEO EEEE EDS DEEE EDS OSEE SES ESES EDS ESEE EDS E EES SDS E EEE SDE D EEE SES DSEE SOS E EEE SES OSEE SESE EEE EBSD SEE SESE EEE SES ESEE SES EEEEEESEEES MAYO BIC PI Support Page 11 Create Run information file The run information file contains parameters that are used by the workflow User should make sure that th
19. o generate BAM for PE or 0 for SR Name of the disease ype of variant BOTH SNV INDEL output directory location Email address sampleA sampleB Name of the sample same as sample info file seperated MAYO TGEN BMC XXX Where the sequencing is done Build of genome ree Text for HTML report information about the samples 223 3 4 5 6 7 8 9 10 list of chromosomes user need to 11 12 13 14 15 16 17 analyze separated 182192203212 222 XYM LANE INDEX lanes are separated NOTE bold ones are our recommendations Run TREAT For Non SGE mode run the following command under the lt TREAT_HOME gt directory scripts non_sge treat sh lt PATH TO RUN INFO gt lt run information file gt For SGE mode run the following command under the lt TREAT_HOME gt directory scripts sge treat sh lt PATH TO RUN INFO gt lt run information file gt lt OUTPUT_DIR gt _ lt PI gt _ lt TOOL gt _ lt OUTPUT_FOLDER gt _ lt realigned_data gt _ lt sample gt _ lt sample igv sorted bam gt _ lt sample igv sorted bam bai gt _ lt Reports per Sample gt lt sample SNV cleaned_annot xls gt lt sample SNV cleaned_annot_filtered xls gt lt sample INDEL cleaned_annot xls gt _ lt sample INDEL cleaned_annot_filtered xls gt MAYO BIC PI Support Page 10 _ lt Reports gt lt SNV cleaned_annot xls gt lt SNV cleaned_annot_filtered xls gt lt INDEL cleaned_an
20. on with high occurrence rate and vice versa for a positive value conservation Score of 1 indicates the variant position overlapping with evolutionary conserved regions in 17 28 44 vertebrates including mammalian amphibian bird and fish species Regulation Score of 1 indicates the variant position overlapping with regulatory potential regions based on short alignment patterns between known regulatory elements and neutral DNA of human chimpanzee macaque mouse rat dog and cow Tfbs TranscriptionFactorBindingSite Score of 1 indicates the variant position overlapping with transcription factor binding sites conserved in the human mouse rat alignment Tss TranscriptionStartSite Score of 1 indicates the variant position overlapping with transcription start sites TSS on the human genome The TSSs of a gene are important landmarks that help define the promoter regions of a gene MAYO BIC PI Support Page 15 Enhancer genome UCSC mouse transgenesis enhancer assay Score of 1 indicates the variant position overlapping distant acting transcriptional enhancers in the human couples the identification of evolutionary conserved non coding sequences with a moderate throughput inDBSNPOrNot whether the SNP is in the dbSNP database and or 1000 dbSNP_1000Geno Genomes either echoing the user input in the case of a mes Maq file or with no user input w
21. th the position information Region Genomic Region 5 UTR dbSNP ID If dbSNP has a variant overlapping at the same position rs2307130 G the rs ID is displayed However the alleles may not be the same SNP Type Nonsynonymous or Synonymous Prediction SIFT Prediction Damaging Tolerated DAMAGING Warning Not scored Low confidence NA Score Ranges from 0 to 1 The amino acid substitution is NA predicted damaging is the score is lt 0 05 and tolerated if the score is gt 0 05 Median Info Ranges from 0 to 4 32 ideally the number would be between NA 2 75 and 3 5 This is used to measure the diversity of the sequences used for prediction A warning will occur if this is greater than 3 25 because this indicates that the prediction was based on closely related sequences Gene ID Ensemble Gene ID ENSG0000016268 8 Gene Name Gene Name AGL OMIM Disease OMIM disease from NCBI GLYCOGEN STORAGE DISEASE III Average Allele Freqs CEU Hapmap populations A 0 60 G 0 40 User Comment SynonymousCodonUsage For every synonymous codon change as shown in the SIFT column Codons this column indicates the percentage occurrence of the codons in Homosapiens The higher the percentage the more frequent the codon appears in gene sequences Difference This column indicates the difference in percentage between the codon changes A negative value indicates the synonymous change occurring from a codon with low occurrence rate to a cod
22. to achieve fast retrieval of alignment overlapping a specified region without going through the whole alignment c Called Variants to run variant annotation module only Called variants include SNVs and INDELs following are the file specification for both i SNVs Single Nucleotide variance The required file format is tab delimited text file with four columns chromosome genomic location reference allele and alternate allele 100 A T 1000 E T 100 G C ii INDELs The required file format is a tab limited text file with four columns chromosome genomic start genomic stop and information about the indel For insertions insertion then genomic start genomic stop for deletions start stop stop start Bases Last column starts with or for insertion or deletion respectively followed by base s inserted or deleted followed by supported reads and read depth salen chri 100 100 A 34 45 chr1 1000 1001 T 23 34 SEERE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE EEEE MAYO BIC PI Support Page 3 Other sequencing platforms for all non Illumina platforms SOLID Roche454 etc TREAT only accepts the called variants as the input file for the variant annotation The format is described earlier in the document Sequence alignment and variant calling are not supported Installation User can download the latest version of TREAT from http ndc mayo edu ma
23. yo research biostat stand alone packages cfm The file to download TREAT_1 0 1 tar gz Move the file to an appropriate directory lt your_directory gt and run the following command under lt your_directory gt to un compress tar zxvf TREAT_1 0 1 tar gz Note that after uncompressing the tar gz file a new folder will be created under lt your_directory gt and named as TREAT_ 1 0 1 TREAT set up under the installation directory run the following command source setup sh lt TREAT HOME gt lt EMATL gt The two input parameters required for the setup sh are 1 lt TREaT_ Home gt is the complete path to the TREAT_1 0 1 folder lt your_directory gt TREAT_1 0 1 2 Users Email address The setup script does the following e It allows user to create all the scripts to be executable and set all the environment variables required for the TREAT e It also creates three configuration files required for the TREAT to run for the example data set and if user wants to run their own data set and then they can go in to modify it After Installation the following directory structure is created automatically lt TREAT HOME gt _ lt bin gt _ lt all the tools gt lt resource gt lt all the references gt _ lt scripts gt _ lt sge gt _ lt source code for sge mode gt _ lt non_sge gt _ lt source code for non sge mode gt _ lt docs gt _ lt user manual gt MAYO BIC PI Support Page 4

User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents