Home

User Guide for SplicingTypesAnno Package

image

Contents

1. gtfGeneValue g g 1 gtfTranscriptLabel NULL Parent transcript_id exonString exon selectGenes NULL gtfColnames c chr source feature start end score strand frame geneName geneOverlap both exon verlap yes Translate gtf gff file to the GRanges object with gene exon and intron structure It is a preprocessing tool for function splicingGene Use translateGTF for details about parameters 7 2 Count the reads for exon intron structure 7 2 1 splicingCount splicingCount selectGene bamFile accepted hits sort bam select GRange sampleName sample sampleID 1 type any Count the reads within gene exon and intron for one gene The selected GRange object coming from translateGTF Use splicingCount for details about parameters 7 2 2 combineCount combineCount sample list Combine the results from splicingCount for all samples Use combineCount for details about parameters 7 3 Calculate and annotate the major splicing types 7 3 1 splicingGene splicingGene selectGene bamFile select GRange eventType c ri 1 ri 2 es 1 es 2 adleft 1 adleft 2 adright 1 adright 2 adboth 1 adboth 2 sampleName sample sampleID 1 reportSummary TRUE minReadCounts 10 ratioNorm FALSE novel both This function belongs to gene level analysis It only works on single gene and helps users to process mutiple bam fi
2. information The genomic coordinates are the two internal boundaries of the junction reads For intron retention the mergeID should be for an inferred intron which is the retained intron for exon skipping the mergelID should be for an inferred exon which is the skipped exon for alternative sites the mergeID should be the gap of between two splicing junctions It may or may not match the known annotation e g chr2 24907270 24907551_ 24907270 and 24907551 are site 1 and site 2 in Figure 1 geneName the gene name from gtf gff file Specifically for gtf file it should be Attributes e g gene id Pnpla7_chr2_ _intron_30 for intron retention e g gene id LOC683722 gene name LOC683722 p id P10335 tran script id NM 001101008 tss id TSS1868 Exon 7 for exon skipping nonjun the number of non junction reads invovled in this splicing type jun the number of junction reads invovled in this splicing type nonjun norm the scaled number of non junction reads no 109 total ReadsO f BamFilex event Length jun norm the scaled number of junction reads no 109 total ReadsO f BamFilex readLength 2 ratio the percentage of splicing types For intron retention it should be nonjun jun nonjun for exon skipping it should be jun nonjun jun for alternative sites it should be altersitel altersitel altersite2 junLeftBorder the end coordinate of the left junction It is site 1 in Figure 1 junRightBo
3. User Guide for SplicingTypesAnno Package Xiaoyong Sun Fenghua Zuo March 24 2015 t Agricultural Big Data Research Center College of Information Science and Engineering Shandong Agricultural University Taian Shandong 271018 China Department of Information Science Taishan Medical University Taian Shandong China Contents Introduction How to cite Installation Major splicing types Gene level analysis and global level analysis 5 Gene evel analysis eee 5 2 Global level analysis 2 e Input and output 6 1 Annotation file gtf gff fle 22e 6 2 Alignment file bam file es 63 Outputs uer ag Ree moon ous e SER Oe eA SUR NS Function description 7 1 Translate gtf gff file to exon 22e AT translaleG TE na zx REUS 7 2 Count the reads for exon intron structure 1 23 splicingGo nt olde meo em hem A tms 1 2 2 combi neCount 5 S ed aliases we eee ue EUR Rr iS 7 3 Calculate and annotate the major splicing types 7 3 1 plieingGene x soy em sa a UE R ER a Eh 1 3 2 iSplicingReport acr pa zd hsol MS3GRE XY X Y YPF johnsunxlGgmail com 8 Demonstration 10 S L Datasets xU P b eb b dr Reg sitire d 4 0 i 10 8 2 Analysis pipeline 222A 10 1 Introduction Alternative splicing plays a key role in the central dogma Alternative splicing has four main types intron retention exon skipping alterative 5 splice site or alternative donor site and alter
4. ctGene bam 1 result GRange sampleName liver ctr sampleID 1 sample2 splicingGene selectGene bamFile2 result GRange sampleName liver K0 sampleID 2 sList list samplei type sample2 type ri 1 lt combineGene sList splicingEvents ri 1 selectGenes ri 2 lt combineGene sList splicingEvents ri 2 selectGenes es 1 combineGene sList splicingEvents es 1 selectGenes es 2 combineGene sList splicingEvents es 2 selectGenes adleft 1 combineGene sList splicingEvents adleft 1 selectGenes adleft 2 lt combineGene sList splicingEvents adleft 2 selectGenes adright 1 lt combineGene sList splicingEvents adright 1 selectGenes adright 2 lt combineGene sList splicingEvents adright 2 selectGenes adboth 1 lt combineGene sList splicingEvents adboth 1 selectGenes adboth 2 lt combineGene sList splicingEvents adboth 2 selectGenes 5 Finally we can generate a user friendly web report sampleList list SampleName c liver ctr liver ko BamFiles c bam 1 bam 2 SampleID c 1 2 splicingReport inputData sampleList gtfFile mm9 pnpla7 gtfFile selectGenes c Pnpla7 6 The bed files in the report can be visualized in IGV 11
5. ds mapping to one gene Pnpla3 8 2 Analysis pipeline We first convert the gff3 file to gene exon intron structure gt options width 50 gt library SplicingTypesAnno gt mm9 pnpla7 gtfFile lt system file extdata mm9_Pnpla7 gtf package SplicingTypesAnno gt result GRange lt translateGTF mm9 pnpla7 gtfFile gtfTranscriptLabel transcript_id 1 Done N Then we count the reads for exon and intron respectively selectGene Pnpla7 samplei c splicingCount selectGene bam 1 result GRange VV MN 10 bam 1 system file extdata liver ctr sort bam package SplicingTypesAnno bam 2 system file extdata liver ko sort bam package SplicingTypesAnno sampleName liver ctr sampleID 1 1 GENE Pnpla7 are processed 1 some chrNo in bam file are NOT in gtf gff3 file The unmatched data will be overlooke gt sample2 c lt splicingCount selectGene bam 2 result GRange sampleName liver ko sampleID 2 1 GENE Pnpla7 are processed 1 some chrNo in bam file are NOT in gtf gff3 file The unmatched data will be overlooke 3 To compare different samples we combine all the results gt sList c lt list sample1 c sample2 c gt ccount lt combineCount sList c 4 In addition we can quantify and annotate the major splicing types at the gene level selectGene lt c Pnpla7 samplei lt splicingGene sele
6. e alternative splicing events that only one exon is skipped type II of exon skipping describes the events with multiple exons skipped Figure 2 Type I of intron retention defines the events that only one intron is retained type II defines those that more than one intron is retained Figure 3 Type I of the alternative donor or acceptor covers those events across only one intron while type II defines other cases Figure 4 We use the following abbreviations e ri l type I of the intron retention e ri 2 type II of the intron retention e es 1 type I of the exon skipping e es 2 type II of the exon skipping e adleft 1 type I of the alternative donor site e adleft 2 type II of the alternative donor site e adright 1 type I of the alternative acceptor site e adright 2 type II of the alternative acceptor site e adboth 1 type I of the alternative both sites e adboth 2 type II of the alternative both sites Type Figure 2 Two types of exon skipping es 1 and es 2 5 Gene level analysis and global level analysis 5 1 Gene level analysis This section describes how to analyze data for one gene or a few genes The package requires a gtf gff file and an alignment bam file as input and can extract the sequencing data for this gene or a few genes from one bam files one sample or multiple bam files multiple samples The final output can either be a list object with major splicing types or a user friendly w
7. eb report The analysis at this level includes two steps 1 creating exon intron struc ture by translateGTF function 2 extracting related reads from bam file and annotating splicing types The main functions are splicingGene splicingReport splicingCount com bineGene and combineCount See details in the Function description sec tion Figure 3 Two types of intron retention ri 1 and ri 2 5 2 Global level analysis The analysis at this level focuses on how to globally analyze all genes for one sample or multiple samples The package takes a gtf gff file and an alignment bam file as input and generates a convenient web report as output in the current working directory 6 Input and output 6 1 Annotation file gtf gff file For requirement of general format see details at UCSC http genome ucsc edu FAQ FAQformat For examples use translateGTF to see details The annotation file should NOT have header and start as the first line The file should be tab delimited text file 6 2 Alignment file bam file Each bam file represents one sample generated from alignment software such as bowtie or tophat Paired end data are treated as single end data 6 3 Output The output columns are LM Figure 4 Two types of alternative both sites adboth 1 and adboth 2 mergeID the identifier for merging this splicing type across multiple sam ples The mergeID consists of chromosome number genomic coordinates and strand
8. espectively junRightStartCollection the coordinate and reads number collection for the 3 site of all the splicing types for alternative site novelSplicingLink the novel splicing links between two exons for al ternative site novelLeft the novel 5 site for alternative site novelRight the novel 3 site for alternative site note annotation for special case For example some intron retention may occur inside exon This may happen when there are two samll exons overlapped with the third big exon e note2 summary sentence for ratio e g 0 4167 100031 100220 10 14 SR813 adleft 1 0 4167 is calculated from 10 10 14 100031 100220 are two alternative sites for alter native donor sites 10 14 are the read number for two alternative sites SR813 is the sample name for the bam file adleft 1 is the type I of the alternative donor site 7 Function description The main features of this package are 1 to translate gtf gff file to exon_intron structure 2 to count the reads for exon_intron structure 3 to calculate and annotate the major splicing types The first feature is implemented by trans lateGTF the second feature requires two functions splicingCount and com bineCount The last feature is achieved by two functions splicingGene and splicingReport 7 1 Translate gtf gff file to exon 7 1 1 translateGTF translateGTF gtfFile gtfChrFormat chr Chr Chro chro 1 gtfGeneLabel gene_id
9. les in a few seconds It is a convenient tool for those who do not have access to cluster computing environment 7 3 2 splicingReport splicingReport inputData sampleList gtfFile gtfChrFormat chr gtfGeneLabel gene_id gtfGeneValue g gtfTranscriptLabel NULL outputDir html eventType c ri 1 ri 2 es 1 es 2 adleft 1 adleft 2 adright 1 adright 2 adboth 1 adboth 2 inputFile sampleFile minReadCounts 10 selectGenes NULL ratioNorm FALSE novel both parallel FALSE cpus 2 This function can be either gene level analysis or global level anaysis dependent on the selectGenes parameter A web report will be generated in the current working directory Hardware requirement for gene level analysis there is no specific require ment for global level analysis this function requires a large amount of memory Users can take advantage of parallel computing to speed up the analysis 8 Demonstration To illustrate how to use this package two data sets are used SRR094623 and SRR094624 The analysis pipeline includes 1 translating gtf gff3 to gene exon and intron structure 2 counting reads at the gene level 2 quantifying and annotating at the gene level 3 quantifying and annotating at the global level 8 1 Data sets The raw data are from NCBI SRA GSE26561 The SRR094623 is for control and the SRR094624 is for knockout After aligning the raw data with tophat we only selected the rea
10. native 3 splice site or alternative acceptor site SplicingTypesAnno is an R package to annotate these four major splicing types by RNA Seq data As a post processing tool after reads alignment it annotates the alternative splicing events with details at the intron or exon level Specially It takes the alignment file bam file as input and analyzes the raw reads through the pipeline with searching algorithms and defines the related alternative splicing types finally generates a user friendly web report for users In addition it provides high flexibility for users to handle large set of data by global level and gene level functions In the global level users can make use of complex clusters with parallel computing feature to speed up the multiple sample analysis In the gene level users can conveniently extract the related alternative splicing events with a single laptop The output is stored as bed format and easily to be visualized with IGV 2 How to cite Sun X Zuo F Ru Y Guo J Yan X Sablok G SplicingTypesAnno Annotating and quantifying alternative splicing events for RNA Seq data Comput Methods Programs Biomed 2015 119 1 53 62 3 Installation Please install 1 Install packages from CRAN install packages c hwriter SortableHTMLTables 2 Install packages from Bioconductor source http bioconductor org biocLite R biocLite Rsamtools 4 Major splicing types Alternative splicing includes exon skipping int
11. rder the start coordinate of the right junction It is site 2 in Figure 1 intronStart the start coordinate of the intron for intron retention It is site 3 in intron retention of Figure 1 intronEnd the end cooridnate of the intron for intron retention It is site 4 in intron retention of Figure 1 exonStart the start coordinate of the exon for exon skipping It is site 3 in exon skipping of Figure 1 exonEnd the end coordinate of the exon for exon skipping It is site 4 in exon skipping of Figure 1 count countMax countSum there are three metrics for measuring alternative acceptor donor both site s count countSum and countMax for alternative sites For one event of alternative site it involves a few different sites inferred from a few different junction reads To differentiate these count is the number of reads for one specific site countMax is the number of reads for one specific site which has the maximum number of junction reads countSum is the number of total reads for all the sites junLeftEndCollection the coordinate and reads number collection for the 5 site of all the splicing types for alternative site e g 100031 100220 10 14 Let s assume that the splicing event is alter native donor site and site 2 and site4 matches in Figure 1 Then 100031 and 100220 are the site 1 and site 3 5 site and 10 and 14 are the read number for the junction read with sitel and the junction read with site 3 r
12. ron retention alternative donor alternative acceptor alternative both sites There are also some other forms with small percentage which is not discussed in this software We design the searching algorithm for splicing types fully based on the structural properties of junction and non junction reads Figure 1 As a result we use the known splicing sites as reference set If the splicing sites derived from junction reads match the reference set we consider them as known sites otherwise as novel sites If the splicing sites are known but the link between two sites is not reported from transcripts this link is marked as novel splicing link Intron retention Alternative site s moog N Rg 3 4 Exon skipping Reads NS P n gt z z ENNE EE ova 5 C 6 s S 3 4 2 Figure 1 Major splicing types inferred from reads versus annotation from gtf gff Nomenclature of splicing sites used in the study 1 novel left exon boundary 2 novel right exon boundary 3 5 known left exon boundary 4 6 known right exon boundary To simplify the analysis we further divide these alternative splicing types to two different subtypes type I and type II by comparing read information with the annotation files Generally type I only consists of one intron or exon structure type II consists of more than one intron or exon structures More specifically type I of exon skipping describes th

Download Pdf Manuals

image

Related Search

Related Contents

Associated Equipment 6002B Battery Charger User Manual  824€ - Colegio Oficial de Ingenieros Agrónomos de La Rioja  Honda GL1800 User's Manual  メーカー名 車 種 名 タ イ プ 年 式 最大積富寶条軍イクル  warnung  manual de instalação Parks NMS  Topic Dashboard Starter Edition User's Guide  GC 315 - SICES Brasil  

Copyright © All rights reserved.
Failed to retrieve file