Home
MONSTER v1.0 User`s Guide
Contents
1. h help Displays usage information and exits 16 Appendix 4 File format specifications The RNA sequence chosen as reference and target in the Basic Usage session are the human IncRNA HOTAIR 2354 nucleotides and ANRIL 3858 nucleotides in FASTA format respectively The FASTA format requires a single description line followed by lines of nucleotides sequence The description line begins with a greater than gt symbol that separates the sequence identifiers from the sequence data Black lines are not allowed in the middle of the file An example for the reference and target sequences in FASTA format is shown gt HOTAIR human acattctgccctgatttccggaacctggaagcctaggcaggcagtggg FACQECECASCESCSS EG arar ganar ECCT AAACGinECea a gtgaggactgctccgtgggggtaagagagcaccaggcactgaggcctg EiciaGeececacagaEeedadcaceeergeLuecrggegqqder gt ANRIL_human BOEStactatecgteaceigaceacggercteaccaggadcagccgegerts Hers feG e cue ieerro tere fencer ee elo everere cloner eleleoinchnirieeleraneiercl WIPIwlershle wie Cletoferelleevelelene eleielere reteveloimieneie Cleric imeneie ereveteyelec gattatttaaaaatgaaaaaggaagaaaggaaagcgag Both sequences are included in the subdirectory example data of the archive zip file called HOTAIR_human fasta and ANRIL_human fasta However they can be downloaded from the online database IncRNAdb http www Incrna org with identifiers Gm16258 and CDKN2B AS1for HOTAIR and ANRIL IncRNA
2. len 17 loop_pos 2323 2329 seq_len 2337 NNNNNNNNNNNNNNNNN 13333 3 Run the afSearch program of Structator package to look for reference SSD of HOTAIR into the target ANRIL sequence setting a global modality The software returns the found matches Command line example MONSTER_v1 0 bin afsearch ANRIL_human fasta comp data dna_rna comp alph data rna alphab pat HOTAIR_human_RNALfold_150_pred_ssd pat t HOTAIRvsANRIL_matches txt Input files ANRIL human fasta target HOTAIR human RNALfold 150 pred_ssd pat reference dna rna comp and rna alphab Output file HOTAIRvsANRIL_matches txt The Structator output is in the following format matched substring structure seq id matching pos pattern id weight strand 39 00 5 10 54 10 7 00 1 00 6 00 7 00 34 00 48 00 9 00 GCUACAU GCUACAUCCGU GCUACAUCCGU ACAUCCGU ACAUCCGU ACAUCCGU ACAUCCGU ACAUCCGU ACAUCCGU CAUCCG O0OoO0O0O0O0OO0OO0OOoOoooOso0N CAS ho oh FH Ph Ph Ph Ph th th bh UGGGCUCA GGGCUC GGGCUC GGGCUC UCAGACA AGACAAU GACAAU GACAAU GACAAU UAAAAA UAAAAA UAAAAA UAAAAAA O0oO0O0OO0OOOOOOo lt o lt osooOsoN Hh Mh hh Hh Fh FH Fh Fh Fh FH Fh Fh Fh 4 Run RNALfold with a span L equal to 150 to obtain the secondary structure predictions of the ANRIL sequence Command line example MONSTER_v1 0 bin RNALfold lt ANRIL_human fasta L 150 A gt ANRIL_human_RNALfold_15
3. RNALfold_out MONSTER_v1 0 bin match_filter r ANRIL_human_RNALfold_150_pred_ssd_com txt m HOTAIRvsANRIL_matches txt o HOTAIRvsANRIL_filtered txt com MONSTER_v1 0 bin SSD_finder s HOTAIR_human_RNALfold_150_pred_ssd pat m HOTAIRvsANRIL_filtered txt o HOTAIR_chains txt com ECECEES We provide the command lines to run the tutorial on unix platforms 1 e GNU Linux Mac OS Details of each step are explained as follows 1 Run RNALfold of the Vienna Package to obtain the secondary structure predictions for the HOTAIR sequence in dot bracket notation Synopsis RNALfold exe L span Description RNALfold reads RNA sequences from stdin and prints local structure predictions to stdout Options L span Set the maximum allowed separation of a base pair to span i e no pairs i j with j i gt L will be allowed In the present example we used L 150 Command line example MONSTER_v1 0 bin RNALfold lt HOTAIR_human fasta L 150 gt HOTAIR_human_RNALfold_150_pred txt Input file HOTAIR_human fasta Output file HOTAIR_human_RNALfold_150_pred txt The format of the output file is as follows gt hotair_human Sequence header Local predicted structure dot bracket notation Free energy kcal mol 2 60 2317 Starting position in the sequence 3 10 2300 8 20 2287 Nucleotides sequence ACAUUCUGCCCUGAUUUCCGGAACCUGGAAGC CUAGGCAGGCAGUGGGGAACUCUGACUCGC
4. gt match lt string gt Input file name of matches e o lt string gt fout lt string gt Output file name e com Enable insertion of comments in the output e ignore_rest Ignores the rest of the labeled arguments following this flag e version Displays version information and exits e h help Displays usage information and exits Comparing non branching structure option SSD_compare SSD_compare generates statistics on the comparison between a list of reference SSDs and a corresponding list of target SSDs 1 Synopsis SSD_compare exe r lt string gt t lt string gt o lt string gt v version h 2 Input e Reference SSD e Target SSD 3 Output e some statistics for each RSSP of an SSD lines beginning with lt r1 gt only when option v verbose 1s set e some statistics for each SSD lines beginning with lt r2 gt e comment lines lines beginning with the character contain information aimed to make the file readable 4 Options e r lt string gt ref lt string gt required Input file name of reference SSDs e t lt string gt target lt string gt Input file name of target SSDs 15 o lt string gt fout lt string gt Output file name v verbose Print complete SSD scores default print only global SSD scores Ignore_rest Ignores the rest of the labeled arguments following this flag version Displays version information and exits
5. 0_pred txt Input file ANRIL_human fasta Output file ANRIL human RNALfold_150_pred txt The output format is the same of step 1 Run nbRSSP_extractor to extract the non branching structures NBSs from the ANRIL_ human RNALfold_150_pred txt file retaining even the overlapped RSSPs Thus we have a wide array of possible structure predictions of the target Command line example MONSTER_v1 0 bin nbRSSP_extractor i ANRIL_human_RNALfold_150_pred txt o ANRIL_human_RNALfold_150_pred_ssd_com txt com RNALfold_out Input file ANRIL_human_RNALfold_150_pred txt Output file ANRIL_human_RNALfold_150_pred_ssd_com txt The output format is the same of step 2 the only difference consists of the higher number of RSSPs that are extracted because the option RNA fold_out allows to maintain even overlapped predictions Run match_filter to discard the unlikely matches obtained running the step 5 based on the predicted RSSPs of step 5 The software returns the filtered matches between HOTAIR and ANRIL Command line example MONSTER_v1 0 bin match_filter r ANRIL_human_RNALfold_150_pred_ssd_com txt m HOTAIRvsANRIL_matches txt o HOTAIRvsANRIL_filtered txt com Input files ANRIL human RNALfold 150 pred ssd com txt HOTAIRvsANRIL matches txt Output file HOTAIRvsANRIL_filtered txt The output format is the same of the step 3 but with a lower number of matches bec
6. 18 Getting Started 1 MONSTER_v1 0 MONSTER is a procedure to extract and search for RNA non branching structures in order to identify common structural motifs Pipeline overview The pipeline is composed of three parts Figure UG1 1 Structure prediction and encoding of the reference into secondary structural descriptor SSD a prediction of the reference structure through RNALfold b extraction of the non overlapping non branching structures NBSs thorough nbRSSP_extractor module c encoding of SSD through nbRSSP_extractor module 2 Matches searching and filtering a searching for matches between reference SSD and target sequence through Structator b filtering out of matches through match_ filter module 3 Chains of matches building a building of chains of matches through SSD_finder module Example of MONSTER flowchart MATCHES SEARCHING amp FILTERING STRUCTURE PREDICTION amp SSD ENCODING OF HOTAIR CHAINS OF MATCHES e Orange Circle published tool i BULDING e Green Circle our developed software A a ee e Water Blue Rectangle reference RNA I O e Yellow Rectangle target RNA I O e Pink Rectangle matches I O Figure UG1 Pipeline overview The pipeline is composed of three parts 1 Structure prediction and SSD encoding of the reference step 1 4 in the manuscript 2 Matches searching and filtering step 5 6 in the manuscript 3 Chains of matches building step 7 in t
7. 8 pID 30 w 1 20 pos 3654 pID 32 w 1 00 pos 3757 pID 34 w 1 pos 3819 dist 116 103 dist 64 62 The first line starting with contains the number of target sequence in this case lt seqID 0 gt because ANRIL is the only target sequence analyzed then there is a line for each found chain of matches Each line starts with the computed score of the chain and it is followed by 1 the pattern ID pID of the reference RSSPs found in the target sequence 11 the positions pos at which RSSPs have been found in the target 111 the weight w of each RSSP and iv the pair wise relative distances dist This parameter consists of two numbers enclosed in the brackets and comma separated the first providing the distance of the found RSSPs in the reference and the second representing the corresponding distance in the target The highest scores represent the most putative structural motifs shared between the reference and the target 11 Advanced Information 3 This chapter explains the details of our implemented algorithms Non branching structure Extractor nbRSSP_extractor nbRSSP_extractor generates a file containing the list of non branching RSSPs SSD extracted from the input The input file is a list of sequence description of structure pairs that may be provided in several formats one SSD for each pair is generated non branching RSSPs may produced according to different algorithms The default input format is
8. CUGUGCUCUGGAGCUUGAUCCGAAAG CUUCCACAGUGAGGACUGCUCCGUGGGGGUAAGAGAGCACCAGGCACUGAGGCCUGGGAGUUCCACAGACCAACACCCCUGCUCCUGG een UGGUUUUAUAUGC CUUAUGGAGUAUAUACUCACAUGUAGCUAAAUAGACUCAGGACUGCACAUUCCUUGUGUAGGUUGUGUGUGUGUG GUGGUUUUAUGCAUAAAUAAAGUUUUACAUGUGGUGAAAAAA 691 04 lt Minimum free energy 10 dots represent unpaired nucleotides matched brackets opened closed represent paired nucleotides 2 Run nbRSSP_extractor to extract the NBSs using the HOTAIR_human_RNALfold_150_pred txt file as input The software returns the HOTAIR SSD comprising of 67 RSSPs Command line example MONSTER_v1 0 bin nbRSSP_extractor i HOTAIR_human_RNALfold_150_pred txt o HOTAIR_human_RNALfold_150_pred_ssd pat Input file HOTAIR human RNALfold_150_pred txt Output file HOTAIR_human_RNALfold_150_pred_ssd pat The format of the output file is as follows gt RSSPO startpos 6 occurrences 1 weight 3 50 cumulative fe 0 408333 NBS details len 35 loop_ pos 21 25 seq_len 2337 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN lt NBS sequence wild characters NBS gt RSSP1 startpos 42 occurrences 1 weight 1 50 cumulative_ fe 0 423944 len 15 loop_ pos 47 52 seq_len 2337 NNNNNNNNNNNNNNN gt RSSP65 startpos 2300 occurrences 5 weight 1 80 cumulative fe 1 413313 len 18 loop pos 2305 2312 seq_len 2337 NNNNNNNNNNNNNNNNNN gt RSSP66 startpos 2318 occurrences 7 weight 1 70 cumulative fe 1 359844
9. D extracted from the predicted structures 4 Options e 6 f lt string gt fmtIn lt string gt Input file format RNALfoldIRfoldlSfoldlseq structlseqs structs default RNALfold e i lt string gt fin lt string gt Input file name e o lt string gt fout lt string gt Output file name e com 12 Enable insertion of comments in the output fmtOut lt string gt Output file format structatorlfasta default structator sort lt string gt Sort criterium for RSSP selection sort_by_nfelsort_by_cfelsort_by_occs default sort_by_nfe S seq Print sequence in the RSSP descriptors default prints N RNALfold_Inrz Enable RSSP extraction from a linearization of input produced by RNALfold RNALfold_out Generate a file containing descriptors of all RSSP predicted by RNALfold default the file is not generated Ignore_rest Ignores the rest of the labeled arguments following this flag version Displays version information and exits h help Displays usage information and exits Matches Filtering match_filter Match_filter filters out matches that cannot actually fold It writes a file of matches following the output format of Structator containing for each sequence the matches that have been someway predicted Current implementation considers a match predicted if it is a substructure of some predicted RSSP In particular the external loop of the match must coincide with the one of
10. MONSTER v1 0 User s Guide Contents Chapter 1 Getting Started csiis eneen a rE E EEE EE REEE AEE EEES 3 MONSTER VO p ierices iisen e E E E EA O E N 3 Pipeline OVELVIEW siniestra a e a a 3 S YSLEMUTEGUITEMENES iii abad 4 Setting UP N nn O 4 Additonal TES csv ndsissceadeesieasoncess en orree E E EE E eo in 5 Chapter 2 Basic Usagi circinera cs cid eoin a e EEE EEEE a EEE E 6 NAS A A A E 6 Preliminary steps viindaninid idad a TE T a E a a 6 Preliminary les dida iaa a Sid od eves EEE AE EA AANA E O EEE EE AASA e TEENAA 6 Tutorial St ps id e a e a E aa a AE En 6 Chapter 3 Advanced Information 0 cccecceeeccssecsseeseeseceseeaeesecuaecaeeeeesaecaaeeceeseceaesaeeeesaecaeeeeeeseeaeeaees 13 Non branching extraction MDRSSP_eXtractOr eseessesesesssesesssssssssssrsssrsssrsssrsssesssrsssrsssesssressressrent 13 Filtering match filter seses ii ses stennsndea ces nen cdewenresteag vend des snadcnsts EREE R EERE E E eS EESE SaS 14 Charmin S SD Timer si A E TE O NEA R VE E ENEE 15 Comparing NBSs SSD_compare 000 0 eecescesseeseecnseceseceseceseceseeeseeeseeseeeecaeecaeeeaeecaeecaaecsaesaeenaeenaeen 16 Chapter 4 Appendix cionado lidia adria diaria seas 17 File format SpecuCatloms sicic iscvessccsceckens sovwsssosstevad bestedscnstdaspscavenssteeseesaccsvedsssasviassadesvisseusvionanbeevebieeaces 17 Running the script MONSTERS ooo ee eee eeeeseescecnaeceaecesecesecsseessecseeessnesseeeeaeeeaeeeaeesaaecsaecaesaeeeaeee
11. arch according to the user operative system in MONSTER v1 0 bin e Install ViennaRNA Package copy the executable RNALfold according to the user operative system e g RNALfold exe on Window platform in MONSTER v1 0 bin On unix platform e g Mac OS GNU Linux you can find the executable path using the command which RNALfold e g usr bin RNALfold Finally in the directory archive MONSTER bin you should find the following executable files e afsearch e RNALfold e nbRSSP_extractor e match filter e SSD_evaluator e SSD finder To test if the executables have been correctly built type run afsearch RNALfold h nbRSSP_ extractor h match filter h SSD evaluator h SSD finder h NOt WN FPF The online help should be displayed for each command Additional files The data subfolder of archive contains the following additional files needed to run MONSTER dna_rna comp e rna alphab 2 afSearch is a program for matching RNA sequence structure patterns in a precomputed index or directly in a plain FASTA file 6 RNALfold is a program for calculating locally stable secondary structures of RNAs 7 File specifying the Watson Crick and wobble complementary rules File specifying an alphabet to which characters are mapped and the sequences are then alphabetically transformed needed to run afsearch program see the user s manual v1 01 of Struct
12. ator packages for details 5 Basic Usage 2 After installing MONSTER software you can follow a sample run executing the following tutorial Aim of the tutorial The user whishes to search for chains group of matches of a reference IncRNA into a target IncRNA For this example we considered as a reference HOTAIR and as a target ANRIL Preliminary step Go to the archive example_data subfolder which stores all the file needed to execute the tutorial You can run the tutorial step by step following the tutorial step section Otherwise you can run the script MONSTER sh on unix platforms to execute the whole tutorial procedure Preliminary files e HOTAIR_human fasta a fasta file with the RNA sequence of HOTAIR e ANRIL_human fasta a fasta file with the RNA sequence of ANRIL Tutorial Steps MONSTER_v1 0 bin RNALfold lt HOTAIR_human fasta L 1501 gt HOTAIR_human_RNALfold_150_pred txt MONSTER_v1 0 bin nbRSSP_extractor i HOTAIR_human_RNALfold_150_pred txt o HOTAIR_human_RNALfold_150_pred_ssd pat MONSTER_v1 0 bin afsearch ANRIL_human fasta comp data dna_rna comp alph data rna alphab pat HOTAIR_human_RNALfold_150_pred_ssd pat t HOTAIRvsANRIL_matches txt MONSTER_v1 0 bin RNALfold lt ANRIL_human fasta L 150 gt ANRIL_human_RNALfold_150_pred txt MONSTER_v1 0 bin nbRSSP_extractor i ANRIL_human_RNALfold_150_pred txt o ANRIL_human_RNALfold_150_pred_ssd_com txt com
13. ause of the filtering Run SSD_finder to perform the chaining It returns the chains of matches that represent the structural motif shared between ANRIL and HOTAIR Command line example MONSTER_v1 0 bin SSD_finder s HOTAIR_human_RNALfold_150_pred_ssd pat m HOTAIRvsANRIL_filtered txt o HOTAIR_chains txt com Input files HOTAIR human RNALfold_150_pred_ssd pat reference HOTAIRvsANRIL filtered txt matches Output files HOTAIR_chains txt 10 The format of the output file is as follows lt seqID 0 gt lt Target sequence identifier Chain score Pattern ID reference Weight Position in the target sequence Chain score 3 99 pID 30 w 1 20 pos 41 pID 32 w 1 00 pos 146 pID 34 w 1 00 pos 207 dist 116 105 dist 64 61 4 Relative Distances between pID in reference target score 2 80 pID 21 w 1 00 pos 182 pID 22 w 1 00 pos 208 dist 25 26 score 2 56 pID 30 w 1 20 pos 205 pID 32 w 1 00 pos 312 dist 116 107 score 2 73 pID 30 w 1 20 pos 205 pID 32 w 1 00 pos 326 dist 116 121 score 4 39 pID 7 w 1 00 pos 2755 pID 11 w 1 00 pos 2827 pID 14 w 1 pos 2960 pID 17 w 1 00 pos 3045 dist 75 72 dist 141 133 dist 79 85 score 4 11 pID 11 w 1 00 pos 3625 pID 14 w 1 00 pos 3767 pID 15 w 1 pos 3787 dist 141 142 dist 22 20 score 2 75 pID 9 w 1 00 pos 3769 pID 11 w 1 00 pos 3819 dist 50 50 score 3 9
14. e type rootdir archive gt cd MONSTER v1 0 rootdir archive MONSTER v1 0 gt ls you should find the directories bin include libs src type rootdir archive MONSTER v1 0 gt cd src rootdir archive MONSTER v1 0 src gt make To run MONSTER some pre existing packages are required listed in the Setting up session Some of these packages are not available for Window e g afsearch Thus it is necessary to compile these packages on Window before running MONSTER The attached distribution does not contain configuration files for Windows platform A distribution including the configuration files for CMake is available on request It can be used to generate the configuration files for Visual Studio 3 Source code of the latest version or older versions of the ViennaRNA package need to be compiled according to the guide lines provided in http www tbi univie ac at RNA INSTALL html 4 The instruction of MONSTER installation on GNU Linux and Mac OS are even included in the INSTALL file of archive MONSTER_v1 0 src folder when the build process terminates you should find in the directory rootdir archive MONSTER_v1 0 bin the executables SSD evaluator SSD finder match filter nbRSSP_ extractor e Install Structator1 1 uncompress Structator1 1 linux gnu amd64 tar gz go the subfolder bin of the unzipped Structatorl 1 linux gnu amd64 file copy the executable afse
15. he manuscript More details are given in the paper Such a flowchart is specific for the case in point of two RNA sequences HOTAIR and ANRIL explained in the tutorial of chapter 2 Legend orange circles represent published available tools green circles represent software developed by us rectangles represent software input and output I O colored with water blue and yellow for what concerns reference and target respectively System requirements MONSTER version 1 1 has been tested on the following operative systems e MacOS e GNU Linux Ubuntu 12 04 or later OS type 64bit e Windows 7 Setting up Download all necessary packages listed here e archive zip file with MONSTER_v1 0 from the supplementary material of the paper e Structatorl l linux gnu amd64 tar gz for Linux 64 bit or select the best appropriate file for the user s operative system http www zbh uni hamburg de en research application oriented bioinformatics software structator html e ViennaRNA Package http www tbi univie ac at ronny RNA index html User can download the latest version of ViennaRNA Package selecting the own Package format according to his operative system Otherwise older versions can be downloadable When all packages are downloaded e Install MONSTER_v1 0 choose a directory and let rootdir be the path to this directory unzip the archive zip file in rootdir that will create a folder named archiv
16. s respectively Incrna db Home Search Submit Help Search ncRNA Name kotar Species B Tags pg Description Sequence welcome to the long non coding rna database R IncRNAdb is a database providing comprehensive annotations of eukaryotic long non coding RNAs IncRNAs NEWS Today 3rd July we have updated IncRNAdb to correct an intermittent issue causing an error when viewing some data Please contact us problem persists 17 Running the Script MONSTER sh MONSTER sh searches for chains group of matches of a reference RNA into a target RNA 1 Synopsys MONSTER L span h lt REFERENCE FILE gt lt TARGET FILE gt 2 Input file e REFERENCE FILE File in fasta format of the reference RNA e g HOTAIR_human fasta e TARGET FILE File in fasta format of the target RNA e g ANRIL_human fasta 3 Output file Chains of the reference RNA into the target RNA 4 Options e L Set maximum base pair separation to span default 150 e h Displays usage information and exits Command line example sh MONSTER sh HOTAIR_ human fasta ANRIL_human fasta 18
17. the output of RNALfold Since in this case many overlapping substructures are provided for each sequence two different strategies may be used to select the RSSPs forming the SSD i linearization option RNALfold_Inrz meaning that non overlapping substructures are first selected according to increasing free energy and then are extracted the RSSPs ii weighted default meaning that all possible RSSPs are extracted first from overlapping substructures and weighted with the absolute value of the mean free energy i e a free energy per nucleotide normalized with the structure length and then the non overlapping extracted RSSPs are selected according to decreasing weight in the weighted case with the option RNALfold_out all the RSSPs extracted from overlapping substructures are directly sent to output without any selection 1 Synopsis nbRSSP_extractor exe f lt string gt i lt string gt o lt string gt com fmtOut lt string gt sort lt string gt s RNALfold_Inrz RNALfold_out version h 2 Input e RNALfold default output of RNALfold e Rfold output of Rfold only one sequence description of structure pair at the time or e Sfold output of Sfold only one sequence description of structure pair at the time or e seq struct output of RNAfold or e seqs structs general sequence structure pairs possibly preceded by a FASTA header 3 Output List of non branching RSSP SS
18. the predicted RSSP 1 Synopsis match_filter exe r lt string gt m lt string gt o lt string gt com version h 2 Input e alist of matches corresponding to one or more sequences as generated by Structator e alist of SSDs one for each sequence present in the list of matches corresponding to non branching RSSP predicted by some prediction tool for those sequences 3 Output List of filtered matches between reference and target 13 Options r lt string gt RSSP lt string gt required Input file name of predicted non branching RSSPs m lt string gt match lt string gt Input file name of matches o lt string gt fout lt string gt Output file name com Enable insertion of comments in the output ignore_rest Ignores the rest of the labeled arguments following this flag version Displays version information and exits h help Displays usage information and exits Chaining SSD_finder SSD_finder finds most significant local groups of matches in a target file that have correspondence in a given reference SSD 1 Synopsis SSD_finder s lt string gt m lt string gt o lt string gt com version h Input Reference SSD to be searched for List of matches founded between reference and target Output Chain of matches Options s lt string gt SSD lt string gt required Input file name of the reference SSD 14 e m lt string
Download Pdf Manuals
Related Search
Related Contents
Ajout au mode d`emploi de la DM1000 pour V2.2 取扱い説明書はこちら KVM-NP8 取扱説明書 Elar-Driver-8 MANUEL D`UTILISATION Canon Selphy CP770 LG VX10000 Quick Start Guide (Spanish) Infocus IN3116 data projector 平成19年度 択一試験過去問 Copyright © All rights reserved.
Failed to retrieve file