Home

4. Component type active workflow user`s manual

image

Contents

1. eee emen 45 4 6 1 dl Re 0 21 Eh e WEN 46 4 6 2 e 5 sees P 46 6 4 6 3 Step 1 Node setting salsa la 48 4 6 4 Step2 Rsecuton 000 02 4 6 5 Stepo Result VIC WINS E 55 A CentroidFold Active e e 58 4 7 1 PPV DOE EOD EE 58 4 7 2 IN OC Q HR 59 4 7 3 Step 1 Node SettIng sk nna 60 4 7 4 USO BC GUI HIS MICE 61 4 7 5 Stepo Res e EE 63 4 8 POODLE Active Workflow ccc ccccccccccsecccesccceeecceeeceaseceeescesaeeceueees 66 4 8 1 ProD 0 A 0 E 67 4 8 2 We 67 4 8 3 Step 1 Node setting E 68 4 8 4 Step HCCC Oe RT RS ER 69 4 8 5 Step Result VIC a 70 49 ASIAN Active e NEE 72 4 9 1 PV AA Ola E 73 4 9 2 IN GGG 74 4 9 3 Configuring running environment eee eene nenne 76 4 10 AutoDock Active W OV e 86 MIO MEME ccc inm 87 Ee 87 4 10 9 Ltepl NNode E 88 Z IO FCC E ee 89 4 10 5 Step 3 bsecutonresulte eee eene nennen serenus 91 ee 93 5 1 Appendix A LSDBCrossSearth eene 93 5 2 Appendix B Last parameter 95 5 2 1 TAS TAN ea ee dE 95 5 2 2 lastdb PAPA MCT EE 101 ouch M H 104 1 Introduction This manual describes Active workflow Component type developed at Computational Biology Research Center Advanced Industrial Science and Technology AIST For the installation of Active workflow Component type please refer to
2. lt 2 Node 3 1 Fastapl Active workflow example A node is an icon that is shown in a workflow screen as follows Node 1 3 2 Fasta File Reader Node example When the node is selected the explanation of each node is displayed in the Node Description column at the right of the KNIME screen 3 Node progress Signals below a node indicate progress as shown below 3 3 Signal of Node progress list Complete Thick Executing blue 4 Node menu A node menu is shown when right clicking on a node as shown below Configure Execute Execute and Open Views Cancel Reset Edit Node Name and Description New Workflow Annotation View name of first view Cut Copy Paste lt J Undo Redo SS Delete 3 4 Node menu 3 5 Node menu list Menu command Configure Various settings of node Another window is started Execute Execute the node The node cannot be used unless the node status is yellow Execute and Open Views It is an active display for the The node cannot be node that displays the result used unless the node window status is yellow Execute a node Cancel Cancel the execution The node cannot be used unless the node status 1s deep blue Reset The setting is reset If the node status is green the node is active Edit Node Name and Use to change the node name Another window is Description or Description started New Work
3. and select Execute from the menu 90 4 10 5 Step 3 Execution results 1 Node4 JmolForModeller Result Execution results of AutoDock_SOAP are displayed using JmolForModeller node Ty Results 0 21 JmolForModeller Node 4 X Results Site 6 4 10 5 1 Node4 JmolForModeller Results JmolForModeller executes Jmol which is an application of molecule viewer In the case of AutoDock SOAP there are some docking results in each docking site of a template protein structure Figure 4 10 5 1 To display these results click a Site button located under each image Figure 4 10 5 2 and a docking result menu is opened 2 Compound 102 Energy Compound 103 Energy O Compound 104 Energy 5 Compound 105 Energy Execute Jmol 4 10 5 2 Docking Result menu 91 Select a radio button corresponding to each docking result and click Execute Jmol button Jmol is launched and selected docking result is displayed Figure 4 10 5 3 At a time a pop up window is opened This window displays an absolute path of the docking result file Figure 4 10 5 4 Please visit a Jmol web site for further information Jmol http jmol sourceforge net 4 10 5 3 Jmol JF ath 201 2 04 1541 6 53 801 2864208 5435 docking results pdb 4 10 5 4 Pop up window to display an absolute path of a docking file 92 5 Appendix 5 1 Appendix A LSDBCrossSearch Life
4. BLASTP 2 2 24 Aug 08 2010 Reference Altschul Stephen F Thomas L Madden Alejandro A Schaffer Jinghui Zhang Zheng Zhang Webb Miller and David J Lipman 1997 Gapped BLAST and PSI BLAST a new generation of protein database search programs Nucleic Acids Res 25 3389 3402 Reference for compositional score matrix adjustment Altschul Stephen F John C Wootton E Michael Gertz Richa Agarwala Aleksandr Morgulis Alejandro A Schaffer and Yi Kuo Yu 2005 Protein database searches using compositionally adjusted substitution matrices FEBS J 272 5101 5109 Query gil7296213 gb AAF51505 1 aristaless Drosophila melanogaster 408 letters Database SWISS SWISS sequence taken from the header Last update 0ct 16 2011 532 146 sequences 188 719 038 total letters E Sequences producing significant alignments bits Value sp Q06453 AL DROME RecName Full Homeobox protein aristaless 648 0 0 sp 042115 ARX_DANRE RecName Full Aristaless related homeobox pr 140 2e 32 SplA6YP92 ARX RAT RecName Full Homeobox protein ARX AltName F 135 5e 31 Full Homeobox protein ARX AltName 135 7e 31 Full Homeobox protein ARX AltName 135 7e 31 Full Aristaless homeobox protein 134 1e 30 Full Retinal homeobox protein Rx2 125 7e 28 spl09W201 RX DROME RecName Full Retinal homeobox protein Rx eee 121 1e 26 Full Retinal homeobox protein Rx1 120 1e 26 Full Retinal homeobox protein Rx1 120 1e 26 3p 097039 RX DUGJA
5. Number of Models for Modelling Enter a number of Models to generate The value range is 1 10 Options tab Modeller License License Key for Modeller required Enter a License Key for Modeller required ol 4 6 4Step2 Execution A 0 Modelling SOAP FastaFileReader BlastForModeller SOAP HitRegionSelector_SOAP TemplateSelector_SOAP D D b gt Su gt L ece CF CF Node 1 Node 2 Node 3 Node 4 SDBCrossSearch HtmlView HtmlView 5 Ee i zs IS Node 7 Node 8 Node 9 4 6 4 1 Modelling SOAP Node 1 Nodel FastaFileReader Select Execute in the right click menu for execution 2 Node2 BlastForModeller SOAP Select Execute in the right click menu for execution 3 Node8 HtmlView Modeller SOAP JmolForModeller x EI P ed 6 CED Node 5 Node 6 PDBjMineWeb e Node 10 Select Execute and Open Views in the right click menu for execution and viewing the results 02 A 5 Node3 m Html i nl View 0 lt gt URL fileC work KNIME pre test Data outdir 2011 10 27 15 20 40 2095750030 queryfstblasthtml High scoring segment pair HSP group Score 238 E 5 30726e 21 Identities 45 69 65 2 Positives 50 69 72 5 Length 69 RYRTTFTSFQLEELEKAFSRTHYPDVFTREELAMKIGLTEARIQVWFQNRRAKWRKQEKVGPQSHPYNP R RT FT Q E LEK F RTHYPDVF RE LA KI L EARIQVWF NRRAKWR EK Q P RNRTSFTQEQIEALEKEFERTHYPDVFARERLAAKIDLPEARIQVW
6. Options tab BLAST version 2 2 18 Execution Type Select BLAST or PSI BLAST e Options tab BLAST version 2 2 18 gt E Value Enter a E Value which 1s used as a threshold when BLAST or PSLE BLAST is performed The default value is 1 0E 5 Options tab BLAST version 2 2 18 Interation Enter a value for iteration for PSI BLAST 48 3 The default value is 3 Node3 HitRegionSelector SOAP Set conditions for BLAST or PSI BLAST 1 Select Configure in the right click menu 0 3 HitRegionSelector_SOA Options Conditions to select IPSI JBLAST hit regions Integer is only permitted to input J Coverage X BI Identity 0 30 Minimum Length 30 4 6 3 2 HitRegionSelector_SOAP Configure Options tab Condition to select PSI BLAST hit regions Integer is only permitted to input gt Coverage Set coverage Coverage is a ratio in a hit area against the total length of the protein structure hit The default value is 60 The range of the value is below 50 lt Coverage lt 100 Integer Options tab Condition to select PSI BLAST hit regions Integer is only permitted to input Identity Set identity Identity 1s an amino acid matching rate in the hit area between the query and the target The default value is 30 The range of the value is below 49 10 lt Identity lt 100 Integer Options tab Condition to select PSI BLAST hit region
7. The matrix file 1s read Execute Hierarchical Clustering change the profile data to the representative Execute GGM Execute t test Execute column filter T4 Joiner deprecated Joiner deprecated Execute uniting columns Node 8 RunCytoscape Execute Cytoscape 79 4 9 3 Configuring running environment 1 Node File Reader Select a matrix file of gene appearance data as an input in Configure in the right click menu Enter ASCI data file location press Enter to update preview valid URL file C work KNIME pre testData test Data txt Basic Settings read row IDs read column headers Java style comments Column delimiter tab ignore spaces and tabs Single line comment Preview Glick column header to change column properties name type user settings Row ID S ORF Row YBRIBEG D alpha0 D alpha D alpha 14 D alpha 21 D alpha 0 17 0 04 0 09 033 o 0 Rowi mm mm Foss um C03 koz Don YLR29C 028 0 19 O86 A Don mme 069 on om 056 08 Rows VILI18W 0 04 0 01 81 03 LT NN DLI20W 0 11 ER 0 03 032 003 Gor UD HIT NN i 0 71 Ron 0 42 0 36 0 36 Rowt0 0 18 Bee YBRI 80 um mu ug 17 Rowi2 pm 029 031 0 um 09 Don mp 028 On C034 03 H Howe YMRO GSO jose 9o L042 Fog 06 Row pen Qn om pons Op 1 Row 1fi Els 129 fl m EE
8. Network networkfile 320041000 F600 2 ed L Oy LI Ee L Ow O Made Attribute Broweer Edge Attribute Browser Network Attribute Browser Welcame to Cytoscape 2 8 2 Right click drag to ZOOM Middle click drag to PAN 4 9 3 11 Cytoscape Please refer to the following sites for the details of Cytoscape Cytoscape http www cytoscape org 89 4 10 AutoDock Active Workflow AutoDock_SOAP executes AUTODOCK which is widely used protein ligand docking software developed at Scripps Institute http autodock scripps edu via SOAP The user needs to provide two things A target protein PDB file a single chain protein NOT a protein complex without bound ligands and a MOL2 formatted molecule file The program will automatically 1dentify potential binding sites and calculate binding energy AutoDock http autodock scripps edu L KNIME c0 tinm File Eat ven Node Seprch Run Help 3 ds bleifs i Mi LC KEE LKE ECK wv Workflow Projects Da 0 last REST E D AutoDock SOAP Pai 711 2 Node Description A AutoDock SOAP A Blast REST GI Mafft SOAP P bfileftead AutoDock SOAP MergelargetAndligand AmnolForModeller T li 5 9 me l f fd ey ey Chal Favorite Nodes i x Node 2 Node 3 Node 4 V Personal favorite nodes Mol FileReader Most frequently used nodes On tt a onm ven Y m S Server Workflow Projects A Node Repository 23 inu e Node 5 e Mi Fostapl_sowp y HitRegion Se
9. 1 1 1 2 1 1 0 3 2 0 2 1 1 1 4 R 1 5 0 2 3 1 0 2 0 3 2 2 1 3 2 1 1 3 2 3 1 2 0 1 4 D 2 2 1 6 3 0 2 1 1 3 4 1 3 3 1 0 1 4 3 3 4 3 1 1 4 0 1 0 0 3 52 2 0 3 2 1 0 3 1 0 1 2 1 2 0 2 4 1 4 H 2 0 1 1 3 0 0 2 8 3 3 1 2 1 2 1 2 2 2 3 0 3 0 1 4 I 1 3 3 3 1 3 3 4 3 4 2 3 1 0 3 2 1 3 1 3 3 3 3 1 4 L 1 2 3 4 1 2 3 4 3 2 4 2 2 0 3 2 1 2 1 1 4 3 3 1 4 K 1 2 0 1 3 1 1 2 1 3 2 5 1 3 1 0 1 3 2 2 0 3 1 1 4 M 1 1 2 3 1 0 2 3 2 1 2 1 5 O 2 1 1 1 1 1 3 2 1 1 4 F 2 3 3 3 2 3 3 3 1 0 0 3 0 6 4 2 2 1 3 1 3 0 3 1 4 5 1 1 1 0 1 0 0 0 1 2 2 0 1 2 1 4 1 3 2 2 0 2 0 1 4 T 0 1 0 1 1 1 1 2 2 1 1 1 1 2 1 1 5 2 2 0 1 1 1 1 4 W 3 3 4 4 2 2 3 2 2 3 2 3 1 1 4 3 2 11 2 3 4 2 2 1 4 4 4 5 2 LAST Results View Sequence Alignment Results 39 4 5 WolfPSORT Active Workflow WolfPSORT Active Workflow performs cell localization prediction via SOAP The result of WoLF PSORT can be viewed using HtmlViewNode Please refer to the following sites for details of WoLF PSORT WoLF PSORT http wolfpsort seq cbrc p Furthermore this workflow can retrieve a variety of related 1nformation by using node LSDBCrossSearch that executes Life Science DataBase cross search http lifesciencedb jp dbsearch with regard to the input sequence KNIME kaalt File Edit View Search
10. 4 2 Node There are 4 nodes 4 4 2 1 Last Active Workflow Node list FastaFileReader FastaFileReader The FASTA format file is read Last_SOAP Execute Last Node 3 CBRCViewer The Last execution result is graphically displayed LSDBCrossSearch LSDBCrossSearch Execute LSDB Ka cross search es 94 4 4 3Step 1 Node setting 1 Nodel FastaFileReader Select a FASTA file as an input using right click menu 2 Node Last SOAP Select configure in right click menu sequence Type a DNA e Pratein Target sequence file for comparison Selected File T Browse selected Directory rs Parameter j4 ul mi0 JI kl w0 g1 0 s2 e30 Parameter Other options 4 4 3 1 Last_SOAP Configure Options tab Input type Sequence Type Select DNA or protein e Options tab Target sequence file for comparison Selected File Select an input file to compare 30 Options tab Output Selected Directory Select an output directory e Options tab gt ParamAL gt Parameter Enter AL parameters if necessary The default parameters are as follows 4 u0 m10 11 k1 w0 g1 0 s2 e30 Options tab gt ParamDB Parameter Enter DB parameters if necessary The default parameters are as follows m110 w1 Options tab Advanced Other options Enter other options if necessary Please refer to appendix B for
11. 4 7 5 Step 3 Result for the details 61 URL file C work KNIME pre testData outdir 2011 10 28 11 0 59 0 4750994 192214597 7 centroidfoldOut gt gi 334185880 ref NM_001203122 1 Arabidopsis thaliana RIO kinase 2 AT3G51270 mRNA complete cds GIGTCTAAAGGATTTTCGAGTTCACAAAAAGAATTTCCTCGCTCTACCGCCGCCGTCTCTCTTCTCTCAGCAAGGTTGAATCATCAGAGAAGAAA GGGTTTTACACTGCGGCGTAAAGGATGAAGCTTGACGTGAATGTGTTGAGATATTTATCCAAAGATGATTTTCGAGTTCTCACTGCTGTCGAGAT GGGAATGCGAAACCATGAGATTGTTCCTICIGAGCTTGTGGAGCGCATTGCTTGTCTAAAACATGGAGGCACCTACAAGGTCCTGAAGAACTTGC TCAAGTATAAGCTTTTGCACCACGATTCCTCTAAATATGATGGATTICCGACTCACCTATCIGGGTITATGACTTTCITGCCATTAAAACATTGGIC AACCGGGGTATATTTACCGGTIGICGGICGICAGATTGGTGTTGGTAAAGAGTCAGACATATTTGAGGTCGCTCAGGAAGATGGAACTATTCTAGC AATGAAGTTACATAGACTAGGGAGAACCTCCTTTAGGGCTGTICAAATCTAAGCGTGACTACTTGAGGCATCGCAGTAGTTTCAGCTGGTTGTATC TCICCCGACITGCAGCICICAAGGAGTTTGCTTTTATGAAGGCTTTGGAAGAACATGACTTTCCGGTTCCAAAAGCTATTGACTGCAATAGACAT TGTGITATCATGGTTCAGGTGAAGCAATTACAGAACCCTGAGACAATTTTCGAGAAGATCATTGGTATIGITGITCGITIGGCIGAGCATGGICT AATTCATTGTGACTTCAATGAATTCAACATCATGATTGATGATGAAGAGAAAATAACGATGATTGATTTTCCACAAATGGTATCTGTTTCACACC GAAATGCACAAATGTACTTTGACCGTGATATCGAATGCATCTTCAAGTTTTTCAGAAAAAGGTTTAATATGTCTTTCCATGAAGATAAAGGTGAA TCAGAGGAGACGGAGGTGGATGAGAACAGCAGACCATCTTTTITICGATATTACTAAAGATGCTAATGCTCIGGATAAAGATCTAGAAGCTAGTGG GIICACAAGAAAGGAGCAGACTGACCICGATAAATTTATTGAAGGTGGGGTGGAGAAGAGTGAAGATTCTGATGAGGATGAGGAATCTGATGATG AAGAGCAGACTTGIGAATCAAACGAAGAAGGAAACCTAAATGAAATAAAATCATTACAGTTG
12. Nod 4 4 9 3 1 FastaFileReader Configure Settings tab Enter ASCII data file location press Enter to update preview valid URL Enter the location of an input file Browse can be used for browing a file After a file is specified the read file is displayed in a lower Preview column When the column header in the Preview column is pressed the following screens are displayed 16 Mame alpha ll Type Double mies value pattern Da tai 4 9 3 2 Configure Column Properties In this window whether the output file contains the column name etc is configured e DON T include column in output table Tick the check box if the output file does not include column names e Name The column name is to change Type The type of data in the column is to change miss value pattern Enter a value which is not included in analysis Domain Enter a domain name in the dialog below which is added to the column New domain settings domain values for nominal data poss Values Values found in the table will be added automatically Enter only additional values here that you want to be in the domain 4 9 3 3 Column Properties Domain Settings tab Enter ASCII data file location press Enter to update preview Preserve user settings for new location Tick the check box if the user
13. RecName Full Retinal homeobox protein Rax A 119 3e 26 Full Paired box protein Pax 7 Alt 119 3e 26 sp P23760 PAX3 HUMAN RecName Full Paired box protein Pax 3 Alt 119 3e 26 Full Paired box protein Pax 3 119 3e 26 3p 042567 RXB XENLA RecName Full Retinal homeobox protein Rx B 119 4e 26 8p P47239 PAX7 MOUSE RecName Full Paired box protein Pax 7 119 4e 26 3p 042201 RXA XENLA RecName Full Retinal homeobox protein Rx A 119 4e 26 Full Retinal homeobox protein Rx1 119 6e 26 sp Q4LAL6 ALX4 BOVIN RecName Full Homeobox protein aristaless l 118 6e 26 Full Homeobox protein aristaless l 118 7e 26 sp Q96053 ARX HUMAN RecName sp 035085 ARX MOUSE RecName spl026657 ALX STRPU RecName SpIQ9PVXO RX2 CHICK RecName sp 042356 RX1_DANRE RecName SplQ9I9DS RX1 ASTFA RecName 38p P23759 PAX7 HUMAN RecName 3p P24610 PAX3 MOUSE RecName sp Q9PVYO RX1 CHICK RecName sp Q9H161 ALX4 HUMAN RecName sp 042357 RX2_DANRE RecName Full Retinal homeobox protein Rx2 118 7e 26 spl035137 ALX4 MOUSE RecName Full Homeobox protein aristaless l 117 2e 25 zs XFENTE IL s 4 3 5 3 Node5 HtmlView BLAST Result 32 4 4 Last Active Workflow Last Active Workflow performs sequence comparison via SOAP The result of Last_SOAP can be viewed using CBRCViewerNode Please refer to the following sites for the details of Last LAST http last cbre jp Furthermore this workflow can retrieve a variety of relate
14. alpha 49 lt lt remove Clustering metric Euclidean Glustering method UPGMA VIF 0 0 D Manual 4 9 3 5 Hierarchical Clustering Configure Options tab Select columns for hierarchical clustering by adding to Include section In default all columns will be processed Set parameters for execution Clustering metric Select from the following Euclidean Euclidean distance Pearson Correlation Coefficient Pearson correlation coefficient A Eisen Correlation Coefficient Correlation coefficient Euclidean between Correlations Euclidean distance between correlation coefficient vectors Clustering method Select from the following Single Linkage Complete Linkage 79 9 UPGMA WPGMA Wards Big N requires less memory using Reciprocal nearest neighbor method however requires more time Its results are the same as Wards method VIF Enter a numerical value A number of clusters based on Variance Inflation Factor is inferred The default value is 10 Manual Enter a number of clusters Wards method and Big N method can be used only in Euclidean distance The default value 1s 3 Press OK after specifying Select Execute in the right click menu for execution Node3 Representative Profile Set options for representative profile in Configure using the right click menu Select Include Celumn s Column s Select all search hits S
15. details of the options of Last Press OK after entering 36 4 4 4Step2 Execution 1 2 3 A A 0 Last 22 E 4 4 4 1 Last Node Nodel FastaFileReader Select Execute in the right click menu for execution Node2 Last SOAP Select Execute in the right click menu for execution Node3 CBRCViewer Select Execute and Open Views in the right click menu for execution and viewing the results Node4 LSDBCrossSearch Select Execute and Open Views in the right click menu for execution and viewing the results Please refer to the following 5 1 Appendix A LSDBCrossSearch for the use of the result screen o1 4 4 5Step 3 Result viewing 1 Node3 CBRCViewer LAST Results The execution result of Last_SOAP can be viewed using CBRCViewerNode A text version of the results is shown by clicking View Sequence Alignment Results link File URL file C work KNIME pre testData outdir 201 1 10 27 10 27 7 0 7916432234081139 last out html g19196549gb A AF71122 1 vs g17296213gb AAF51505 1 Ali t Results E External Browser 4 4 5 1 Node3 CBRCViewer LAST Result 38 LAST version 58 2 c 100000 e 30 d 18 x 110 y 110 a 11 EU I m vi d ei Lal Ka e e m I p e L a N Ce a N m ei I ei vel L A ll e i N ll n T H Er A R N D C QE G H I LK M FP S T W Y V B J Z X Aa 4 1 2 2 0 1 1 0 2 1
16. homolog 1 Arabidopsis GENE 55070 GNP 55070 UNP Q7L5Y6 HGNC 25477 1 oft 0 QD BIS TER 1 1408 Sep 2011 10 4 by NBDC biosciencedbc jp dbsearch phrase Arabidopsis 5 1 2 LSDB window Please refer to the life science database cross search site for the details Life Science DataBase Site http biosciencedbc jp dbsearch 94 5 2 Appendix B Last parameter 5 2 1 lastal parameter Option description for LAST has been taken from LAST web site h Show all options and their default settings yv Be verbose write messages about what lastal is doing o FILE Write output to the specified file instead of the screen f NUMBER Choose the output format 0 means tabular and 1 means MAF MAF format looks like this a score 15 s chr3L 19433515 23 24543557 TTTGGGAGTTGAAGTTTTCGCCC s HO4BA01F 1907 2 21 25 TITTGGGAGTTGAAGGTT GCCC Lines starting with s contain the sequence name the start coordinate of the alignment the number of sequence letters spanned by the alignment the strand the sequence length and the aligned letters The start coordinates are zero based If the strand Is the start coordinate is in the reverse strand he same alignment in tabular format looks like this 15 chr3L 19433515 23 24543557 HO4BAOIF1907 2 21 25 17 2 0 4 The final column shows the sizes and offsets of gapless blocks in the alignment In this case we have a block of size 17 then an offset of size 2 in the upper
17. out by clicking LSDB Cross Search button and a Web browser of life science database cross search will appear as shown below dh ERFT ONARIR gt Q CH biosciencedbc jp dbsearch phrase Arabidopsis AS EST 5 AWB RS Arabidopsis re A j THF Ax 2 ABUYD ISTH F Bx A 47620 ARER ALL Search by PubMed 2 SST Zeie 517 18480 3 C Bi x e DE VOX Wikipedialip wiki Search by NCBI D FOYVLOPF ERE 64 EE lightgreen E 3 Search by Google BADE SYS Ha A 4 7 fJV Arabidopsis _thaliana flower jpg25 Opx i i x spit eat E i i vi rival VOARFAD WR AAR DNAT 3 V2Z INSD B 5 TEM 2 w PlantaePlantae PI tE T 4815P w M s B 755 Eras D 479 4r AB amp xLE 16 Ff w BrassicaceaeBrassicaceae Bi LH Si B servas Fi w ArabidopsisArabidopsis f O4R AF A Ens B EPIG thaliana Arabidopsis thaliana 70 CL 23 At BX SS Th ale Cress Mouse ear Cress 77 D WingPro STO DBift JV 75 AARFAF GAB SS Arabidopsis thaliana d FJ EJ MEDALSIER 21 BL 0 E D AgriTogo 0 http ja wikipedia org wiki E3 82 B7 E3 83 A a C xt 2096 3 D225 45 Search Result astra EES Mids s 979 ASTRA Home Statistics Human Mouse D 3 CERE EG 979 melanogaster C elegans A thaliana O sativa Help a C FASS AFH 947 st asiza 1 of944 b b t ta 1 smmm o I m mli m moo em X 5 AO Yo GgRiTUAL Human 2 Select a database to see more information de etiolated
18. settings are to preserve in figure 4 9 3 2 TT 2 Settings tab Basic Settings In Basic Settings at the center of Settings tab basic settings need to be done Read row IDs Row IDs are to be read Column delimiter Select a delimiter in the input file from the pull down menu Read column headers Column header of the input file is to be read Ignore spaces and tabs Space and tab are to be disregarded The comment on the Java style comments Java style 1s to be read Single line comment A key to the line comment is to be set Advanced In addition to do detailed settings the following screen appears Decimal Separator znore spaces short Lines Define quote characters here Quotes can be multi character patterns for example quote Escape character if checked is always the backslash X Enter a new quote character 1 currently set esc tesc support esc character 3 Add 4 9 3 4 Basic Settings Advanced Press OK after specifying Select Execute 1n the right click menu for execution Node2 Hierarchical Clustering 78 Set a hierarchical clustering parameter between variables in Configure using the right click menu Options Memory Policy Include Columns Columns Select all search hits Select all search hits add all gt gt D alpha 0 D alpha D alpha 14 D alpha 21 D alpha 28 D alpha 35 D alpha 42 D
19. 5 bit amp The may optionally be followed by a name ignored and the sequence and quality codes are allowed to wrap onto more than one line For fastq sanger the quality scores are obtained by subtracting 33 from the ASCII values of the characters below the AT For fastq solexa and fastq illumina they are obtained by subtracting 64 prb format stores four quality scores A C G T per position with one sequence per line like this 40 40 40 40 12 1 12 3 10 10 40 40 Since prb performs not store sequence names lastal uses the line number starting from 1 as the name In fastq sanger and fastq il lumina format the quality scores are related to error probabilities like this qScore 10loglO p In fastq solexa and prb however aScore 10log10 p 1 p In lastal s MAF output the quality scores are written on lines starting with q For fasta they are written with the same encoding as the input For prb they are written in the fastq solexa ASCII 64 encoding Finally PSSM means position specific scoring matrix The format is myLovelyPSSM 100 S TW Y 2 1 2 2 5 0 4 3 3D 1 2 0 7 4 1 1 2 2 4 4 2 4 4 2 1 2 5 4 The sequence appears in the second column and columns 3 onwards V 0 2 4 contain the position specific scores Any letters not specified by any column will get the lowest score in each row This format is a simplified version of PSI BLAST s ASCII format the non
20. CAAGACAAGGAGCAAAAAAGTTICAGATGGTIGTIT GAGGCAGAAGTTGAGTTGGATAATACCGAGAACGGT GAAAGCAATGGAGATGAAGATGAAGTCGGAAGTAATGAGGTTAGTATCTIGIGIGAGGA GGAAGAGAAAGAGGCAGAGCTGGAGAAAAATTTGGGCAAGGTAAGACGCAGAGCCATGGCAGCAGCCAGGGGACGTAGAAAGTCACAGTCTTCAA GAAACACATACAAGGACAAAGGACGCGGATCCCAAAACTCCAAGATCCACAGCAACATGAGCGGCTTTTGATACTCCAACTGTGGGCTTGAATGG GCCGTATAATGATGTAAACGGGCCCATAATCTTTTTCTAGTGACTTTTTCTCGAATCGTTGAGTTTTCCAAATTGAACACCAACCAATGTACTGA GACAGAGAAATTAAATTATAACGGATATGTTTCTTAATAAATCAAATCTCAAATG COO C CC COCO ee 2 UDO 3323 2222 CCCCCCC GC GG COCCGGGGGGGG GGCO 2222222222222 2 2 22222222222 CC CCCCCC CCCCCCCC C CC CCCCCQGG 22 2222222 2 222 02213222322 2222222 CC DDD CC 222222 DD AED CCOGC GGG COC 2 222222 CC CCCCCQQU 222 2 22 C GGG 2222222 2999 g 4 th 0 2 gt gi 21406208 gb AY087471 1 Arabidopsis thaliana clone 35785 mRNA complete sequence AATCGAAAAAAATAAATGCGTTGTTTGGTACAGCTTCACGAACAATCTCTCTCTCGATAGATTCTTCTTACCTCTGAATTTCTCGTTGTTGGAAC AATGGCGICGAATCICCIGAAAGCCCTAATCCGATCICAGATTCTTCCATCTICCAGGAGGAATTTCAGTIGIGGCGACCACACAGCTITGGCATIC CAACAGACGATCTAGICGGCAATCACACCGCCAAATGGATGCAGGATAGAAGCAAGAAATCACCTATGGAACTGATTAGTGAGGTTCCACCTATC AAAGTTGATGGAAGGATTGTITGCITGIGAAGGAGACACCAATCCGGCCCTAGGICATCCAATCGAGTTCATATGCCICGACCTAAATGAGCCTGC GATCIGCAAGTACIGCGGCCTITICGITATGTTCAAGATCATCACCATTGAGGCAAATTCTGAAAGTGAATTGCTGGTCTCTCTCCCCTTTTTATTG CATTITIITAAGITIGIGIATTIGTITTITTITICIGGTIGTGCCTACTACATCTTICAGCTATATTA
21. CCGGGGTATATTTACCGGT GT CGGTCGTCAGATTGGTG TTGGTAAAGAGTCAGACATATTTGAGGTCGCTCAGGAAGA TGGAACTATTCTAGCAAT GA AGTTACATAGACTAGGGAGAACCTCCTTTAGGGCTGTCAAATCTAAGCGTGACTACTTGA GGCATCGCAGTAGTTTCAGCTGGTTGTATCTCTCCCGACTTGCAGCTCTCAAGGAGTTTG CTTTTATGAAGGCTTTGGAAGAACATGACTTTCCGGTTCCAAAAGCTATTGACTGCAATA GACATTGTGTTATCATGGT TCAGGT GAAGCAATTACAGAACCCT GAGACAATTTTCGAGA AGATCATTGGTATTGTTGTTCGTTTGGCTGAGCATGGTCTAATTCATTGTGACTTCAATG AATTCAACATCATGATTGATGATGAAGAGAAAATAACGATGATTGATTTTCCACAAATGG TATCTGTTTCACACCGAAATGCACAAATGTACTTTGACCGTGATATCGAATGCATCTTCA AGTTTTTCAGAAAAAGGTTTAATATGTCTTTCCATGAAGATAAAGGT GAAT CAGAGGAGA CGGAGGT GGATGAGAACAGCAGACCATCTTTTTTCGATATTACTAAAGATGCTAATGCTC TGGATAAAGATCTAGAAGCTAGTGGGT TCACAAGAAAGGAGCA GACT GACCTCGATAAAT TTATTGAAGGT GGGGT GGAGAAGAGTGAAGATTCTGATGAGGATGAGGAATCTGATGATG AAGAGCAGACTTGT GAA TCAAACGAAGAAGGAAACCTAAATGAAATAAAATCATTACAGT TGCAAGACAAGGAGCAAAAAAGT TCAGATGGTGTTGAGGCAGAAGTTGAGTTGGATAATA CCGAGAACGGT GAAAGCAATGGAGATGAAGATGAAGTCGGAAGTAATGAGGTTAGTATCT TGTGTGAGGAGGAAGAGAAAGAGGCAGAGCT GGAGAAAAATTTGGGCAAGGT AAGACGCA GAGCCATGGCAGCAGCCAGGGGACGTAGAAAGTCACAGTCTTCAAGAAACACATACAAGG ACAAAGGACGCGGA TCCCAAAACTCCAAGATCCACAGCAACATGAGCGGCTTTTGATACT CCAACTGTGGGCT TGAATGGGCCGTATAATGATGTAAACGGGCCCATAATCTTTTTCTAG TGACTTTTTCTCGAATCGTTGAGTTT TCCAAATTGAACA 4 2 5 2 MAFFT Result Text View 22 2 Node5 CBRCViewer ClustalW Result The sequence identifier used for the input is displayed on the left The aligned sequence is shown on the right A tex
22. Computational Biology Research Center AIST Active Workflow Component Type User Manual CBRC 2012 11 14 Contents 1 2 3 4 ainne skO e EE 3 About the Active workflow Component Gpe 4 KEE 6 Use of each Active workflow asdisran 11 4 1 Fastapl Active Workflow cc 11 4 1 1 Preparato EEN 12 4 1 2 o M 12 4 1 3 Step T Node SEIN asna a kn 13 4 1 4 SR I EE 13 4 2 Mafft Active Workflow eens eens renna 15 4 2 1 L n Eh O EE 16 2242 DuC T 16 4 2 3 Step Tee TEE e 17 4 2 4 wir eM d ORARE TT TU TT 19 4 2 5 Stepo vestit WIC WILDE assa nn 21 A43 Blast ACTIVE W Ol KEOW ici ss atecissdacianosetdeeieawaceanacsantiermieswndenstonesaiersddacenbees 20 4 3 1 DA D 25 4 3 2 DN GC 26 4 3 3 Step TL Node EE 21 4 3 4 A OD 2 UN e EE 28 4 3 5 Step 3 result viewing cccccccccecceeceeccecessccececeeceeeceeceucesceeceeeeeeseeees 30 4 4 Last Active Workflow seeesseeeeseeeeeeeenneen nennen ene enne enis 39 4 4 1 FRE AT AO saka i EE 33 4 4 2 Wo 34 4 4 3 Step 1 Noda Seb e 35 4 4 4 SEENEN 37 4 4 5 Stem o Result TE e EE 38 4 5 WolfPSORT Active Workflow nenne 40 4 5 1 EE 40 4 5 2 c 40 4 5 3 Step Ee ET Ge 42 4 5 4 Step2 Execution and result viewing 0000 43 4 6 Modelling Active Workflow
23. DBj Mine Node5 Modeller SOAP Select Execute in the right click menu for execution Node6 JmolForModeller Select Execute and Open Views in the right click menu for execution and viewing the results 10 Node7 LSDBCrossSearch Select Execute and Open Views in the right click menu for execution and viewing the results Please refer to 5 1 Appendix A LSDBCrossSearch for the use of the result screen 54 4 6 5 Step 3 Result viewing 1 NodelO PDBjMineWeb PDB Mine The execution result of TemplateSelector SOAP of Node4 can be viewed by PDBjMineWeb node This window shows a list of known 3D structure information PDB code chain identifier for each hit region 3D structure information stored in PDBj Mine of PDBJ 1s shown by selecting from the list Open POB Mine Web 4 6 5 1 Node10 PDBjMineWeb PDB Mine 55 AE FL Statistics Help FAQ Contact Us PDBj Protein Data Bank Japan maintains a centralized PDB archive of macromolecular structures and collaboration with the RCSB the BMRB in USA and the PDBe in EU PDBj is supported by JST NBDC and O Data Deposition ADIT PDB Mi PDB Deposition ine ADIT NMR Search gt gt Mine xPSSS PDB RDF Summary 1twf Search PDB Sumary structural Details Experimental Details Functional Details L Sequence Neighbor Downlo chem comp RDF Latest Release Search Sequence Navigator Structu
24. DLE has 2 types POODLE L which is optimized for longer disorder regions gt 40 a a and POODLE S which is optimized for shorter disorder regions POODLE results can be viewed in line plot format POODLE http mbs cbrc jp poodle File Edit View Node Search Run Help r3 Y T vl Y VK Qa v 4 ZA Workflow Projects A A 0 Poodle SOAP egg ij Blast SOAP li CentroidFold L i Fastapl SOAP lj KNIME_cbrcASIAN i Last i Mafft SOAP L i Modelling SOAP A Poodle SOAP li RNA Structure Prediction lj WoLF PSORT SOAP Favorite Nodes 52 Personal favorite nodes y Most frequently used nodes Last used nodes cA Node Repository k az Outline onsole 5 2 ASIAN a a 28 Console 3 4 Active Flow KNIME Console gis IO A a Databa E x Welcome to KNIME v2 3 4 0028950 the Konstanz Information Mig amp Database DEE Copyright 2003 2011 Uni Konstanz and KNIME GmbH German LASASSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSASSSSSSSSSSSSSSSASSASSASS RI S Data Manipulation Q Data Views 4 4 8 1 POODLE Active Workflow 66 4 8 1 Preparation A file needed for execution is an amino acid sequence file in FASTA format x Multi FASTA format file cannot be used File type FASTA Format File 4 8 2 Node There are 4 nodes 4 8 2 1 Poodle Active Workfl
25. Enter a value of Significance level for deviancel The default value 1s 0 5 Significance level for deviance2 Enter a value of Significance level for deviance2 The default value 1s 0 01 Press OK after entering values Select Execute in the right click menu for execution 5 Node5 t Test Set options for t Test in Configure using the right click menu Dialog 0 11 t Tes File Options Memory Policy Exclude Select Include Column s Column s Select all search hits Select all search hits add all gt gt D cluster 0 D cluster 1 D cluster 3 D cluster 4 lt lt remove all D cluster 5 D cluster 6 D cluster 7 Number of samples 79 Correlation type Correlation Partial correlation Threshold 0 05 4 9 3 8 t Test Configure 82 6 Options tab Select columns for t Test by adding to Include section In default all columns will be processed Set parameters for t Test Number of samples Enter a value The default value is 79 Correlation type Select either correlation coefficient Correlation or partial correlation coefficient Partial correlation Threshold Enter a value for significant level The default value 1s 0 05 Press OK after completing Select Execute in the right click menu for execution Node6 Column Filter Set column filter in Configure using the
26. FSNRRAKWRREEKLRNQRRQSGP High scoring segment pair HSP group Score 235 E 1 16277e 20 Identities 44 60 73 3 Positives 52 60 86 7 Length 60 KQRRYRTTFTSEQLEELEKAFSRTHYPDVETREELAMKIGL TEARIQVWFQNRRAKWRKQ KQRR RTTF QL ELE AF RT YPD TREELA LTEARIQVWFQNRRA RKQ KQRRSRTTFSASQLDELERAFERTQYPDIYTREELAQRTNLTEARIQVWFQNRRARLRKQ High scoring segment pair HSP group Score 221 E 5 13603e 19 Identities 39 61 63 9 Positives 49 61 80 3 Length 61 QRR RTTFTSEQLEELEKAFSRTHYPOVETREELAMKIGL TEARIQVWF QNRRAKWRKQEK QRR RT FTS QL LE F R YPD TREE A LTEAR VWF NRRAKWRK E 4 6 4 2 BlastForModeller SOAP Result view HtmlView HitRegionSelector_SOAP Select Execute in the right click menu for execution Node9 Select Execute and Open Views in the right click menu for execution and HtmlView viewing the results 09 6 7 8 9 Query Hit Length aa Query Coverage 3 Query Hit Range aa PDB Hit Length a PDB Coverage PDB Hit Range aa Identity t E value 246 354 29 497 744 246 62 12 135 380 26 77 7 10235e 19 4 6 4 3 HitRegionSelector_SOAP Result View HtmlView Node4 TemplateSelector SOAP Select Execute in the right click menu for execution Node10 PDBjMineWeb Select Execute and Open Views in the right click menu for execution and viewing the results Please refer to following description 4 6 5 Step 3 ResultStep 3 for the use of P
27. GCCTTCGTTATGTTCAAGATCATCACCATTGAGGCAAATTCTGAAAGT GAATTGCTGGTCTCTCTC COCTTTTTATTGC ATTTTTAAGTT TGTGTATTGTTTTTTTCTGGTGT GCCTACTACATCTTCAGCTATATTATC TAATAAAGGATTCGATCAAAGTCGGGTAAGTTTGATTTTTGTTTGATCTC ACTTCAGCACTTGTCATGTTGTAACATTCAATCTCTGATATCACTGTYTT 4 2 5 4 ClustalW Result TextView 24 4 3 Blast Active Workflow Blast Active Workflow performs homologue search via REST The result of BlastNCBI REST can be viewed using CBRCViewerNode This workflow can retrieve a variety of related information by using node LSDBCrossSearch that executes Life Science DataBase cross search http lifesciencedb jp dbsearch with regard to the input sequence n E inn Ele pdt Wew Node Segch Bun Help ra y Ka s lom om ZS cke e En Se s Workfiens Properts C 0 Ra REST i geg SE Ki Made Description iy EN E Md AutoDeck SOAP A Binet REST FastaFileteacder RlactM CHI REST CRRCVIewer il Maff SOAP E G ES X N E Node 1 d Hade 3 A Mode 3 NN d T a Sa A Fovorite Nodes amp x q SDRCrosSearch xo imer W Personal favorite nodes NW b r Most frequently used nodes kal Sr Ca Server Workflow Projects n Lag peed rela fer fees f Ch Node Repository wl Node 5 Noda 4 Of Active Flow Workflow Sanaa publicesrverk Es AlanmentFileReader ma uteDock Soap Statue mot connected Connect EL BastrorModeller SOAP LL BlastNCBI REST f Cancviewer Ee Bl onn ST NT CA iin D l t CentroidFoid SOAP slo he
28. HE w2 3 4 0 3 he Konsra Infomation Hine P 1 apyrig 011 Dr ASS amd KNIME GmbH German Ka PPP PPPS PRPS TTP PRP eee 4 2 1 Mafft Active Workflow 15 4 2 1 Preparation A file needed for execution is a Multi FASTA format file containing base sequences or amino acid sequences in FASTA format File type Multi FASTA format 4 2 2 Node There are 6 nodes 4 2 2 1 Mafft Active Workflow Node list Node 1 FastaFileReader FastaFileReader The FASTA format file is read Mafft SOAP Execute Maftt Node 3 CBRCViewer The multiple alignment result 1s displayed Node4 ClustalW SOAP Execute EM 8 16 Node5 CBRCViewer The multiple alignment result 1s displayed LSDBCrossSearch Execute LSDB cross search 4 2 3Step 1 Node setting 1 Nodel FastaFileReader Select a Multi FASTA file as an input using right click menu 2 Node Mafft SOAP Select an output directory using right click menu and set options 1f necessary select Output Directory Selected Directory Advanced Options retree 2 maxiterate bl B op 1 53 ep 0 0 clustalaut 4 2 3 1 Mafft SOAP Configure Options tab Advanced Options 17 Options are explained below op Gap opening penalty default 1 53 ep Offset works like gap extension penalty default 0 0 maxiterate Maximum number of iterative refinement default 0 clustalout Output cl
29. LE Result TextView 71 4 9 ASIAN Active Workflow ASIAN Automatic System for Inferring A Network developed at CBRC is a network inferring tool that combines a hierarchical clustering with graphical Gaussian modeling GGM Please refer to the ASIAN web site for the details http eureka cbrc p as1an Bie Edit View Mode Search Ron Help ir 7 rir Ea Workflow Projects D DCH KM E Blast Soar Ki CentroidFold E Fastapi SOAF AL KNIME cbreASTAN B Last lij Mafft SOAP E Modeling SOAP lj Foodie SOAP Ku RIA Structure Prediction WoLF PSORT SOAP A Favorite Nodes 11 n Suk Personal favorite nodes Yr Most frequently used nodes D Last used nodes ch Mode Repository EJ ASIAN udi 10 EI Database ER Data Manipulation Ci Data views Ger Us Met CU Welcome ro KHIME v21 3 4 0 028850 Che Fonstanz Information Miner Copyright 2003 2011 Uni Kensrtanr and ENIME GmbH Germany EELER r oe oa af e em 4 9 1 ASIAN Active Workflow T2 YBR166C YORS57C YLR292C YGL112C YILI16W DU 20W YHLO25W YGL248W YIL146C Y JR1 OGW YNL272C YBR1 230 YCRO40W YHRO47C YMROSSC YDR45 FW YKL201 C YDR31 1 W YGR274C YHR1 78W YKROS3W YPR113W YPR1 49 YJL21 OW YDLOOGW YLR 88W YDL226C YHRO26W YJL121C YMLOS1 W 42 YDR405W ld 4 gt M testData 4 9 1 Preparation Therefore the vector of one variable is described in the line lal
30. Run Node Help D KAMERA 2QOC 0008 E 9 ic ZA Workflow Projects 2 WoLF PSORT SOAP i ES Blast SOAP li CentroidFold li Fastapl SOAP lj KNIME cbrcASIAN Last L i Mafft SOAP Modelling SOAP li Poodle SOAP L i RNA Structure Prediction A WoLF PSORT_SOAP Server W 33 Favorite Nodes 45 ts Personal favorite nodes Most frequently used nodes Workflow Server publ Last used nodes cA Node Repository Ir E ASIAN b Outline 3 D Console 53 Ex BB r E roy B 4 Active Flow KNIME Console e a IO ub RRR RRR RRR RRR RRR RRR RR RRR RRR RRR RRR RRR RRR RMR RRR RRR RRR RRR RRR RHR MRR HHH E Welcome to KNIME v2 3 4 0028950 the Konstanz Information Mine www Copyright 2003 2011 Uni Konstanz and KNIME GmbH Germany a c cA Data Manipulation EORR OR ROR RR ROR ROR ROR ROR RRR RRR RRR ROR ROR RRR RR RRR RR RRR RRR RR RRR Q Data Views D E Database 4 5 1 WolfPSORT Active Workflow 4 5 1 Preparation A file needed for execution is an amino acid sequence file in FASTA format Multi FASTA format can be used Multi FASTA Format file 4 5 2 Node 40 There are 4 nodes 4 5 2 1 WolfPsort Active Workflow Node list FastaFileReader FastaFileReader The FASTA format file is read WolfPsort SOAP Exe
31. SOAP Select an output directory and format in the right click menu Parameters r Input type Format amp Fasta O CGlustabid selected Directory Weight of base pairs r Advanced Other options 4 7 3 1 CentroidFold SOAP Configure Options tab Input type Format Select FASTA or ClustalW as a format Options tab Weight of base pairs Gamma Select a value from the pull down menu Options tab Advanced Other options Enter other options if necessary Please refer to the following sites for details of CentroidFold CentroidFold http www ncrna org centroidfold software centroidfold 60 4 7 4Step2 Execution 0 CentroidFold F FastaFileReader CentroidFold_SOAP CBRCViewer E e KS E e a Node 1 Node 2 Mode 3 FRNAdbSearch HtmlView uoc S Node 4 Node 6 KSDBCrossSearch Node 5 4 7 4 1 CentroidFold Node It executes it from left FastaFileReaderNode 1 Nodel FastaFileReader Select Execute in the right click menu for execution 2 Node2 CentroidFold SOAP Select Execute in the right click menu for execution 3 Node3 CBRCViewer Select Execute and Open Views in the right click menu for execution and viewing the results 4 Node6 HtmlView Select Execute and Open Views in the right click menu for execution and viewing the results Please refer to
32. Science DataBase cross search can be executed in green node status after executing LSDBCrossSearch node Life Science DataBase cross search site was developed in the Database Integration project promoted by Ministry of Education Culture Sports Science and Technology If View is selected in right click menu on LSDBCrossSearch node View window of LSDBCrossSearch node will appear FASTA Header Lists 334185880 ref NM 001203122 1 Arabidopsis thaliana RIO kinase i 21406208 gb AY087471 1 Arabidopsis thaliana clone 35785 mRNA i 28416578 gb BT004574 1 Arabidopsis thaliana At3g03070 gene i 13358228 gb AF325039 2 Arabidopsis thaliana AT3g03070 AT3g0 i 110735932 dbj AK227975 1 Arabidopsis thaliana mRNA for hypot LSDB Cross Search Search Words How to input search keywords AND gt space eg network socket OR eg network socket XOR eg network socket Wild card gt eg inter sphere Priority order gt space 5 1 1 LSDBCrossSearch View window Headers of the FASTA file used for LSDBCrossSearch node are shown in FASTA Header Lists A keyword s for cross search should be entered in the text box 93 For a combined search the following symbols should be used AND retrieval Space OR retrieval Pipe Exclusive OR retrieval Exclamation mark Wildcard search Asterisk OR has the highest priority Cross search will be carried
33. TCTAATAAAGGATTCGATCAAAGTCGGGTAAGTTT GATTTTTGTTTGATCTCACTTCAGCACTTGTCATGTTGTAACATTCAATCTCTGATATCACTGTYTTTT COCO COC URL EDD COCO UDO DDD It CCC UD CC 4 7 4 2 Node6 HtmlView CentroidFold Results 5 Node4 FRNAdbSearch Select Execute and Open Views in the right click menu for execution and viewing the results Please refer to 4 7 5 Step 3 Result for the details 6 Node5 LSDBCrossSearch Select Execute and Open Views in the right click menu for execution and viewing the results Please refer to the following 5 1 Appendix A LSDBCrossSearch for the use of the result screen 62 4 7 5Step 3 Result viewing 1 Node3 CBRCViewer CentroidFold Resuls The execution result of CentroidFold_SOAP of Node2 can be viewed as CentroidFold Results by CBRCViewer Please refer to the following site for details of CentroidFold CentroidFold http www ncrna org centroidfold software centroidfold g1334185880 g121406208g re 001203 b AY0S7471 1 122 1 Arabid Arabidopsis External Browser O OOOO U O O 4 7 5 1 Node3 CBRCViewer CentroidFold Results 63 2 Node4 FRNAdbSearch FRNAdbSearch displays a retrieval screen to fRNAdb If the input RNA sequence file is in FASTA format the header line of the FASTA format is displayed in the FASTA Header Lists column If the input RNA sequence file is in ALN format this column is blank A search keyword s to fRNAdb should
34. ariety of related information by using node LSDBCrossSearch that executes Life Science DataBase cross search http lifesciencedb jp dbsearch with regard to the input sequence Zo KNEME Ere pdt ww Node sir Bun elm r4 e ix Ka ao e m Workflow Projecti AutoDeck SOAP A8 Bias REST Malft_Soar d reete Nodes 7 SE ce W Personal favente nodes Most frequanthy weed nodes CH p rr rapi rk p Node Repesitary IM Active Flow Ki AlignmaentFileRender Sa Auk Soap LI ara CAR TI Bla REST 5 CBAC viewer a CentroidPold_ SOAP Sj ClustahWw SOAp Campus SOAP mm Compound selector ES Docking iteSelector i Docking Templateselector FRMAdESeerch oy F staFilakandar San De tue DO t v D A talat REST Leite E d Node Description ap 23 7 a NR Mafft SOAP i FastaridleBeader Mafft SAP CARY imer e i kin A Malh SOAP executes MAFFT which bs Vs j multiple alignment program flor amino acid or XX SS rudechde via SOAP Please vist a MAFFT Mock Y Hoda 3 Moie 3 weh see http smart for further information x X ON 1 ClustalW SOAP CBRCViewer Dialog Options d poss gt FP i Select utpat Directory a ig Ed Server Workflow Projects 7 um V Node 4 Node 5 i ri i 4 l Lpbecrosssearch LE Meridian server pubisean krim sen Status mot connected Connect Node G Ee Cutline 17 B B console e E Pe EE SSES KNIME Carsale ZK ee ee Ve Welecme po KHI
35. astNCBI REST Node 2 File Options Memory Policy Select Output Directory Selected Directory BLAST Programs 9 BLASTP BLASTN BLASTX TBLASTN TBLASTX PSEBLAST RPS BLAST MEGABLAST Databases database name nr E value threshold E value lt 1 0E 4 Advanced Other options Apply 4 3 3 1 BlastNCBI REST Configure Options tab BLAST Programs Specify Programs default BLASTP Databases default nr E value Threshold default 1 0e 4 and Advanced default empty Please check a BlastNCBI_REST node description for further information 27 4 3 4Step2 Execution 2 KNIME 1 2 3 4 File Edit View Mode Search Run Help i Li 4 3 4 1 Blast REST Node Nodel FastaFileReader Select Execute in the right click menu for execution Node2 BlastNCBI REST Select Execute in the right click menu for execution Node3 CBRCViewer Select Execute and Open Views in the right click menu for execution and viewing the results Node4 LSDBCrossSearch 28 5 Select Execute and Open Views in the right click menu for execution and viewing the results Please refer to the following 5 1 Appendix A LSDBCrossSearch for the use of the result screen Node5 HtmlView Select Execute and Open Views in the right click menu for executio
36. at E 5 Guetan_soar mes MEAE Uno A CompoundQuery SOAP ae 2o ca amar xoi en e Edidi am N Welcome ro RIVE F2 3 4 0028950 the Ennsrtanz Information Miner mm CompoundSelector een M umm pyrighz 2003 Uni Eennranr and WHINE Get Germany E Dockngsibeselector VW ig PEQEEBEBEPERSSEEENEBEBESPGERENSENEPEPRRRENESREBERRERSENRERESRRGSENSENEERRBRSSENNENBEEAS La La Leg file is located ar CiXEGIMEM E Rime 2 5 4 verkspece m eradaraMlknime kmimme le4 in Decking Template Selector E FRNAgbSaarch ra Fastariebeager 4 3 1 Blast Active Workflow 4 3 1 Preparation A file needed for execution 1s a file containing a nucleic acid sequence amino acid sequence in FASTA format x Multi FASTA format cannot be used File type FASTA Format file 20 4 3 2 Node There are 5 nodes 4 3 2 1 Blast Active Workflow Node list Node 1 FastaFileReader FastaFileReader The FASTA format file is read BlastNCBI REST Execute Blast BlastNCBI REST Node 3 CBRCViewer The Blast execution result is graphically displayed Node4 LSDBCrossSearch Execute LSDB 8 cross search Node5 HtmlView The Blast execution result is displayed in text 26 4 3 3 Step 1 Node setting 1 Nodel FastaFileReader Select a FASTA file as an input using right click menu 2 Node2 BlastNCBI REST Specify an absolute path of a directory to store Blast Results or select the directory using Browse button Dialog 0 6 Bl
37. at cannot be used File type FASTA format amino acid sequence file 4 6 2 Node There are 10 nodes 4 6 2 1 Modelling Active Workflow Node list Node 1 FastaFileReader FastaFileReader The FASTA format file is read Node 2 BlastForModeller SOAP Execute BLAST or PSI BLAST Node 3 HitRegionSelector SOAP 3D structural hit area 1s extracted from the execution result of BLAST or DG BLAST Node4 TemplateSelector SOAP TemplateSelector_SOAP A template of 3D structure modeling 92 is selected 46 Modeller SOAP Modeller_SOAP Execute dE MODELLER JmolForModeller JmolForModeller Protein 3D structures are displayed using Jmol LSDBCrossSearch LSDBCrossSearch Execute LSDB cross search HtmlView The execution result of BlastForModeller SOAP 1s displayed Node9 HtmlView The execution result of HitRegionSelector_ SOAP is displayed Node10 PDBjMineWeb Known 3D structure information is displayed by PDBj Mine 4T 4 6 3 Step 1 Node setting 1 Nodel FastaFileReader Select a FASTA file as an input in Configure using the right click menu 2 Node2 BlastForModeller SOAP Select an output directory and set options in Configure using the right click menu BLAST version 2 2 18 Execution Type BLAST PSE BLAST E Walue 1 0E 5 Interation 3 amp elect Output Directory selected Directory Browse 4 6 3 1 BlastForModeller SOAP Configure
38. be entered in the text box at the center of the window A search can be carried out by pressing fRNAdb Keyword Search button The result of the retrieval is displayed in another window as shown in figures 16 1 2 Please refer to the following site for details of fRNAdb fRNAdb http www ncrna org frnadb index html FASTA Header Lists 334185880 ref NM 001203122 1 Arabidopsis thaliana RIO kinase 21406208 gb AY087471 1 Arabidopsis thaliana clone 35785 mRNA i 28416578 gb BI004574 1 Arabidopsis thaliana At3g03070 gene i 13358228 gb AF325039 2 Arabidopsis thaliana AT3g03070 AT3q0 110735932 dbj AK227975 1 Arabidopsis thaliana mRNA for hypotl fRNAdb Keyword Search Search Word AND OOR How to input fRNAdb search keywords Please separate each word with a space essential Valid characters alphanumeric space hyphen period colon double quotation diagonal To search for an exact phrase enclose the phrase in double quotation marks examples mature micro RNA 4 7 5 2 Node4 fRNAdbSearch fRNAdb Keyword Search 64 b Keyword Search Results miRNA 5856 hit entries 1 to 100 you see entry IDs over 100 hits please use a fRNAdb website fRNAdb web site 4 7 5 3 Node4 fRNAdbSearch Search results 65 4 8 POODLE Active Workflow POODLE Prediction Of Order and Disorder by machine LEarning developed at CBRC predicts disorder regions from an amino acid sequence POO
39. cute WoLF PSORT HtmlView The WoLF PSORT execution result 1s displayed LSDBCrossSearch Execute LSDB cross search 41 4 5 3 Step 1 Node setting 1 Nodel FastaFileReader Select a FASTA file as an input using right click menu 2 Node2 WolfPsort SOAP Select an output directory and kingdom using right click menu Type amp animal plant fungi Browse 4 5 3 1 WolfPsort SOAP Configure Options tab Kingdom Type Select animal plant or fungi 42 4 5 4 Step2 Execution and result viewing 2 2 WoLF PSORT SOAP cs ER 4 5 4 1 WoLF PSORT_SOAP Node 1 Nodel FastaFileReader Select Execute 1n the right click menu for execution 2 Node2 WolfPsort SOAP Select Execute in the right click menu for execution 3 Node3 HtmlView Select Execute and Open Views in the right click menu for execution and viewing the results 45 i amp gt URL file work K NIME pre testData outdirZ2011 10 27 12 11 35 000597024934 753224 wolfpsort out k used for ENN is 3 g1 7296213 qgb AAF51505 1 nucl 31 5 cyto nucl 16 5 4 5 4 2 Node3 HtmlView WoLF PSORT Result 4 Node4 LSDBCrossSearch Select Execute and Open Views in the right click menu for execution and viewing the results Please refer to the following 5 1 Appendix A LSDBCrossSearch for the use of the result screen 44 4 6 Modelling Active Wor
40. d information by using node LSDBCrossSearch that executes Life Science DataBase cross search http lifesciencedb jp dbsearch with regard to the input sequence File Edit View Node Search Run Help ri REGER E a406G QU O00 m 18E9 i A Workflow Projects A 0 Last x Node Des 52 Hl Bt i Blast SOAP lij CentroidFold i Fastapl SOAP lij KNIME cbrcASIAN A Last iy Mafft SOAP L i Modelling SOAP li Poodle SOAP L i RNA Structure Prediction lj WoLF PSORT SOAP Favorite Nodes Personal favorite nodes y Most frequently used nodes Last used nodes cA Node Repository k WBO Server W 53 Beal Workflow Server publ gJ ASIAN Active Flow gs IO E Database S Data Manipulation Q Data Views Io perd E amp Bil rf El r3 oo KNIME Console AE a eg A Welcome to KNIME v2 3 4 0028950 the Konstanz Information Mine www Copyright 2003 2011 Uni Konstanz and KNIME GmbH Germany LASSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSASSASSSSSSSSSSSSSSSSSRSRSRAI 2 z e Ei m r 4 4 1 Last Active Workflow 4 4 1 Preparation A file needed for execution is a sequence file of nuclear acid amino acid in FASTA format Multi FASTA format can also be used Multi FASTA Format File 33 4
41. e Help ry v HJ B v V v A v k Qa v A Workflow Projects m 0 Fasta a ag L i Blast SOAP li CentroidFold A Fastapl SOAP lij KNIME cbrcASIAN i Last i Mafft SOAP L i Modelling SOAP li Poodle SOAP li RNA Structure Prediction li WoLF PSORT SOAP Favorite Nodes 7 Eu E Ze R Personal favorite nodes Most frequently used nodes Last used nodes Node Reposit qM pository DE Outline El Console 23 ER ERl r El ner How KNIME Console g ASIAN kk A Active Flow Te a i Welcome to KNIME v2 3 4 0028950 the Konstanz Informat e a IO www Copyright 2003 2011 Uni Konstanz and KNIME GmbH RRKKKKKKKKKKKKKKKKKKKKKKKKKKRRRRRRRRRRRRRRRRRRRRKRRKRKKKRREE ES k gear Log file is located at C CBRC knime knime workspace metad cA Data Manipulation WARN Fastapl SOAP Execution canceled Q Data Views Y Statistics d 4 1 1 Fastapl Active Workflow 11 4 1 1 Preparation A file needed for execution is a sequence file in FASTA format Multi FASTA format can also be used File type Multi FASTA format 4 1 2Node There are 4 nodes 4 1 2 1 Fastapl Active Workflow Node list Node 1 FastaFileReader FastaFileReader The FASTA format file is read Node 2 Fastapl SOAP fastapl fastqpl Fastapl_SOAP executes Node 3 HtmlView The predic
42. ecute Execute 1t again later failed Error occurred 7 Operation for specifying a file or a directory in node configuration In many nodes a file or a directory needs to be specified as an input or an output directory Please specify as follows 1 Select the icon of a node followed by right clicking A menu appears FastaFileReader Node 3 7 FastaFileReader Icon example 2 Select Configure from the menu Configure da Execute m Execute and Open Wiews gi Cancel faa Reset m Edit Node Mame and Description il Mew Workflow Annotation Show Flow Variable Ports of Cut Copy Paste 43 Undo d Beda 3 Delete E D Qut Part name 3 8 right click menu 3 Select a file or a directory using Brows in the pop up dialog Selected File 3 9 FastaFileReader Configure Press OK after selecting 10 4 Use of each Active workflow Usage of each Active workflow is explained below 4 1 Fastapl Active Workflow Fastapl Active Workflow performs sequence processing Please refer to the following sites for the explanation and the usage example of fastapl fastqpl fastapl fasqpl http seq cbre jp fastapl Furthermore this workflow can retrieve variety of related information by using node LSDBCrossSearch that performs Life Science DataBase cross search http lifesciencedb jp dbsearch with regard to the input sequence File Edit View Search Run Nod
43. elect all search hits add all gt gt lt lt remove all Gluster column S Cluster e Type Mean Median 4 9 3 6 Representative Profile Configure 80 Options tab Select columns for representative by adding to Include section In default all columns will be processed Set an option for representative Type Select mean or median Press OK after selecting Select Execute in the right click menu for execution 4 Node4 Graphical Gaussian Modeling Set options for Graphical Gaussian Modeling GGM in Configure using the right click menu 4 Dialog 0 10 Gr File Options Memory Policy Exclude Include Column s Column s Select all search hits Select all search hits add all gt gt D alpha 0 D alpha D alpha 14 D alpha 21 D alpha 28 lt lt remove all D alpha 35 D alpha 42 D alpha 49 W amp S iteration Epsilon 1 0E 4 Significance level for deviancel 0 5 Significance level for deviance2 0 05 4 9 3 7 Graphical Gaussian Modeling Configure Options tab Select columns for GGM by adding to Include section In default all columns will be processed 81 Set options W amp S iteration Enter a value of iteration for Wermuth Scheidt algorithm The default value is 1000 Epsilon Enter a value of Epsilon The default value is 1e 4 Significance level for deviancel
44. elected Directory Browse 4 10 3 1 AutoDock SOAP Configure If you specify binding site coordinate check a use and input coordinates in XYZ coordinates text boxes 3 Node5 Mol2FileReader Select a MOL2 file as an input using right click menu 88 4 10 4 Step2 Execution KNIME File Edit View Node Search Run Help P Q 10 5O D O 6 amp 7 En Sc se A 0 Blast REST aa 2 AutoDock SOAP Node 1 Mol2FileReader 4 10 4 1 AutoDock SOAP workflow AutoDock_SOAP workflow is executed according to the following steps 1 Nodel PdbFileReader If the node is yellow the node is ready to be executed Right click on the node and select Execute from the menu 2 Node2 AutoDock SOAP If the node is yellow the node is ready to be executed Right click on the node and select Execute from the menu 3 Node3 MergeTargetAndLigand If the node is yellow the node is ready to be executed Right click on the node and select Execute from the menu 89 4 Node4 JmolForModeller 5 If the node is yellow the node is ready to be executed Right click on the node and select Execute from the menu If the status light changes to green the node is successfully finished Right click on the node and select View name of first view from the menu Node5 Mol2FileReader If the node is yellow the node is ready to be executed Right click on the node
45. epth of buckets used to accelerate initial match finding Larger values increase the memory usage of lastdb and lastal make lastal faster and have no effect on lastal s results The default is to use the maximum depth that consumes at most one byte per possible match start position Just count sequences and letters This is much faster and the results are useful with lastex Letter counting is never case sensitive v Be verbose write messages about what lastdb is doing 103 6 Contact Please send your queries or comments if you have to the address below workflow cbrc jp Computational Biology Research Center of AIST plans to listen to user s requests positively and to make the system better Computational Biology Research Center CBRC Advanced Industrial Science and Technology AIST http togo cbrc p AIST Tokyo Waterfront Bio IT Research Building 2 4 7 Aomi Koto ku Tokyo 135 0064 Japan 104
46. es cA Node Repository mes 7 L PR ASIAN B Outline 22 El Console 23 Ex BH rg EB rir GB Active Flow KNIME Console e a 10 a SES a Welcome to KNIME v2 3 4 0028950 the Konstanz Information Mig i 1 www Copyright 2003 2011 Uni Konstanz and KNIME GmbH German CA Data Manipulation RRR RRR RRR RR RRR RRR RRR RRR ER RRR RRR RRR RRR RR RRR RRR RRR RRR RRR RRR Q Data Views 4 E Database 4 7 1 CentroidFold Active Workflow 4 7 1 Preparation A file needed for execution is an RNA sequence file in FASTA format or an RNA sequence of alignment result file aln of ClustalW Multi FASTA can also be used File type Multi FASTA Format File ClustalW ALN File 58 4 7 2Node There are 6 nodes 4 7 2 1 CentroidFold Active Workflow Node list FastaFileReader CentroidFold SOAP CBRCViewer FRNAdbSearch LSDBCrossSearch HtmlView FastaFileReader CentroidFold SOAP The FASTA format file 1s read Execute Centroid Fold The CentroidFold execution result 1s displayed Execute fRNAdb search Execute LSDB cross search The CentroidFold execution result is displayed 59 4 7 3Step 1 Node setting 1 Nodel FastaFileReader Select a FASTA file as an input in Configure using the right click menu 2 Node2 CentroidFold
47. etters more liberal ly LAMA Local Alignment Metric Accuracy alignments minimize the ambiguity of columns both paired letters and gap columns When GAMMA is low this method produces shorter alignments with more confident columns and when GAMMA is high it produces longer alignments including less confident columns In summary to get the most accurately paired letters use gamma centroid To get accurately placed gaps use LAMA Note that the reported alignment score is that of the ordinary gapped alignment before realigning with gamma centroid or LAMA j NUMBER Output type 0 means counts of initial matches of all lengths 1 means gapless alignments 2 means gapped alignments before non redundantization 3 means gapped alignments after non redundantization 4 means alignments with ambiguity 99 estimates 5 means gamma centroid alignments 6 means LAMA alignments Match counts J0 respect the minimum length option but not the maximum multiplicity option It s a bad idea to try jO when comparing a large sequence to itself Q NUMBER his option allows lastal to use sequence quality scores or PSSMs for the queries 0 means read queries in fasta format without quality scores 1 means fastq sanger format 2 means fastq solexa format 3 means fastq illumina format 4 means prb format 5 means read PSSMs he fastq formats look like this mySequenceName TITTTTTTGCCTCGGGCCTGAGT TCT TAGCCGCG 55555555x85 55 5 5 5
48. everse shift by one nucleotide The indicates a stop codon The same alignment in tabular format looks like this 108 myprot 422 40 649 mydna 878 117 1000 4 1 0 6 0 1 10 0 1 19 The 1 in the final column indicates the reverse frameshift 96 x DROP Maximum score drop for gapped alignments Gapped alignments are forbidden from having any internal region with score lt DROP his serves two purposes accuracy avoid spurious internal regions in alignments and speed the smaller the faster y DROP Maximum score drop for gapless alignments z DROP Maximum score drop for final gapped alignments d SCORE Minimum score for gapless alignments e SCORE Minimum score for gapped alignments Miscellaneous Options S AD AD AD AD AD Cd AD PSI AD Cl AD AD AD AD AD OD AD PSI Cd PD s STRAND Specify which query strand should be used O means reverse only 1 means forward only and 2 means both m MULTIPLICITY Maximum multiplicity for initial matches Each initial match is lengthened until it occurs at most this many times in the reference If the reference was split into volumes by lastdb then lastal uses one volume at a time The maximum multiplicity then applies to each volume not the whole reference This is why voluming changes the results LENGTH Minimum length for initial matches Length means the number of letters spanned by the match n COUNT Maximum number of gapless alignments per query
49. flow Annotation Use to insert some comment The comment column 1s displayed Use to display results Another window is View viewer name started The node and the comment etc are cut Copy The node and the comment etc are copied Paste The node and the comment etc which are copied are pasted p Undo Use to undo cut copy or paste Redo Use to cancel the action undone The node and the comment 8 5 Execute all executable nodes When all the configurations of nodes complete all the nodes can be executed at a time In that case click on the icon in the top of the KNIME screen shown below after selecting the node which is a starting point Execute all executable nodes Shift F7 E 3 5 Execute all executable nodes 6 Alert messages and Error messages If an alert or an error occurred after a node is executed a pop up screen will appear along with messages in Console of KNIME screen Those should be checked to resolve problems Examples of the messages and measures are shown as follows 3 6 Alert messages sample Cause and method of settlement Console Cause WARN FastaFileReader 0 2 1 The file 1s not specified failed to apply settings Please specify Method of settlement a filename Specify the file Pop up Cause SOAP execution error An error occurred when SOAP is Please resubmit again later executed Console Measures ERROR CentroidFold SOAP Ex
50. kflow Modelling SOAP performs 3D structure modeling of a protein via SOAP First BLAST PSI BLAST is carried out to search similar regions against PDB database http www rcsb org If similar regions are found a program called MODELLER http salilab org modeller models the query protein based on the similar regions as a template A key license 1s required to run MODELLER Furthermore this workflow can retrieve a variety of related 1nformation by using node LSDBCrossSearch that executes Life Science DataBase cross search http lifesciencedb jp dbsearch with regard to the input sequence 8060opDOO0 g 5E 9 i Modeller SOAP JmolForModeller FastaFileReader BlastforModeller SOAP HitRegionSelector SOAP TemplateSelector SOAP ler b i KNIME cbrcASIAN SCH Wi Last EM TI d Mafft SOAP LS 7 z i I A Modelling SOAP li Poodle SOAP i RNA Structure Prediction i WoLF PSORT SOAP Fovorte Nodes CD MA Personal favorite nodes Most frequently used nodes Last used nodes O Node Repository CZ ASIAN 94 Active Flow tis 10 y base eee Welcome to KNIME v2 3 4 0028950 the Konstanz Information Miner ess Copyright 2003 2011 Uni Konstanz and ANIME GmbH Germany amp Data Manipulation Data Views 4 6 1 Modelling Active Workflow 45 4 6 1 Preparation A file needed for execution is an amino acid sequence file in FASTA format x Multi Fasta form
51. lForModellerNode Modelling Results Sequence Region aal 497 744 Objective Function 1338 8324 Model 3 Objective Function 1429 0203 Model 4 Objective Function 1311 5658 Model 5 Objective Function 1302 1974 Execute Jmol 4 6 5 3 Node6 JmolForModeller Modeller Results The Modeller Results displays the resulting protein structures by Jmol Once a model in the list is selected Jmol screen with a structure appears by pressing Execute Jmol button Please refer to the following for the details of Jmol Jmol http jmol sourceforge net o1 4 7 CentroidFold Active Workflow CentroidFold Active Workflow performs prediction of RNA secondary structure from a RNA sequence via SOAP Furthermore this workflow can retrieve a variety of related information by using node LSDBCrossSearch that executes Life Science DataBase cross search http lifesciencedb jp dbsearch with regard to the input sequence File Edit View Search Run Node Help rt Ar AR 10 amp O G D O 6 m NA tc se A Workflow Projects ELI Le 0 CentroidFold 3 egg i Blast SOAP A CentroidFold ij Fastapl SOAP lj KNIME_cbrcASIAN Li Last ij Mafft SOAP L i Modelling SOAP li Poodle SOAP L i RNA Structure Prediction lj WoLF PSORT SOAP Favorite Nodes zn OAAR Personal favorite nodes Most frequently used nodes Last used nod
52. lector SOAP zi M i Workflow Server gm Htmiview fH tPknot_soap Connect Status not connected Con u InitMinMM_SOAP bei IMOIForModeller LSDGCrossSearch Ni Last SOAP Y MMPrep SOAP pe Outline 7 Q console ni AC fi Mafft SOAP KNIME Console Be MergeTargetandLigand ya EE CAA ARARA Err ER Modeller_SOAP ECH Ft Welccme ME 3 4 0028950 the Konstans Informacion Miner ee t 5 epyri 20211 Uni Konstanz and KNIHE OmbH Germany SR Mol2FileReader SC Moltrec D SOAP 4 10 1 AutoDock Active Workflow 86 4 10 1 Preparation This node requires two files PDB format file and MOL2 format file PDB format file MOL2 format file 4 10 2 Node There are 5 nodes 4 10 2 1 AutoDock Active Workflow PdbFileReader AutoDock SOAP MergeTargetAndLigand MergeTargetAndLigand JmolForModeller JmolForModeller Mol2FileReader Read PDB format file Execute AutoDock via SOAP Merge PDB format file and AutoDock results file Launch Jmol Read MOL2 format file 87 4 10 3 Step1 Node setting 1 Node PdbFileReader Select a PDB file as an input using right click menu 2 Node2 AutoDock SOAP Specify an absolute path of a directory to store AutoDock results or select the directory using Browse button Dialog 2 2 AutoDock_SOAP Options Memory Palicy Specify binding site coordinate X Y Z use x conrdinate UI y coordinate UI 2 coordinate UI Output s
53. n and viewing the results 29 4 3 5 Step 3 result viewing 1 Node3 CBRCViewer BLAST Result The execution result of BlastNCBI REST can be viewed as BLAST Result A text version of the results 1s shown by pressing TextView button BLAST Result i 7236213 zb A 4 3 5 1 Node3 CBRCViewer BLAST Result 30 BLASTP 2 2 24 Aug 08 2010 Reference Altschul Stephen F Thomas L Madden Alejandro A Schaffer Jinghui Zhang Zheng Zhang Webb Miller and David J Lipman 1997 Gapped BLAST and PSI BLAST a new generation of protein database search programs Nucleic Acids Res 25 3389 3402 Reference for compositional score matrix adjustment Altschul Stephen F John C Wootton E Michael Gertz Richa Agarwala Aleksandr Morgulis Alejandro Schaffer and Yi Kuo Yu 2005 Protein database searches using compositionally adjusted substitution matrices FEBS J 272 5101 5108 Query gi 7296213 gb AAF51505 1 aristaless Drosophila melanogaster 408 letters Database SWISS SWISS sequence taken from the header Last update Oct 18 2011 532 146 sequences 188 719 038 total letters E Sequences producing significant alignments bits Value sp Q05453 AL DROME RecName Full Homeobox protein aristaless 648 0 0 4 3 5 2 BLAST Result TextView l 2 Node5 HtmlView BLAST Result The execution result of BlastNCBI REST can be viewed as follows
54. n example alpha 49 el 4 9 1 1 ASIAN Active Workflow sample matrix file 02 0 27 012 0 34 019 0 36 0 36 018 0 47 0 42 0 26 0 4 0 25 0 3 0 4 0 03 0 27 019 017 SKS 017 0 34 0 74 0 07 0 36 0 08 0 01 0 29 0 14 004 alpha 56 0 06 0 51 014 0 03 0 01 0 1 01 0 29 0 15 0 07 01 012 0 06 0 3 0 07 01 0 21 023 02 084 0 03 058 027 023 0 04 016 0 08 024 0 1 0538 alpha 63 0 06 0 67 014 0 01 0 32 017 0 08 027 0 08 0 06 0 01 0 23 042 023 022 0 09 014 014 0 03 027 0 71 042 0 01 01 0 26 0 31 015 0 3 017 0 36 022 alpha 70 014 0 62 038 0 1 034 022 0 29 0 58 0 2 011 02 0 29 0 31 0 36 052 017 1 46 058 043 0 67 0 34 0 23 0 32 019 0 03 0 3 0 03 015 056 024 A file needed for execution 1s a file of matrix form of the gene appearance data In ASIAN Active Workflow a variable to be analyzed is treated by each line alpha In this example the experiment name of microarray is described as ORF name and 0 g 100 73 4 9 2Node There are 8 nodes 4 9 2 1 ASIAN Active Workflow Node list File Reader Hierarchical Clustering Representative Profile Graphical Gaussian Modeling Column Filter File Reader Hierarchical Clustering a ee Graphical Gaussian Modeling Column Filter eoe
55. n self alignment In that case the gapped extension might discover the main self alignment and extend over the entire length of the sequence To avoid this problem gapped alignments are not triggered from any gapless alignment that is contained in both sequences in the core of another alignment has start coordinates offset by DISTANCE or less relative to this core Use wO to turn this off 98 G FILE Use an alternative genetic code in the specified file For an example of the format see vertebrateMito gc in the examples directory By default the standard genetic code is used This option has no effect unless DNA versus protein alignment is selected with option F t TEMPERATURE Parameter for converting between scores and likelihood ratios This affects the column ambiguity estimates A score Is converted to a likelihood ratio by this formula exp score TEMPERATURE The default value is 1 lambda where lambda is the scale factor of the scoring matrix which is calculated by the method of Yu and Altschul YK Yu et al 2003 PNAS 100 26 15688 93 g GAMMA This option affects gamma centroid and LAMA alignment only Gamma centroid alignments minimize the ambiguity of paired letters n fact this method aligns letters whose column error probability is less than GAMMA GAMMA 1 When GAMMA is low it aligns confidently paired letters only so there tend to be many unaligned letters When GAMMA is high it aligns l
56. ow Node list Node 1 FastaFileReader FastaFileReader The FASTA format file 1s read Node 2 Poodle SOAP Execute POODLE Node 3 CBRCViewer The POODLE execution result 1s displayed Node4 LSDBCrossSearch Execute LSDB u cross search 67 4 8 3 Step 1 Node setting 1 Nodel FastaFileReader Select a FASTA file as an input in Configure using the right click menu 2 Node2 Poodle SOAP Select an output directory and program type in Configure using the right click menu Options Memory Policy POODLE Type POODLE S 5 POODLE L selected Directory Browse 4 8 3 1 Poodle SOAP Configure e Options tab Type gt POODLE Type Select type POODLE S or POODLE L POODLE S predicts shorter disorder regions POODLE L predicts longer disorder regions gt 40 a a 68 4 8 4Step2 Execution wa 0 Poodle SOAP zz E FastaFileReader Poodle SOAP CBRCViewer Mode 1 Node 2 Node 3 LSDBCrossSearch GRZ Node 4 4 8 4 1 Poodle SOAPNode 1 Nodel FastaFileReader Select Execute in the right click menu for execution 2 Node2 Poodle SOAP Select Execute in the right click menu for execution 3 Node3 CBRCViewer Select Execute and Open Views in the right click menu for execution and viewing the results 4 Node4 LSDBCrossSearch Select Execute and Open Views in the right click menu for execution and viewing the results Please refer
57. pha O 0 33 0 64 0 23 0 69 0 04 011 047 0 25 0 58 0 36 0 31 017 0 29 0 29 034 0 01 0 29 014 0 06 054 0 38 0 27 284 043 0 32 0 45 0 43 0 07 016 007 zi Gene appearance data file of matrix format alpha 7 017 0 38 019 0 589 0 01 0 32 1 0 26 0 29 017 012 0 32 0 31 0 07 0 88 0 69 0 01 023 0 36 0 64 0 23 147 042 0 33 0 34 0 12 0 34 0 04 017 01 alpha 14 0 04 0 32 0 36 0 74 0 51 0 03 0 51 0 01 045 022 0 34 034 0 2 0 34 0 42 0 09 0 33 0 04 0 26 022 0 64 0 62 0 94 0 69 022 0 25 12 018 0 01 0 22 alpha 21 004 a column name of Yeast ID of the line 0 07 0 29 014 0 56 0 32 0 25 0 06 015 0 34 0 61 0 42 0 04 0 34 0 97 0 09 0 07 04 042 054 1 79 0 4 0 03 015 019 0 03 012 0 06 OI alpha 28 0 08 SE 04 064 03 0 03 0 71 0 42 0 56 0 36 018 025 0 38 0 36 015 0 25 0 24 0 28 012 0 08 0 69 0 64 1 47 0 62 0 34 018 0 01 01 0 08 0 1 Don alpha 35 012 0 01 016 018 0 49 012 0 22 0 07 0 36 0 03 0 28 0 3 011 0 43 0 29 0 21 0 48 023 0 06 012 St 133 045 1 74 03 026 02 023 0 08 0 26 012 alpha 42 0 03 0 32 0 09 0 42 0 08 0 01 03 03 0 54 0 2 014 019 0 2 04 04 018 0 08 003 Gene appearance data of Yeast is shown as a
58. position When lastal extends gapless alignments from initial matches that start at one query position if it gets COUNT successful extensions it skips any remaining initial matches starting at 97 that position This option has no effect unless COUNT is less than MULTIPLICITY k STEP Look for initial matches starting only at every STEP th position in the query This makes lastal faster but less sensitive i BYTES Search queries in batches of at most this many bytes If a single sequence exceeds this amount however it is not split You can use suffixes K M and G to specify KibiBytes MebiBytes and GibiBytes This option has no effect on the results apart from their order unless k gt 1 If the reference was split into volumes by lastdb then each volume will be read into memory once per query batch u NUMBER Specify treatment of lowercase letters when extending alignments 0 means do not mask them 1 means mask them for gapless extensions 2 means mask them for gapless and gapped extensions but not final extensions 3 means mask them at all stages Mask means change their match mismatch scores to min unmasked score 0 This option performs not affect treatment of lowercase for initial matches w DISTANCE his option is a kludge to avoid catastrophic time and memory usage when self comparing a large sequence f the sequence contains a tandem repeat we may get a gapless alignment that is slightly offset from the mai
59. ppendix A LSDBCrossSearch for the use of the result screen 20 4 2 5Step 3 Result viewing 1 Node3 CBRCViewer Mafft Result The sequence identifier used for the input is displayed on the left The aligned sequence is shown on the right A text version of the results 1s shown by pressing TextView button 1 gi 334185880 2 gi 21406208 z 3 gi 28416578 z 4 gi 13358228 g 5 gi 110735332 1 ei 334185880 2 gi 21406208 g 3 gi 28416578 g 4 gi 13358228 g 5 gi 110735332 1 ei 1334185880 2_gi 21406208 g 3_gi 28416578 g 4_gi 13358228 g 5 gi 110735932 1 ei 334185880 2 gi 21406208 g 3 gi 28416578 g 4 gi 13358228 g 5 gi 110735332 1 gi 334185880 2 gi 21406208 g 3 gi 28416578 g 4 gi 13358228 g 5 gi 110735332 1 gi 334185880 2 gi 21406208 z 3 gi 28416578 z 4_gi 13358228 z 5 gi 110735932 1_gi 334185880 2 gi 21406208 z 3 gi 28416578 s 4_gi 13358228 g 5_gi 110735932 4 2 5 1 Node3 CBRCViewer MAFFT Result 21 gi 334185880 GTGTCTAAAGGATTTTCGAGT TCACAAAAAGAATTTCCTCGCTCTACCGCCGCCGTCTCT CTTCTCTCAGCAAGGT TGAATCATCAGAGAAGAAAGGGT TTTACACTGCGGCGT AAAGGA TGAAGCTTGACGTGAATGTGTTGAGATATTTATCCAAAGATGATTTTCGAGTTCTCACTG CTGTCGAGAT GGGAATGCGAAACCATGAGATTGTTCCTTCTGAGCTT GTGGAGCGCATTG CTTGTCTAAAACAT GGAGGCACCTACAAGGT CCT GAAGAACTTGCTCAAGTATAAGCTTT TGCACCACGATTCCTCTAAATATGATGGATTCCGACTCACCTATCTGGGTTATGACTTTC TTGCCATTAAAACATTGGTCAA
60. re Navigator SeSAW Ligand Binding Sites GIRAF EM Navigator Search NMR Data BMRB Status Search More images curis eg View in 3D molecule viewer JV Graphic jV4 Jmol 1 Viewer Yorodumi Protein Globe ASH MAFF Tash SEALA Structure Prediction gt gt CRNPRED Spanner SFAS Derived database PDB ID RELATED PDB ID Title Functional Keywords Biological source Cellular location 1twf sequence information FASTA 1twa 1twc 1twg 1twh DNA directed RNA polymerase ll la E C 2 7 7 6 DNA directed RNA p kDa polypeptide E C 2 7 7 6 DNA polymerase Il 45 kDa polypeptide E DNA directed RNA polymerases polypeptide E C 2 7 7 6 DNA dire polymerases Il and Ill 23 kDa poly E C 2 7 7 6 DNA directed RNA p and Ill 14 5 kDa polypeptide E C 2 directed RNA polymerase Il 14 2 kC E C 2 7 7 6 DNA directed RNA p and Ill 8 3 kDa polypeptide E C 2 7 directed RNA polymerase ll 13 6 kC E C 2 7 7 6 DNA directed RNA p and lll 7 7 kDa polypeptide E C 2 7 RNA polymerase ll complexed with resolution TRANSCRIPTION MRNA MULTIPI COMPLEX MOLECULAR MACH MOTIFS Saccharomyces cerevisiae baker s UNP P04050 Nucleus UNP P20435 Cytoplasm UNP P22139 Nucleus nucleolus d 4 r 4 6 5 2 Node10 PDBjMineWeb PDB Mine 56 2 Node6 JmolForModeller Modeller Results The execution result of Modeller SOAP of Noded can be viewed as Modeller Results by Jmo
61. right click menu File Column Filter Memory Policy Exclude select Include Column s Column s Select all search hits Select all search hits add all gt Bh ide lt lt remove all 9 Enforce exclusion 0 Enforce inclusion 4 9 3 9 Column Filter Configure 83 Column Filter tab Select columns for t Test by adding to Include section Press OK after selecting Select Execute in the right click menu for execution 7 Node7_ Joiner deprecated Set column join in Configure using the right click menu standard Settings Memory Policy Join column from second table sss ID m Duplicate column handling Fiker duplicates o Don t execute Append suffix Join made Janer Join Multiple match row ID zuffix OK Execute 4 9 3 10 Joiner deprecated Configure Standart Settings tab Join column from second table Select Row ID or ids Duplicate column handling Select Fileter duplicates Don t execute or Append suffix Enter suffix 1n case of Append suffix Join mode Select either Inner Join Left Outer Join Right Outer Join or Full Outer Join Multiple match row ID suffix Enter Suffix for multiple joined Row ID Press OK after completing Select Execute in the right click menu for execution 84 8 Node8 RunCytoscape Select Execute and Open Views in the right click menu to execute Cytoscape e 104
62. s Integer is only permitted to input Minimum Length Set Minimum Length Minimum Length is a value of minimum length of amino acid of the hit area The default value 1s 30 The range of the value 1s below 26 Minimum Length Input amino acid sequence length Integer Press OK after entering Node4 TemplateSelector SOAP Set conditions for a template for 3D structure modeling 1 Select Configure in the right click menu Options Memory Policy Conditions to determine for modelling or for diplaving PDB Mine Web Coverage X 90 Identity X 90 4 6 3 3 TemplateSelector SOAP Configure Options tab Condition to determine for modelling or for displaying PDBj Mine Web Coverage Identity Set Coverage and Identity Coverage is a ratio in a hit area against the total length of the protein structure hit 90 5 Identity is an amino acid matching rate in the hit area between the query and the target The default value of Coverage is 90 and of Identity 90 Only integer can be used Node5 Modeller SOAP Set a license key and a number of models to generate for MODELLER 1 Select Configure in the right click menu Options Memory Policy Condition for Modeller Execution Number of Models for Modelling 5 Modeller License License Key for Modeller required 4 6 3 4 Modeller SOAP Configure e Options tab Condition for Modeller Execution
63. s 32bit Workflow that performs homology modeling from amino acid sequence 3 Common rules Starting Active workflow Ccommon rules in all Active workflows are as follows Double click on the workflow the user will use in Workflow Projects column after KNIME starts The workflow is then shown and ready to use File Edit View wh d A Workflow Projects Search Run Node Help a a Blast SOAP li CentroidFold A Fastapl SOAP lij KNIME cbrcASIAN L i Last L i Mafft SOAP L i Modelling SOAP li Poodle SOAP li RNA Structure Prediction L i WoLF PSORT SOAP Favorite Nodes 52 Personal favorite nodes Most frequently used nodes Last used nodes cA Node Repository Node Des KR e Ka Serer w Gi 76 ts Workflow Server publ B Outline 52 E ASIAN 4 Active Flow piis IO amp Database EA Data Manipulation Q Data Views Statistics EI Console 23 Peer she Ep E BB rtE ri KNIME Console m r Welcome to KNIME v2 3 4 0028950 the Konstanz Informat we Copyright 2003 2011 Uni Konstanz and KNIME GmbH REAR HR RRR RRR RK HHH RRR HRM RHR RMR HHH Log file is located at C CBRC knime knime workspace metad WARN Fastapl SOAP Execution canceled
64. sequence and 0 in the lower sequence then a block of size 4 Score Options r SCORE 95 Match score q COST Mismatch cost p FILE Obtain match and mismatch scores from the specified file Options r and q will be ignored For an example of the format see hoxd O mat in the examples directory Any letters that aren t in the file will get the lowest score in the file when aligned to anything Asymmetric scores are allowed query letters correspond to columns and reference letters correspond to rows Other options can be specified on lines starting with last but command line options override them a COST Gap existence cost b COST Gap extension cost A gap of size k costs a b k c COST This option allows use of generalized affine gap costs SF Altschul 1998 Proteins 32 1 88 96 Here a gap may consist of unaligned regions of both sequences If these unaligned regions have sizes j and k where j lt k the cost is a b k j cxj If c gt a 2b the default it reduces to standard affine gaps F COST Align DNA queries to protein reference sequences using the specified frameshift cost A value of 15 seems to be reasonable The output looks like this a score 108 s myprot 422 40 649 FLLQAVKLQDP STPHQ VPSP VSDL ATHTLCPRMKYQDD s mydna 8 78 117 1000 FFLQ IKLWDPXSTPH IVSSP PSDL SAHTLCPRMKSQDN The Y indicates a forward shift by one nucleotide and the indicates a r
65. simplified version is allowed too If you use PSSMs options r q and p are mostly ignored except that they determine the default value of y 5 2 2 lastdb parameter Option description for LAST has been taken from LAST web site Main Options Show all options and their default settings Interpret the sequences as proteins The default is to interpret them as DNA Soft mask lowercase letters This means that when we compare these sequences to some other sequences using lastal lowercase letters will be excluded from initial matches This will apply to lowercase letters in both sets of sequences Advanced Options s BYTES Limit memory usage by splitting the output files into smaller volumes if necessary This will limit the memory usage of both lastdb and lastal but it will make lastal slower It is also likely to change the exact results found by lastal BYTES should be slightly less than the amount of real memory on your computer You can use suffixes K M and G to specify KibiBytes MebiBytes and GibiBytes For example s 5G has 101 worked well with 6G and s 1280M has worked well with 2G However the output for one sequence is never split Since the output files are several fold bigger than the input this means that mammalian chromosomes cannot be processed using much less than 2G here is a hard upper limit of about 4 billion sequence letters per volume Together with the previo
66. t version of the results 1s shown by pressing TextView button a name of first view File ClustalW Result i 214086208 z i 13358228 z i 28418578 z i 110735932 i 334185880 i 21408208 z i 13358228 z i 28418578 g gi 110735332 gi 334185880 i 21408208 z i 13358228 z i 28418578 z gi 110735332 i 334185880 si 21406208 s i 13358228 z i 110735932 _ i 1334185880 i 1408208 z 8 gi 13358228 g i 28418578 z gi 110735332 gi 334185880 i 1408208 z i 13358228 i 28416578 s i 110735932 gi 934185880 i 21406208 s i 13358228 z i 28416578 s _gi 110735932 i 384185880 2 4 3 5 1 2 4 3 5 1 2 4 3 5 1 g As 8 i 28416578 g be IL 2 4 3 5 1 2 4 3 5 1 2 4 3 5 1 d 4 2 5 3 Node5 CBRCViewer ClustalW Result 29 AATCGAAAAAAATAAATG CGTTGTTTGGTACAGCTTCACGAACAATCTCTCTCTCGATAGATTCTICT TACCTCTGAATTTCTCGTTGTTGGAACAATGGCGTCGAATCTCCTGAAAG COCTAATCCGATCTCA GATTCTTCCATCTTCCAGGAGGAATTTCAGTG TGGCGACCACACAGCT TGGCATT CCAACAGACGATCTAGTCGGCAATC ACACCGCCAAATGGATGCAGGATAGAAGCAAG AAATCACCTATGGAA CTGATTAGTGAGGT TCCACCTAT CAAAGTTGATGGAAGGATTG TTGCTTGTGAAGGAGACACCAATCCGGCCCTAGGTCATCCAA TC GAGTTCAT ATGCCTCGACCT AA ATGAGCCT GCGATCTGCAAGTACTG CG
67. the installation manual available in Life Science Database Integration Web site Life Science Database Integration Web http togo cbrce jp The Active workflows run on KNIME platform Please refer to the KNIME site for the details of KNIME This manual explains how the user can work with Active workflows KNIME http www knime org 2 About the Active workflow Component type There are nine Active workflow combination types available which are listed in the table below 2 1 Active workflow component type list Active workflow component OS Explanation type name Fastapl Active Workflow Windows 32bit Workflow that performs sequence processing of FASTA form file Mafft Active Workflow Windows 32bit Workflow that performs TT Kee Blast Active Workflow Windows 32bit Workflow that performs nuin d Last Active Workflow Windows 32bit Workflow that performs mind WolfPSORT Active Windows 32bit Workflow that predicts Workflow localization in cell from amino acid sequence CentroidFold Active Windows 32bit Workflow that predicts Workflow secondly structure from the RNA sequence POODLE Active Workflow Windows 32bit Workflow that predicts disorder area from amino acid sequence ASIAN Active Workflow Windows 32bit Integrated analytical Linux workflow using gene network inferring system AutoDock Active Workflow Windows 32bit Chemical compounds protein docking workflow Modelling Active Workflow Window
68. tion result 1s displayed Node4 LSDBCrossSearch LSDBCrossSearch Execute LSDB cross search WC Node 4 12 4 1 3Step 1 Node setting 1 Nodel FastaFileReader Select a FASTA file as an input using right click menu 2 Node2 Fastapl SOAP Select an output directory using right click menu and set options if necessary a Options 4 1 3 1 Fastapl SOAP Configure Options tab Advanced Options The default options are p 1 100 meaning that sequence length of the FASTA file will be adjusted to 100 characters a line 4 1 4 Step 2 Execution 13 1 2 3 A 0 Fastapl SOAP EMIT 4 1 4 1 Fastapl SOAP all Nodes FastaFileReader Select Execute in the right click menu for execution Fastapl SOAP Select Execute in the right click menu for execution HtmlView Select Execute and Open Views in the right click menu for execution and viewing the results LSDBCrossSearch Select Execute and Open Views in the right click menu for execution and viewing the results Please refer to the following 5 1 Appendix A LSDBCrossSearch for the use of the result screen 14 4 2 Matt Active Workflow Mafft Active Workflow performs multiple alignment for nucleic acid sequences or of amino acid sequences via SOAP It uses ClustalW http www clustal org or MAFFT http mafft cbrc jp This workflow can retrieve a v
69. to the following 5 1 Appendix A LSDBCrossSearch for the use of the result screen 69 4 8 5Step 3 Result viewing 1 Node3 CBRCViewer POODLE Result The execution result of Poodle SOAP can be viewedto as POODLE Result by CBRCViewer node This screen displays the disorder prediction results of POODLE S or POODLE L as a plot The vertical axis indicates disorder probability and the horizontal axis indicates residue numbers Amino acids in red indicate disorder predicted The text version of the results can be shown by pressing TextView button 225 250 275 300 32 5 35 0 375 400 425 450 475 50 0 525 position from Nterm 55 0 gt gi 11513392 pdb 1DH3 4 Chain A Crystal Structure Of A Creb Bzip Cre Complex Reveals The Basis For Creb Faimly Selective Dimerization And Dna Binding KREVRLMKNREAARESRRKKKEYVKSLENRVAVLENGONKTL IEELKALKDLYSHK 4 8 5 1 Node3 CBRCViewer POODLE Result 70 PFRMAT DR REMARK K Shimizu Y Muraoka S Hirose and T Noguchi REMARK Feature Selection Based on Physicochemical Properties of REMARK Redefined N term Region and C term Regions for Predicting Disorder REMARK Proc of IEEE CIBCB 2005 pp262 267 METHOD Prediction for short disorder using modified PSSM K D 0 712 R D 0 696 E D 0 632 Y D 0 555 R O 0 466 L 0 0 386 M O 0 32 K 0 0 34 N 0 0 38 RO 0 481 E D 0 555 D 0 633 D 0 651 R D 0 63 E D 0 708 s D 0 728 R D 0 718 R D 0 683 K D 0 629 4 8 5 2 Node3 POOD
70. us point this means that lastdb will refuse to process any single sequence longer than about 4 billion m PATTERN Specify a spaced seed pattern for example m 110101 In this example mismatches will be allowed at every third and fifth position out of six in initial matches his option performs not constrain the length of initial matches The pattern will get cyclically repeated as often as necessary to cover any length Although the 0 positions allow mismatches they exclude non standard letters e g non ACGT for DNA f option c is used they also exclude lowercase letters u FILE Specify a subset seed file The m option will then be ignored For an example of the format see yass seed in the examples directory w STEP Allow initial matches to start only at every STEP th position in each of the sequences given to lastdb This reduces the memory usage of lastdb and lastal and it makes lastdb faster Its effect on the speed and sensitivity of lastal is not entirely clear To emulate BLAT use w 11 a SYMBOLS Specify your own alphabet e g a 0123 The default DNA alphabet is equivalent to a ACGT The protein alphabet p is equivalent to a ACDEFGHIKLMNPQRSTVWY Non alphabet letters are allowed in sequences but by default they are excluded from initial matches and get the mismatch score when aligned to anything If a is specified p is ignored 102 b DEPTH Specify the d
71. ustal format default fasta reorder Outorder aligned default input order quiet Do not report progress The default options are as follows retree 2 maxiterate 0 bl 62 op 1 53 ep 0 0 clustalout Node4 ClustalW SOAP Specify an absolute path of a directory to store ClustalW results or select the output directory using Browse button Options Flow Variables Memory Policy Type DNA Select Output Directory Selected Directory Browse 4 2 3 2 Mafft SOAP Configure Specify PROTEIN for protein sequences or DNA for nucleic acid sequences radio button 18 4 2 4Step2 Execution 4 2 4 1 Mafft SOAP Node Mafft or ClustalW can be selected 1 2 3 A 5 Nodel FastaFileReader Select Execute in the right click menu for execution Node2 Mafft SOAP Select Execute in the right click menu for execution Node3 CBRCViewer Select Execute and Open Views in the right click menu for execution and viewing the results Node4 ClustalW SOAP Select Execute in the right click menu for execution Node5 CBRCViewer Select Execute and Open Views in the right click menu for execution and viewing the results 19 6 Node6 LSDBCrossSearch Select Execute and Open Views in the right click menu for execution and viewing the results Please refer to the following 5 1 A

Download Pdf Manuals

image

Related Search

Related Contents

Manual do proprietário  FMS12-12V DC  Colle en rouleau Colla in rotolo Cola en rollo  Canon PowerShot SX110 IS User Guide Manual pdf  PROGRAMMES - M6 Publicité  REACH Media Master CL4000F(Powolive™) User Manual  comunità di sant - Sito Web del Comune di S.Dorligo della Valle  REFERENCIA RÁPIDA DE STYLEVIEW®  

Copyright © All rights reserved.
Failed to retrieve file