Home

Unipro UGENE User Manual

image

Contents

1. uo 125 8 2 Browsing and Zooming Assembly 126 8 2 1 Opening Assembler Browser Window o 126 8 2 2 Assembly Browser Window 2 a a a a 126 8 2 3 Assembly Browser Window Components 5005 127 8 2 4 Reads Area Description a a a 128 8 2 5 Assembly Overview Description aoa oa a a a a a a a 129 8 2 6 Ruler and Coverage Graph Description 129 8 2 7 Go to Position in Assembly 20 20 0222 220000 130 8 2 8 Using Bookmarks for Navigation in Assembly Data 130 8 3 Getting Information About Read ee ee 130 8 4 Short Reads Vizualization m4 131 8 4 1 Reads Highlighting a 131 8 4 2 Reads Shadowing oa aaa a a a a 132 8 5 Associating Reference Sequence ma 133 8 6 Consensus Sequence moss a0 a ee a a A ee Se 133 8 7 EXPOMING 22248665 DME Cae eee ae AAA as 134 8 7 1 Exporting Read eek Boe wt ew dee SEEPS A A 134 8 7 2 Exporting Visible Reads 2 1 o 134 8 7 3 Exporting Consensus 2 1 a a 135 8 7 4 Exporting mace sierras das 135 8 8 Options Panel in Assembly Browser 000 2 eee 136 8 8 1 Navication lt lt er een rover aaa eee ee ee st 136 8 8 2 Assembly Browser Settings ooo a a a a a 136 8 8 3 Assembly Statistics cme bu Bow wo E IA 137 8 9 Assembly Browser Hotkeys 2
2. 54 27 i 41 Value 37 Auto annotations murine gb NC_001363 38 4 NC_001363 features murine gb 37 gt BH cos 0 4 32 al al misc feature 0 2 22 gt HB misc_feature 2 390 43 FP B misc_feature 5245 5833 36 al 7 source 0 1 29 gt E source 1 5833 To copy the statistical information about a sequence select it on the Options Panel and choose the copy item in the context menu or use the Ctr1 C shortcut 5 7 Information about Sequence 8 Unipro UGENE User Manual Version 1 12 3 5 8 Manipulating Sequence 5 8 1 Going To Position To go to a position use the global actions toolbar CJ Or use the Go to position context menu or the Actions main menu item Chrl G ci Select sequence region Ctrl CGT Ay New annotation CUN A Copy Select Add Analyze K Also you can use the shortcut Ctrl G 5 8 2 Toggling Views It is possible to switch the Sequence overview Sequence zoom view and the Sequence details view visibility using the rightmost button in the toolbar The sequence can be removed from the view using the same menu Once you remove the last sequence in the view the view is automatically closed 5 8 3 Capturing Screenshot Use a sequence toolbar Capture screen button to save a screenshot of the sequence Available file formats are jpg png and tiff 54 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 5 8 4 Zooming S
3. The branches have been collapsed eracantha deracantodes EF540 r Lo Aychia_barano TEttigonia viridissima To show the collapsed clade select the Expand item in the node s context menu 9 6 3 Swapping Siblings To interchange the locations of the the two branches of a clade select the Swap Siblings item in the context menu of the root node of the clade Rosselisna nese After swapping the branches 440 2 Chapter9 Phylogenetic Tree Viewer Unipro UGENE User Manual Version 1 12 3 9 6 4 Zooming Clade Additionally to other zooming options you can use the Zoom In item in the context menu of the root node of a clade 9 6 5 Adjusting Clade Settings When a clade is selected the branch and the labels formatting settigns are applied to the clade only Note that the settings are not applied to the collapsed branches if any See an example of changing branch settings for a clade nH eaae S R Pa 9 7 Exporting Tree Image A tree image can be exported to a raster format png jpg bmp etc or to a vector format svg Select either the Export Tree Image toolbar button or the Actions Export Tree Image item in the main menu In the submenu appeared select the Screen Capture item to save the tree image to a raster format The standard Save As dialog will appear where you can select the file name and format To export a tree image to a vector format select the As SVG item in the Export Tree Image su
4. 0 ee eee ee ee 52 5 8 Manipulating Sequence 1 ee 54 5 8 1 Going To Position lt s s d d soci eda nisd RA A ew DS 54 5 8 2 TOBIN VIEWS s so sessanta tinkas A 54 5 8 3 Capturing Screenshot aoaaa a 54 5 8 4 Zooming Sequence s s seess resad ara R 55 5 8 5 Creating New Ruler a 55 5 8 6 Selecting Amino Translation 95 5 8 7 Showing and Hiding Translations oaoa aa 0 00004 2a 56 5 8 8 Selecting Seg ence lt yori rodadas 57 5 8 9 Copying Sequence ooo a a a 59 5 8 10 Search in Sequence cu oaoa OHM eS ee adas 60 5811 Editing Sequence lt secesia toes ei ode ee ee o se BEG 61 5 8 12 Exporting Selected Sequence Region 0 004 64 5 8 13 Exporting Sequence of Selected Annotations 64 5 8 14 Locking and Synchronize Ranges of Several Sequences 65 5 9 Annotations Editor 67 5 9 1 Automatic Annotations Highlighting 0 0004 68 5 9 2 The db xref Qualifier 2 0 02 0 02000020 ee ee 68 5 10 Manipulating Annotations e 69 5 10 1 Creating Annotation 22 e 20420 e eS Se ww RR 69 5 10 2 Editing Annotation 4 688 e582 oh de Sb ee eee ES woe ew 70 5 10 3 Highlighting Annotations 70 5 10 4 Creating and Editing Qualifier o a 73 5 10 5 Adding Column for Qualifier aaa a 74 5 10 6 Copying Qualifier Text 2 a a a 75 5 10 7 Deleting Annotations and Qualifiers
5. Host Protocol Ping 193 124 211 18 23733 chief Direct socket protocol 7 HMMERS search task Remove Muscles align task Smith Waterman search task Modify Workflow schema simple run task Ping 146 Unipro UGENE User Manual Version 1 12 3 To add a new remote machine click the Add button In the appeared dialog select the protocol and fill other required fields U Add remote machine Ed Hosts Pt Modification of a remote machine is as simple as adding a new one Just select the machine and click the Modify button To remove a remote machine from the monitor select the machine in the table and click the Remove button You can Ping a remote machine to check if it s still alive and UGENE is still running there Some network protocols for example direct socket protocol can do scanning of local network To search for running UGENEs through such protocols click the Scan button Also you can use one of the public UGENE machines to run your tasks on it To add public machines to the monitor click the Get public machines button 10 1 Remote Machines Monitor 147 Unipro UGENE User Manual Version 1 12 3 10 2 Running Workflows on Cloud 10 2 1 Introduction The Workflow Designer is a powerful extension of Unipro UGENE which allows to easily construct and execute computational workflows It has interactive visual interface and provides a lot of capabilities This manual section explains how to launch w
6. Fragment of CVU5Ss762 gb 5 A CAATAATGAC ACTTTCCATIG GITATTACTG TGAAAGGTAAC Cancel Here you can select the type of each DNA end and even input a custom overhang The changes you ve made are shown in the Preview area of the dialog To confirm the changes and close the dialog click the OK button Reverse Complement a Fragment To reverse complement a fragment check the nverted check box for the fragment in the new molecule contents list Other Constuction Options To save the fragments of the new molecule as annotations check the Annotate fragments in new molecule check box 11 9 Molecular Cloning in silico 179 Unipro UGENE User Manual Version 1 12 3 To make all DNA ends blunt check the Force blunt and omit all overhangs check box All overhangs would be cut in this case Check the Make circular check box to make the new molecule circular Output On the Output tab of the dialog you can select the file to save the new molecule to The molecule is opened by default as soon as it is created To modify this behavior uncheck the Open view for new molecule check box on the same tab To save the molecule file to the hard disk immediately after it is created check the Save immediately check box Otherwise it would be stored in memory until you save or remove it 180 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 10 Secondary Structure Prediction The Secondary Structure Prediction pl
7. If there are several sequences in the document then selecting the Separate sequences option will open several sequences in a Sequence View window Contrariwise selecting the Merge sequences option will merge the sequences into one sequence The Gap length parameter specifies the length of the gaps inserted between the merged sequences Your choice will be saved as default if you check the Save as default settings check box Note that if you select to merge the sequences then the annotations of the sequences if any are also relocated automatically 4 5 2 Opening Document Present in Project To open a document that is already present in the current project select it in the Project View and click Enter double click on it or drag it to an empty space of the UGENE window 4 5 Opening Document 23 Unipro UGENE User Manual Version 1 12 3 4 6 Creating Document To create a new sequence file from text select the File gt New document from text main menu item Actons Settings Tools Window E New project New document from text J ea Access remote database C3 Open Cirl o The Create Document dialog appears U Create Document Paste data here Custom settings Alphabet Standard DINA Skip unknown symbols Replace unknown symbols with Document format sequence name sequence C Save file immediately You can input the created sequence to the Paste data here field The follo
8. 7 Tree d M Align sequences to profile with MUSCLE 7 dda Statistics K N Align profile to profile with MUSCLE Y View K K1 Align with Kalign A A Export K EN Align with ClustalW L Advanced Alon with MAFFT Consensus mode LZ Align mith T Coffee 228 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 BLAST BLAST The Basic Local Alignment Search Tool BLAST http blast ncbi nlm nih gov finds regions of local similarity between sequences The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families BLAST is a new version of the BLAST package from the NCBI From UGENE you can use the following tools of the old BLAST package e blastall the old program developed and distributed by the NCBI for running BLAST searches e formatdb formats protein or nucleotide source databases before these databases can be searched by blastall And the following tools of the new BLAST package e blastn searches a nucleotide database using a nucleotide query e blastp searches a protein database using a protein query e blastx searches a protein database using a translated nucleotide query e tblastn compares a protein query against a translated nucleotide database the all six reading
9. Left Right See the result the anaglyph view is applied to a molecule below Swap colors 6 2 3 Moving Zooming and Spinning 3D Structure A 3D structure can be easily spinned moved and resized e To spin the 3D structure drag the mouse on the 3D structure while holding the left mouse button e To move the 3D structure hold the Ctrl keyboard button and drag the mouse with the left button pressed e To resize the 3D structure either use the mouse wheel or Zoom In and Zoom Out buttons on the toolbar At any time you can restore the default view by pressing the Restore Default View button on the toolbar E Si You can also overview the whole structure by spinning it automatically Select the Spin item either in the 3D Structure Viewer context menu or in the Display menu on the toolbar to do it To stop the spinning uncheck the Spin item 6 2 4 Selecting Sequence Region When you are selecting a region of a sequence e g in the Sequence zoom view the corresponding region on the 3D structure is being highlighted while the rest regions of the 3D structure are being shaded To configure the color of a region selected open the Settings dialog press the Settings item in the 3D Structure Viewer context menu or in the Display menu on the toolbar to do it press the Set selection color button and select a color in the dialog appeared 90 Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 To adjust
10. Load Folder Clear list Matrix Strand Clear results Save as annotations Results Found O Add to queue 222 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 In the search dialog you must specify a file with PWM or PFM You can do so by pressing the browse button 1 and selecting the file Also you can use the special interface to choose a JASPAR matrix by pressing the Search JASPAR database button 2 Alternative way to specify the position weight frequency matrix is to create a specific one from an alignment or a file with several sequences with the build a new matrix tool After the profile the matrix is loaded you can adjust the threshold value 3 The threshold sets the minimal identity score for a result to pass The more the result score is the more it is homologically related to the aligned region By changing the threshold you can filter low scoring results If the loaded matrix is a position frequency matrix you must also specify the algorithm to build the corresponding position weight matrix which will represent the transcription factor There are four algorithms available Weight algorithm Berg and von Hippel crand Berg and von Hippel Log odds NLG A mn 1 Also you can add a selected matrix with the specified Minimal score and the Algorithm to the matrices list To do it select the matrix and other options and press the Add to queue button The plugin will search with all
11. Tuna iA Trout 14 Eel 1A Consensus Loach 1 Tuna 1 Trout 1 Eel 1 Seahorse a Salamander 1 Frog 1 Panda 1 There are two gap columns inserted into the source profile and two gap columns inserted into the added one Therefore the profiles columns kept intact and the alignments haven t been changed Note Aligning a profile to the active alignment you will modify the original alignment file since it will contain 2 profiles after the operation is completed 11 15 3 Aligning Sequences to Profile with MUSCLE Another feature provided by the plugin is aligning a set of unaligned sequences to an existing profile To use this feature select the Align gt Align sequences to profile with MUSCLE context menu item This option is not available in the original MUSCLE package v3 7 and is a new functionality for original MUSCLE users In this mode each sequence from the input file is aligned to the active profile separately and is merged to the result alignment only after all sequences are processed For example the alignment in the picture above can be used as a profile again And the added profile can be used as a set of sequences The result of such sequences to profile alignment is presented on the picture below 204 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Consensus ES a Loach 14 Tuna 1 Trout 1 Eel 1 Seahorse 1A Salamander 1A Frog i Panda dt 1 La am m am m
12. 0Ut CBS nmm 12 2 15 Searching HMM Signals Using HMMER2 Task Name hmm2 search Searches each input sequence for the significantly similar sequence that matches to all specified profile HMM using the HMMER2 tool Parameters seq semicolon separated list of the input sequence files String Required hmm semicolon separated list of the input HMM files String Required out output file with annotations String Required name name of the result annotations String Optional Default hmm _ signal e val e value that can be used to exclude low probability hits from the result Number Optional Default 1e 1 246 Chapter 12 UGENE Command Line Interface Unipro UGENE User Manual Version 1 12 3 score score based filtering which is an alternative to e value filtering to exclude low probability hits from the result Number Optional Default 1000000000 Example ugene hmm2 search seq CBS_seq fa hmm CBS hmm out CBS_hmm qb 12 2 16 Aligning with ClustalW Task Name align clustalw Multiple sequence alignment with ClustalW Warning ClustalW is used as an external tool and must be installed on your system Parameters toolpath path to the ClustalW executable By default the path specified in the Application Settings is applied String Optional Default default tmpdir directory for temporary files String Optional in semicolon separated list of input files Str
13. 11 10 Secondary Structure Prediction 181 Unipro UGENE User Manual Version 1 12 3 U Secondary structure prediction Algorithm GOR I Mw Range start K Range end Results Region Structure type 1 10 18 gorly results 2 23 29 gorly results Total predicted 2 Save as annotation select this button to save the results as annotations of the current protein sequence 182 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 11 SITECON SITECON is a program package for recognition of potential transcription factor binding sites basing on the data about conservative conformational and physicochemical properties revealed on the basis of the binding sites sets analysis To cite SITECON use the following article Oshchepkov D Y Vityaev E E Grigorovich D A Ignatieva E V Khlebodarova T M SITECON a tool for detecting conservative conformational and physicochemical properties in transcription factor binding site alignments and for siterecognition Nucleic Acids Res 2004 Jul 1 32 Web Server issue W208 12 UGENE version of SITECON provides a tool for recognition of potential binding sites for over 90 types of transcription factors Also UGENE version of SITECON provides a tool for recognition of potential binding sites basing site alignment proposed by user For the detailed method description see the original SITECON site http wwwmgs bionet nsc ru cgi bin mgs sitecon sitecon pl stage 0 Da
14. Project View A visual component used to manage active project Task View A visual component used to manage active tasks Log View A visual component used to show logs Notifications A visual component used to show notifications Generally it is used to open tasks reports Plugin Viewer A visual component used to manage plugins Sequence View An Object View aimed to visualize DNA RNA or protein sequences along with their properties like anno tations chromatograms 3D models statistical data etc Annotation Additional information about a sequence identified by its name and the sequence region Alignment Editor An Object View used to visualize and edit DNA RNA or protein multiple sequence alignments Options Panel An Options Panel it is the panel with different information tabs and tabs with settings for Sequence View and Assembly Browser In the image below you can see a typical UGENE window with a Project View and a single Object View window opened 16 Unipro UGENE User Manual Version 1 12 3 lA File Actions Settings Tools Window Help Gok 4 a s Project view E ty3 aln gz H AF177870 emb T human_Tl fa eas fastq H 1CF7 PDB H murine gb a NC 001363 features 4 5 NC 001363 Fa 1 Project ty3 m ty3 W AF177870 s AF177870 stand VF human_T1 s human_T1 UC Hr eas 5 Sequence wy 1CF7 PDB Wr murine s NC_001363 VF murine s NC_001363 2 Wy 1CF7 PDB 2 Task view Log
15. Rate matrix fixed specifies the fixed rate amino acid model This parameter is available for amino acid sequences The following models are available e poisson e jones e dayhoff e mtrev e mimam 6 wag e rtrev 6 cprev 7 4 Building Phylogenetic Tree 123 Unipro UGENE User Manual Version 1 12 3 e vt e blosum e equaline The following parameters are common for nucleotide and amino acid sequences Rate sets the model for among site rate variation Select one of the following equal no rate variation across sites gamma gamma distributed rates across sites The rate at a site is drawn from a gamma distribution The gamma distribution has a single parameter that describes how much rates vary propinv a proportion of the sites are invariable invgamma a proportion of the sites are invariable while the rate for the remaining sites are drawn from a gamma distribution Gamma sets the number of rate categories for the gamma distribution You can select the following parameters for the MCMC analisys Chain length sets the number of cycles for the MCMC algorithm This should be a big number as you want the chain to first reach stationarity and then remain there for enough time to take lots of samples Subsampling frequency specifies how often the Markov chain is sampled You can sample the chain every cycle but this results in very large output files Burn in length determines
16. String Optional Default misc feature t ype type of the matrix Boolean Optional Default false The following values are available e true dinucleic type e false mononucleic type Dinucleic matrices are more detailed while mononucleic ones are more useful for small input data sets algo algorithm used to convert a PFM to a PWM String Optional Default Berg and von Hippel The following values are available e Berg and von Hippel e Log odds e Match e NLG score minimum percentage score to detect TFBS Number Optional Default 85 st rand strands to search in Number Optional Default 0 The following values are available e 0 both strands e 1 direct strand e 1 complement strand Example ugene pfm search seg 1n fa matrix MA0265 1 pfm MA0266 1 pfm out res gb 12 2 CLI Predefined Tasks 249 Unipro UGENE User Manual Version 1 12 3 12 2 22 Building PWM Task Name pwm build Builds a position weight matrix from a multiple sequence alignment file Parameters in semicolon separated list of input MSA files String Required out output file String Required t ype type of the matrix Boolean Optional Default false The following values are available e true dinucleic type e false mononucleic type Dinucleic matrices are more detailed while mononucleic ones are more useful for small input data sets algo algorithm used to build the matrix String
17. Unipro UGENE User Manual Version 1 12 3 e RSCB PDB e PDBsum e NCBI MMDB Note that if you re online you can access the Protein Data Bank directly from UGENE and load a required file by its PDB ID see Fetching Data from Remote Database for details Hint Don t forget to select the correct database PDB while fetching 6 2 2 Changing 3D Structure Appearance This chapter describes how you can change a 3D stucture appearance Selecting Render Style The following render styles are available e Ball and Stick e Space Fill e Tubes e Worms To change the render style select an appropriate item in the Render Style menu it can be found either in the 3D Structure Viewer context menu or in the the Display menu on the toolbar Ball and Stick Space Fill Tubes Worms 6 2 3D Structure Viewer 87 Unipro UGENE User Manual Version 1 12 3 Selecting Coloring Scheme You can select one of the following coloring schemes e Chemical Elements e Molecular Chains e Secondary Structure To change the coloring scheme open the Coloring Scheme menu available in the context menu and in the Display menu on the toolbar Chemical Elements Molecular Chains Secondary Structure Calculating Molecular Surface To calculate the molecular surface of a molecule select the Molecular Surface item in the 3D Structure Viewer context menu or in the Display menu on the toolbar and check one of the following items e SAS solvent accessib
18. 6 5 7 Saving Dotplot as Image e 108 6 5 8 Saving and Loading Dotplot o 108 6 5 9 Building Dotplot for Currently Opened Sequence 108 6 5 10 Comparing Several Dotplots a 108 T Alignment Editor 2 0 110 7 1 OVEIVICW isso rasa 110 7 1 1 Alignment Editor Features aa a a a a a 110 AZ Alignment Editor Components 2 2 2 a a a a 111 7 1 3 Navigation se s sossicicariad a aider diaaa er a ninta 112 7 1 4 Coloring Schemes lt s sss genesis sra 112 7 1 5 Zooming and Fonts s c sa c e E E ee R e RRR 112 7 1 6 Searching for Pattern 2 aa a 113 7 1 7 CONS ree errr eee eee e Rte ek eee AA E 114 7 2 Working with Alignment e 115 7 2 1 Undo Redo Framework o 115 2 2 Selecting Subalignment 0 0 0 0 0 00000 eee 115 1 2 3 Eding 5 TTT oso narrar sra 115 1 2 4 Aligning Sequences gt s saa a IE E E OE rr oe ee ES 117 125 Working with Sequences List 2 oaoa a aa e 118 7 3 Advanced Functions 120 7 3 1 Grid Profile sss ea seeur A AK Gee oe ee oe aS 120 7 3 2 Exporting IMaee stories ee Rae EBA RS ee dene Ee we EG 120 7 3 3 Building HMM Profile aoaaa a 120 7 4 Building Phylogenetic Tree um e 121 7 4 1 PHYLIP Neighbour Joining 2 aaa a 121 7 4 2 oa SU paused e Pease aa Ses oe o Fae oe eo 122 Assembly Browser 1 ee 125 8 1 Import BAM SAM File
19. Hide lpzidessemagprotection from organic solvents and antibiotics search type NIL amp object soxS amp quickSearch Quick Search TorR response regulator http biocyc org ECOLI substring search type NILMobject T ORR amp quickSearch Quick Search Tryptophan trp transcriptional repressor http biocyc org ECOLI substring search type NIL amp object T RPR amp quickSearch Quick Search Continued on next page 190 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Table 11 2 continued from previous page TyrR Tyrosine repressor http biocyc org ECOLI substring search type NILMobject TyrR amp quickSearch Quick Search 11 11 SITECON 191 Unipro UGENE User Manual Version 1 12 3 11 12 Smith Waterman Search The Smith Waterman Search plugin adds a complete implementation of the Smith Waterman algorithm http en wikipedia org wiki Smith Waterman to UGENE To use the plugin open a nucleotide or protein sequence in the Sequence View and select the Analyze gt Find pattern Smith Waterman item in the context menu The Smith Waterman Search dialog appears e r oa Eae Sean Smith Waterman parameters Input and output Sequence O Translation Smith Waterman algorithm parameters Algorithm version Scoring matrix Gap scores Results filtering strategy Advanced Gap extension 1 00 Minimal score 00 i First of all you need to specify the pattern to search for The
20. Optional Default Berg and von Hippel The following values are available e Berg and von Hippel e Log odds e Match e NLG Example ugene pwm build in COl aln out result pwm 12 2 23 Searching for TFBS with Weight Matrices Task Name pwm search Searches for transcription factor binding sites TFBS with position weight matrices PWM and saves the regions found as annotations Parameters seq semicolon separated list of input sequence files to search TFBS in String Required matrix semicolon separated list of the input PWM String Required out output Genbank file name name of the annotated regions String Optional Default misc feature min score minimum percentage score to detect TFBS Number Optional Default 85 strand strands to search in Number Optional Default 0 The following values are available e 0 both strands e 1 direct strand e 1 complement strand 250 Chapter 12 UGENE Command Line Interface Unipro UGENE User Manual Version 1 12 3 Example ugene pwm search seg input fa matrix Aro80 pwm Aftl pwm out res gb 12 2 24 Building Statistical Profile for SITECON Task Name sitecon build Builds a statistical profile for SITECON It can be later used to search for TFBS Parameters in semicolon separated list of input DNA multiple sequence alignment files An input file must not contain gaps String Required out output file If several
21. Plugins main menu item The Plugin Viewer window will appear 34 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 mg _ 4 a U New Project UGENE Plugin Viewer AE BI File Actions Settings Tools Window Help oe DMA Export Mame Description cl BALL 4 port of BALL Framework For m oa BioStruck3D Viewer Plugin Biological 3D Structure Viewer D 24 Bowtie An ultrafast mernory efficient sh 24 CUDA Support Utility plugin For CUDA enabled Description of a Chromaviewy Chromatograms visualization the selected pao Circular view Enables drawing of DNA sequen i 5 DNA Annotator This plugin contains routines ta pugin j K DAG Export Routines to export ar align multi 24 DNA GraphPack n This plugin contains a set of gra cl DMA Statistics On Provides statistical reports For s a Dotplot an Build dotplot For sequences 14 External tool supp On Runs other external tools a GORTY On GOR IU protein secondary struct a HMM Rased on HMMER 2 3 2 package The list of plugins iological sequence analysis usin MIM profile tools Plugin is base n 4 port of Kalign package For mul n 4 port of MUSCLE package For 14 ORE Marker On Finds open reading Frames ORF a Optimized Srith Waterr Various implementations oF Smit Routines to export or align multiple DNA sequences a Phrelip plugin n PHYLIP the PHYLogeny Inferenc a Primer 3 Oni Integrated tool For PCR
22. String Required inmodel input SITECON profile s If several profiles have been supplied searches with all profiles one by one and outputs merged set of annotations for each input sequence String Required out output Genbank file String Required annotation name name of the annotated regions String Optional Default misc feature min score recognition quality threshold The value must be between 60 and 100 Choosing too low threshold will lead to recognition of too many TFBS recognised with too low trustworthiness Choosing too high threshold may result in no TFBS recognised Number Optional Default 85 min err1 setting for filtering results minimal value of Error type Number Optional Default 0 12 2 CLI Predefined Tasks 251 Unipro UGENE User Manual Version 1 12 3 max err2 setting for filtering results maximum value of Error type Il Number Optional Default 0 001 st rand strands to search in Number Optional Default 0 The following values are available e 0 both strands e 1 direct strand e 1 complement strand Example ugene sitecon search in input fa inmodel profile sitecon out res gb 12 2 26 Fetching Sequence from Remote Database Task Name fetch sequence Fetches a sequence from a remote database Parameters db database to read from String Required in semicolon separated list of resource IDs in the database String Required
23. a NC_014287 features Q s NC_014287 sequence NC_014267 gb a NC_014267 features Q s NC_014267 sequence project uprj Bookmarks F NC_014287 NC_014287 sequence vf kad Ult Sa Wy NC 014267 NC_014267 sequence 7 RNA YP_003734570 1 trnV uac NC_014267 sequence BORG ce eae 301 90 JUK IES Q NC_014267 features NC_014267 gb lt U To load a saved project later select File Open and specify the path to the project file 4 13 Options Panel The Options Panel is available in the Sequence View and in the Assembly Browser By default it is closed To open a tab of the Options Panel click on the corresponding icon at the right side of a Sequence View or Assembly Browser window To close the tab click again on the tab icon Note that Ctrl key can be used to open several tabs at the same time In this case the tabs are shown on the Options Panel one after another More detailed information about different Options Panel tabs can be found in the following chapters e Options Panel in Sequence View Information about Sequence Search in Sequence Highlighting Annotations e Options Panel in Assembly Browser Navigation in Assembly Browser Assembly Browser Settings Assembly Statistic 4 14 Adding and Removing Plugins A plugin is a dynamically loaded module that adds a new functionality to UGENE To manage plugins select the Settings
24. a eee o oao ororoos 138 8 9 1 Assembly Overview Hotkeys 2 aa a a 138 8 9 2 Reads Area Hotkeys 2 2 a 138 10 11 Phylogenetic Tree Viewer a 140 9 1 Adjusting Tree Settings e 140 9 2 Adjusting Branch Settings e 141 9 3 Selecting Tree Layout 141 9 4 Modifying Labels Appearance e 142 9 4 1 Showing Hiding Labels 0 0 e 142 9 4 2 PUSHING Re 264 odds Po EA oe ee F 142 9 4 3 Changing Labels Formatting 2000000 eee 143 9 5 Zooming Mes bridas or 143 9 6 Working with Clade rada eee Se bee 143 9 6 1 Selecting Cd e ane GbR asas Ge ESS eee BES 144 9 6 2 Collapsing Expanding Branches 0 0 144 9 6 3 Swapping Siblings 2 2020202020020 2 144 9 6 4 Zooming Clade ae 2 cornada asa ADEE GG 145 9 6 5 Adjusting Clade Settings 2 a a a a a a a 145 9 7 Exporting Tree Image 145 9 8 Printing Wee 5 2 eee ee eee eee eee ek ee eee eee eee 145 Distributed Computing 1 2 ee ee 146 10 1 Remote Machines Monitor 0 0 eee ee ee 146 10 2 Running Workflows on Cloud 4 mo 148 10 2 1 Introduction a 148 10 2 2 Cloud Computing suse as 6626 od wwe eee we 148 10 2 3 Cloud Remote Machine 0 0 0 0 0 00000000 eee een 148 10 2 4 Launching Workflow 2 2 2 a e 149 10 2 5 Useful Tips and Recommend
25. alternatively maybe you would like to analyze a certain sequence part In this case you select the required data in the web browser window the Open selected in UGENE item should now appear in the context menu 262 Chapter 14 Tutorials Unipro UGENE User Manual Version 1 12 3 eh Ensembl genome browser x W D www ensembl org Homo_sapiens Export Output Gene db core flank3_display 0 flank5_disp W r Login Register TA AI CN E Search Human Q Human GRCh37 y Gene based displays Gene summary Splice variants 1 Supporting evidence sequence External references Regulation E Comparative Genomics Mp sean alignments Gene tree image L Gon tree text L Gene tree alignment Gene gain loss tree Ortholoques Paralogues Protein families Phenotype E Genetic Variation L Van ation table L Van ation image Structural variation E External data L Personal annotation E ID History L Gene history Y Configure this page 38 Add your data dh Export data Ft Bookmark this page Share this page Location 2 145 324 002 145 337 001 Gene ACO10090 1 Ml Er eee ei Export Gene Data gt EN5600000232606 ENS5T00000413525 cdna KNOWN lincRNA AGCTTCACATGTIGAGATAAATOCACTCAAAGATICCTCACAAGTAGCTCTITGGAGCTIIC AGATGIGAAATGGATCATTCCTICAATCTGIAATAGACCCTICIGIGAAGCICTITICAATCA SACCAGAGAATTCAAGAGTTTCCAACACCTAAGAGTGOOIATITGOCAAATGGTGGGCCAA AGGAATAAAGAAGGCATGCAAAACTCTIGACAGAAGACATTCAGAASATIGATITGATATC AGATACAAGGAGASAAATA
26. applied after restart here you can select UGENE localization Currently available localizations are EN RU CS and ZH The default value Autodetection specifies that UGENE should use the operating system regional options to select the localization This setting is applied only after UGENE is reopened Appearance defines the appearance of the application for example here is a part of the same dialog when the Cleanlooks appearance has been applied 4 16 UGENE Application Settings 87 Unipro UGENE User Manual Version 1 12 3 Appearance Window Layout Multiple documents O Tabbed documents Preferred Web browser System default browser 5 Specified executable Window Layout this option allows to control the behavior of windows multiple or tabs Preferred Web browser you can use either System default browser or specify some other browser Open last project at startup if the option is checked the last project is opened when UGENE is started Path to downloaded data specifies the path where files downloaded from the remote databases will be stored Enable statistical reports collecting collects information about UGENE usage and sends it to the UGENE team to help improve the application Note The collected information includes 1 System info UGENE version OS name Qt version etc 2 Counters info number of launches of certain tasks e g HMM search MUSCLE align The collected information D
27. at Copy K E E A plach k a i sa o Available formats are png jpg and bmp 6 5 8 Saving and Loading Dotplot To save a dotplot in a native format right click on the dotplot and select the Dotplot gt Save Load Save context menu item Go to position Ctrl G Save asimage 7 Select sequence region Ctrl A E AL New annotation Cran T Load wo Copy F Perl E h i The Save Dotplot dialog will appear A dotplot is saved in a file with the dpt extension Later the dotplot can be loaded using the Dotplot Save Load Load context menu item 6 5 9 Building Dotplot for Currently Opened Sequence To build a dotplot for currently opened sequences create a multiple view containing these sequences It can be arranged by dragging the corresponding sequence objects the items strated with the s into the same Sequence View Then right click on the created view and select the Analyze Build dotplot item in the context menu Every sequence from the current multiple sequence view can be used to build a dotplot Note If you need to compare a sequence with itself you can activate the menu from a single Sequence View 6 5 10 Comparing Several Dotplots Dotplots created for the same view are shown at the same view If the horizontal and vertical sequences of several dotplots are the same correspondingly it is possible to lock all zooming and navigating operations for these dotplots Press the Multiple v
28. en wikipedia org wiki NF CE BAB NFkB _ hetero The p50 NFKB1 p65 RELA heterodimer is the most abundant form of NF kB http en wikipedia org wiki RELA NFkB homo The c Rel protein is a member of the NF kB family of transcription factors and contains http en wikipediaaokRe wixmblbgy domain NN Nuclear transcription factor Y http en wikipedia org wiki NFYA Nrf2 Nuclear factor erythroid derived 2 like 2 http en wikipedia org wiki NFE2L2 Octamer transcription factor 1 http en wikipedia org wiki Oct 1 Oct al Octamer transcription factors http en wikipedia org wiki Octamer_ transcription factor p53 Protein 53 http en wikipedia org wiki P53 PPRF Paramedian pontine reticular formation http en wikipedia org wiki Paramedian_ pontine reticular formation Pul ls a protein that in humans is encoded by the SPI1 gene http en wikipedia org wiki SPI1 setCREB cAMP response element binding http en wikipedia org wiki CREB setCREBzag cAMP response element binding http en wikipedia org wiki CREB Continued on next page 186 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Table 11 1 continued from previous page SRE_san Serum response element http en wikipedia org wiki Serum response factor SRF Serum response factor http en wikipedia org wiki Serum__response_ factor STAT1 Signal Transducer and Activator of Transcription 1 http
29. quickSearch Quick Search Ferric Uptake Regulation http biocyc org ECOLI substring search type NIL amp object FUR amp quickSearch Quick Search GALR Galactose repressor http biocyc org ECOLI substring search type NIL amp object GALR amp quickSearch Quick Search GALS Galactose isorepressor http biocyc org ECOLI substring search type NIL amp Mobject GALS amp quickSearch Quick Search GLPR sn Glycerol 3 phosphate repressor http biocyc org ECOLI substring search type NIL amp object GLPR amp quickSearch Quick Search GNTP Is a member of the GntP family transporters http biocyc org ECOLI substring search type NIL amp object GNT P amp quickSearch Quick Search HNS Histone like nucleoid structuring protein http biocyc org ECOLI substring search type NIL amp object HNS amp quickSearch Quick Search Isocitrate lyase Regulator http biocyc org ECOLI substring search type NILMobject ICLR amp quickSearch Quick Search Integration host factor http biocyc org ECOLI substring search type NIL amp object IHF amp quickSearch Quick Search Continued on next page 188 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Table 11 2 continued from previous page lron sulfur cluster Regulator 1 http biocyc org ECOLI substring search type NIL amp object ISCR amp quickSearch Quick Search lron sulfur cluster Regulator 3 http biocyc org ECOLI substring sea
30. save dir directory to store sequence files loaded from the database String Optional Example ugene fetch sequence db PDB 1d 3INS 1CRN 12 3 Creating Custom CLI Tasks The predefined tasks are actually the Workflow Designer schemas stored in the SUGENE data cmdline di rectory Follow the instructions in the Workflow Designer Manual http ugene unipro ru documentation html on how to create a schema and to run it from the command line You may also find useful the following video tutorial devoted to the creating of a custom console command e Creating custom console command MUSCLE alignment with various output format http www youtube com watch v ZfxmX_20t5M 252 Chapter 12 UGENE Command Line Interface 13 APPENDIXES 13 1 Appendix A Supported File Formats Note UGENE is able to read and write files compressed with Unix Linux gzip utility You don t have to unpack the files 13 1 1 Specific File Formats ABIF abl abi abif A chromatogram file format See also Chromatogram Viewer ACE A file format for storing data about genomic contigs See also Alignment Editor Bairoch bairoch A file format to store enzymes See also Restriction Analysis bam Binary compressed SAM format See also Assembly Browser la A multiple sequence alignments MSA file format See also Alignment Editor EBWT ebwt A Bowtie prebuilt index file See also Bowtie EMBL em emb embl A rich format for storin
31. 27 954 4 216 bp To scroll the resized overview drag the mouse while pressing down the mouse wheel To learn about available hotkeys refer to Assembly Browser Hotkeys 8 2 6 Ruler and Coverage Graph Description The Ruler shows the coordinates in the Reads Area When you move the mouse cursor in the Reads Area the coordinate of the selected location with the coverage of reads is shown on the ruler in dark red The Coverage Graph shows the exact coverage of the sequence at each position For example on the image below the coordinate is 9168 and the coverage of reads is 251 aak ak 9 1k 9 166 C251 9 3k A To show hide the coordinates on the ruler you can click the following button on the toolbar tH Ek To show hide the coverage on the ruler you can click the following button on the toolbar El100 Alternatively you can use the Show coordinates and Show coverage under cursor check boxes located on the Assembly Browser Settings tab of the Options Panel 8 2 Browsing and Zooming Assembly 129 Unipro UGENE User Manual Version 1 12 3 8 2 7 Go to Position in Assembly To go to the required position in an assembly use the following field located on the Assembly Browser toolbar Input the location and click the Go button A similar Go field is also available on the Navigation tab of the Options Panel 8 2 8 Using Bookmarks for Navigation in Assembly Data Use bookmarks to save and restore visual state of an assembly for exam
32. 3 UGENE c_elegans_test as I Y File Actions Settings Tools Window Help SU rla amp o Assembly Browser Settings Reads Area Scrolling can be optimized by drawing only reads positions without content while scrolling Optimize scrolling 290 to 328 39 bp dE 1 Project Show pop up hint Consensus rea Consensus algorithm Default Difference from reference Show coordinates Show coverage under cursor To learn more about Reads Area settings refer to the Reads Area Settings chapter To learn more about Consensus see the Consensus Sequence chapter To learn more about Ruler see the Browsing and Zooming Assembly chapter 8 8 3 Assembly Statistics The Assembly Statistics tab includes the following Assembly Information e Name the name of the opened assembly e Length the length of the assembly e Reads the number of reads in the assembly Also the tab can include the Reference Information if it is available in the assembly file For example e MD5 e Species e URI 8 8 Options Panel in Assembly Browser 137 Unipro UGENE User Manual Version 1 12 3 UGENE test as I Q File Actions Settings Tools Window Help Gok amp g 1655 Assembly Information Name II Length 15279 316 Reads 81756 E 1 Project 7 Reference Information 8 9 Assembly Browser Hotkeys 8 9 1 Assembly Overview Hotkeys The followin
33. 4k 43 ak 36 35 Pe ee ee RR RRR R RR S hC DL 363 features murine gb Lil gt Note The Circular Viewer is opened automatically when the Sequence View is opened for a plasmid The inner circle represents the sequence clockwise and the scale marks show the corresponding sequence positions The sequence annotations are represented as curved colored regions at the outer side of the circle The Circular Viewer helps to navigate within the sequence You can select an annotation on the circular view and the annotation will also be focused and highlighted in all Sequence View areas Sequence overview Sequence zoom view Sequence details view and Annotations editor 6 1 Circular Viewer 83 Unipro UGENE User Manual Version 1 12 3 You can also select a sequence region WP 97706 Z NP_597742 2 a NP 040335 1 NC_001363 sequence NP_040336 1 C NP_597744 1 5333 bp B 040336 1 GD c 4 EM i Ln R T EL EL T nn ma LP TEIA This will also affect the Sequence View mesecni Note that the circular view is zoomed automatically when the Circular Viewer area is resized mise binding i repast regon N N Resizing the circular view area is resizing the sequence circular view B ee Y E El Z C9 YSCPLASH features plssmid_2rricron_doublet gb i JE 84 AAEE T s spia MENA ANA 7 MENA ANA Jold_sequence Pal Chapter 6 Sequ
34. 5k MRNA 7 AABS9342 1 _ mRNA The 3D Structure Viewer adds 3D visualization for PDB and MMDB files d 30 Structure Viewer Active view 1 1 6C E E a Day Urs Add F Z VEC chain 1 sequence amina n a8 W F belazsir feb Pep pipha helik bas siphsheliz peas 35 S a5 i00 108 110 118 120 125 130 135 140 145 150 155 160 165 MPNQNANTHIVEV EH SA SN GS SS LVAATDT CVNS CCANVVTHSLOSSCSTTTE PNALNTHYNNGVLLIAAAGHAGD SS Y X j gt TIRTETATE TTT Unipro UGENE User Manual Version 1 12 3 The Chromatogram Viewer adds support for chromatograms visualization and editing F 90 JRI 06 sequence BS E I Ia La CH A S E A 12 4 6 8 10 12 14 16 16 20 22 24 26 26 50 32 34 36 50 40 42 44 46 46 50 52 54 56 5060 CCACACGTGOGETATECGTAACECTCTGEOGACETEGECECTEGTEGGGATACACGEGTCATACGA T H Y R Y LC L S V 2 S gt S LC i gt I D Co 9 E ji gt Name value F Annotations MyDocument_9 gb o Y misc Feature 0 1 L misc_ Feature 16 34 The DNA Graphs Package shows various graphs for sequences 9 AY027935 standard sequence S T ii cx 13 ARA Y Informational Entropy 11814 19693 Window 500 Step 50 11814 125k 13k 135k 14k 145k 15k 155k 16k 165k 47k 1475k 18k 185k 19k 19693 S 2 P A E E T L K I C E S V R F QO N CTTCCTECCCCGCCGAATTITTACGTTAAAAATITGTGTTTTTTCAGTTAGATITCAAAATA SHA SAS o SS ttn n A 12 4 6 6 10 12 14 16 18 20 22 24 26 26 30 32 34 36 36 40 42 44 46 46 50 52 54 56 586
35. 6 10 12 14 16 18 20 22 24 26 23 30 RM 36 38 40 42 CCAGATTCAGTTCCTTTAATAAAGAGATTAATTTCAATATTAA Mame gt Value C9 Annotations result gb a 19 40 the_qualifier DDS Ea 10 20 the _qualifier pp 5 10 9 Exporting Annotations Open the Sequence View with document that contains annotations Select a single or several annotations or annotation groups in the Annotation editor select the Export gt Export annotations context menu item The Export Annotations dialog will appear File format csv E Save sequences under annotations E Save sequence names Here you can set the path to the file choose the file format and optionally for CSV format you can save the sequence along with annotations and save sequence names 78 Chapter 5 Sequence View 6 Sequence View Extensions The functionality of the Sequence View can be significantly increased with Sequence View Extensions Below is the demonstration its functionality The Circular Viewer shows the circular view of a sequence mRNA AABS9343 1 mRNA repeat_region misc_binding misc_recomb old_sequence MRNA al AAB59340 1 exon SCPLASM sequence 6318 bp a Conflict a OXON sae conflict a MRNA ss AAB59340 1 Ls Fepeat_region conflict ON conflict 5 conflict conflict conflict conflict en _ thisc_recomb mRNA 7 U conflict mRNA 7 A mRNA mRNA 44659341 1 misc_binding Po Sit R mRNA OS 1 a mRNA 1
36. CDS 5048 5203 al 7 misc_feature 0 2 e Bl misc_feature 2 90 t Bl misc_feature 5245 5833 a po source 0 1 gt E source 1 5833 ka U L If you want to see all annotation types click the Show all annotation types link Find below information about annotations types properties that you can configure Annotations Color To change a color of all annotations of a certain type click on the corresponding color box in the annotations types table and select the required color in the appeared Select Color dialog Annotations Visability You can show hide annotations of a certain type by selecting the type in the annotations types table and check ing unchecking the Show annotations of this type check box Show on Translation This option is available for nucleotide sequences only It specifies to show the annotation on the corresponding amino sequence instead of the original nucleotide sequence in the Sequence Detailed View for example 5 10 Manipulating Annotations 71 Unipro UGENE User Manual Version 1 12 3 R F T EK V E WATTCACCAAAGTTIGAAA Jp _ _ 1 68 10 12 14 16 16 20 22 24 PETAAGTGOTTTEAACTTT I W L L E S E gt E N E You can enable disable this option by checking unchecking the Show on translation checkbox Captions on Annotations It is possible to show a value of a qualifier of an annotation instead of the annotation type name in the Sequence Zoom View To enable this option for an annotation
37. Ctr1 C To copy one or several sequences do the following e Select the sequences in the Sequence list area e Select the Copy Copy selection context menu item in the Sequence area or use hot key combination Note that if you activate context menu in the Sequence list area you will lose your current selection Unipro UGENE User Manual Version 1 12 3 Copy consensus Copy consensus with gaps To copy consensus sequence use the Copy Copy consensus item Sorting Sequences To sort sequences by name in the alphabetical order choose the View Sort sequences by name item from the Actions main menu or the context menu 7 2 Working with Alignment 119 Unipro UGENE User Manual Version 1 12 3 7 3 Advanced Functions This chapter is devoted to the advanced functions of the Alignment Editor You will learn how to build a grid profile export a picture of an alignment and build HMM profiles 7 3 1 Grid Profile Using the Alignment Editor you can create a statistic profile of a multiple sequence alignment The alignment grid profile shows positional amino acid or nucleotide counts highlighted according to the frequency of symbols in a row To create a grid profile use the Statistics Generate grid profile item in the Actions main menu or in the context menu To learn more about this feature refer to the DNA Statistics plugin documentation 7 3 2 Exporting Image To export an alignment as image click the Export as
38. Document Save to file UGENE trunk data samples FASTA human_T1_copy1 fal File format Compress file Add to project R Here you may select the name of the output file in the Save to file field and optionally choose the format of the output file in the File format field Use the Compress file checkbox to compress the file The Add to project checkbox checked by default adds the output file to the current project After choosing all parameters click the Export button 4 7 Exporting Documents 8 Unipro UGENE User Manual Version 1 12 3 4 8 Locked Documents The lock icon in the document element indicates that the document can t be modified 7 ey ck icon a 1CF7 chain 1 annotation Z 5 1CF7 chain 2 sequence a 1CF7 chain 2 annotation UGENE does not allow modification of some formats that were created not by UGENE If UGENE is able only to read a document see the Supported File Formats chapter you can export the document objects to a file To do it use the built in export utilities Also you can export the document objects of unlocked documents 4 9 Using Objects and Object Views The document always contains one or more objects An object is a structured biological data that can be visualized by different Object Views A single Object View can visualize one or several objects of different types For example a single view can show a sequence annotations for the sequence 3D model for the part of t
39. E Create index File 22 DNA Assembly O 001363 sequence oe Weight matrix K enn 1 5k Es HMMERZ tools gt HMM build NP D yy G SITECOM d HMM calibrate n Ma HMMERS tools K HMM search A Workflow Designer pe TR E E T P P We highly recommend reading the original HMMER2 documentation http hmmer janelia org documentation to learn how to use utilities provided by the plugin Note SSE2 algorithm is implemented by Leonid Konyaev Novosibirsk State University Use of the SSE2 opti mized version of the HMM search algorithm with quad core CPU gives gt 30x performance boost when compared with the original single threaded algorithm single sequence mode 11 13 1 Building HMM Model HMM Build HMM build tool is used to build a new HMM profile from a multiple alignment You can use any alignment file formats supported by UGENE The output HMM profile format is compatible with the HMMER2 package 11 13 HMM2 2 Unipro UGENE User Manual Version 1 12 3 U HMM build EJES Multiple alignment File Ce genomes lignmentiCOl na aln File to save HMM profile Ce Genomes HMM 00 hmm lend _ Expert options Default Arms behaviour Emulate hmmfs behaviour Emulate hmms behaviour Emulate hramsw behaviour Note The HMM build tool does not automatically calibrate a profile Use the HMM calibrate tool to calibrate the profile 11 13 2 Calibrating HMM Model HMM Calibrate The HMM
40. ECOLI substring search type NIL4object Crp quickSearch Quick Search Cysteine B http biocyc orgyECOLI substring search type NIL4object CysB quickSearch Quick Search Cytidine Regulator http biocyc org ECOLI substring search type NIL4object CytR quickSearch Quick Search Continued on next page 11 11 SITECON 187 Unipro UGENE User Manual Version 1 12 3 Table 11 2 continued from previous page Deoxyribose Regulator http biocyc org ECOLI substring search type NIL4object DeoR quickSearch Quick Search DnaA is the linchpin element in the initiation of DNA replication in E coli http biocyc org ECOLI substring search type NIL4object DnaA quickSearch Quick Search Fatty acid degradation Regulon http biocyc org ECOLI substring search type NILMobject FadR amp quickSearch Quick Search fis Factor for inversion stimulation http biocyc org ECOLI substring search type NILMobject fis amp quickSearch Quick Search FINDC Operon that encodes two transcriptional regulators http biocyc org ECOLI substring search type NIL4object FIhNDC quickSearch Quick Search Fnr FNR is the primary transcriptional regulator that mediates the transition from aerobic to http biocyc org EO6Yrbfsabstowbph through the regulation of hundreds of genes search type NIL4object Fnr quickSearch Quick Search Frur Fructose repressor http biocyc org ECOLI substring search type NIL4object Frur
41. EE ay te Diao a a U Pe 0 Sas L B ced L L E a oa Ker L ae la Tang E a oy wee Mao La E E Ya a gt f NGA AP AA gn PN tk uw ES a aoe 1 be mio s 1 ART qa 7 2 Li k B 8 la I tm KT A or 1 cam ip Sr ies La ae we Feats zee ee o cet c E A ges E RTA L S o E eae z a TAE E Ja Pieni 7 E jar d a LL T Xe ep ye vay S xs a RAE Ba R ae Pret et 7 F Pa A YEAR 0 T a e apa E ba a rig Po ee ae S S N x trade KR A Ee A A 7 O0k ar ae ee 5 anes feo oth oc k 9 P HT d R T T a oe UI SE RE ote a L a NS ey x f Ea Go g TN ANE ARTH 5 Por oat Ienaga co S 5 1 a T go S a A e ol i z RH a cog y z oo cn le z Z Man mA en A oa au E a TR KT Tig ijale b edie opi SK IR een S GAEL i P ee eg 100k Y we py RAE A paea Air Le a Gee A ao eee a 3 po T vrs a ogee gt KS S m ai r N I N o Re e as da N E 1 a R 1 i Ea a a U 3 Hoot ne male A i iea E y e a N E y st 1 0 PM 5 as au setae ames a e T ERT T i T me k p e A E T if ja ee i we wien DBL VE M he N 1 110k PAET Dia ee ee E eink H R eand A a apes se mm K uo te a K Ked M 7 Ra ei d P S Ie Po ee EPs oe oie ooo X Lk i po or Te am Ba a o Ke R A E eet we a A 64 ee vC Be KG K UCI Te co Bs art ae Uh Le ae th oct O A Kh e a a Jesse a R IL AE 116k NC_014267 sequence min length 11 identity 100 Note The Dotplot plugin uses the Repeat Finder plugin to build a dotplot make sur
42. Genbank file with the annotations String Required schema UQL schema String Required merge if true merges regions of each result into a single annotation Boolean Optional Default false offset if merge is set to true specified left and right offsets for merged annotations Number Optional Default 0 244 Chapter 12 UGENE Command Line Interface Unipro UGENE User Manual Version 1 12 3 Example ugene query in input fa out result gb schema RepeatsWithORrF uql 12 2 12 Building Bowtie Index Task Name bowtie build Builds a Bowtie index using a reference sequence The index can later be used to align short reads to the reference sequence Parameters ref reference sequence file String Required ebwt name of the index The index is stored as a set of 6 files with suffixes 1 ebwt 2 ebwt 3 ebwt 4 ebwt rev 1 ebwt rev 2 ebwt String Required Example ugene bowtie build ref ref fa ebwt refindex 12 2 13 Aligning Short Reads with Bowtie Task Name bowtie Aligns short reads to a reference sequence with Bowtie using its pre built index Parameters reads semicolon separated list of input short reads files String Required ebwt Bowtie index file String Required out output file String Required format format of the output file String Optional maqerr maximum permitted total of quality values at al mismatched read positions thro
43. H F I GAACGCGAATGCCTOTCTCTCTTTCGATGGGTATGEOCAATTGTECACAT 12 4 6 B 10 12 14 16 18 202 M 5 8 eM N a a a CTTGCOGCOTTACEGAGAGAGAGAAAGOTACCCATACGGTTAACAGGT TA F A F A E R E K 5 B Y A LO F E H R E R E L H T H S N Nare Value 0 AF177870 standard Features AF177870 0mb After the view is opened you can see a set of new buttons in the toolbar area The actions provided by these buttons are available for all sequences opened in the view In the picture below these buttons are pointed by the Global actions arrow Below the toolbar there is an area for a single or several sequences For each sequence a smaller toolbar with actions for the sequence and the following areas are available Sequence overview Shows the sequence in whole and provides handy navigation in the Sequence zoom view and the Sequence details view Sequence zoom view Provide flexible tools for navigation in large annotated sequence regions Sequence details view A supplementary component of the Sequence overview It is used to show sequence content without zooming Annotations editor Contains tools to manipulate annotations for a sequence 47 Unipro UGENE User Manual Version 1 12 3 UCET Project UGENE AF177870 AF177870 standard sequence U Fie Actions Settings Tools window Help Global actions BOB iA 2 laaa a aa GTL C2 AF177670 standard sequence The sequence name S Tp ii CHITE a Sequence overview i 14k 16k 1 2k 22k nn a mk 400 500 ag
44. Logging OpentL o Workflow Designer Genome Aligner External Tools You can select path for external tools package MakeBLASTDB 2 BlastX BlastP BlastN 4 BLAST FormatDB BlastAll E BWA Bl CAPS G ClustalO ClustalW i MAFFT DDD 38 98 98 3 3 OOOO xxl 32 The Sasic Local A gament Seano Tool BLAST finds regions of local similarity between sequences The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families ox J cancel 46 Chapter 4 Basic Functions 5 Sequence View 5 1 Sequence View Components The Sequence View is one of the major Object Views in UGENE aimed to visualize and edit DNA RNA or protein sequences along with their properties like annotations chromatograms 3D models statistical data etc For each file UGENE analyzes the file content and automatically opens the most appropriate view To activate the Sequence View open any file with at least one sequence For example you can use the SUGENE data samples EMBL AF177870 emb file provided with UGENE After opening the file in UGENE the Sequence View window appears db OO H AT te i tn C2 APITTETO standard sequence D G L 5 L 5 H G
45. Minimum repeat length value Such value will be set that there will be about 1000 repeats found Repeats identity specifies the percents of the repeats identity Press the 100 button to set the 100 identity After the parameters are set press the OK button The dotplot will appear in the Sequence View as 8er1O ON o 4 U ats E Z O os i oo aN F as ct he fof Sli x ea gO R SR eta Ge A Pees ae a U X ata T a oo wh oa 70 o e AA azuanb Z H Ha E A E PO HUT Z Zr X Z YT BODOM aR et A AA 10k 20k 30k 40k 50k 60k FOk 20k 90k 100k 110k 120k 140 425 Y L L FP E V N D F F F S E L V E I TCGGTACTATTACCECGAGGTTAACCATECTACGGTTTACGTAACGTTEGTCGAAATT 12 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 36 38 40 42 44 46 48 50 53 It is a two dimentional plot consisted of dots 6 5 Dotplot 103 Unipro UGENE User Manual Version 1 12 3 Each dot on the plot corresponds to a matched base symbol at the x position of the horizontal sequence and the y position of the vertical sequence Visible diagonal lines indicate matches between sequences in the given particular region See also e Interpreting Dotplot Identifying Matches Mutations Ivertions etc e Building Dotplot for Currently Opened Sequence 6 5 2 Navigating in Dotplot To zoom in zoom out a dotplot you can e Rotate the mouse wheel e Press corresponding zoom buttons located on the left To move the zoomed region you can dl 5
46. Multiple sequence alignment with MUSCLE Parameters in semicolon separated list of input files An input file can be of any format containing sequences or alignments String Required out output ClustalW file String Required Example ugene align in 14 3 3 sto out 14 3 3_aligned aln 12 2 20 Building PFM Task Name pfm build Builds a position frequency matrix from a multiple sequence alignment file Parameters in semicolon separated list of input MSA files String Required out output file String Required t ype type of the matrix Boolean Optional Default false The following values are available e true dinucleic type e false mononucleic type Dinucleic matrices are more detailed while mononucleic ones are more useful for small input data sets 248 Chapter 12 UGENE Command Line Interface Unipro UGENE User Manual Version 1 12 3 Example ugene pim build 1in C0L aln out res lt pin 12 2 21 Searching for TFBS with PFM Task Name pfm search Searches for transcription factor binding sites TFBS with position weight matrices PWM converted from input position frequency matrices PFM and saves the regions found as annotations Parameters seq semicolon separated list of input sequence files to search TFBS in String Required matrix semicolon separated list of the input PFM String Required out output Genbank file name name of the annotated regions
47. Protists Mouse Genome Informatics VEGA Click on the map to view the list of databases DATABASE SEARCH Search by type Search by organism Search by database name A Z K Genome K Gene annotation gt Protein sequence and structure Interaction and pathways Gene expression Notice that an example Ensembl ID below the search bar is highlighted it has a light blue background Current version of the UGENE extension allows detecting the following types of identification numbers 1 Ensemble Gene ID 2 Ensembl Protein ID 3 PDB ID Right click on the ID and select Open in UGENE item in the context menu 14 1 Using BioMart with UGENE 261 Unipro UGENE User Manual Version 1 12 3 BioMart Central Portal Home IDENTIFIER SEARCH 690 Examples KRAS ENSGOSS Open link in new tab Open link in new window Open link in incognito window Save link as Gene retrieval converter Copy link address TD openinucene Cancer genes Ensembl Ensembl Bacteria Ensembl Fungi Ensembl Metazoa Ensembl Plants Ensembl Protists Mouse Genome Informatics VEGA The sequence with the selected ID will be opened in UGENE 14 1 6 Opening selected data in UGENE Imagine that you have browsed for required data e g a sequence with annotations and opened for example an html view for the data in a web browser Now you would like to open the data in UGENE to analyze them in more detail Or
48. RESUS ari bao amp bod A hbo o A ad rae ay GOS See Woe we E iad 174 Molecular Cloning in silico o 1 o o e 175 11 9 1 Disestine nto Fragments mi sorori i Ea a R E dad e OS 175 1192 Creating Fragment ie lt e ata RES ee e e a 176 119 3 Construetine Molecule se 44 tesis awed 2 ds A Be Heed 177 Secondary Structure Prediction 2 2 181 SITECON 0 enteres lic li a ia es R T E E 183 11 11 1 SITECON Searching Transcription Factors Binding Sites 184 11112 Types of SITEGCON Models q 42 4 2 ir ao at a ie 185 Smith Waterman Search o 192 HMM2 Z 26 E E arar ida 195 11 13 1 Building HMM Model HMM Build 00 195 11 132 Calibrating HMM Model HMM Calibrate o 196 11 133 Searching Sequence Using HMM Profile HMM Search 197 AMM 245 od A A ee a ee oe 198 11 141 Building HMM Model HMM3 Build o o e 198 11 142 Searching Sequence Using HMM Profile HMM3 Search 199 11 143 Searching Sequence Against Sequence Database Phmmer Search 201 UMUSCLE a ia dt a Biles Wee RS Ree eh A AAA 203 11 15 1 Aligning with MUSCLE 0202020000 000020000004 203 11 15 2 Aligning Profile to Profile with MUSCLE 0 0 204 11 15 3 Aligning Sequences to Profile with MUSCLE 204 BOW TICS ee rote te ae ete es eee E ates uae ae
49. Reference sequence DNA sequence to which short reads would be aligned to This parameter is required Index file name file to save index to This parameter is required Reference fragmentation this parameter influences the amount of parts the reference will be devided It is better to make it bigger but it influences the amount of memory used during the alignment Total memory usage shows the total memory usage System memory size shows the total system memory size 218 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 19 CAP3 CAP3 CONTIG ASSEMBLY PROGRAM Version 3 http genome cshlp org content 9 9 868 full is a sequence assembly program for small scale assembly with or without quality values Click this link http seq cs iastate edu to open CAP3 homepage CAP3 is embedded as an external tool into UGENE Open Tools DNA assembly submenu of the main menu s Align short reads Test runner 2 Build index ITET aie os bit 4 SITECON a Convert UGENE Assembly data base to SAM format si Multiple alignment e H dh Select the Contig assembly with CAP3 item to use the CAPS The Contig Assembly With CAP3 dialog appears U Contig Assembly With CAP3 Base Advanced Input files long DNA reads to assembly Remove Remove All Result contig You can add or remove input files using Add and Remove buttons To remove all files click the Remove all butt
50. See instructions below on how to install it 3 UGENE must be launched 14 1 2 Installing UGENE extension on Google Chrome To install UGENE extension on Google Chrome 1 Open Extension settings in Google Chrome you may input chrome extensions in the address bar to do it 2 Open Chrome directory from the UGENE Web Browsers Extensions Package that there is on the Download page http ugene unipro ru download html 3 Drag the ugene crx from the Chrome directory to the Extensions settings page and click Add in the confirm dialog 257 Unipro UGENE User Manual Version 1 12 3 y e Extensions gt C ichromey extersions E Chrome Extensions C Developer mode History Extensions Boo You have no extensions Want to browse the gallery instead Settings Add UGENE Help It can s Access your data on all websites Access your tabs and browsing activity 14 1 3 Installing UGENE extension on Mozilla Firefox To install UGENE extension on Mozilla Firefox open Add ons Manager and select Install Add on From File item in the settings menu 258 Chapter 14 Tutorials Unipro UGENE User Manual Version 1 12 3 Firefox gt JE Add ons Manager Check for Updates View Recent Updates You don t have any Y Update Add ons Automatically Reset All Add ons to Update Automatically Learn Gre grar gra In the browse dialog select ugene xpi file that you can find in the Firefox directory of the
51. UGENE Web Browsers Extensions Package that there is on the Download page http ugene unipro ru download html 14 1 4 Opening data found using BioMart in UGENE For now there are two options to open data found using BioMart in UGENE 1 Open data by ID for example by an Ensembl ID 2 Open selected data 14 1 5 Opening BioMart data in UGEBE by ID Let s open web site http www biomart org 14 1 Using BioMart with UGENE A Unipro UGENE User Manual Version 1 12 3 T il BioMart x J j k CL www biomartorg biossssmart Community Publications News Bio Portal 40 databases in 4 continents and growing Version 0 8 UK 23 USA 8 France 4 Canada Chile aie Proceed to Portal Italy l China 2 Proceed to Portal South Korea 1 Japan Project K Version 0 8 Click for example on the Proceed to Bio Portal link The following page will appear Proceed to Portal gt 260 Chapter 14 Tutorials Unipro UGENE User Manual Version 1 12 3 E gt D central biomart org BioMart Central Portal Home ES E EEE II IDENTIFIER SEARCH BioMart CENTRAL PORTAL POS Databases 41 p Go Examples KRAf ENSG00000148648 Toots CANADA 1 w Gene retrieval Variant retrieval Sequence retrieval ID converter carmel lae STATES Cancer genes DE Ensembl L SOUTH KOREA 1 Ensembl Bacteria Ensembl Fungi Ensembl Metazoa Ensembl Plants Ensembl
52. Unipro UGENE User Manual Version 1 12 3 8 2 5 Assembly Overview Description The Assembly Overview shows a coverage overview of the assembly The longer the depth of a line in the overview and the deeper the color the more reads are located in this region To open a region of the assembly in the Reads Area click on it in the Assembly Overview On the overview the selected region is displayed either as a gray rectangle a red cross or a red rectangle For example If you hold Shift and select a region on the overview the overview is zoomed to the selection Note that when the Assembly Overview is in focus and you use either the zoom buttons on the toolbar the zoom items in the Actions main menu or a mouse wheel the Reads Area is resized appropriately The Assembly Overview can also be resized To zoom in the overview select either the Zoom in or the Zoom in 100x item in the Assembly Overview context menu You can scroll the resized overview by dragging the mouse while pressing down the mouse wheel To zoom out the overview select the Zoom out item in the context menu The Restore global overview item in the context menu restores the default overview size when the whole contig overview is shown Notice that the Assembly Overview shows the coordinates of the assembly areas visible in the Reads Area and in the Assembly Overview Reads Area coordinates 25 382 ta 26 031 650 bp Assembly Overview coordinates 23 738 to
53. User Manual Version 1 12 3 439 445 450 455 460 465 470 475 480 485 490 495 500 505 450 455 470 475 asinos NON A low complexity region is a region produced by redundancy in a particular part of the sequence It is represented on a plot as a rectangular area filled with the matches 4 Low complexity regions 30 240 250 260 270 280 290 300 310 320 330 260 257 aJnas 300 nS pa a a e A E Hint Compare sequence with itself to easily find low complexity regions in it 6 5 6 Editing Parameters It is possible to edit parameters of a built dotplot Right click on the dotplot and select the Dotplot Parameters context menu item E 7 U l Parameters L sa 3 Go to position Ctrl 6 savedoad l a Select sequence region Ctri aA Remove S A New annotation Ctrl Hn a Copy K ms a The parameters dialog will be re opened See description of the available parameters here 6 5 Dotplot 107 Unipro UGENE User Manual Version 1 12 3 6 5 7 Saving Dotplot as Image To save a dotplot as image right click on the dotplot and select the Dotplot Save Load Save as image context menu item S mre ied z L S a N S z a E Cea es V ao ri fe oe Goma a E a E mo S A gt a 5 V x N z 3 E E Go to position Ctrl 6 a asimage 7 i Select sequence region Ctrl A a AG New annotation CHIN C pa
54. active tasks 4g The Export Selected Sequences dialog will appear 28 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 Export to file E UGENE trunk data samples FASTA human_T1_new fal reformat to use Export with annotations Add document to the project Convertion options Save direct strand Save complement strand O Save both strands E Translate to amino alphabet Save all amino frames Use custom translation table 1 The Standard Genetic Code Merge options S Save as separate sequences Merge sequences Add gap symbols between sequences Here you can select the location of the result file and a sequence file format You can choose to add newly created document to the current project and export sequence with annotations or without annotations To do it check the corresponding checkboxes Use the Conversion options to choose a strand for saving sequence s Also you can translate sequence s to amino alphabet Also it is possible to specify whether to merge the exported sequences into a single sequence or store them as separate sequences If you merge the sequences you re allowed to select the gap symbols between sequences This is the length of the insertion region between sequences that contain N symbols for nucleic or X for protein sequences 4 10 2 Exporting Sequences as Alignments Suppose we want to interpret FASTA file as multiple alignment To do this select a single or several se
55. align Whole alignment Warning By default UGENE does not rearrange sequence order in an alignment but the original MUSCLE package does To enable sequence rearrangement uncheck the Do not re arrange sequences stable option in the dialog 11 15 uMUSCLE sui Unipro UGENE User Manual Version 1 12 3 One of the improvements to the original MUSCLE package is the ability to align only a part of the model When the Column range item is selected the region of the specified columns is only passed to the MUSCLE alignment engine The resulted alignment is inserted into the original one with gaps added or removed on the region boundaries Note To visually select the column range to align make a selection in the alignment editor first Then invoke the MUSCLE plugin Its column range boundary values will automatically match the given selection 11 15 2 Aligning Profile to Profile with MUSCLE The Align Align profile to profile with MUSCLE context menu item allows to align an existing profile to an active alignment During this process the MUSCLE does not realign the profiles but inserts columns with gaps characters only characters For example the alignment in the picture below could be used as a profile en Y a le ose ss oe Consensus S ATGGCACATCCE LK BLA TE TAG G a T TCCAAGA cGe T L C T C6 C i H D 0 1 4 6 16 2 2 24 26 28 N 2 uw 36 33 40 a3 Loach 14
56. and qualifiers can be deleted using the Delete key To remove an annotation object from the active view select the object in the Annotations editor and press the Shift Delete Note that the object will not be removed from the project but just from the active Sequence View To add object again just drag and drop it to the Sequence View 5 10 8 Importing Annotations from CSV It is possible to import annotations for a sequence from an annotations table stored in the CSV format To import annotations from a CSV file right click on a Project View and select Import gt Import annotations from CSV The following dialog box will appear U Import annotations from CSV Ela File to read GJ Results Result file sl File format Genbank w Add result file to project Column separator value L hex 2c length 1 File parsing a TES O Soipt Edit First lines to skip Do not skip Ship all lines starts vith the text Interpret multiple separators like a single separator try when separator is a whitespace character Remove quotes Default annotation name misc_feature Raw file preview Basically you need to specify the file to read annotations table from required File to read D projects dev ugene trunk test _common_data scenarios annotations_import anns 1 csw 5 10 Manipulating Annotations 75 Unipro UGENE User Manual Version 1 12 3 And the format of and the path to the file to write
57. calibrate tool reads a HMM profile file scores a large number of synthesized random sequences with it fits an extreme value distribution EVD to the histogram of those scores and re saves the hmm file including the EVD parameters To avoid modification of the original HMM file you can select a new location for the calibrated profile U HMM calibrate El x HMM file D rGoenormesrhrnmrmiFn3 hmm l T Expert options Fix the length of the random sequences to Mean length of the synthetic sequences Humber of synthetic sequences Standard deviation Random seed _ Save calibrated profile to File PT 196 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 13 3 Searching Sequence Using HMM Profile HMM Search The HMM search tool reads a HMM profile from a file and searches the sequence for significantly similar sequence matches The sequence must be selected in the Project View or there must be an active Sequence View window opened If the selected sequence is nucleic and the HMM profile is built for amino alignment the sequence is automatically translated and all 6 translations are used to search in If a HMM profile is built for nucleic alignment the search is performed for both strands direct and complement U HMM search El E File with HMM profile D fGenomes hrom ina Anar loud Save annotations bo 2 Create new table Ci MyDocument gb land Annotation parameters Group name Ann
58. cost of speed especially for short reads 32bp Quality threshold q parameter for read trimming Barcode length B length of barcode starting from the 5 end When the specified length is positive the barcode of each read will be trimmed before mapping and will be written at the BC SAM tag For paired end reads the barcode from both ends are concatenated Colorspace color the input is read in colorspace colors are encoded as characters A C G T A blue C green G orange T red Long scaled gap penalty for long deletion L long scaled gap penalty for long deletion Non iterative mode N disable iterative search All hits with no more than Max diff differences will be found This mode is much slower than the default Select the required parameters and press the Start button 11 17 2 Building Index for BWA To build BWA index select the Tools DNA Assembly gt Build Index item in the main menu The Build Index dialog appears Set the Align short reads method parameter to BWA The dialog looks as follows U Build Index NIG Align short reads method y Index flename Index algorithm a L Colorspace c There are the following parameters Reference sequence DNA sequence to which short reads would be aligned to This parameter is required Index file name file to save index to This parameter is required Index algorithm a Algorithm for constructing BWT index Avai
59. e TITA Export pn I K Remove K A Rulers K Name Annotations highlighting ES cy NC 001363 Features murine gb RIRI 9 Oa RY AS Wi 4 For details see the next sections of the documentation 82 Tl sa a il aa Ctrl F Ctrl 5hiFt F Find pattern Find pattern Smith Waterman Find ORFs Find annotated regions Build dotplot Find repeats Find tandems Find query designer pattern Find restriction sites Query NCBI BLAST database Search HMM signals with HMMERS Search with HMM model Search TFBS with SITECOM Search TFBS with matrices Primers Querer ah seas Querer MOD BC TET Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 6 1 Circular Viewer The Circular Viewer plugin provides capability to show the circular view of a nucleotide sequence Usage example Open a nucleotide sequence object in the Sequence View The Show circular view button is available on the sequence toolbar cy NC 001365 sequence NP Dans ma NP S977344 a F NP_597742 2 2 berminal repeat The annotations NC_001363 sequence 5033 bo The scale marks i 5 terminal repeat NP_040335 1 2 NC_001363 sequence QA TL lala a amp amp vA 15k 2k 2 5k 3k 3 5k dk 45k 5k 5 833 B HP 040335 1 saarra 59774 WP 040 Np 1 5K db EHe 7 ak 3 5k
60. e Total memory usage shows the total memory usage e System memory size shows the total system memory size Index parameters Reference fragmentation this parameter influences the number of parts the reference will be divided It is better to make it bigger but it influences the amount of memory used during the alignment e Index memory usage size shows the index memory usage e Directory for index files temporary directory for saving index files You can choose a temporary directory for saving index files for the reference that will be built during the alignment If you need to run this algorithm one more time with the same reference and with the same reference fragmentation parameter you can use this prebuilt index that will be located in the temporary directory 11 18 UGENE Genome Aligner 217 Unipro UGENE User Manual Version 1 12 3 11 18 2 Building Index for UGENE Genome Aligner You can build an index to optimize short reads alignment using UGENE Genome Aligner To open the Build Index dialog select the Tools DNA assembly Build index item in the main menu Set value of the Align short reads method parameter to UGENE Genome Aligner The dialog looks as follows U Build Index Align short reads method UGENE Genome Aligner wt Reference sequence Reference fragmentation OMb Total memory usage O Mb System memory size 1536 Mb Start Cancel The parameters are the following
61. finished you can browse the results sort them by length strand or start position and save as annotations to the original sequence in the Genbank format 162 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 6 Remote BLAST The Remote BLAST plugin provides the capability to annotate sequences with information stored in remote databases To perform a remote database search open the Sequence View select the sequence region to analyze and click the Analyze Query remote database context menu item If a region is not selected the whole sequence will be analyzed Analyze G Find pattern Ctrl F Align W Find pattern Smith Waterman Ctrl Shift F Cloning Find ORFs Export F ind annotated regions Edit sequence ES Build dotplot Remove Find repeats Find tandems Analyze with query schema Find restriction sites Query NCBI BLAST database He Search HMM signals with HMMER3 Rulers Statistics R UZ Ly E C Select the search type Ka Search for short nearly exact matches Expectation value 10 000000 C Megablast Max hits 10 The database The database description Basic Local Alignment Search Tool or BLAST is an algorithm for comparing primary biological sequence information such as the amino acid sequences of different proteins or the nucleotides of DNA sequences A BLAST search enables a researcher to compare a query sequence with a library or database of seque
62. format ugenedb Short reads each added short read is a small DNA sequence file At least one read should be added Note The Aligning Short Reads with UGENE Genome Aligner has no limitation on short reads length Common parameters Mismatches allowed check this box to allow mismatches between the reference sequence and a short read Select one of the following e Mismatches number to set the number of mismatched nucleotides allowed This pa rameter can take values 1 2 and 3 e Percentage of mismatches to set the number of mismatches in percents Note that in this case the absolute number of mismatches can vary for different reads This parameter can take values 1 10 Align options e Use GPU optimization use an openCL enabled GPU during the alignment the corresponding hardware should be available on your computer e Align reverse complement reads use both a read and its reverse complement during the alignment e Use best mode during the alignment report only about best alignments in terms of mismatches e Omit reads with qualities lower than omit all reads with qualities lower than the specified value Reads that have no qualities are not omited Advanced parameters Maximum memory for short reads maximum memory usage for short reads This pa rameter allows one to decrease the load on the computer on one side and to increase the computer speed of the task on the other side
63. image button on the editor toolbar oie lA OE gt so The file save dialog will appear where you should set name location and format of the picture UGENE supports export to the PNG TIFF and JPEG image formats 7 3 3 Building HMM Profile The editor has capabilities to build a Hidden Markov Model profile based on the multiple sequence alignment This functionality is based on the Sean Eddy s HMMER http hmmer janelia org package To build a HMM profile select the Advanced gt Build HMMER2 profile or the Advanced Build HMMER3 profile item in the Actions main menu or in the context menu Learn more about the HMM tool in the documentation pages of the HMM2 and the HIMM3 plugins 120 Chapter 7 Alignment Editor Unipro UGENE User Manual Version 1 12 3 7 4 Building Phylogenetic Tree To build a tree from an alignment either press the Build Tree button on the toolbar select the Tree Build Tree item in the alignment context menu or the Actions Tree gt Build Tree item in the main menu J Two methods for building phylogenetic trees are supported 1 The PHYLIP Neighbour Joining method The PHYLIP http evolution genetics washington edu phylip html package implementation of the method is used under the hood 2 The MrBayes external tool Check MrBayes Web Site http mrbayes sourceforge net for more details 7 4 1 PHYLIP Neighbour Joining The Building Phylogenetic Tree dialog for the PHYLIP Nei
64. input files have been supplied then a sitecon profile is built for each input file i e several output files with different indexes are generated String Required wsize window size The window is a region of the alignment used to build the profile It is picked up from the center of the alignment and occupies the specified length The edges of the alignment beyond the window are not taken into account The recommended length is a bit less than the alignment length but not more than 50 bp Number Optional Default 40 clength length of a random synthetic sequence used to calibrate the profile Number Optional Default 1000000 rseed random seed used to calibrate the profile e g to generate the random synthetic sequence Use the same value to get the same calibration results twice on the same data By default new random seed is generated each time a calibration occurs Number Optional Default 0 walg specifies to use the Algorithm 2 weight algorithm In most cases it is not required but in some cases it can increase the recognition quality Boolean Optional Default false Example ugene sitecon build in COI aln out result sitecon 12 2 25 Searching for TFBS with SITECON Task Name sitecon search Searches for transcription factor binding sites TFBS with SITECON and saves the regions found as annotations Parameters in semicolon separated list of input sequence files to search TFBS in
65. m m m m m m m m m m e T The original alignment is not modified only columns with gap 1 character can be inserted The second profile was considered as a set of sequences and therefore is modified Note that if a file with another alignment is used as a source of unaligned sequences the gap characters are removed and each input sequence is processed independently This method is quite fast for example an alignment of 3000 sequences 1000 bases each to the existing profile takes about 5 minutes on the usual Core2Duo computer 11 15 uMUSCLE 203 Unipro UGENE User Manual Version 1 12 3 11 16 Bowtie Bowtie is a popular short read aligner Click this link http bowtie bio sourceforge net index shtml to open Bowtie homepage Bowtie is embedded as an external tool into UGENE Open Tools DNA Assembly submenu of the main menu Window Help DINA assembly Align short reads Test runner t Build index erry Ae 5 SITECON as Convert UGENE Assembly data base to SAM format 206 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 16 1 Aligning Short Reads with Bowtie When you select the Tools DNA Assembly gt Align short reads item in the main menu the Align Short Reads dialog appears Set value of the Align short reads method parameter to Bowtie The dialog looks as follows U Align Short Reads sign short reads method Bowie OOOO OE Reference sequen L kestena O Prebuilt index SAM o
66. matrices specified in the list You can use the Save list button to export the list of matrices to a csv file Later the list can be loaded from the file using the Load list button The rest options are standard sequence search options the strand and the sequence region where to search for matches After specifying the necessary options press the Search button The found results will appear in the dialog table The corresponding results identity scores are in the Score column Range Matrix Strand Score y 199944 199949 M40271 1 pFrn Direct strand 31 26 199943 199948 M40271 1 pFrn Direct strand 62 39 199942 199947 MADZ71 1 pfm Direct strand 5 92 199941 199946 M40271 1 pFro Direct strand 6 00 199940 199945 M40271 1 pFrn Direct strand 6 00 199939 199944 M40271 1 pFrn Direct strand 14 07 199938 199943 Made71 1 5Fm Direct strand 57 69 Y The regions found by the weight matrix algorithm can be saved as annotations to the DNA sequence in the Genbank format by pressing the Save as annotations button After saving the file with resulting annotations will be automatically added to the current project and the annotations will be added to the original sequence Note that in case of selecting JASPAR or UNIPROBE matrix the resulting annotations will contain the given matrix properties 11 20 Weight Matrix 223 Unipro UGENE User Manual Version 1 12 3 U Weight matrix search TU Create annotation Save annotati
67. more but it is very fast You can see the description of the annotation saving parameters here Search timeout sometimes a database doesn t respond therefore you need to re wait for the response This option sets the time that will be spent for re appeal to the database Note that in case of long sequences time for request preparation increases and the search takes several minutes Also there is Advanced options tab 164 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Word size Gap costs Match scores Entrez query Filters Masks Low complexity filter Mask for lookup table only E Human repeats filter E Mask lower case letters Filter results Filter by Select result by E accession Evalue def filter by definition of annotations F Score Fl id The view of the Advanced options tab depends on the selected search For the blastn search it looks like on the picture above Word size the size of the subsequence parameter for the initiated search Gap costs costs to create and extend a gap in an alignment Increasing the Gap costs will result in alignments which decrease the number of Gaps introduced Match scores reward and penalty for matching and mismatching bases Entrez query a BLAST search can be limited to the result of an Entrez query against the database chosen This restricts the search to a subset of entries from that database fitting the requirement of the Entrez que
68. other parameters can be changed on the Output tab of the dialog U DNA Flexibility ae ON ie Save annotation s to Create new table D Documents tests MyDocument gb Annotation parameters Group name Once the Search button has been pressed the annotations for the regions of the high DNA flexibility are created 11 3 2 Result Annotations Each annotation has the following qualifiers e area average threshold average window threshold in the area i e total threshold win dows number e total threshold sum of all window thresholds in the area e windows number number of windows in the area 14 156 14 672 Note Using the DNA Graphs Package you can see the flexibility graph of a DNA sequence 158 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 4 DNA Statistics The DNA Statistics plugin provides exportable statistic reports In the current UGENE version the DNA Statistics plugin provides only Alignment Grid Profile report The Alignment Grid Profile shows positional amino acid or nucleotide counts highlighted according to the frequency of symbols in a row The original idea of the MSA Grid Profile is described in the following paper Alberto Roca Albert Almada and Aaron C Abajian ProfileGrids as a new visual representation of large multiple sequence alignments a case study of the RecA protein family BMC Bioinformatics 2008 9 554 Usage example Open a sequence al
69. run it for a nucleotide sequence The results are saved as a set of annotations to the specified file in the Genbank format To learn more about the Query Designer read the Query Designer Manual follow the link on the UGENE documentation page http ugene unipro ru documentation html 11 23 Query Designer 235 12 UGENE Command Line Interface UGENE command line interface CLI was developed keeping in mind the following principles e To make it as easy as popular shell commands e To include all significant UGENE features e To allow users to add their own commands To use UGENE CLI make sure to add the path to the UGENE executable to your PATH environment variable The general syntax is the following ugene task task_name task_parameter value task_parameter value option value option value Here task name task to execute it can be one of the predefined tasks or a task you have created task parameter parameter of the specified task Some parameters of a task are required like in and out parameters of some tasks option one of the CLI options See the example below ugene align in COI aln out result aln log level details 12 1 CLI Options help h lt option_name gt lt task_name gt Shows help information For example ugene help Shows general UGENE CLI help ugene h ugene help lt option_name gt Shows help for the lt op
70. sequences as mutations invertions insertions deletions and low complexity regions Also the plugin provides advanced features comparing multiple dotplots navigation in a dotplot dotplots syn chronization saving and loading a dotplot etc An example of a dotplot view 100k 110k 120k 140 425 Abe T ECHE T ay ie Taca e A RER RC os E TE E ed e L A AA E E S G a Mi ls ip F LS U a LIS A M dis Cae yo ma E E H E e a l a S E Za LS 0 G hae fA M N D G i SEE Li D a 2 v oat U pet dl MN E T Pog Sea Phot al 6 da a TA el E S S ema eae ne 3 Oo to trea somes vel Stn Gab no o A Meads te eth aH eae HET ME 20K 1 A a EAE Coal WA EY ia ea A AA ea TEE LE 0 ee RAD a Tk LAT A ee abe ART RS e v a A ar Ls 1 ve AO NETA o mal c Paas eR B hom atl o Ate N E L H zH o 0 En T i se Ke E a 23 cg Pe aie 1 fe cuba wages oo GTE 30k ART L CEES X e RR R ef manh y ie as ao M P Fn a n NN 3 kana a gt ET an EVR ree 40k Be oa S ai S i y a LS 1 K 1 ana a xr ES MG as L OZ lt po r aB CJ al e te O Eo Boe q vi eS IH 60k sy eae 2 REA ta D RTA is 7 9 K U Pa T ES J TKL 7 E T SL on US d A Sa me ster re Des wah pa 5 api t a E Sjan Aine gun N Soe sine Wa r S id Ae dL a an A R N v R xes e Tar Say ee te Pee a H ated 80k 0 L T a V 7 Ke d xe 08 ama A T ae E 3 DAA A A E Toast S
71. set to true accepts only the specified annotations if set to false accepts all annotations except the specified ones Boolean Optional complement complements the annotated regions if the corresponding annotation is located on the complement strand Boolean Optional extend left extends the resulting regions to the left for the specified number of base symbols Number Optional extend right extends the resulting regions to the right for the specified number of base symbols Number Optional gap length inserts a gap of the specified length between the merged annotations transl translates the annotated regions Boolean Optional Example ugene extract segquence in sars gb out res fa annotation names gene 12 2 4 Finding ORFs Task Name find orfs Searches for Open Reading Frames ORFs in nucleotide sequences and saves the regions found as annotations Parameters 240 in semicolon separated list of input files String Required out output file with the annotations String Required name name of the annotated regions String Optional Default ORF min length ignores ORFs shorter than the specified length String Optional Default 100 require stop codon ignores boundary ORFs that last beyond the search region i e have no stop codon within the range Boolean Optional Default false require init codon allows ORFs starting with any codon othe
72. that have an annotated region inside Unipro UGENE User Manual Version 1 12 3 The found repeats are saved and displayed as annotations to the DNA sequence C hamen 1 UCSC Agil 2002 che 11597 770 117855134 i Ok ak Bok H C z I YV F LE EK ACAATTCTIIGPCAATTCCTT AAA r 10 817 05 190516 105 TETTAACAAACAGTTAAGGAATT TT Mara a df Arataa MyOocunest_5 pb SD ete ont 0 30 Cl repra ai El mea Lr D eres urit 0 eras LE E reaa wit L TEN O R t Oen ae amp amp Ra vs mum 120 eik SX ma a aag lZ Pe 180 Z 110690 I H PAAATCA Q 110050 19054 iS 16S 100 van TH E ATTTAGTTATCTATCTACTATCTACTATCTATCTATCTA due hu a Tsy 874 S IFA 1960 2000 jon 19535 19969 20005 20048 jor 19565 19957 ONS 20087 jor 3723 63800 85750 65877 on SS eS 3177 65017 05211 11 7 2 Finding Tandem Repeats To find tandem repeats select the Analyze Find tandems context menu item in the Sequence View window In the opened dialog you can specify the tandem search parameters the region to search in and the result parameters 11 7 Repeat Finder 169 Unipro UGENE User Manual Version 1 12 3 U Find tandems Base Tandem finder parameters Tandem preset Min period Max period 1000000 n Region to process Whole sequence Selection Custom range 199950 Save annotation s to Existing annotation table Create new
73. that the coordinate of the first visible base of the row is N but the row contains K gaps before the position N The starting offset value will be N K The same rule is true for the ending offset You can turn off the Sequence offsets by unchecking the Actions View Show offsets main menu item or View Show offsets context menu item Global coordinates This component displays the coordinates of the upper left corner of the current selection If no region is selected it shows the starting alignment point 7 1 Overview 111 Unipro UGENE User Manual Version 1 12 3 Alignment lock status As in the Sequence View this component shows whether the alignment is locked Locked documents are not allowed to be modified 7 1 3 Navigation The Sequence area provides several flexible ways to navigate through an alignment The simplest way is to use the mouse and the scrollbars Alternatively you can use arrow keys on the keyboard to navigate The list of hot keys for quick navigation e PageUp to move one screen left e PageDown to move one screen right e Home to center the starting columns of the alignment e End to move to the trailing columns of the alignment Hint if you use Shift key with the hot keys above you will navigate through the rows For example Shift PageDown will move one screen down Finally you can use the Go to position dialog from the Actions menu the context menu or the editor toolbar Position Can
74. the Consensus Area in the context menu of the Reads Area or on the Assembly Browser Settings tab of the Options Panel The following algorithms are currently available e Default shows the most common nucleotide at each position When there is equal numbers of different nucleotides in a position the consensus sequence resulting nucleotide is selected randomly from these nucleotides 8 5 Associating Reference Sequence 133 Unipro UGENE User Manual Version 1 12 3 e SAMtools uses an algorithm from the SAMtools Text Alignment Viewer to build the consensus sequence The algorithm takes into account quality values of reads and nucleotides and works with the extended nucleotide alphabet To leave only differences between the reference and the consensus sequences highlighted on the consensus se quence select the Show difference from reference item in the context menu of the Consensus Area or the Difference from reference item on the Assembly Browser Settings tab of the Options Panel Differences gt 910 To export a Consensus Sequence right click on it in the Consensus Area and select the Export gt Export consensus item in the context menu For more information about consensus exporting see Exporting Consensus 8 7 Exporting 8 7 1 Exporting Read To export a read right click on it in the Reads Area and select the Export Current read item in the context menu The Export Reads dialog appears Export to
75. the number of samples that will be discarded when convergence diag nostics are calculated Heated chains number of chains will be used in Metropolis coupling Set 1 to use usual MCMC analysis Heated chain temp the temperature parameter for heating the chains The higher the temperature the more likely the heated chains are to move between isolated peaks in the posterior distribution Random seed a seed for the random number generator Save tree to file to save the built tree Press the Build button to run the analysis with the parameters selected and build a consensus tree 124 Chapter 7 Alignment Editor 8 Assembly Browser The UGENE Assembly Browser project started in 2010 was inspired by Illumina iDEA Challenge 2011 http www illumina com landing idea and multiple requests from UGENE users The main goal of the As sembly Browser is to let a user visualize and efficiently browse large next generation sequence assemblies Currently supported formats are SAM Sequence Alignment Map and BAM which is a binary version of the SAM format Both formats are produced by SAMtools and described in the following specification SAMtools http samtools sourceforge net SAM1 pdf Support of other formats is also planned so please send us a request if you re interested in a certain format To browse an assembly data in UGENE a BAM or SAM file should be imported to a UGENE database file After that you can convert the
76. the shading drag the Unselected regions shading slider in the Settings dialog de 3D Structure Viewer Active view 1 2ZNL v EQ a Display e 190 200 210 6 2 5 Selecting Models to Display When a molecular structure contains multiple models e g NMR ensembles of models the Models item appears in the 3D Structure Viewer context menu and in the Display menu on the toolbar settings HE Export Image i ES Close Select All LA AAA 1 dh Ln b L KM Ka To show all the models check the Select All item To show only one model check the Exclusive item and then check the model you want to display To show several models uncheck both the Select All and the Exclusive items and check the models you would like to display 6 2 3D Structure Viewer 91 Unipro UGENE User Manual Version 1 12 3 6 2 6 Exporting 3D Structure Image To export a 3D structure image select the Export Image item in the 3D Structure Viewer context menu or in the Display menu on the toolbar The Export Image dialog will appear 1 Export Image File name C Documents and Settings user 1G9E png Width 899px Here you can browse for the file name select the width and height of the image as well as its format svg png ps jpg or tiff 6 2 7 Working with Several 3D Structures Views To add another view to the 3D Structure Viewer you can 1 Drag a required 3d object from the Project View to the 3D Structure Viewer Projec
77. there are no additional configuration steps required as ClustalW executable file is included to the UGENE distribution package Otherwise 1 2 Install the Clustal program on your system Set the path to the ClustalW executable on the External tools tab of UGENE Application Settings dialog Now you are able to use Clustal from UGENE Open a multiple sequence alignment file and select the Align with ClustalW item in the context menu or in the Actions main menu The Align with ClustalW dialog appears see below where you can adjust the following parameters 232 Gap opening penalty cost of opening up a new gap in the alignment Increasing this value will make gaps less frequent Gap extension penalty cost of every item in a gap Increasing this value will make gaps shorter Weight matrix specifies a single weight matrix for nucleotide sequences or series of matrices for protein sequences For nucleotide sequences the weight matrix selected defines the scores assigned to matches and mismatches including IUB ambiguity codes it can take values e UB default scoring matrix used by BESTFIT for the comparison of nucleic acid sequences X s and N s are treated as matches to any IUB ambiguity symbol All matches score 1 9 all mismatches for IUB symbols score 0 e CLUSTALW previous system used by ClustalW in which matches score 1 0 and mismatches score 0 All matches for IUB symbols also score 0 For protein
78. to e Create new table C Documents and Settings oigl MyDocument_11 gb Annotaton parameters Group name Number of CPUs being used Restore to Default Search The dialog is very similar to the dialog described in the Remote BLAST chapter except the following parameters Select input file this parameter is only presented if the dialog has been opened from the Tools main menu Here you must input a query sequence file that would be used to search the BLAST database If the dialog has been opened e g using the Sequence View context menu then the currently active sequence is used as a query sequence Search type here you should select the tool you would like to use If the query sequence is a nucleotide sequence then blastn blastx and tblastx items are available For a protein sequence the items are blastp and tblastn Select database path path to the database files Base name for BLAST DB files base name for the BLAST database files Number of CPUs being used number of processors to use To learn about other parameters please refer to the Remote BLAST chapter ClustalW Clustal http www clustal org is a widely used multiple sequence alignment program It is used for both nucleotide and protein sequences ClustalW is a command line version of the program Clustal home page http www clustal org 11 22 External Tools 231 Unipro UGENE User Manual Version 1 12 3 If you are using Windows OS
79. tool into UGENE Open Tools DNA assembly submenu of the main menu Window Help DNA assembly Align short reads Test runner 32 Build index J SITECON dL i Convert UGENE Assembly data base to SAM format 210 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 17 1 Aligning Short Reads with BWA When you select the Tools DNA Assembly gt Align short reads item in the main menu the Align Short Reads dialog appears Set value of the Align short reads method parameter to BWA The dialog looks as follows U Align Short Reads sign short reads method IN Reference sequence Resultflename i _ Prebuilt index SAM output Short reads O Max affcr nacos li Missing prob n i Index algorithm a is v Enable long gaps Max gap extensions e Threads t Indel offset 4 Mismatch penalty 4 Max long deletion extensions d Gap open penalty 0 Seed length Max seed differences k Gap extension penalty Best hits A Max queue entries m 2000000 Barcode length 8 Quality threshold q _ Colorspace c _ Long scaled gap penalty for long deletions L _ Noniterative mode M There are the following parameters Reference sequence DNA sequence to align short reads to This parameter is required 11 17 BWA 211 Unipro UGENE User Manual Version 1 12 3 Result file name file in SAM format to write the result of the ali
80. upper lower case annotations during the file reading process Format options 1 Don t use case annotations default mode usual sequence reading and writing 2 Use lower case annotation sequences are read and annotations with names lower_ case are added When these sequences are written to file then the case becomes like original the file case the case is saved 3 Use upper case annotation there is a similar behavior but with upper case annotations 40 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 4 16 5 Logging General Resources Network Category TRACE DETAILS INFO ERROR rie lala lt lt all gt gt E Sampletet E Sample text Sample text Sample text i OpenCL Algorithms Workflow Designer Console Genome Aligner Core Servi External Tools OTE Services Input Output Performance Remote Service Scripts Tasks Integration SR ES 2S S Ss SR SE SESE Ss E E User Interface A A A C C C E C E A AEA Bee He aa baa S S Log format Show date Show log level Show log category Enable colored log output E Save output to file On the Logging tab you can select type of log information ERROR INFO DETAILS TRACE for each Category that will be output to the Log View You can select format for each log message by checking the Show date Show log level and Show log category options Log x INFO 09 59 Starting Open new Sequence view task a I
81. view Za En Bookmarks s am Object views sequence with annotation E P 12 4 6 8 10 12 14 16 18 20 22 2426 TTTACTTTCTGGGGTGGGCATCCACC I F 8 G WV R L H Auto annotations murine gb NC 001363 b g NC 001363 features murine gb 4 2 UGENE Window Components Length 5833 Options panel A 1427 1673 1 484 1249 Cc G T K Dinudeotides Annotation CDs misc_feature SOUrCE Configure the annotation type 4 Show annotations of this type Show on translation Show value of qualifier protein_id locus_tag gene function p Notifications Active task m No active tasks This chapter describes UGENE main window components Project View Task View Log View and the Notifica tions popup window 4 2 1 Project View The Project View shows documents and bookmarks of the current project The documents are files added to the project And the bookmarks are visual view states of the documents Read Using Bookmarks to learn more about bookmarks To show hide the Project View click the Project button in the main UGENE window 4 2 UGENE Window Components 17 Unipro UGENE User Manual Version 1 12 3 U UGENE FACIE U example alignment Example sequence E human _TLfa 4 s human_Ti UCSC April 2002 e 0 to 10 000 10 000 bp 9 i example alignment bam ugenedb as Example sequence SS Project view GET human_T1 UCSC April 2002 chr7 E BK L exa
82. 0 I l gt Ae Name Value X AY027935 standard features Haemophilus emb 80 Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 The Dotplot provides a tool to build dotplots for DNA or RNA sequences 1 10k 20k 30k 40k 50k 60k FOKk S0k 90k 100k 110k 120k 140 425 g R oe 7 ae 0 sae pa LS oo vi et Perec ae p a E x x rope AG Aaa ae Ee Me age ne E mn pala ony oe Tm lt L S a s W mb 2 a Sr Y 7 S a SH wr ue 10k E DA R ZAM ae Bee es oa ib ee a ee so ate 5 cop e e oe AE aa aN gs eres p Te agi ee et pee ERE F N o Pr y A s ree E Eb se gs oe ie bm d d f 40k Lita A v oy de Wes BT aa a Deelah DES S G OE T E SH a es a m aE E toe eoe nea el E E 30k 60k TUK on a a AL S RT E P EI fee EOL rk a Sab A A o otis pi fen a KT E VUZ i s o e po o E E mano a fies y Be 100k A E Jig 110k khi Ta NC_014267 sequence min length 11 identity 100 A number of other instruments add graphical interface for popular sequence analysis methods 81 Unipro UGENE User Manual Version 1 12 3 U UGENE murine NC_001363 sequence BI Fie Actions Settings Tools Window Help CA GS A S a la Ea en ey NC 001363 sequence dna Go to position CHG 5 termine Select sequence region Ctrl v Jo New annotation CLN Copy b Select K E elec N Add d d 3 ARATI aia gt sw 12 4 Clonning K
83. 0 42 44 46 45 3 TITACTITTCTOGG6TGGGCATECACEGTTEGATEGAATICATIGCGGTLE K F 2 E V R L H C A L E L L A V Mame E cy Annotations MyDocument gb OE misc Feature 0 1 O misc_feature 9 29 gP NC_001363 features NC_001363 gb e E cos 0 4 5 10 2 Editing Annotation If the document is not locked it is possible to edit an annotation or an annotation group using the F2 key The result of pressing the key for an annotation U Edit annotation Annotation name L Location 1 20 The result of pressing the key for an annotation group U Rename group Ed 5 10 3 Highlighting Annotations To configure settings of annotation types go to the Annotation Highlighting tab in the Options Panel By default the tab shows annotations types of the opened Sequence View 70 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 Oms OB gt 9 SE Be BO ye 2 Select an annotation type Annotation CDS misc feature source Show all annotation types F a L Q K E G G M E D E T R RE W L V 8 R K R G E K T P P W G G Configure the annotation type TTTAGTCTCCAGAAAAAGGGGGC Jon ankamen bE ye Tae NE PgR ca er E Show on translation AAATCAGACGGTCTTTTTECCCCCETTACITTCETGGGGTGGGCATECACEG E T E L F L P 8S H F V G G T FP P Show value of qualifier N y R W F F P P T E 8 G V R L H Cc mad Name S Value 2 a cos os gt E CDS 1042 2658 gt E CDS jomm 2970 34153 3412 3873 gt E CDS 3875 4999 gt E
84. 0 with Best A backtrack is the introduction of a speculative substitution into the alignment Descriptors memory usage chunkmbs the number of megabytes of memory a given thread is given to store path descriptors in the Best flag Default 64 This parameter is available if the Best flag is checked Seed seed pseudo random number generator Threads launch the specified number of parallel search threads Threads will run on separate processors cores and synchronize when parsing reads and outputting alignments The following flags are available Colorspace color the input is read in colorspace colors are encoded as characters A C G T A blue C green G orange T red No Maq rounding nomaqround Maq Mapping and Assembly with Quality accepts quality values in the Phred quality scale but internally rounds values to the nearest 10 with a maximum of 30 By default Bowtie also rounds this way No Maq rounding prevents this rounding in Bowtie No forward orientation nofw do not attempt to align against the forward reference strand No reverse complement orientation norc do not attempt to align against the reverse complement reference strand Try as hard tryhard try as hard as possible to find valid alignments when they exist including paired end alignments 208 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Best alignments best make Bowtie gu
85. 1 12 3 Assembly Overview TT O to 10 000 10 000 bp 2646 to 2 732 87 bp Reference Area Consensus Area AAA AA AO AAA Ana OEA a Ruler aaa 2 T32 C 20 Coverage Graph E a a aa a Nes Ys 8 2 4 Reads Area Description The Reads Area provides a visualization of reads of an assembly part To zoom in or zoom out rotate the mouse wheel To perform zooming you can also use the Zoom In and Zoom Out buttons on the toolbar or the Actions Zoom In and Actions Zoom Out items in the main menu Also when you double click on a read it is zoomed in and moved to the center of the window By dragging the mouse while holding the left mouse button you can navigate in the Reads Area To navigate long distances in the Reads Area use the Assembly Overview described below Other ways to navigate in the assembly are e Use the horizontal and vertical scroll bars of the Reads Area e Go to a specified position in an assembly To learn about available hotkeys refer to Assembly Browser Hotkeys By default assembly rendering is optimized while scrolling While you are moving across an assembly it shows the assembly in gray color but when you stop it shows the assembly in different colors To disable this option uncheck the Optimize the rendering while scrolling item in the context menu of the Reads Area or Optimize scrolling item on the Assembly Browser Settings tab of the Options Panel 128 Chapter 8 Assembly Browser
86. 2 0 75 5 10 8 Importing Annotations from CSV 2 2 a 75 5 10 9 Exporting Annotations 2 2 a 78 6 Sequence View Extensions 1 a a 19 6 1 Circular Viewer 2222652424526 2288 Hee eee eee AAA 83 6 2 3D Structure Viewer 2 6 6444684064448 4G beet e 86 6 2 1 Opening 3D Structure Viewer a a a a 86 6 2 2 Changing 3D Structure Appearance a 0 000004 87 6 2 3 Moving Zooming and Spinning 3D Structure oaoa a aa a 90 6 2 4 Selecting Sequence Region 2 a a a a 90 6 2 5 Selecting Models to Display a0 a a a a 91 6 2 6 Exporting 3D Structure Image a a 92 6 2 7 Working with Several 3D Structures Views oaoa a 92 6 3 Chromatogram Viewer anaoa a 94 6 3 1 Exporting Chromatogram Data a a 95 6 3 2 Viewing Two Chromatograms Simultaneously 95 6 4 DNA Graphs Package 98 6 4 1 Description of Graphs oaoa a a 99 6 4 2 Graph Settings s 242454888 RES a wee a 100 6 5 DOLDIOL gee ae eee cues tee eeoee ee ee eeaeneeoe ee E E E te 101 6 5 1 Creating DOtDIOL sios bbe eee hee eRe Eee eRe A 101 6 5 2 Navigating in Dotplot oaoa a aa a a 104 6 5 3 Zooming to Selected Region a a 105 6 5 4 Selecting Repeal aose rrera desig ee ee ERR EA A 105 6 5 5 Interpreting Dotplot Identifying Matches Mutations lvertions etc 105 6 5 6 Editing Parameters 66s a eee bbe ween dd ed be ws ewan dba oun 107
87. 3 tools Build profile 7 Weight matrix K Search HMM signals x BLAST Phmmer search HR HMMER2 tools KL I Hao g C Build dotplot F T E wW ATTCACCAAAGTTE e In the appeared HMM3 search dialog fill required parameters and click the Remote run button U HMM3 search as Input and output Reporting tresholds Acceleration Other Query profile HMM fle Save annotations Eo Create new table Ci Documents and Settings oiglMyDocument_11 gb Annotation parameters Group name Annotation name i Remote run Cancel e The Remote machines monitor dialog will appear You can also add remove or modify remote machines here e Select a machine to run and click the Run button Note that only 1 machine can be selected in the current version of UGENE e That s all After the task is finished you will see the task report in the Task View 10 3 Running HMMER3 Search Task on Remote Machine 151 Unipro UGENE User Manual Version 1 12 3 10 4 Running Smith Waterman Search Task on Remote Machine Read the Smith Waterman Search plugin documentation before reading this paragraph To run the Smith Waterman Search task on a remote machine you need to do the following e Open a sequence and click the Analyze gt Find pattern Smith Waterman item in the Sequence View context menu E Add K Align K Export K 7 Edit se K ei k ACG F
88. 5 Unipro UGENE User Manual Version 1 12 3 7 90 JRI 07_copy sequence Objects Chromat gt M 90 JR1 07 scf L s 90 JR1 07 sequence es e S0 JRI 07 chromatogram E MB 90 JRI 07_copy scr oe a s 90 JRI 07_copy sequence IL lt U U H l Ta UN 7 1 10 20 30 40 50 E The result will look like this C7 90 RI 07 sequence MO R T aloe 3 iaa ye i 0 ii 40 0 50 t 50 90 00 t Nt H 40 0 S0 f 0 I 200 4 B Chromatogram view zoom in to see base calls A 0 TO GOCOA A Lann nnn Gee nnn nnn nnn O nnn D ee nnn nnn nnn eee soe 1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 213 90 JRI 07_copy sequence Ew Q BR RR Cz T3 eas gU Y 1 YN C 4 50 t Bi aL 00 t ut il 40 50 f B0 QU 200 4 i S A Chromatogram view zoom in to see base calls ute OTOGOCOA la H in an an 38 1 at i a di i i i 9 L I at l L ti 18 lt You can also use the Lock scales and Adjust scales global actions for the chromatograms For example if you lock the scales you are able to scroll the sequences simultaneously Also when you select a 96 Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 sequence region in one sequence the same region is selected in the second sequence U New Project UGENE 90 JRI 07_copy 90 JRI O7_copy sequence UJ File Actions Settings Took Window Help ROB A da da Ja la a la 90 JRI 07 sequence MO BRB Tm L i ma
89. 5 0033 Annotations highlighting 3 terminal repeat O curra TY 11 This will activate a dialog where to set up annotation parameters U Create annotation Save annakation s ta Existing annotation table it 2 Create new table fuser MyDocument_1 gbl land Annotation parameters Group name Annotation name Location Create Cancel The dialog asks where to save the annotation It could be either an existing annotation table object or a new document file You can also specify the name of the group and the name of the annotation If the group name is set to lt auto gt UGENE will use the annotation name as the name for the group You can use the characters in this field as a group name separator to create subgroups The Location field contains annotation coordinates The coordinates must be provided in the Genbank or EMBL file formats If you want to annotate complement sequence strand surround the coordinates with the comple ment word or press the last button in the Location row to do it automatically 5 10 Manipulating Annotations 69 Unipro UGENE User Manual Version 1 12 3 Note that by default the Location field contains the coordinates of the selected sequence region Once the Create button is pressed the annotation is created and highlighted both in the Sequence overview and the Sequence details view areas M K AAATGAAA 12 4 6 8 10 12 14 16 18 20 22 24 26 26 30 32 34 36 30 4
90. 65 60k 65k TUK Tok GOK Gf 671 aauanbeas Z9ZKLD ON Tol o oo oe oe ES li Minima a e NC_014267 sequence min length 11 identity 100 e Hold the middle mouse button and move the mouse cursor over the zoomed region of the doplot e Click on the desired region of the minimap in the right bottom corner e Activate the Scroll tool hold the left mouse button and move the mouse cursor over the zoomed region 104 Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 6 5 3 Zooming to Selected Region To select a dotplot region activate the Select tool hold down the left mouse button and drag the mouse cursor over the dotplot When you select a region on a dotplot the corresponding region is also selected in other Sequence View areas Sequence details view Sequence zoom view etc The opposite is true as well if you select a region in a Sequence View area the corresponding region is also selected in the dotplot view To zoom to the region selected click the Zoom in on the left 12 230 16k 16k 20 r Click t to zoomto 6 5 4 Selecting Repeat To select a repeat activate the Select tool and click on the repeat 24 5k gt To deselect the repeat either click on other repeat or hold Ctrl and click somewhere on the dotplot 6 5 5 Interpreting Dotplot Identifying Matches Mutations lvertions etc Using a dotplot graphic you can identify such the following diff
91. Custom CLI Tasks 4 mo 252 APPENDIOSES oe rra ao aaa AAA 253 13 1 Appendix A Supported File Formats o o 2939 1311 Specific Fille Formats 2an s sesi gemee A A 253 13 1 2 UGENE Native File Formats 0 02020 020 0 0 0 0 004 206 13 1 3 Other File Formats 256 Tutorials o waa nera ba hw hee be eee ee we eo as 257 14 1 Using BioMart with UGENE e 2 257 14 1 1 Environment requirements oaoa oaa a 257 14 1 2 Installing UGENE extension on Google Chrome 25 14 1 3 Installing UGENE extension on Mozilla Firefox 208 14 1 4 14 1 5 14 1 6 Opening data found using BioMart in UGENE Unipro UGENE User Manual Version 1 12 3 soa ae ie pe pe ede Pe GS te T 259 Opening BioMart data in UGEBE by ID 0 0 0 259 Opening selected data in UGENE o 262 1 About Unipro Established in 1992 Unipro company has its headquarters located in Novosibirsk Akademgorodok the home of Siberian Branch of Russian Academy of Sciences The company s primary activity is IT outsourcing solutions To learn more about the company please visit the company website http unipro ru 1 1 Contacts Company website ES T Uni proe ru Address UniPro 6 1 Lavrentiev Avenue 630090 Novosibirsk Russia Marketing department Tel 7 383 3326061 Fax 7 383 3302960 Email
92. DS T gt CDS 2629 3423 all por gene 0 1 4 M gene 679 1398 gene egfp gt pr source 0 1 rir No active tasks y The Export Sequence of Selected Annotations dialog will appear which is similar to the Export Selected Sequences dialog described here 5 8 14 Locking and Synchronize Ranges of Several Sequences An important feature of the Sequence zoom view is the ability to synchronize and lock visual ranges of different sequences shown in the Sequence View This feature is available when there are two or more sequences opened in the same Sequence View If we click the Lock scales button the second sequence scale will be adjusted to be the same as the focused sequence scale and is locked Now if we move a scrollbar or use zoom buttons for any of the sequence visual ranges for the rest sequences will also be adjusted CIP AAA A o Er L n te ip 7 Lock scales button Synchronize _ koo tk without lock F Y L p p Scales button E 11 5 8 Manipulating Sequence 65 Unipro UGENE User Manual Version 1 12 3 To unlock the scales click the same button again You may use the Adjust scales button to synchronize scales without locking them Note that if you have a selected sequence region or a selected annotation the scales will be synchronized by the start position of the region or the annotation If there are no active selection the regions are synchronized by the first visible sequence posi
93. E NP 040335 1 Becta ESE P SUD A Pw Genetic code selector Toggle visibility 2 5k 2k 3 5k 4 5k 5k 3 direct amino translations U A A gt F h The original sequence edge D D GC C T 52 2939 2945 2950 2955 2960 1965 2970 2975 2980 2085 2995 TCATCGTCGACCCACTGGCCGGAACGCATCCCTACCAT C Q The complement strand reer T a D A A A M A T 8 VF g i The selected annotation lt 3 complement amino translations F S NC_001363 fatures NC_001363 gb 9 4 CDS ie 4 oO GE 1042 z655 9 8 CDS join 2970 3413 3412 3873 O CDS 3875 4999 H O CDS 5045 5203 5 T gene 0 4 lt gt G 2 Tasks LJ 3 Log Mo active tasks See also e Navigating the Sequence details view using the Sequence overview e Selecting Amino Translation e Showing and Hiding Translations 5 Information about Sequence Context information about a sequence can be found on the nformation tab in the Options Panel All information is contextual i e it shows statistics about the currently selected region on the selected sequence The tab includes information about 52 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 e Sequence length e Characters occurrence e Dinucleotides occurrence for sequences with the standard DNA and RNA alphabets moms O RY UO eT waa es v B Length 589 Characters Occurrence A 151 25 6 E 165 227 AR Git 24 T 131 22 2 Y Dinudeotides om AG 4 AT 29 CA 41
94. E User Manual Version 1 12 3 or drag the file to the UGENE window Warning Documents created not by UGENE are locked To be able to edit the document you should save a copy of the document and continue working with the copy Advanced Dialog Options Open the Add existing document dialog U Add existing document Add existing document sament Document format Plain text _ Force read only mode Custom settings Save file to disk before opening 22 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 The following parameters are available Document location location of the document It can be a local file a shared network file or a web reference for example e C store mydocument gb e 192 168 0 3 store mydocument gb e http someaddress com store mydocument gb Document format specifies how to interpret the data stored in the file As specified above the format is detected automatically but you can select it manually Force read only mode locks the document for editing Save file to disk before opening this option becomes available if a web reference has been specified in the Document location Saves the remote file to the local disk before opening it Custom settings the button is available for Genbank EMBL FASTA and FASTQ document formats The button opens the following dialog UI Custom settings FASTA format settings 2 Separate sequences O Merge sequences
95. Escherichia coli 53k Escherichia coli RFLS 7M Escherichia coli H1 Escherichia coli Escherichia coli H709c Escherichia coli RY13 REBASE Info Selected You can see the list of restriction enzymes that can be used to search for restriction sites The information about enzymes was obtained from the REBASE http rebase neb com rebase rebase html database For each enzyme in the list a brief description is available the accession ID in the database the recognition sequence etc If you re online you can get more detailed information about an enzyme selected by clicking the REBASE Info button 172 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 8 1 Selecting Restriction Enzymes To select an enzyme check it in the list Notice that the enzyme appears in the Selected enzymes area of the dialog You can also use the Select All button to select all the enzymes available the Select None button to deselect all the enzymes To select all enzymes with recognition sequence length shorter than the specified value click the Select by length button and input the minimum length in the dialog appeared To invert selection click the nvert selection button As soon as enzymes are selected you can click the OK button to search for corresponding restriction sites in the sequence 11 8 2 Using Custom File with Enzymes To load a custom file with enzymes click the Enzymes file button and browse for the file The file must be o
96. NFO 09 59 Task Open new Sequence view finished INFO 09 59 Starting Open new Sequence view task INFO 09 59 Task Open new Sequence view finished a INFO 09 59 Starting Open new Sequence view task INFO 09 59 Task Open new Sequence view finished 4 16 UGENE Application Settings A Unipro UGENE User Manual Version 1 12 3 4 16 6 OpenCL If you have a video card that supports OpenCL you can use it to speed up some calculations in UGENE To do it install the latest video card driver and check the corresponding check box General OpenCL Resources Network The following OpenCL enabled GPUs are detected Check the GPUs to use for accelerating algorithms computations NVIDIA Corporation GeForce GT 220 435 Mb File Format Logging OpenCL Workflow Designer Genome Aligner External Tools Now you can for example use OpenCL optimization for the Smith Waterman algorithm 42 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 4 16 7 Workflow Designer Use this tab to configure the Workflow Designer settings General Workflow Designer Resources Network Scan aaa File Format Show grid Logging l OpenCl Serer Workflow Designer Element style Genome Aligner Element font External Tools Element background color Runtime settings Track running progress Directories Directory for custom elements with scripts e trunk src _debug data workfiow_samples us
97. O to 70 057 70 057 bp 1 to 70 057 70 057 E PA O A A a 3934 21345033 Zoom in to see the reads or choose one of the well covered regions Region Approx coverage 1 B 087 9 164 3674 2 19009 9 086 3419 3 9 320 9 397 3 204 4 8 931 9008 3030 5 9 397 9 474 2 540 6 9 164 9 241 1937 7 9 249 9 319 1842 8 7689 7 766 1 436 9 7611 7688 1391 10 7378 7 455 1145 TIP Ctri arrow Move one page in the corresponding direction in the Reads Area Note that for large assemblies it may take some time to calculate the overview and the well covered regions To see the reads either select a region from the list or zoom in for example by clicking the link above the well covered regions or by rotating the mouse wheel You can also use the hotkeys Tips about hotkeys are shown under the list of well covered regions To learn about available hotkeys refer to Assembly Browser Hotkeys 8 2 3 Assembly Browser Window Components An Assembly Browser window consists of Assembly Overview By default shows the whole assembly overview Can be resized to provide an overview of an assembly part Reference Area Shows the reference sequence Consensus Area Shows the consensus sequence Ruler Shows the coordinates in the Reads Area Reads Area Displays the reads Coverage Graph Shows the coverage of the Reads Area See the example below 8 2 Browsing and Zooming Assembly 127 Unipro UGENE User Manual Version
98. OESN T include any personal data Path for temporary files the path where will be stored temporary files Default settings this option resets the default settings 4 16 2 Resources Resources Resources Network File Format Optimize for CPU count Logging OpenCL Tasks memory limit Application resources Workflow Designer Thread init Genome Aligner External Tools 38 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 On the Resources tab you can set resources that can be used by the application Optimize for CPU count Tasks memory limit and Threads limit 4 16 3 Network General Resources Network File Format ype Server Logging T OpenCL os Workflow Designer Use authentication with HTTP proxy Genome Aligner External Tools Login Password Do not use proxy on following addresses separate line for each SSL settings Secure Socket protocol SslV3 Remote request settings Remote request timeout 60 sec On the Network settings tab of the dialog you can specify Proxy server parameters select SSL settings and configure the Remote request timeout 4 16 UGENE Application Settings 39 Unipro UGENE User Manual Version 1 12 3 4 16 4 File Format File Format Resources Network Sequence annotatons FieFomat create annotations for case tng Logging OpentL Workflow Designer Genome ligner External Tools The Sequence Annotations settings allows to use
99. ORFs with length lower than Min length value will not be found Must terminate within region this option ignores boundary ORFs located beyond the search region Must start with init codon item switches the ORF Marker algorithm to the mode when any non stop amino acid code is interpreted as region start position Allow overlaps alternative downstream initiators when another start codon is located within a longer ORF i e all possible ORFs will be found not only the longest ones Allow alternative init codon option includes ORFs starting with alternative initiation codons ac cordingly to the current translation table 11 5 ORF Marker 161 Unipro UGENE User Manual Version 1 12 3 Include stop codon includes stop codons into resulting annotations The other available parameters are DNA to Amino translation table defines the way start alternative start and stop codons are encoded Strand where to search the ORFs in the direct strand in the complement strand or in both strands Preview allow to preview the regions strands and lengths of the found ORFs Clear results becomes available when some results have been found clears these results Results When the search parameters has been selected and the OK button has been pressed in the dialog the auto annotating becomes enabled In the Annotations editor the ORFs annotations can be found in the Auto annotations orf group After the search has been
100. Plugins Unipro UGENE User Manual Version 1 12 3 11 11 2 Types of SITECON Models Eukaryotic Name Tn CEBP a CCAAT enhancer binding protein alpha http en wikipedia org wiki Ccaat enhancer binding _ proteins CEBP_all CCAAT enhancer binding proteins http en wikipedia org wiki Ccaat enhancer binding _ proteins CLOCK Circadian Locomotor Output Cycles Kaput http en wikipedia org wiki CLOCK cMyc can Myc c Myc is a regulator gene that codes for a transcription factor A mutated version http en wikipediaobrly yaiks glud in many cancers CRE Cyclic AMP response element http en wikipedia org wiki CAMP_ response element cAMP_ response element E2F1 Transcription factor E2F1 is a protein that in humans is encoded by the E2F1 gene http en wikipedia org wiki E2F1 E2F1 DP1sel1 E2F factors bind to DNA as homodimers or heterodimers in association with dimerization http en wikipediapargyankDPILEDP 1 EGR1 Early growth response protein 1 http en wikipedia org wiki EGR1 EKLf Erythroid Kruppel like Factor http en wikipedia org wiki KLF1 ER2 Estrogen receptor beta http en wikipedia org wiki Estrogen receptor beta GATA all GATA transcription factors are a family of transcription factors characterized by their http en wikipediaabiffwtli GAT tha BBHptequefarid ATA GATA 1 GATA binding factor 1 http en wikipedia org wiki GATA1 GATA 2 GATA binding protein 2 http en w
101. TGCCAGTAAGASAATOCATTITITCAAGATTASATTICOGCATT TOTTACTTAATAGCATTIGICATATICCAATTTITCATATGIAGTAAATTICATITCAAAT CA 2ENSGOO000232606 ENSTOO000413525 cds KNOWN linckKNA gt EN35600000232606 ENS5T00000413525 ENS5E00001765550 exon KNOWN lincRNA AGCTTCACATGIGAGATAAATGCACTCAAAGATTCCTCACAAGTAGCTCITIGGAGCITC AGATGTGAAATGGATCATTCETCAATETGTAATAGACECTICTGTGAAGETETICAATEA SaCCAGAGAATTCAAG gt EN3600000232506 ENS5T00000413525 ENSEOQO001709530 exon KNOWN lincRHA AGTTTCCAACACCTAAGAGTEGGTATTTGGCAAATEGTGGECCAAAGGAATAAAGAAGGCA TGCAAAACTCTTGACAGAAGACATTCAGAAATTGATTITGATATCAGATACAAGGAGAAAA TATGCCAGTAAGAAAATGCATTTTTCAAGATTAAATTCGGOCATTIGITACTTAATAGCAT TIGTICATATICCAATTTTTCATATGOTAGTAAATTCATTTCAAAT CA gt ENSG0000023268 Copy GIGAAATTGAGGGE s Print Search Google for gt EN5000000232606 EN5T00000413525 AGGAACCTGAGGTA AAGGCTGICAATCE AGACAAGCTOTAAg TeacaaccctTTace Ml Open selection in UGENE CICICIATIGAATI S G GAAAAAACATTAGI ELE AAGAATTGATATATTTTATAAACGTAATGATECTCEATAGTTACATETTATITACGGGATTAT GTTTGTAGATCATGTAATGAGTTTTAATAAGTTIGTTICCACGTAAAGCAGTAAGACAGAG TTACCTCTGGTAACGGABABAATARACGAAGATCTATICIGAAATTACITITTIGICGICACAT TTTTAATTGATTICGGTGOCCACATITECACGTTGGACATITCIAGGTTAATITITITITATI ATACTIGAAGTTTTAGGCGTACACGTOCACAACETGCACGGTTACATATGTATACATGTIGCC ATGTTGGTGIGCTGCACCCATTAACTCGICATTITAACATTAGGTATATCTICTCTAGIITA ATTTTTAAGAAACTETAGAATTTICAATGGGCTATITAGTTIGGAGAAGCAGTGAATECT CACA lhe TNN TET T rere R lt T eT NW N Ea N ETM NNE ELEN NNE The selected da
102. UGENE database file into a SAM file The import to a UGENE database file has both advantages and disadvantages The disadvantages are that the import may take time for a large file and there should be enough disk space to store the database file On the other hand this allows one to overview the whole assembly and navigate in it rather rapidly In addition during the import you can select contigs to be imported from the BAM SAM file So there is no need to import the whole file if you re going to work only with some contigs Note that in the future there are plans to support the other approach as well namely when a BAM SAM file is opened directly The Assembly Browser has been tested on different BAM SAM files from the 1000 Genomes Project http www 1000genomes org about and other sources Read the documentation below to learn more about the Assembly Browser features 8 1 Import BAM SAM File To start working with an assembly import it to the UGENE database file To do this open the assembly file The Import BAM SAM File dialog appears Y Import BAM SAM file Source URL E BAM Kebsislla sort bam Contig name Length 1 pkF70 10057 2 pkf140 147 416 3 pRFo4 94 219 Import unmapped reads Destination URL E BAM Klebsislla sort bam ugenedb Add to project The Source URL field in the dialog specifies the file to import The Info button nearby can be used to obtain additional information about the file 125 Unipro UGENE U
103. Unipro GENE Unipro UGENE User Manual Version 1 12 3 April 01 2014 Contents 1 About Unipro 6 os be be ew ew ee ee eS 10 1 1 CONTACTS o arrasa rara 10 About UGENE cesaron 11 2 1 Key Featires esaer Anesa erona re 11 2 2 User Interface anaoa ee eb ee bee et AA 11 2 3 High Performance Computing o 12 2 4 Cooperation i 6 ae a a a a 12 WistaatiONn E E E E es 13 3 1 Installing UGENE on Windows e 13 3 2 Installing UGENE on Linux e 14 3 3 Installing UGENE on Mac OS X 15 Basic FUNCUONS ese resad KEM cdo AAA 16 4 1 UGENE Terminology 16 4 2 UGENE Window ComponentS ua 17 4 2 1 Project VIEW socie ria AA 17 4 2 2 Task View o 45 gt hk GaSe eee hae ee ae a ee a 18 4 2 3 Log VIEW sss eee ee eee aee ee eE eee eRe YE Ree eS EE ESE EG 19 4 2 4 NGUMCATIONS lt 240408 824 be we Soe ene RO BEEBE 19 4 3 Main Menu Overview 20 4 4 Creating New Project 21 4 5 Opening Document 21 4 5 1 Opening for the First Time 0 00000 eee ee ee 21 4 5 2 Opening Document Present in Project o 23 4 6 Creating Document 1 2 24 4 7 Exporting Documents 20 4 8 Locked Documents is cores cara ce Be oe 26 4 9 Using Objects and Object Views 2 26 4 10 Exporting ObIectS 24222642
104. Version 1 12 3 7 1 2 Alignment Editor Components Here is the default layout of the editor Prarnproplera_frcato iope Mapa EF Sia GC 080060 e Gr EFAG L l B Gr L HOntana_montarna Mebraprora_ponto_EFSAOA ESTI HEO EE sodio vd_EF S40 Deracartha_deracantoides_ El DI DADA Tettigonia_wiridisstma Concepts SC DBP Conocephalls_ sp Conocechalus perciudnta Mecopoci_elingala JS agak Mecootde_elangeta_ Sumit ecopoda_ sp Malis Podria sonar Helrodes_pupia_EFFPACGA Sequence list Ending offset aln f i ia h Find o l iniji Cot 17 RO LS TR Coordinates 777 Mo activa task ly The Alignment Editor components Sequence area This is the main component of the editor It displays aligned sequences The upper part of the Sequence area is the ruler which shows the coordinates of the currently visible row sequences Consensus area This component is situated above the Sequence area It shows the consensus sequence for the current alignment calculated using currently selected algorithm Sequence list This component is located in the left part of the Sequence area It shows names of the corresponding sequences in the alignment Editor toolbar The toolbar contains shortcuts for important editor actions such as Undo Redo Zooming and others Sequence offsets These are the offsets for the first and the last visible base for each alignment row Note that the offset value doesn t include gaps For example let s assume
105. al BLAST database using old version of the NCBI BLAST Warning BLAST is used as an external tool and must be installed on your system Parameters toolpath path to the blastall executable By default the path specified in the Application Settings is applied String Optional Default default tmpdir directory for temporary files By default the path specified in the Application Settings is applied String Optional Default default in semicolon separated list of input sequence files String Required dbpath path to the BLAST database files String Required dbname base name of the BLAST database files String Required out output Genbank file the results of the search are stored as annotations String Required name name of the annotations String Optional Default blast result 242 Chapter 12 UGENE Command Line Interface Unipro UGENE User Manual Version 1 12 3 p type of the BLAST search String Optional Default blastn The following values are available e blastn e blastp e blastx e tblastn e tblastx e expectation value threshold Number Optional Default 10 Example ugene local blast in input fa dbpath dbname mydb out output gb 12 2 9 Local BLAST Search Task Name local blast Performs a search on a local BLAST database using BLAST Warning BLAST is used as an external tool and must be installed on your system Parameters to
106. anch Settings A Unipro UGENE User Manual Version 1 12 3 9 4 Modifying Labels Appearance From this paragraph you can learn how to show hide taxon and distance labels align them and change their formatting font color etc 9 4 1 Showing Hiding Labels When you open a tree all the labels are shown by default To hide the taxon sequence name labels select either the Show Labels toolbar button or the Actions Show Labels item in the main menu and uncheck the Show Names item in the submenu appeared To hide the distance labels uncheck the Show Distances item in the same submenu To show the labels again check an appropriate item in the submenu 9 4 2 Aligning Labels To align a tree labels press the Align Labels sticky button on the toolbar or the Actions Align Labels item in the main menu See the example of aligning labels below 420 2 Chapter9 Phylogenetic Tree Viewer Unipro UGENE User Manual Version 1 12 3 see eee eee PAIRS Ssappeensis e so ha alitas EF MIG20 ati Rosseliana mesel mmnm mmm mmm Bicolorana_bicolor_EF5408 30 cr Montana montana ie Metroptera_ japonica _EF540831 Gampsocieis_sedalkowi_EF540828 rs Deracantha _deracantoides EF540 a ia fychia_baranow AAA AAA AA Tettigonia_Widissima de Conocephalus_discolor on Conocephalus _sp Conocephalus_percaudata DIN ae Mecopoda elongata lehigaki_ ie Mecopoda elongata Sumatra_ e M ecopod a p _ N alas la E pon Phan ropters I Elesa
107. and line version using the command ugene Note Several native packages for specific Linux distributions are also available Find out details on the download page Note UGENE is a part of Ubuntu and Fedora Linux distributions 14 Chapter 3 Installation Unipro UGENE User Manual Version 1 12 3 3 3 Installing UGENE on Mac OS X 1 Download the Mac OS X Disk image file using the appropriate link on the download page B Mac OS X Prebuilt package for MacOS X x86 system download 2 Launch the dmg file and accept the GNU license agreement The following window will appear Mp lof 2 selected Zero KR available Applications 3 To start UGENE click on the ugeneui icon You can also copy UGENE to the Applications folder by dragging it 3 3 Installing UGENE on Mac OS X 15 4 Basic Functions 4 1 UGENE Terminology Project Storage for a set of data files and visualization options Document A single file can be stored on a local hard drive or be a remote web page Each document contains a set of objects Object A minimal and complete model of biological data For example a single sequence a set of annotations a multiple sequence alignment Task A process usually asynchronous that works in background For example some computations loading and writing files Plugin A dynamically loaded module that adds new functionality to UGENE Object View A graphical view for a single or a set of objects
108. apping check the Bootstrapping and Consensus Trees group check box The following parameters are available Number of replicates number of replicate date sets Seed random number seed By default it is generated automatically You can manually change this value in order to make results of different runs of a tree building reproducible The should must be an integer greater than zero and less than 32767 and which is of the form 4n 1 that is it leaves a remainder of 1 when divided by 4 Any odd number can also be used but may result in a random number sequence that repeats itself after less than the full one billion numbers Usually this is not a problem Consensus type specifies the method to build the consensus tree Select one of the following e Strict specifies that a set of species must appear in all input trees to be included in the strict consensus tree e Majority Rule extended specifies that any set of species that appears in more than 50 of the trees is included The program then considers the other sets of species in order of the frequency with which they have appeared adding to the consensus tree any which are compatible with it until the tree is fully resolved This is the default setting e M1 includes in the consensus tree any sets of species that occur among the input trees more than a specified fraction of the time see the Fraction parameter below The Strict consensus and the Majority Rule cons
109. arantee that reported singleton alignments are best in terms of stratum i e number of mismatches or mismatches in the seed for the case of n mode and in terms of the quality values at the mismatched position s All alignments all report all valid alignments per read or pair Validity of alignments is determined by the alignment policy combined effects of n mode v mode Seed length and Mad error Select the required parameters and press the Start button 11 16 2 Building Index for Bowtie To build Bowtie index select the Tools DNA Assembly gt Build index item in the main menu The Build Index dialog appears Set the Align short reads method parameter to Bowtie The dialog looks as follows U Build Index Reference sequence Index flemame o _ Colorspace l Star Cancel There are the following parameters Reference sequence DNA sequence to which short reads would be aligned to This parameter is required Index file name a file to save the created index to This parameter is required Colorspace color the input is read in colorspace colors are encoded as characters A C G T A blue C green G orange T red 11 16 Bowtie 209 Unipro UGENE User Manual Version 1 12 3 11 17 BWA BWA is a fast light weighted tool that aligns relatively short reads to a reference sequence Click this link http bio bwa sourceforge net to open BWA homepage BWA is embedded as an external
110. ations 0004 eae 150 10 3 Running HMMER3 Search Task on Remote Machine anaa aaa 151 10 4 Running Smith Waterman Search Task on Remote Machine 152 10 5 Running MUSCLE Align Task on Remote Machine anaana aaa 153 Sa es areo GRR Gabe doen t ee eue eked eee a eue dws 154 11 1 Workflow Designer 155 11 2 DNA Annotator ces te dee tee bee e cs ae sees amp Chee eee G4 156 11 3 DNA Flexibility lt lt 2 4 157 11 3 1 Configuring Dialog Settings 0 0 2 2 20 000004 158 11 32 Result Annotations 2 2 a a 158 11 4 DNA Statistics oe rostros rosana oe che oe 6 159 11 5 ORF Marker o coe ses eee eRe eS BE eee ee Os 161 11 6 Remote BLAST gt s 24 22h foe chee Se Set beet wees bs dra 163 11 7 Repeat Finder isso eds merda aa tee eens a es ee 168 11 8 11 9 11 10 11 11 11 12 11 13 11 14 11 15 11 16 11 17 11 18 11 19 11 20 11 21 O a aaa 6 4 4 8 8 oe Ai Bo See we ee EO SL oe a 168 11 72 Finding Tandem Repeats vada bd a o e ES he oe S32 169 Restriction Analysis a 20 2 N L A BE EDAD RT ET 172 11 8 1 Selecting Restriction Enzymes 2 2 a a 173 11 8 2 Using Custom File with Enzymes 0 000004 173 11 8 3 Filtering by Number of Hits 244466584854 4644 173 Lo 4 Excluding RECIO sr io 4 54 eae ee eS ds AR REESE d 173 So Circular Molecule 2 lt sea a SR A ee Se Ee BS eet a ek Re i 173 LSO gt
111. bmenu 9 8 Printing Tree To print a tree select either the Print Tree toolbar button or the Actions Print Tree item in the main menu The standard print dialog will appear where you can select a printer to use and specify other settings 9 7 Exporting Tree Image 145 10 Distributed Computing Distributed computing allows to notably increase the performance of computational tasks by distributing the task data among computational units However the distributed computing assumes complex solutions specialized versions of algorithms network communication etc Unipro UGENE project provides advanced distributed computing capabilities Despite the complexity of the internal structure for users running computational tasks on a remote machine is as easy as running it on a local machine Starting with version 1 7 2 UGENE supports cloud computing For example computational workflows can be launched on the Amazon EC2 http aws amazon com ec2 cloud Check for details the following documenta tion section e Running Workflows on Cloud There are also several distributed algorithms that can be executed on a remote machine e HMMERS search e Smith Waterman search e Muscle3 align 10 1 Remote Machines Monitor Remote machines to perform calculations can be set in the Remote machines monitor dialog lt can be accessed by selecting the Settings gt Remote machines monitor in the main menu Tools Window Help Preferences
112. ce alignment see the Building Phylogenetic Tree paragraph To learn what you can do with a tree using UGENE Phylogenetic Tree Viewer read the documentation below 9 1 Adjusting Tree Settings To adjust a tree settings select either the Tree Settings toolbar button or the Actions Tree Settings item in the main menu The Tree Settings dialog will appear 140 Unipro UGENE User Manual Version 1 12 3 Tree Settings Width Height Rectangular Tree View In the dialog you can tune the width of the tree If the tree layout is set to rectangular you can tune the height of the tree also And you can select the tree view e Phylogram e Cladogram 9 2 Adjusting Branch Settings To adjust branch settings select either Branch Settings toolbar button or the Actions Branch Settings item in the main menu The Branch Settings dialog will appear ZI Branch Settings Color Line Weight Here you can select the color and the line width of the tree branches Note that when a clade has been selected the branch settings are applied to the clade only 9 3 Selecting Tree Layout You can select one of the following tree layouts e Rectangular e Circular e Unrooted To do it press the Layout toolbar button and check the required item in the appeared menu Or you can check the item in the Actions gt Layout submenu of the main menu See the example of the Circular layout 9 2 Adjusting Br
113. cel A ee O So Enter the column number base coordinate and the view will be centered to the corresponding base 7 1 4 Coloring Schemes There are various coloring schemes for DNA and amino alphabets available To change the scheme activate the context menu using the right mouse button or the Actions main menu and select the required scheme in the Colors submenu upy F Mo colors Edit K Jalview 2 Align Percentage Identity iy Statistics k LIGENE Lite h 7 1 5 Zooming and Fonts To perform zoom operations use the corresponding buttons on the editor toolbar Unipro UGENE User Manual Version 1 12 3 By default the base characters are visible when zooming But for rather long sequences there is another zoom mode available In this mode the bases are not shown This allows viewing very large sequence regions up to 500 bp Zoom To Selection AHI Consensus lili m TO mi 70 7 7 7a 7M W347 To mi TCU pa AD ABD eT BBD HB6 M Sb az EE Sr 20 DrD thid Srl magso zmis RRR ug i us EEE thos brig bz163 ruiz DET EFE PEET 29173 29280 3 prmer_2d33REVCOMP I 1 22 prmer_espREVCOMP I 1 xy primer_e0 12REVOOMP L de 30 You can zoom to the selected region by clicking the Zoom to selection button It is very convenient operation when the alignment size is rather large For example you can zoom out to some percentage select an interesting region and then zoom to the selec
114. cidad oras 27 4 10 1 Exporting Sequences to Sequence Format o 28 4 10 2 Exporting Sequences as Alignments 2 2 a a 29 4 10 3 Exporting Alignment to Sequence Format 0 30 4 10 4 Exporting Nucleic Alignment to Amino Translation 31 4 11 Using Bookmarks 32 4 12 Working with Projects e 33 4 13 Options Panel AA 34 4 14 Adding and Removing Plugins e 2 34 4 15 Fetching Data from Remote Database eee 36 4 16 UGENE Application Settings ee 36 4 16 1 General a2 ees be epee n een PP doa newrhbegag ake canus ad 37 4 16 2 IRESQUICES e ura ae A ee a BAR aos aaa 38 4 16 3 Network lt gt roben RRR RS See ee CRE SO CEE ee eS 39 lod RIG Formate ss esos eee eeeeerade 40 4 16 5 ETT 2 ah oe eee ee beeen ee E seep eee ee eee eas 41 4166 OpenGL a2 cnheeesaGiwueaGded biagwvyd dd bw wee ewe nee 42 4 16 7 Workflow Designer 2 1 aa a 43 4 16 8 Genome Aligner a 44 4 169 CUDA Rea ee 45 4 16 10 External Tools AAA 46 Sequence View a a a a 47 5 1 Sequence View ComponentS 47 5 2 Global Actions bsos cda raras 49 5 3 Sequence Toolbar rara AA 49 5 4 Sequence Overview 50 5 5 Sequence Zoom View 51 5 5 1 Managing Zoom View RowS aoaaa a a 51 5 6 Sequence Details View anoano e 52 5 7 Information about Sequence
115. complement sequence e Ctrl Shift T copies reverse complement amino translation 3 Using the Copy submenu of the context menu d sd T d d d ed PA O Pad d P d d d Se ed P a Pd d S Goto position Ctrl 1 Te i Select sequence region Ctrl A gt A New annotation Ctrl M 291 Copy sequence Crec Select as Copy complement sequence Chrl Shirt C Add t 3 Copy translation Ctrl T Analyze E Er Copy complement translation Ctrl Shift T res l Align 49 Copy annotation sequence 0 Export l Ta Copy annotation sequence translation Edit sequence K Remove K re A Rulers K O Disable mier Feature highlighting Annotations highlighting 5 8 Manipulating Sequence 59 Unipro UGENE User Manual Version 1 12 3 5 8 10 Search in Sequence To search for a pattern in a sequence go to the Search in Sequence tab of the the Options Panel in the Sequence View Input the value you want to search in the text field and click the Search button Search in Sequence Search for mia A ee la cla A i 40 4244 CATTG Cea mP Gn a nra TIO More apdans By default misc_ feature annotations are created for regions that exactly match the pattern To change these and or other settings click on the Show more options link Find below the description of the available settings Search algorithm Search algorithm Algorithm InsDel Should match 100 This group specif
116. ct OmpC amp quickSearch Quick Search OxyR Oxidative stress regulator http biocyc org ECOLI substring search type NIL amp object OxyR amp quickSearch Quick Search PHOB PhoB is a dual transcription regulator that activates expression of the Pho regulon in http biocyc org F spbh substemgronmental Pi search type NIL4object PHOB lt quickSearch Quick Search PHOP Member of the two component regulatory system phoQ phoP involved in adaptation to http biocyc org H vO Mgaibstnmigonments and the control of acid resistance genes search type NIL amp object PHOP amp quickSearch Quick Search PurR dimer controls several genes involved in purine nucleotide biosynthesis and its own http biocyc org B he ssi bstring search type NIL4object PurR quickSearch Quick Search RcsB 1 Regulator capsule synthesis B http biocyc orgyECOLI substring search type NIL amp object RcsB amp quickSearch Quick Search ResB 2 Regulator capsule synthesis B http biocyc org ECOLI substring search type NIL amp object RcsB amp quickSearch Quick Search Rob2 Right origin binding protein http biocyc org ECOLI substring search type NIL amp object ROB amp quickSearch Quick Search ROB Right origin binding protein http biocyc org ECOLI substring search type NILMobject ROB amp quickSearch Quick Search soxS SoxS is a dual transcriptional activator and participates in the removal of superoxide and http biocyc org
117. dotplot axis If there are several sequences in the specified the first or the second file and you haven t selected to join the sequences in the previous dialog then you can select a sequence in these fields If you have selected to Join all sequences found in the file then you can t select a separate sequence from the file the joined Sequence can be selected instead Search direct repeats check this option to search for direct repeats in the specified sequences You can also select the color with which the repeats will be displayed in the picture The defau t button sets the default color Search inverted repeats check this option to search for inverted repeats in the specified sequences 102 Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 You can also select the color with which the repeats will be displayed in the picture The default button sets the default color Custom algorithm optionally you can select an algorithm to calculate the repeats e Auto e Suffix index e Diagonals Note The specified algorithm is provided to the Repeat Finder plugin as an input parameter In most cases the Auto value is appropriate Minimum repeat length allows to draw only such matches between the sequences that are contin uous and long enough For example if it equals to 3bp then only repeats will be found that contain 3 and more base symbols Press the 1k button to automatically adjust the
118. e for NCBI GenBank such unique id could be Acces sion Number http en wikipedia org wiki Accession number 28bioinformatics 29 or NCBI Gl number http www ncbi nlm nih gov Sitemap sequencelDs html Optionally you can browse for a directory to save the fetched file to After you click the OK button UGENE downloads the biological object DNA sequence protein sequence 3d model etc and adds it to the current project If something goes wrong check the Log View it will help you to diagnose the problem 4 16 UGENE Application Settings To open UGENE Application Settings dialog choose the Settings gt Preferences item in the main menu The following settings are available 36 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 4 16 1 General General General Resources Network Language of User Interface applied after restart on Logging OpenCL Workflow Designer eis Genome Aligne External Tools Window Layout Multiple documents 5 Tabbed documents Preferred Web browser System default browser O Specified executable Project Open last project at startup Path to downloaded data C Users yalaaer UGENE_downloaded Statistical reports Enable statistical reports collecting Path for temporary files C Users yalgaer AppData Local Temp Default settings Reset settings to default on the next run The following settings are available on the tab Language of User Interface
119. e from Remote Database 238 Chapter 12 UGENE Command Line Interface Unipro UGENE User Manual Version 1 12 3 12 2 1 Converting Sequences Task Name convert seq Converts a sequence from one format to another Parameters in input sequence file String Required out name of the output file String Required format format of the output file String Optional The following values are available e fasta e fastq e genbank e raw Example ugene convert seg in human_T1 fa out human_Tl gbk format genbank 12 2 2 Converting MSA Task Name convert msa Converts a multiple sequence alignment file from one format to another Parameters in input multiple sequence alignment file String Required out name of the output file String Required format format of the output file String Optional The following values are available e clustal default e mega e mst e sam e srfasta e stockholm Example ugene convert msa in CBS sto out CBS Lormat mst 12 2 CLI Predefined Tasks 239 Unipro UGENE User Manual Version 1 12 3 12 2 3 Extracting Sequence Task Name extract sequence Extracts annotated regions from an input sequence Parameters in semicolon separated list of input files String Required out output file String Required annotation names list of annotations names which will be accepted or filtered String Required accept or filter if
120. e with a sequence an alignment or any other biological data a new anonymous project is created automatically To create a new project select the File New project menu or click the New project button on the main toolbar The dialog will appear U Create new project Create new project Project name New Project Project Folder C Program Files Unipro LIGENE Project File project Here you need to specify the visual name for the project and the directory and file to store it After you click the Create button the Project View window is opened 4 5 Opening Document UGENE stores information about documents you are working with in a project Once a document has been opened the information about it is saved in the current project 4 5 1 Opening for the First Time To open a document that is not yet presented in the current project use either an advanced Add existing document dialog a simple open file dialog or just drag the document to the UGENE window UGENE automatically detects the format of the document but if you use the advanced dialog you can choose the format manually To open the advanced dialog select one of the following e Add gt Existing document item in the Project View context menu e File Open As item in the main menu To simply open the document select one of the following e Open item in the main toolbar e File Open item in the main menu 4 4 Creating New Project 21 Unipro UGEN
121. e you have the Repeat Finder plugin installed The Dotplot features are described in more details below 6 5 1 Creating Dotplot To create a dotplot select the Tools Build dotplot main menu item The Build dotplot from sequences dialog will appear 6 5 Dotplot 101 Unipro UGENE User Manual Version 1 12 3 U Build dotplot from sequences File with first sequence D penomes NC_014267 fa C Join all sequences found in the file C Compare sequence against itself File with second sequence D genomes NC_014287 fal C Join all sequences found in the file Here you should specify the File with first sequence Also you should either check the Compare sequence against itself option or select the File with second sequence Optionally you can select to Join all sequences found in the file for the first and or for the second file If you select to join the sequences you can also select the Gap size The gap of the specified size will be inserted between the joined sequences After you press the Next button the dialog to configure the dotplot parameters will appear U Dotplot Dolplo parameters X axe sequence NC 014267 sequence Y AXR sequence HC 014257 sequence 7 Search direct repeats _ Custom algorithm Minimum repeat length Repeats identity The following parameters are available X axis sequence the sequence for the X dotplot axis Y axis sequence the sequence for the Y
122. ee ee ee hee ee eee eas 240 1224 t lt TT ORPS enero AA A 240 1225 Finding Repeats 1 9 05 ars a 241 12 2 6 Finding Pattern Using Smith Waterman Algorithm 241 12 2 7 Adding Phred Quality Scores to Sequence 2 004 242 12 2 8 Local BLAST Search oo soso sonar a 242 12 29 Local BLAST F Search 2 2 20 02 ERE 243 12 2 10 Remote NCBI BLAST and CDD Requests 244 12 2 11 Annotating Sequence with UQL Schema 244 12 2 12 Building Bowtie Index 00000002 ee 245 12 2 13 Aligning Short Reads with Bowtie oaa a a a a a 0 004 245 12 2 14 Building Profile HMM Using HMMER2 0 0 0 0 246 12 2 15 Searching HMM Signals Using HMMER2 2 0 0 0 246 12 2 16 Aligning with ClustalW 6 6 6 bee eee ee ee Eee ee 247 12 2 17 Aligning with Kalign 2 aa 247 12 2 18 Aligning with MABET 2 24654 wee ee Odo ed EY SSR ES we ee ew es 247 12 2 19 Aligning with MUSCLE a 248 12220 B ilding PFM s a AR As 248 12 2 21 Searching for TFBS with PFM 4 lt 4 sie ido roscas 249 12 2 22 Building PWM 25 2am 44 wwe Bol ee Awe we wR ES we es 250 12 2 23 Searching for TFBS with Weight Matrices 250 12 2 24 Building Statistical Profile for SITECON 0 0 0 2 251 12 2 25 Searching for TFBS with SITECON 2 2 e 251 12 2 26 Fetching Sequence from Remote Database 252 12 3 Creating
123. ementary or Both strands Search in for nucleotide sequences you can select the Translation value for this option In this case the input pattern will be searched in the amino acid translations Region specifies the sequence range where to search for a pattern You can search in the whole sequence or specify a custom region Other settings Other settings Remove overlapped results T Limit results number to 100000 This group contains additional common settings Remove overlapped results annotates only one of the overlapped results Limit results number to limits number of the searched results to the specified value Annotations settings Save annotation s to Existing annotation table Cy murine features 1 E C Create new table sfoigl MyDocument_8 gb Annotaton parameters Group name lt auto gt S Annotation name misc_feature A In the Save annotation s to group you can set up a file to store annotations It could either an existing annotation table object or a new document file In the Annotation parameters group you and specify the annotations name and a group in the Annotations Editor 5 8 11 Editing Sequence If the document is not locked it is possible to edit the sequence 5 8 Manipulating Sequence 61 Unipro UGENE User Manual Version 1 12 3 Insert subsequence Remove Remove subsequence The Edit sequence submenu is available in the Actions ma
124. en wikipedia org wiki STAT1 STAT Signal Transducer and Activator of Transcription http en wikipedia org wiki STAT _ protein TIFI Thyroid transcription factor 1 http en wikipedia org wiki NK2_ homeobox 1 USF Upstream stimulatory factors http en wikipedia org wiki USF1 yyl Is a protein that in humans is encoded by the YY1 gene http en wikipedia org wiki YY1 Prokaryotic C J oson OOOO AgaR N acetylgalactosamine repressor AgaR negatively controls the expression of the aga gene http biocyc orgy ktOtdr substring search type NIL amp object AgaR amp quickSearch Quick Search AgaC AgaC is the Enzyme IIC domain of a predicted N acetylgalactosamine transporting PEP http biocyc org HlePdrdenbgthiogshotransferase system search type NILMobject AgaC amp quickSearch Quick Search ArcA ArcA transcriptional dual regulator http biocyc org ECOLI substring search type NIL amp object ArcA amp quickSearch Quick Search ArgR ArgR complexed with L arginine represses the transcription of several genes involved in http biocyc org Kidsyhfbebstiang transport of arginine transport of histidine and its own synthesis and search type NILMobgtotat doa pie amp asuick Seginaine data kai Search CpxR DNA binding response regulator in two component regulatory system with CpxA http biocyc org ECOLI substring search type NIL amp object CpxR amp quickSearch Quick Search cAMP receptor protein http biocyc org
125. ence View Extensions Unipro UGENE User Manual Version 1 12 3 So you can adjust it to an appropriate size It is possible to rotate the circular view using the mouse wheel Use the Export gt Save circular view as image context menu or the Actions main menu item to save the image of the circular view 4k _ SCPLA 6318 bp 7 tai LU k mee Ee Te te e y s Taa a a a T tatnen exon GZ Goto position Crta TT MERA Select sequence region Chrl A n rea conflict Ay New annotation Chrl Can A poset a Ta Tea comlict NN a conflict 0 Fo r v s ds F N N K w conflict Select Z OS comPlict mB NE me conflict ABS sill conflict Make an alignment of selected annotations Jence Edit sequence k Ear ds CET HET T TTL Different file formats are available including png bmp jpg svg and pdf Note that if a sequence file contains several sequences it is possible to view the circular views of the sequences in the same Circular Viewer area repeat regen ABS 9343 1 misc recom it is possible to resize the areas Ma ml JA d A A AAOGITAZ 1 mRNA A mRNA Fa L conflict AABSOU4 1 1 repeat regia You can work with these circular views at the same time 6 1 Circular Viewer 85 Unipro UGENE User Manual Version 1 12 3 6 2 3D Structure Viewer The 3D Structure Viewer is intended for visualization of 3D structures of biological molecules Using the 3D Structure View
126. ence must be selected in the Project View or there must be an active Sequence View window opened If the selected sequence is nucleic and profile HMM is built from amino alignment the sequence will be automat ically translated and searched in all possible frames 6 totally If a profile HMM is built for nucleic alignment the search is performed for both strands direct and complement The HMM3 search accepts the HMMER2 HMM profiles amino only as a backward compatibility feature An interesting post about using the HMMER2 models with the HMMER3 is available on the Sean Eddy s blog http selab janelia org people eddys blog p 117 U HMM3 search Input and output Reporting tresholds Acceleration Other Query profe HMM les CI Save annotation s to O Existing annotationtable H Create new table C MyDocument gb lead Annotation parameters Sup ETE Annotation name 4 Remote run Cancel 11 14 HMM3 199 Unipro UGENE User Manual Version 1 12 3 For example reporting thresholds options can be configured using the dialog U HMM3 search Input and output Reporting tresholds Acceleration Other Report domains with E value less than Report domains with score greater than Score treshold Use profile s G4 gathering cutoffs use profile s NC noise cutoffs use profile s TC trusted cutoffs Number of significant sequences for domain E value calculation The search results are stored a
127. ensus are extreme cases of the M consensus being for fractions of 1 and 0 5 respectively e Majority Rule specifies that a set of species is included in the consensus tree if it is present in more than half of the input trees Fraction becomes available when the Consensus type parameter is set to M1 Specifies the fraction Save tree to file to save the tree built Press the Build button to build a tree with the parameters selected 7 4 2 MrBayes The Building Phylogenetic Tree dialog for the MrBayes method has the following view 122 Chapter 7 Alignment Editor Unipro UGENE User Manual Version 1 12 3 Build Phylogenetic Tree Tree bulding method A y MrBayes Settings Model Substitution model Rate Gamma MCMC Chain length 4 Subsampling frequence Burn in length 4 Heated chains Heated chain temp 4 Random seed save tree to D trunk data samples CLUSTALW COLnwk Remember Settings Cancel There are two steps to a phylogenetic analysis using MrBayes 1 Set the evolutionary model 2 Run the Markov chain Monte Carlo MCMC analisys The evolutionary model is defined by the following parameters Substitution model specifies the general structure of a DNA substitution model This parameter is available for the nucleotide sequences It corresponds to the Nst setting of MrBayes You may select one of the following e JC69 Nst 1 e HKY85 Nst 2 e GTR Nst 6
128. equence To zoom a sequence in the Sequence zoom view you can use one of the zoom button on the sequence toolbar 4 el L 3 3 Ke E e There are standard Zoom In and Zoom Out buttons Additionally you can zoom to a selected region using the Zoom to Selection button To restore the default view of the Sequence zoom view when the sequence is not zoomed use the Zoom to Whole Sequence button 5 8 5 Creating New Ruler You can create any number of additional rulers by clicking the Ruler gt Create new ruler context menu item Gok ition Ctrl ba o to position F A k Vi T i Select sequence region Ctrl ae L E Jh New annotation Ctrl M G T G M _ AGCTTAAGTAAC Copy K 394 36 38 40 42 44 TCGAATTCATTS L E L L j e Select _ Add Analyze Align Expart Edit sequence i E H a Remove El tl Annotations highlighting w Show main ruler GS ia ton T F E E G 5 p New ruler with a custom offset L E E D T H L A AA AE AAA AA TARTE A AA FA 11 d C o an d ed 40 40 m u d GC a A A d AC 39 dn d dd de da Fon 7n En fe 2 n oOo Eda T 5 8 6 Selecting Amino Translation The default value for the genetic code is read by UGENE from the sequence file when it is available You can also select the genetic code for the sequence using the Amino translation menu button on the sequence toolbar Note All analysis routines like HMMER OFR finding etc will use th
129. equence around selected annotations Align K Export K Edit sequence K Remove b A Rulers Annotations highlighting And select the Select Sequence between selected annotations item in the context menu The Sequence around selected annotations item selects the selected annotations and the sequences between these annotations fay A C_001363 sequence i 500 5 terminal rep 3 PENE ATGGAAAAAT 1700 1705 1710 1715 a e l 1755 1761 CTCCTTTGCCTGTCGAAGTT a Annotations MyDocument_11 9b misc_feature 0 3 R misc_feature O misc_feature 1728 1737 B misc_feature S08 NP HD11 363 features Irma irine 58 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 Another way to select a sequence around annotations is to hold Shift and Ctrl keys while clicking on the annotations either in the Sequence details view or in the Sequence zoom view 5 8 9 Copying Sequence The selected sequence region an annotation sequence or their amino translations can be copied to clipboard 1 By pressing the corresponding buttons in the global toolbar rd sequence Actions to copy selected sequence regions 2 Using the following shortcuts e Ctrl C copies direct sequence strand e Ctrl T copies direct amino translation e Ctrl Shift C copies reverse
130. er Optional 12 2 CLI Predefined Tasks 245 Unipro UGENE User Manual Version 1 12 3 seed seed for pseudo random number generator Number Optional seedlen number of bases on the high quality end of the read to which the n ceiling applies The lowest permitted setting is 5 and the default is 28 Bowtie is faster for larger values of seedlen Number Optional tryhard finds valid alignments when they exist including paired end alignments Boolean Optional chunkmbs number of megabytes a certain thread is given to store path descriptors in best mode Number Optional best guarantees that reported singleton alignments are best in terms of stratum i e number of mismatches or mismatches in the seed in the case of n mode and in terms of the quality values at the mismatched position s Example ugene bowtie reads r1 fa r2 fa r3 fa ebwt refindex out result aln 12 2 14 Building Profile HMM Using HMMER2 Task Name hmm2 build Builds a profile HMM using the HMMER2 tools Parameters in semicolon separated list of input multiple sequence alignment files String Required out output HMM file String Required name name of the profile HMM String Optional Default hmm profile calibrate enables disables calibration Boolean Optional Default true seed random seed a non negative integer Number Optional Default 0 Example ugene hmm2 b 1 ld 1n CBS sto
131. er you can work with data from the Protein Data Bank PDB a repository for the 3D structural data of large biological molecules such as proteins and nucleic acids maintained by the Worlwide Protein Data Bank http www wwpdb org wwPDB You can work as well with data from the NCBI Molecular Modeling DataBase http www ncbi nlm nih gov sites entrez db structure MMDB also known as Entrez Structure a database of experimentally determined structures obtained from the RCSB Protein Data Bank http www pdb org Find the description of the 3D Structure Viewer features below 6 2 1 Opening 3D Structure Viewer The 3D Structure Viewer is opened automatically when you open a PDB or MMDB file For example open SUGENE data samples PDB 1CF7 PDB The 3D Structure Viewer adds a view to the upper part of the Sequence View JER E 3D Structure Viewer Active view C a Display Links cy 1CF chain 1 sequence amino gt E be 5 10 15 20 ou 30 K 40 45 30 55 50 BF SRHE RS LELLITEFVS LLQEARDEVLDLELAADTLAVAQERRIYDITNVLEGCIGLIEREKSEKNSIQWE f ES Y 1CF chain 1 annotation 1CF7 PDB ES 9 1CF chain 2 annotation 1CF7 PDB Notice the Links button on the toolbar When you click the button the menu appears with quick links to online resources with detailed information about the molecule opened e PDB Wiki 86 Chapter 6 Sequence View Extensions
132. erences between the sequences 1 Matches A match between sequences looks like a diagonal line on the dotplot graphic representing the continuous match or repeat 6 5 Dotplot 105 Unipro UGENE User Manual Version 1 12 3 106 2 Frame shifts a Mutations Mutations are distinctions between sequences On the graphic they are represented by gaps in diagonal lines They interrupt matches b Insertions Insertions are parts of one sequence that are missed in the another while the surrounding parts match In other words an insertion is a subsequence that was inserted into a sequence Graphically insertions are represented by gaps which lie only on one axis A little shift towards the other axis indicates a mutation involved c Deletions A deletion is a subsequence that was deleted from a sequence A deletion from sequence A found in sequence B can be considered as an insertion into sequence B and contained in sequence A a M L rad 1 10k 20k 30k 40k 50k ED TUK 80k SOk 100k ta 120k 140 426 ee shea o z E A ES R A L Tr Fak a 0 0 L RETS afe rE AA a TT HEE Sen qe hide R can EEE ae RANK ean HE ieee kbs ee PETI r a A cZ Sale 10 L tn AS a ir T Wingy a os i XY LS LS Bee ee 4 pert E O LETRA Ha AR Ke LE DN 7 Le a O A ag R aR He L d F e E N LT T di 634 a hie uy El a pC veda a E sat PES H i L EES LS LS KH r L Ok aL ma bie E
133. ers Directory for custom elements with command line tools C Users ExternalToolConfig Directory for induded schema elements C Users IndudedWorkers W Run tasks in separate process 4 16 UGENE Application Settings 49 Unipro UGENE User Manual Version 1 12 3 4 16 8 Genome Aligner Use this tab to configure the Genome Aligner settings General Genome Aligner Resources Network Directories File Format Directory for built indexes Gaia C Users AppData Local Temp ugene_tmp p1988 aligner OpenCL Workflow Designer Genome Aligner External Tools 44 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 4 16 9 CUDA If you have a NVIDIA video card that supports Compute Unified Device Architecture CUDA you can use it to speed up some calculations in UGENE To do it install the latest video driver and check the corresponding check box General CUDA Resources Network File Format Logging OpenCL External Tools Genome Aligner CUDA Workflow Designer The following CUDA enabled GPUs are detected Check the GPUs to use for accelerating algorithms computations GeForce GT 220 435 Mb Now you can for example use OpenCL optimization for the Smith Waterman algorithm 4 16 UGENE Application Settings AB Unipro UGENE User Manual Version 1 12 3 4 16 10 External Tools Here you can set the paths to the external tools executable files General Resources Network File Format
134. etails view gt vV L L I T L I K 5 H LE PG L E I H v 2090 2095 21k 2105 2110 2115 210 2125 2130 2135 2140 2146 When the sigma button in the right part of the Sequence overview is pressed density of annotations in the sequence is shown For example in the picture below there are annotations in the parts of the sequence that are marked with dark grey color 1 5k Zk 22k 24k 26k 28k 3123 P See also e Sequence Zoom View e Sequence Details View 50 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 5 5 Sequence Zoom View The Sequence zoom view is designed to provide flexible tools for navigation in large annotated sequence regions The most Sequence zoom view space is used to visualize annotations for the sequence The annotations are organized in rows by their names If two annotations with the same name overlap an extra row is created For every row the name and the total number of annotations in the row are shown with a light grey text at the left part of the area NC_004716 sequence dna a O c mR amp amp Y a Y 500 ik 1 5k AA MG Al salt h Hame and number ofjahnotations in the row i RR T i Managing rows Annotations J i i sorted by name l i gt ngn ae 28 gt 1 500 1k 1 5k 2k 2 5k 3k 3 5k 4k 45k Sk 5 5k Tk T 699 he 6k Below the annotation rows there is a ruler to show coordinates in the sequence 5 5 1 Managing Zoom View Row
135. f the Bairoch format For details about the format refer http rebase neb com rebase rebase f19 html You can also save the currently selected enzymes to a file Click the Save selection button to do that 11 8 3 Filtering by Number of Hits To filter the results by the number of restriction sites found for an enzyme check the Filter by number of results check box and input the minimum value and the maximum value of hits 11 8 4 Excluding Region To exclude a sequence region from the search check the Exclude region check box and input the start and the end positions of the region If a subsequence has been selected before opening the dialog you can click the Selected button to automatically fill the values with the selected subsequence s start and end positions 11 8 5 Circular Molecule To consider the sequence as circular and be able to search for restriction sites between the end and the beginning of the sequence check the Circular molecule option Example Let s consider e The sequence is CTGC CAC e Aarl restriction enzyme with recognition sequence CACCTGC has been checked In this case if the Circular molecule option has been checked the restriction site will be found If it hasn t been checked the restriction site won t be found in this position 11 8 Restriction Analysis 173 Unipro UGENE User Manual Version 1 12 3 11 8 6 Results When at least one enzyme has been selected and the OK button has bee
136. fault ncbi blastn The following databases are available e ncbi blastn for nucleotide sequences e ncbi cdd for amino acid sequences e ncbi blastp for amino acid sequences out output Genbank file String Required eval specifies the statistical significance threshold for reporting matches against database se quences Number Optional Default 10 hits maximum number of hits that will be shown Number Optional Default 10 name name of the result annotations If not set name will be specified with the cdd result or the blast result String Optional Default cdd or blast short optimizes search for short sequences Boolean Optional Default false blast output path to the file with the NCBI BLAST output only for the ncbi blastp and ncbi blastn databases Boolean Optional Default the file is not saved Example ugene remote request in seq fa db ncbi blastp out res gb 12 2 11 Annotating Sequence with UQL Schema Task Name query Annotates a sequence in compliance with a UGENE Query Language UQL schema This allows to analyze a sequence using different algorithms at the same time imposing constraints on the positional relationship of the results To learn more about the UQL schemas read the Query Designer Manual http ugene unipro ru documentation html Parameters in semicolon separated list of input sequence files String Required out output
137. ference at bases of quality values ql and q2 then the score at the difference is max 0 min q1 q2 b where b is the specified value The specified value should be more than 15 The difference score of an overlap is the sum of scores at each difference Max qscore sum at differences d remove an overlap if its difference score is greater than the specified value The specified value should be more than 20 Similarity score of an overlap parameters 220 The following parameters are used to calculate the similarity score of an overlapping alignment Match score factor m a match at bases of quality values q1 and q2 is given a score of m min ql q2 where m is the specified value The specified value should be more than 0 Mismatch score factor n a mismatch at bases of quality values q1 and q2 is given a score of n min q1 q2 where n is the specified value The specified value should be less than 0 Gap penalty factor g a base of quality value q1 in a gap is given a score g min ql q2 where g is the specified value q2 is the quality value of the base in the other sequence right before the gap The specified value should be more than 0 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 The similarity score is caclulated as the sum of scores of each match each mismatch and each gap Based on this value and the following value some overlaps are removed Overlap similarity score cut
138. file Fle format Add to project Select a file to export the read to and the file format The read can be exported either to a FASTA or FASTQ file When the parameters are set click the Export button The read is exported to the file and if the Add to project check box has been checked it is added to the current project from where you can open it 8 7 2 Exporting Visible Reads To export all reads visible in the Reads Area select the Export gt Visible reads item in the Reads Area context menu The Export Reads dialog appears The dialog is described in the Exporting Read section 134 Chapter 8 Assembly Browser Unipro UGENE User Manual Version 1 12 3 8 7 3 Exporting Consensus To export a consensus sequence of the assembly select either the Export consensus item in the Consensus Area context menu or the Export gt Consensus item in the Reads Area context menu The Export Consensus dialog appears Export to file E Documents example alignment_consensus fa Sequence name Example sequence_consensus Consensus algoritm Keep gaps Region Whole sequence Add to project Select a file and the file format The consensus can be exported to a FASTA FASTQ GFF or GenBank file Modify if required the exported sequence name and choose the consensus algorithm The consensus is exported with gaps if the Keep gaps check box has been checked Also you can select the exporting region It can be either a Wh
139. frames e tblastx translates the query nucleotide sequence in all six possible frames and compares it against the six frame translations of a nucleotide sequence database e makeblastdb formats protein or nucleotide source databases before these databases can be searched by other BLAST tools BLAST home page http blast ncbi nlm nih gov Blast cgi CMD Web PAGE TYPE BlastHome To make BLAST or BLAST tools available from UGENE 1 Install the required verion of BLAST or BLAST on your system 2 Set the paths to the executables you are going to use on the External tools tab of UGENE Application Settings dialog After you ve finished this configuration you can access the tools from the Tools BLAST submenu of the main menu Creating Database To format a BLAST database do the following e f you re using BLAST open Tools BLAST gt FormatDB e f you re using BLAST open Tools BLAST gt BLAST make DB The Format database dialog appears 11 22 External Tools 229 Unipro UGENE User Manual Version 1 12 3 U Format database Input data Or select directory with input files PO File filter Include files filter fa fasta Exclude files filter pal Type of file s LS protein Output settings Select the path to save the database into Base name for BLAST fles Title for database e Format Cancel Here you must select the input files If all the files you want
140. ftware See also Sequence View PFM pfm A file format for a position frequency matrix See also Weight Matrix PWM pwm A file format for a position weight matrix See also Weight Matrix seq P F A raw sequence format See also Sequence View The Sequence Alignment Map SAM for mat is a generic alignment format for storing PDB pDRAW32 Oo G read alignments against reference sequences See also Assembly Browser Bowtie UGENE Genome Aligner SCF scf It is a Standard Chromatogram Format See also Chromatogram Viewer SITECON sitecon A file format to store TFBS profile See also S TECON N 54 Chapter 13 APPENDIXES Unipro UGENE User Manual Version 1 12 3 Stockholm sto A multiple sequence alignments file format See also Alignment Editor Swiss Prot txt sw An annotated protein sequence in format of the UniProtKB Swiss Prot database See also Sequence View 13 1 Appendix A Supported File Formats 255 Unipro UGENE User Manual Version 1 12 3 13 1 2 UGENE Native File Formats UGENE database file Short FASTA UGENE Work flow Designer schema UGENE Query Designer schema Workflow ement command tool Reads el for line 13 1 3 Other File Formats File format extension image formats Comment bmp jpg png tiff svg etc pdf 256 Stores a dotplot of a sequence See also Dotplot UGENE database files s
141. g ciphh4 phderanst sagechP Qyidnd Guaoshhoenolpyruvate PEP systems MODE Molybdate responsive transcription factor http biocyc org ECOLI substring search type NIL amp object MODE amp quickSearch Quick Search NAC Nitrogen assimilation control http biocyc org ECOLI substring search type NIL amp object NAC amp quickSearch Quick Search NAGC new2 N acetylglucosamine http biocyc org ECOLI substring search type NIL amp object NAGC amp quickSearch Quick Search N acetyl neuraminic acid regulator http biocyc org ECOLI substring search type NILMobject NANR amp quickSearch Quick Search Nitrate nitrite response regulator NarL http biocyc org ECOLI substring search type NIL amp object NARL amp quickSearch Quick Search Continued on next page 11 11 SITECON Unipro UGENE User Manual Version 1 12 3 Table 11 2 continued from previous page Nitrate nitrite response regulator NarL http biocyc org ECOLI substring search type NIL4object NARL quickSearch Quick Search Nitrate nitrite response regulator NarP http biocyc org ECOLI substring search type NIL amp object NARP amp quickSearch Quick Search NirC is a nitrite transporter which is a member of the FNT family of formate and nitrite http biocyc org EEQtp stidstring search type NILMobject NIRC amp quickSearch Quick Search OmpC OmpC is a member of the GMP family http biocyc org ECOLI substring search type NIL amp obje
142. g hotkeys are available for the Assembly Overview Shift move mouse Zoom the Assembly Overview to selection Ctrl wheel Zoom the Assembly Overview Alt click Zoom the Assembly Overview in 100x Shia the AR 8 9 2 Reads Area Hotkeys The following hotkeys are available for the Reads Area 138 Chapter 8 Assembly Browser Unipro UGENE User Manual Version 1 12 3 8 9 Assembly Browser Hotkeys 139 9 Phylogenetic Tree Viewer The Phylogenetic Tree Viewer is intended to display a phylogenetic tree built from an alignment or loaded from a file e g a Newick file U MyProject UGENE File Actions Settings Tools Window Help DOBBS RNO a T COl13 Tree Podisma_ 3pporensis lsophya_altaica EF540820 tz 1 Project Roeselians roeseli Bicolorana_bicolor_EF5408 30 MEE 0 051 00 0 077 Montana montana Metrioptera_ japonica _EF540837 Gampsockis_sedakowi_EF540828 Deracantha derscantoides EFE ychia_baranos Tettigonis_wridissima 0 025 0 101 rg Conocephalus_discolor eee Conocephalus sp 0 068 0 018 0 0 Mecopoda elongata lehigaki_ J Conocephalus percaudata i ES 0 Mecopoda elongata Sumatra_ 0 007 Mecopoda_sp Malaysia _ 0 101 Phaneroptera_Blcata To load a tree from a file follow the instruction described in the Opening Document paragraph For example you may open the SUGENE data samples Newick COI nwk sample file provided within UGENE package To build a tree from a multiple sequen
143. g sequences and their annotations See also Sequence View FASTA fa mpfa fna One of the oldest and simplest sequence file fsa fas fasta sef segs See also Sequence View FASTQ fastq A file format used to store a sequence and its corresponding quality scores It was originally developed at the Wellcome Trust Sanger Institute See also Sequence View Genbank ob gbk gen A rich format for storing sequences and as genbank sociated annotations See also Sequence View 253 Unipro UGENE User Manual Version 1 12 3 GFF UQ h r The Gene Finding Format GFF format is used to store features and annotations See also Sequence View A file format to store HMM profiles See also HMM2 HMM3 ASN 1 format used by the Molecular Model ing Database MMDB See also 3D Structure Viewer y 3 Jo ctr Z T A multiple sequence alignments file format See also Alignment Editor Newick A multiple sequence alignments file format See also Alignment Editor A tree file format See also Building Phylogenetic Tree Phy logenetic Tree Viewer nwk newick A multiple alignment and phylogenetic trees file format See also Alignment Editor Building Phy logenetic Tree Phylogenetic Tree Viewer The Protein Data Bank PDB format allows to view the 3D structure of the sequence See also 3D Structure Viewer A sequence file format used by pDRAW32 so
144. ghbour Joining method has the following view Build Phylogenetic Tree Tree building method PHYLIP Neighbor Joining M Distance Matrix Distance matrix model _ Gamma distributed rates across sites Coefficient of variation of substitution rate among sites Transition ftransversion ratio 2 00 C Bootstrapping and Consensus Tree Number af replicates 100 Seed must be odd Consensus type Majority Rule extended Fraction Save tree to im Files Unipro UGENE data samples CLUSTALW CO112 nwk The following parameters are available Distance matrix model model to compute a distance matrix The following values are available for a nucleotide multiple sequence alignment e F84 e Kimura e Jukes Cantor e LogDet 7 4 Building Phylogenetic Tree 121 Unipro UGENE User Manual Version 1 12 3 The following models are available for a protein alignment e Jones Taylor Thornton e Henikoff Tillier PMB e Dayhoff PAM e Kimura Gamma distributed rates across sites specifies to take into account unequal rates of change at different sites lt is assumed that the distribution of the rates follows the Gamma distribution Coefficient of variation of substitution rate among sites becomes available if the Gamma distributed rates across sites parameter is checked Specifies the coefficient of the distribution of the rates Transition transversion ratio expected ratio of transitions to transversions To enable bootstr
145. gnment into This parameter is required Prebuilt index check this box to use an index file instead of a source reference sequence Also you can build it manually SAM output always save the output file in the SAM format the option is disabled for BWA Short reads each added short read is a small DNA sequence file At least one read should be added You can also configure other parameters They are the same as in the original BWA you can read detailed description of the parameters on the BWA manual page http bio bwa sourceforge net Select one of the following parameters that correspond to the n option in the original BWA 212 Max diff n maximum edit distance An integer value should be input Missing prob n the fraction of missing alignments given 2 uniform base error rate A float value is used Max gap opens o maximum number of gap opens Index algorithm a algorithm for constructing BWT index It implements three different algorithms 1 is designed for short reads up to 200bp with low error rate lt 3 It does gapped global alignment w r t reads supports paired end reads and is one of the fastest short read alignment algorithms to date while also visiting suboptimal hits 2 bwtsw is designed for long reads with more errors It performs heuristic Smith Waterman like alignment to find high scoring local hits Algorithm implemented in BWT SW http seqanswers c
146. gs can be configured in the UGENE Application Settings 4 2 4 Notifications The Notifications component shows notifications for tasks reports Bookmarks Log Notifications INFO 16 54 Converting assembly from Klebsislla sort bam to Klebsisila DETAILS 16 54 Importing assembly pkF70 1 of 3 DETAILS 16 55 Su DETAILS 16 55 Importing assembly pkf140 2 of 3 DETAILS 16 57 Succesfully imported 416287 reads for assembly pkf1 DETAILS 16 57 Importing assembly pKF9 3 of 3 INFO 17 04 Canceling task Convert BAM to UGENE database Klebsisl INFO 17 01 Canceling task BAM SAM file import Klebsislla sort bam INFO 17 01 Canceling task C If a task has finished without errors the notification is blue If an error has occured during the task execution the notification is red To open a task report click on the corresponding notification See an example of a task report below 4 2 UGENE Window Components 19 Unipro UGENE User Manual Version 1 12 3 U Task report DigestSequenceTask Seles F Task report DigestSequenceTask status Finished time 0 00 00 015 Digest into fragments murine gb linear Generated 10 fragments From EcoRV 138 To EcoRV 214 77 bp From EcoRV 214 To EcoRV 3227 3014 bp From EcoRV 3227 To Ball 3698 472 bp From BglII 3702 To HindIII 5023 1322 bp From HindIII 5027 To Clal 5104 78 bp To remove a notification f
147. hapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 U Align with ClustalW El Ed Advanced options _ Weight matrix BLOSUM F Iteration type Max iterations Frotein gap parameters _ Gap separation distance _ Hydrophilic gaps off _ No end gap separation penalty _ Residue specific gaps off Cancel The following parameters are only available for protein sequences Gap separation distance tries to decrease the chances of gaps being too close to each other Gaps that are less than this distance apart are penalized more than other gaps This does not prevent close gaps it makes them less frequent promoting a block like appearance of the alignment Hydrophilic gaps off increases the chances of a gap within a run of hydrophilic amino acids No end gap separation penalty treats end gaps just like internal gaps to avoid gaps that are too close Residue specific gaps off amino acid specific gap penalties that reduce or increase the gap opening penalties at each position in the alignment or sequence For example positions that are rich in glycine are more likely to have an adjacent gap than positions that are rich in valine MAFFT Originally MAFFT http mafft cbrc jp alignment software is a multiple sequence alignment program for unix like operating systems However currently it is available for Mac OS X Linux and Windows It is used for both nucleotide and protein sequences MAFFT ho
148. he sequence or its chromatogram simultaneously The type of an object is indicated by the symbol in the square brackets and the icon near the object To 9 t y 5 141Xberezikow chromatogram Jiinfo AINS chain 2 sequence SINS chain 2 annotation AINS chain 3 sequence SINS chain 3 annotation AINS chain 4 sequence SINS chain 4 annotation Below is the list of object types supported by the current version of UGENE Object types 26 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 You can edit names of particular objects such as sequence objects by selecting them in the Project View and then pressing F2 To be able to do so the document containing the target object must be unlocked To see the list of all available views for a given object select the object and activate the context menu inside the Project View window and select the Open view submenu Project x P NC_001363 sequence Name filter E 7 Objects E S Spall Nusa H Unload selected documents n E T p E V G Add K Import K Export K MH 6 5 10 12 14 16 18 20 22 24 2 Remove K ETTTCTGGGGTGGGCATCECACI bP L2 W G Y T F W G G I P P P NC_001363 features murine gb The picture above illustrates an option to visualize the selected DNA sequence object using the Sequence View a complex and extensible Object View that focuses on visualization of sequence objects in combination
149. he threshold specified Ps F Z L i Cancel Ci E Bit There are several modes e JalView Default it is based on the JalView algorithm Returns if there are 2 characters with high frequency Returns symbol in lower case if the symbol content in a row is lower than the specified threshold e ClustalW emulates the ClustalW program and file format behavior e Levitsky this algorithm is proposed by Victor Levitsky to calculate consensus of DNA alignments At first it collects global alignment frequencies for every symbol using extended 15 symbols DNA alphabet Then for every column it selects the rarest symbol in the whole alignment with percentage in the column greater or equals to the threshold value e Strict the algorithm returns gap character if symbol frequency in a column is lower than the threshold specified Unipro UGENE User Manual Version 1 12 3 7 2 Working with Alignment This chapter explains how to work efficiently with the Alignment Editor You will learn how to modify an alignment remove gaps align sequences copy and paste regions add new sequences and extract subalignments as new alignments 7 2 1 Undo Redo Framework The editor tracks all modifications of the aligned sequences When a modification happens the current state of the multiple sequence alignments object is being recorded You can apply any previous state and redo the modifications using the corresponding b
150. hown at the bottom part of the dialog 76 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 Le Results preview ignored ignored ignored ignored Raw file preview name start end quall al 10 20 pp a1 19 40 ppe The preview table headline indicates the types of the information contained in the corresponding columns By default the values are ignored To specify a column role click on the corresponding headline element Add result file U Select the role of the column Column separator va Column role File parsing Add offset Column separa LS Annotation end position Indusive O script L 7 Annotation length First lines to skip Annotation name Skip all lines starts Qualifier L Interpret multi Ignore this column Default annotation na Results preview name start position end position inclusive qualifier the_qualifier 10 20 pp 19 40 ppe The annotation start and end positions must be specified It is possible to add an offset to every read start position by checking the Add offset checkbox and to shorten annotations by one from the end by uncheking the Inclusive checkbox When all the roles are specified press Run With the Add to project checkbox specified and a Sequence View opened on success you will see the Sequence View with annotations linked 5 10 Manipulating Annotations 77 Unipro UGENE User Manual Version 1 12 3 12 4 6
151. i l l i S 2k Sequence zoom view Current sequence actions E E E L S L L L 5 T b A N A 5 L M Z F H T L S F I W H I E SAA TTTCGATGGCOTATGCCAATTOTCCACATTCACTCOT ee ee eee 12 4 6 8 10 12 14 16 18 20 22 24 26 26 30 32 34 36 35 40 42 44 46 46 50 52 54 57 CTTGOGCTTACOGAGAGAGCGAGAAAGOCTACECATACGGTTAACAGGTOTAAGTGAGCA Sequence details view Marne value E cy AF177670 standard features AF177670 emb i The annotations object name Annotations editor ATGTCTTGTTTAATGGTTGAGAGGT GTGGCGAAATCTTGTTTGAGAACCC Ble zr ali lh i laa alas aa wn il on atc 12 4 6 6 10 12 14 16 18 20 22 2 26 28 30 32 4 36 33 40 42 44 46 4850 a ey human dna O LT C9 100 200 300 400 500 600 700 son 900 100 200 300 400 S500 600 700 500 900 ATGTCETTGTTTAATEGTTGAGAGGTGTGGCGAMATCETTGTTTGAGAACEE A _ gt A A gt A gt gt T a gt gt T p gt gt A 12 4 6 8 10 12 14 16 18 20 22 4 26 26 350 32 354 36 39 40 42 d 46 4650 No active tasks lt _ You can change the focus by clicking on the corresponding sequence area All sequences that are not in focus have the sequence name and icon disabled The bottom area of the Sequence View is the Annotations editor lt contains a tree like structure of all annotations available for all sequences shown in the Sequence View and can be used to perform various actions on annotations create a new annotation modify the existing one gro
152. ies the algorithm that should be used to search for a pattern The algorithm can be one of the following e InsDel there could be insertions and or deletions i e a pattern and the searched region can vary in their length You can specify the percentage of the pattern and a searched region match in the field nearby Note that this value also depends on the pattern length and is disabled when the pattern hasn t been specified e Substitute a pattern may contain characters different from the characters in the searched region When this algorithm has been selected you can also specify the match percentage and additionally it is possible to take into account ambiguous bases e Regular expression a regular expression may be specified instead of a pattern For example character matches any character matches zero or more of any characters There is also the Limit result length option that specifies the maximum length of a result Search in Search in Strand Both lt Search in Sequence Region Whole Sequence 60 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 In this group you can specify where to search for a pattern in what region and in which strand for nucleotide sequences Also for nucleotide sequences it is possible to search for a pattern on the sequence translations Strand for nucleotide sequences only Specifies on which strand to search for a pattern Direct Reverse compl
153. iew synchronization lock button on the left 108 Chapter 6 Sequence View Extensions aduanbas Az TO ON 6 5 Dotplot Unipro UGENE User Manual Version 1 12 3 33 799 SUK 60k fa 167 50k T eens eee 60k as 8er1O J azuanbas TUK S a NC 014267 sequence min length 15 identity 100 109 7 Alignment Editor 7 1 Overview This chapter gives an overview of the Alignment Editor components and explains basic concepts of browsing an alignment 7 1 1 Alignment Editor Features The Alignment Editor is a powerful tool for visualization and editing DNA RNA or protein multiple sequence alignments The editor supports different multiple sequence alignment MSA formats such as ClustalW MSF and Stockholm The full list of file formats supported in UGENE is here The editor provides interactive visual representation which includes e Navigation through an alignment e Optional coloring schemes for example Clustal Jalview like etc e Flexible zooming for large alignments e Export publication ready images of alignment e Several consensus calculation algorithms Using the Alignment Editor you can e Perform multiple sequence alignment using integrated MUSCLE and KAlign algorithms e Edit an alignment delete copy paste symbols sequences and subalignments e Build phylogenetic trees e Generate grid profiles e Build Hidden Markov Model profiles to use with HMM2 HMM3 tools 110 Unipro UGENE User Manual
154. ighlights all columns of nucleotides e Free highlights all reads that intersect a given column In this mode you can lock a position Click the Lock here item in the context menu to do it To return to a locked position select the Jump to locked base item in the context menu e Centered highlights all reads that intersect the column in the center of the screen RG HS A RA AAA A A 58 132 Chapter 8 Assembly Browser Unipro UGENE User Manual Version 1 12 3 8 5 Associating Reference Sequence To associate a reference sequence with the assembly open the sequence the sequence must be loaded and drag it to the Assembly Reference Area Name filter Objects 23 738 to 27 954 4 216 bp 4 T Klebsislla sort bam ugenedb as pkF70 as pkf140 as pkFO4 4 T pKF3 all fa Y s pkF70 A s pkfl4o A s pkF94 2759029 The sequence appears in the Reference Area 23 738 to 27 954 4 216 bp ef 550 To remove the association select the Unassociate item in the Reference Area context menu 8 6 Consensus Sequence A consensus sequence can be found in the Consensus Area under a reference sequence lt refers to the most common nucleotide at a particular position Referente sequence z dd 340 J Consensus se quence ee E fe r e e ee e e ee E e H E E e e ail sal a A af To choose a consensus algorithm select the Consensus algorihtm item either in the context menu of
155. ignment in the Alignment Editor and use the Statistics Generate grid profile context menu item Lu 13 e e 13 7 l E Goto position 13 cor 14 Edit 14 22 Align 13 gt 13 View is Advanced a The dialog will appear Profile mode Counts Percents Custom options C Show scores For gaps _ Show scores For symbols not used in alignment C Skip gaps in consensus position increments Seve profile to File Fie Hypertext HTML Comma sepa akedi CSV CIR 9 Here is a brief description of the options that can be set in the dialog Profile mode Counts Percents select the Percents to have scores shown as percents in the report Show scores for gaps check this item if you want gap characters statistics to be shown in the report 11 4 DNA Statistics 159 Unipro UGENE User Manual Version 1 12 3 Show scores for symbols not used in alignment if a symbol is not used in the alignment at all it won t be shown in the report Check this item to make all symbols of alignment alphabet reported Skip gaps in consensus position increments consensus ruler configuration If checked the gaps in consensus will not lead to ruler increments Save profile to file allows to save profile to a file in the HTML or CSV format The CSV format is convenient for further processing in worksheets editors like Excel The result profile in the HTML mode d UGENE File Action
156. ikipedia org wiki GATA2 GATA 3 Trans acting T cell specific transcription factor GATA 3 http en wikipedia org wiki GATA3 HMG 1 High mobility group protein 1 http en wikipedia org wiki HMGB1 HNF 1 Hepatocyte nuclear factor 1 http en wikipedia org wiki Hepatocyte nuclear factors HNF1 HNF 3 Hepatocyte nuclear factor 3 http en wikipedia org wiki Hepatocyte nuclear factors HNF3 Continued on next page 11 11 SITECON 163 Unipro UGENE User Manual Version 1 12 3 Table 11 1 continued from previous page HNF 4 Hepatocyte nuclear factor 4 http en wikipedia org wiki Hepatocyte nuclear factorsffHNF4 IRF Interferon regulatory factors http en wikipedia org wiki Interferon regulatory factor isre Interferon stimulation response element http en wikipedia org wiki Interferon Downstream _ signaling MyoD MyoD belongs to a family of proteins known as myogenic regulatory factors MRFs http en wikipedia org wiki MyoD MyOGsel3 Myogenin http en wikipedia org wiki Myogenin NF 1 Neurofibromin 1 http en wikipedia org wiki Neurofibromin 1 NF E2 Transcription factor NF E2 45 kDa subunit is a protein that in humans is encoded by the http en wikipediahdrg Angleiya l F E2 NFATp Pre existing component of the NFAT Nuclear factor of activated T cells transcription http en wikipediacorgpleiki NFAT NFkB al Nuclear factor kappa light chain enhancer of activated B cells http
157. in menu and in the Sequence View context menu When you select the Insert subsequence item the following dialog is opened e Insert sequence Annotations region resolving mode Position to insert gt 1 _ Save to new file Merge annotations to this file Document format FASTA 62 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 Description of the dialog parameters Paste data here you must input the inserted subsequence This parameter is mandatory Annotated regions resolving defines either to Resize Remove or Split an annotation into two annotations in case when the subsequence is inserted to the sequence position where some annotations are presented Start position the sequence position where to insert the subsequence Save resulted document to a new file the result sequence can be saves to a new file instead of modifying the current file You must select the Document location FASTA and Genbank file formats are available when you do not include annotations to the result file If you check the Merge annotations to this file item the annotations will also be saved to the result file Genbank file format is only available in this case In case a subsequence has been selected the first item in the Edit sequence submenu is called Replace subsequence instead of Insert subsequence The dialog opened in this case is similar to the dialog described above except it already contains
158. ind pattern l Iw Find pattern Smith Waterman F Find ORFs t Find annotated regions lad r A L E L L A V E C Ctrl F e You will see the Smith Waterman search dialog Fill required fields and click the Remote run button U Smith Waterman Search Enter pattern here Smit Weterman algorithm parameters Algorithin version Scoring malri Chasse Z ana khad Strand C3 Both Direct Search in Sequence gt Translation Complement Save armoatatiords ba O Existing annotation table e Create mew table Annan parameters TOUR name gbp An notabar Mama Imisc_festure Gap cores Results filtering strategy malazia Report rea s fiber intersections Gap extension Minimal score Range C Whole 2equence Selected range Custom range Ic Documents and Sethings cig iMyDiocument_ 1 agh e The Remote machines monitor dialog will appear You can also add remove or modify remote machines here e Select a machine to run and click the Run button Note that only 1 machine can be selected in the current version of UGENE e That s all After the task is finished you will see the task report in the Task View 152 Chapter 10 Distributed Computing Unipro UGENE User Manual Version 1 12 3 10 5 Running MUSCLE Align Task on Remote Machine Read the uMWUSCLE plugin documentation before reading this paragraph To run the uMUSCLE align task on a remote machine you
159. ing Required out output file String Required format format of the output file String Optional Example ugene align clustalw in COI aln out COI sto format stockholm 12 2 17 Aligning with Kalign Task Name align kalign Multiple sequence alignment with Kalign Parameters in semicolon separated list of input files String Required out output file in the ClustalW format String Required Example ugene align kalign 1n C0l alm out C01 _ aligned aln 12 2 18 Aligning with MAFFT Task Name align mafft Multiple sequence alignment with MAFFT Warning MAFFT is used as an external tool and must be installed on your system 12 2 CLI Predefined Tasks 247 Unipro UGENE User Manual Version 1 12 3 Parameters toolpath path to the MAFFT executable By default the path specified in the Application Settings is applied String Optional Default default tmpdir directory for temporary files String Optional in semicolon separated list of input files String Required out output file String Required format format of the output file String Required op penalty for opening a gap Number Optional ep penalty for extending a gap Number Optional maxiterate maximum number of cycles of iterative refinement Number Optional Example ugene align mafft in COI aln out COI_aligned aln 12 2 19 Aligning with MUSCLE Task Name align
160. ing the problem of a sequence annotating As well as for the S TECON the main use case of the plugin is recognition of potential transcription factor binding sites on basis of the data about conservative conformational and physicochemical properties revealed with the binding sites sets analysis The Weight Matrix contains a lot of position frequency matrices PFM s and position weight matrices PWM s also known as position specific score matrices PSSM s The matrices came from two wide known open archives JASPAR http jaspar genereg net which contains frequency matrices and UniPROBE http the _ brain bwh harvard edu uniprobe containing weight matrices Also the Weight Matrix plugin provides a tool for creating specific position frequency and weight matrices from an existing alignment or from a file with several sequences The created matrix can be used as a profile for the search as well as the JASPAR and UNIPROBE ones To search for transcription factor binding sites in a DNA sequence select the Analyze gt Search TFBS with matrices context menu item The Weight matrix search dialog will appear U Weight matrix search Score J ao Search JASPAR database Build new matrix View matrix Weight algorithm Berg and von Hippel Strands Range 2 Both strands 2 Whole sequence Direct strand Selection range L 7 Complement strand C 7 Custom range Matrix Minimal score Algorithm Load list Save list
161. ioe es a Gee ee 206 11 16 1 Aligning Short Reads with Bowtie a 207 11 1602 Building Index for Bowtie 4 4244 oo 8e yeh bee ee ee ee Red 209 BWA cua edo ba ee evs AAA Be eb ee he AR 210 11 17 1 Aligning Short Reads with BWA 2 2 2 2002020202 002000008 211 1117 2 Building Index ftor BWA x 6 g i amp o esa es eee ee we hs Ee a 213 UGENE Genome Aligner s 0k 22 be eee eens b 4 oe ee eres 4 www 215 11 18 1 Aligning Short Reads with UGENE Genome Aligner 2 216 11 18 2 Building Index for UGENE Genome Aligner 218 CABO 2 e bir 0 i BB em ake ok oe Se eR he ee ot eG A ee a 219 Weient Matrix sanra da ate Bee Be a a A e s a ea ek 222 1120 1 Searching JASPAR Database vik ce ax wwe ecw eee ede ws tada ie da 224 1120 2 Building New Matrix 0 06445 64 6 06 RRO RAL SAA Sa GED aoe 225 PIG a Bee oon Bore amp EA A a a e a Soe a E 227 12 13 14 11 22 External Tools 0 0 eee ee ee ee ee ee ee ke ee ee 228 11 22 1 Configuring External Tool 0000000000000 2 228 11 23 Query Designer lt lt lt o lt izs is Act eRe eRe kbA EERE EES ESE SES 235 UGENE Command Line Interface 0 00002 2 eee 236 12 1 CLI Options sac er eee see awa eee Ota sche AAA ae 236 12 2 CLI Predefined TaskS e 238 12 2 1 Converting Sequences 2 2 a a a a 239 1222 Conven MSA aereas canas eros 239 1223 JEXHACIAS OEQUENCS sssrds greed riua E
162. is code by default 5 8 Manipulating Sequence 55 Unipro UGENE User Manual Version 1 12 3 The Vertebrate Mitochondrial Code The Yeast Mitochondrial Code The Mold Protozoan and Coelenterate Mitochondria and the Mycoplasma Code The Invertebrate Mitochondrial Code The Ciliate Dasycladacean and Hexamita Nuclear Code The Echinoderm and Flatworm Mitochondrial Code The Euplotid Nuclear Code The Bacterial and Plant Plastid Code The Alternative Yeast Nuclear Code The Ascidian Mitochondrial Code The Alternative Flatworm Mitochondrial Code Blepharisma Nuclear Code Chloraphycean Mitochondrial Code Trematode Mitochondrial Code Scenedesmus obliquus Mitochondrial Code Thraustochytrium Mitochondrial Code The numbering of the genetic codes corresponds the NCBI Genbank database numbering 5 8 7 Showing and Hiding Translations You can turn on off the direct and complement amino translations visualization in the Sequence details view using the Show complement strand and the Show amino translations toolbar buttons 120k 140 Show translation buttons cop E D D S 3 direct amino T l M translations To AATGCEGATAAAGCGAAGCGATGACIC S 199970 199925 199930 199935 ITACGETATTTETTETACIGAG LA ES 3 complement H amino translations I Y L L H a On the picture below the both strands are turned off 56 Chapter 5 Sequence View 5 8 8 Selecting Sequence You ca
163. ituated below 3 1 Installing UGENE on Windows To install UGENE on Windows 1 Download UGENE Windows installation package 2 Launch the downloaded exe file and follow the Unipro Setup wizard Welcome to the UGENE Setup Wizard This wizard will quide you through the installation of UGENE It is recommended that you close all other applications before starting Setup This will make it possible to update relevant system files without having to reboot your computer Click Next to continue Alternatively to use UGENE without installing 1 Download UGENE zip package 2 Unpack it 3 Launch the ugeneui exe file Note Be sure that you launch the installer with an administrative Windows account If you have a problem with installation try to do the following right click on the installer exe file and select Run as administrator item 13 Unipro UGENE User Manual Version 1 12 3 3 2 Installing UGENE on Linux Download the appropriate version of the installation package 32 bit or 64 bit Linux binary package i336 32 bit download Linux binary package x06 64 64 bit dowmload The downloaded file has tar gz extension Unpack the archive You can use this command tar xf name of the downloaded tar gz file Change the working directory to the unpacked UGENE directory cd name of the unpacked directory Launch the UGENE GUI version using the command ugene ui or the comm
164. l0 2 DNA Graphs Package 99 Unipro UGENE User Manual Version 1 12 3 6 4 2 Graph Settings To change settings of a graph select the Graph settings item in the graph context menu The Graph Settings dialog appears Graph Settings Window Steps per window Default color IN Cancel The following parameters are available Window the number of bases in a window Steps per window the number of steps in window The Step is calculated as Window Steps per window Default color the default color of line of graph or lines of graphs for GC Frame Plot Checking of the Cutoff for minimum and maximum values checkbox enables the following settings Minimum the minimum value for cutoff Maximum the maximum value for cutoff Select an appropriate minimum and maximum value and click the OK button to show the graph of cutoffs The graph is divided into 2 parts The upper part shows values greater than the specified Maximum value The lower part of the graph shows values lower than the specified Minimum value For example Values greater than y Maximum Values lower than Minimum 100 Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 6 5 Dotplot The Dotplot plugin provides a tool to build dotplots for DNA or RNA sequences This allows to compare these sequences graphically Using a dotplot graphic you can easily identify such differences between
165. lable options are It implements three different algorithms 1 is designed for short reads up to 200bp with low error rate lt 3 It does gapped global alignment w r t reads supports paired end reads and is one of the fastest short read alignment algorithms to date while also visiting suboptimal hits 2 bwtsw is designed for long reads with more errors It performs heuristic Smith Waterman like alignment to find high scoring local hits Algorithm implemented in BWT SW http seqanswers com wiki BWA SW On low error short queries BWA SW is slower and less accurate than the is algorithm but on long reads it is better 11 17 BWA 213 Unipro UGENE User Manual Version 1 12 3 3 div does not work for long genomes Colorspace color the input is read in colorspace colors are encoded as characters A C G T A blue C green G orange T red 214 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 18 UGENE Genome Aligner The UGENE Genome Aligner http ugene unipro ru benchmarks UGENE Genome Aligner SPO 2011 pdf is a fast short read aligner It aligns DNA sequences of various lengths to the reference genome with configurable mismatch rate It is available from the Tools DNA assembly submenu of the main menu eer Window Help DNA assembly Align short reads i 1 T HE f JJ Test runner 5 Build index 1 TEC T Th 8 T l x S 5 SH i 2 Convert UGENE Assemb
166. le surface e SES solvent excluded surface e vdWS van der Waals surface To remove the molecular surface that has already been calculated select the Off item You can also select the Molecular Surface Render Style to modify the calculated molecular surface appearance e Convex Map e Dots 88 Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 SAS solvent accessible surface SES solvent excluded surface VW gt S van der Waals surface VW gt with dots Selecting Background Color To change the background color open the Settings dialog choose the Settings item in the 3D Structure Viewer context menu or in the Display menu on the toolbar press the Set background color button and select a color in the dialog appeared Selecting Detail Level To select the detail level of a 3D Structure representation open the Settings dialog of the 3D Structure Viewer and drag the Detail level slider Enabling Anaglyph View UGENE allows you to view a molecule in the anaglyph mode To enable the anaglyph view open the Settings dialog of the 3D Structure Viewer and check the Anaglyph view check box You can modify the color settings select one of the available Glasses colors or set custom colors swap the colors The offset of the color layers can be adjusted by dragging the Eyes shift slider 6 2 3D Structure Viewer 89 Unipro UGENE User Manual Version 1 12 3 Eyes shift Glasses colors Cyan Red M
167. lect a single object with a sequence alignment in the Project View window and click the Export Export alignment to sequence format context menu item 30 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 U UGENE aln_e am Ne m aln exampl Si File Actions Settings Tools Window Help 9eRTa aga s a Project Name filter IC a DR aln_example aln gt a 10 12 Open view Add to view Unload selected objects Lock document for editing Add Import Export alignment to sequence format Remove Save selected documents F _ p Ln1 3 Col1 604 Pos1 601 fy Export to file Fle format to use Add document to the project Gap characters 1 9 Keep Here it is possible to specify the result file location to select a sequence file format to define whether to keep or remove gaps chars in the aligned sequences and optionally add the created document to the current project 4 10 4 Exporting Nucleic Alignment to Amino Translation Select a single object with a nucleic sequence alignment in the Project View window and click the Export Export nucleic alignment to amino translation context menu item 4 10 Exporting Objects 31 Unipro UGENE User Manual Version 1 12 3 File Actions Settings Tools Window Help CA B RI S 32 a Je W R x Consensus Open view Add to wiew Unload selected documents Lock document for editing Add Import Export Export alignment to seq
168. lignment with optimized Smith Waterman algorithm Combining various algorithms into custom workflows with UGENE Workflow Designer Search for a pattern of various algorithms results in a nucleic acid sequence with UGENE Query Designer User Interface Visual and interactive genome browsing including circular plasmid view Multiple alignment editor Chromatograms visualization 3D viewer for files in PDB and MMDB formats with anaglyph stereo mode support Phylogenetic tree viewer 11 Unipro UGENE User Manual Version 1 12 3 e Easy to use workflow designer for custom computational workflows 2 3 High Performance Computing e Complete support of modern multicore processors and SSE instructions e Out of the box support of modern GPUs using NVIDIA CUDA and ATI Stream e Integrated solutions for Cell Broadband Engine e Supercomputers and distributed computing support e Amazon EC2 cloud computing support 2 4 Cooperation e Can be used for education purposes in schools and universities e Features to be included into the next release are initiated by users e UGENE team is ready for collaboration in related projects both free and commercial 12 Chapter 2 About UGENE 3 Installation Get the appropriate package from the UGENE download page http ugene unipro ru download html Follow the installation instructions on the same page to install UGENE on your system Quick guides on how to install UGENE on Windows Linux and Mac OS X are s
169. ll appear Select the remote machine that represents the EC2 service and click the Run button 10 2 Running Workflows on Cloud 149 Unipro UGENE User Manual Version 1 12 3 U Remote machine monitor Add http 184 73 180 209 80 rservice engine Amazon EC Cloud support transport protocol Remove Modify Get public machines 10 2 5 Useful Tips and Recommendations e Before launching a schema on a cloud try launching it on a local machine with some test data This will help to prevent schema errors e Always check the Log View at the bottom of UGENE window it contains important information and all error messages e When a remote workflow is executed its progress is shown in the Task View e When analysing large datasets gt 10 Mb first make sure that your schema works correctly with small datasets If something goes wrong don t panic Report about your problem on our forum or contact us directly 150 Chapter 10 Distributed Computing Unipro UGENE User Manual Version 1 12 3 10 3 Running HMMER3 Search Task on Remote Machine Read the HMM3 plugin documentation before reading this paragraph To run the HMMER3 search task on a remote machine you need to do the following e Open a sequence and select the Tools gt HIMMER3 tools Search HMM signals item in the main menu Tools Window Help E Create index File 20 DNA Assembly J SJ SITECON K Multiple Alignment 2 He HMMER
170. lly saves the state of the view in the Auto saved bookmark when the view is closed Now by activating bookmarks you can restore the original view state For example for the Sequence View bookmarks you can store a visual position and zoom scale for the sequence region a NT_025441 Features i 5 10 15 20 25 s NT_025441 sequence CTTAAGTAAGCTTATCTTAACTTAGC 9 a NT_025975 features I E FE L I 8 D G s NT_025975 sequence F E N 8 Y F QQ L S Y a NT_078122 features N MH R I S N E R s NT_078122 sequence le Bookmarks N NT 011875 features hs fey gbk gz Use the F2 keyboard shortcut to rename a bookmark To remove a bookmark press the Delete key UGENE has limited set of built in Object Views Extensions modules or plugins can be used to adjust the existing views or to add new views to the tool 4 12 Working with Projects All the opened documents and bookmarks along with the corresponding views states can be saved within a project file To do so select File Export Project It will invoke the Export project dialog where you can select the destination folder and the project file name 4 12 Working with Projects 33 Unipro UGENE User Manual Version 1 12 3 U UGENE NC_014267 NC_014267 sequence E ax o x gm RoR she be x z trnH gug f YP_003734638 1 YP_003734611 1 YP_003734601 1_ is S 7 7 pms YP_003734596 1 dd lt 7 7 ARNA 5 NC_014287 gb
171. lo be he he T Y Y 5 55 Y T 5 Y T oS ko ke ke T e ke E ee E ie ie be be ie ie fc TS T T TS T T T T o T E E T TD ke ke ke ke ke ke ke ke T e l T lao 00 0 00 00 has ls 00 00 0 00 0 b k T k o o o o ho Lo Lo o Le e T T ke ke Wi ke ke T T ke T T Ta T k k le la le le de de ld le E e le de lo 00 00 00 00 00 00 00 00 00 ee 00 amp la la la le de de de de de de de e e gt ke ke le le le le le le le le le le Le ce Ma Mo ke Mo loo le lo lo o e T l k k k k k ka a O A A A 9 k ko k lo ko ee e lm he o e e e e a e e k DS ke ke T k T ke T TT e ee ae e oe a 9 A A A kb m 00 00 00 00 0 00 0 00 00 00 e 00 T on el ee ee T c lu ll e la T ka ka ka ka ka ka k k A A 9 00 ls le le la le le le le le le lo 00 T ee ee A O A ee A A A E E ke T T kie T T T T T T O TS T ls la ka le la la ee oe ee ee Y e G T ke T T Ta TD T T TD T TD T i lo lso lu 0 00 0 00 00 00 m lo le Lo lo le de lo be lo lo bo le L ee le ld le le de e de le de ke Re 8 4 Short Reads Vizualization 131 Unipro UGENE User Manual Version 1 12 3 e Strand direction highlights reads located on the direct strand in blue and reads on the complement strand in green e Paired reads highlights all paired reads in green Note that the information about the pair is shown in the hint 8 4 2 Reads Shadowing Various modes of column highlighting are available from the Reads shadowing item in the context menu of the Reads Area e Disabled h
172. ly data base to SAM format 11 18 UGENE Genome Aligner 215 Unipro UGENE User Manual Version 1 12 3 11 18 1 Aligning Short Reads with UGENE Genome Aligner When you select the Tools DNA Assembly gt Align short reads item in the main menu the Align Short Reads dialog appears Set the Align short reads method parameter to UGENE Genome Aligner The dialog looks as follows U Align Short Reads Align short reads method UGENE Genome Aligner Result fle name l T Prebuilt index Short reads Common parameters Advanced parameters Index parameters l T Mismatches allowed Mismatches number Percentage of mismatches Align options C Use GFU optimization C Align reverse complement reads Use best mode during the aligning C Omit reads with qualities lower than The following parameters are available Reference sequence DNA sequence to align short reads to This parameter is required Result file name file in UGENE database format or SAM format if the box SAM output check to write the result of the alignment into This parameter is required Prebuilt index check this box to use an index file instead of a reference sequence Also you can build it manually 216 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 SAM output checking this box allows one to save output files in the SAM format The default format of output files is the UGENE database
173. marketing unipro ru UGENE website httpt ugene unipro 7ru UGENE technical support Email ugene unipro ru 10 2 About UGENE Unipro UGENE is a free cross platform genome analysis suite It is distributed under the terms of the GNU General Public License http www gnu org licenses old licenses gpl 2 0 html To learn more about UGENE visit UGENE website http ugene unipro ru It works on Windows Mac OS X or Linux and requires only a few clicks to install 2 1 Key Features Creating editing and annotating nucleic acid and protein sequences Search through online databases NCBI PDB UniProtKB Swiss Prot UniProtKB TrEMBL Multiple sequence alignment Clustal MUSCLE Kalign MAFFT T Coffee Online and local BLAST search Restriction analysis with integrated REBASE restriction enzyme database Integrated Primer3 package for PCR primers design Search for direct inverted and tandem repeats in DNA sequences Constructing dotplots for nucleic acid sequences Search for transcription factor binding sites TFBS with weight matrix and SITECON algorithms Aligning short reads with Bowtie and UGENE genome aligner Search for ORFs Cloning in silico 3D structure viewer for files in PDB and MMDB formats anaglyph view support Protein secondary structure prediction with GOR IV and PSIPRED algorithms HMMER2 and HMMER3 packages integration Building using integrated PHYLIP package and viewing phylogenetic trees Local sequence a
174. me as in the original Primer3 11 21 Primer3 227 Unipro UGENE User Manual Version 1 12 3 11 22 External Tools The External Tools plugin allows one to launch an external tool from UGENE The folllowing tools are supported e Bowtie e BLAST BLAST e BWA e CAP3 e ClustalW e MAFFT e MrBayes e T Coffee To use an external tool from UGENE the tool needs to be installed on the system and the path to it should be properly configured However there is no need in the additional configuration if you ve installed the UGENE Full Package as it already contains all the tools by default Otherwise if you ve installed the UGENE Standard Package you would need to configure an external tool in order to use it Note that in this case you can download the package with all the external tools from this page http ugene unipro ru external html To learn how to configure an external tool read below 11 22 1 Configuring External Tool To configure an external tool 1 Make sure the tool is installed on your system 2 Set a path to the tool executable file in UGENE It can be set on the External Tools tab of the Application Settings dialog If the path hasn t been set for a tool UGENE menu items that launch the tool are displayed in italic For example on the image below a path for the ClustalW external tool has been set and paths for MAFFT and T Coffee has not 1 3 pon E ILE Align with MUSCLE h Ald A
175. me page http mafft cbrc jp alignment software To make MAFFT available from UGENE 1 Install the MAFFT program on your system 2 Set the path to the MAFFT executable on the External tools tab of UGENE Application Settings dialog For example on Windows you need to specify the path to the maf ft bat file To use MAFFT open a multiple sequence alignment file and select the Align with MAFFT item in the context menu or in the Actions main menu The following dialog appears 11 22 External Tools 233 Unipro UGENE User Manual Version 1 12 3 U Align with MAFFT Advanced options 1 53 C Offset works like gap extension penalty _ Maximum number of iterative refinement The following parameters are available Gap opening penalty Gap opening penalty at group to group alignment Offset works like gap extension penalty offset value which works like gap extension penalty for group to group alignment Maximum number of iterative refine specifies the number of cycles of iterative refinement to perform T Coffee T Coffee http www tcoffee org Projects home _ page t coffee home page html is a multiple sequence alignment package T Coffee home page T Coffee http www tcoffee org Projects home page t coffee home page html To make T Coffee available from UGENE see the External Tools To use T Coffee open a multiple sequence alignment file and select the Align with T Coffee item in the context menu or in the Acti
176. molecule from the current UGENE project Click the From Project button to do so The Select Item dialog appears with the sequence objects available Select a sequence and press the OK button After that create a fragment in the appeared Create DNA Fragment dialog as described in the Creating Fragment paragraph The fragment created from the sequence appears in the list of available fragments Fragments of the New Molecule The next step is to add required fragments to the new molecule contents To add fragments select them in the list of available fragments and click the Add button To add all the fragments click the Add All button Changing Fragments Order in the New Molecule To change the order of fragments in the new molecule select a fragment in the new molecule contents list and click either the Up or the Down button to move the fragment in the corresponding direction 178 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Removing Fragment from the New Molecule To remove a fragment from the new molecule select it in the new molecule contents list and click the Remove button To remove all the fragments click the Clear All button Editing Fragment Overhangs To edit a fragment s overhangs select the fragment in the new molecule contents list and click the Edit button The Edit Molecule Fragment dialog appears U Edit Molecule Fragment Left End Right End Type Type Overhang S Blunt Custom overhang Preview
177. mple alignment Example sequence E gn E o e Bookmarks Y OR F T RV E M K E RE TTGTCAGATTCACCAAAGTTGAAATGAAGGAAAA Kd 12 4 6 6 10 12 14 16 18 20 22 24 26 28 30 3234 AACAGTETAAGTGGTTTCAACITTACITECECTTTT I W T m D F L e F g E F a Name gt Value cy Auto annotations human_T1 fa human_T1 UCS You can also use the Alt 1 hotkey to show hide the Project View To create a new project refer to Creating New Project Note that if you have no project created when opening file with a sequence an alignment or any other biological data a new anonymous project is created automatically 4 2 2 Task View The Task View shows active tasks for example algorithms computations To show hide the Task View click the Tasks button in the main UGENE window Task state description Task progress Actions Loading mn Running 7 x a K Opening view for document Klebsislla sort bam Running 7 gt Adding document to project Klebsislla sort bam Running 7 9 ob BAM SAM file import Klebsislla sort bam Running 7 Load BAM info Finished 100 T Convert BAM to UGENE database Klebsi Importing reads 7 N Running task Loading documents po f To Tasks 1 p GA Task view 18 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 The hotkey for showing hiding the Task View is Alt 2 The Task name column of the Task View shows the tasks names Task state description shows the stat
178. n Analysis From this chapter you can learn how to search for restriction sites on a DNA sequence The restriction sites found are stored as automatic annotations This means that if the automatic annotations highlighting is enabled then the restiction sites are searched and highlighted for each nucleotide sequence opened Refer Automatic Annotations Highlighting to learn more Open a DNA sequence in and click the following button on the Sequence View toolbar R Alternatively select either the Actions Analyze gt Find restriction sites item in the main menu or the Analyze Find restriction sites item in the context menu The Find restriction sites dialog appears gs Find restriction sites Name Accession LEE A 0 45 B 0 93 C 0 18 D fo 8 E 1 13 Eael Eagl Earl Eal Eco S3kI _ Eco57MI EcoHI EcoNI Ecood EcoRI MA m a HHE Selected enzymes EcoRI RE00800 RE00802 RE00807 RBOOS 17 RBO3177 RB05034 RE00978 RE00981 RE00982 R000993 r Filter by number of results Exdude region Range start Total number of enzymes 370 selected 1 Sequence YGGCCR CGGCCG CTETTE GGCGGA GAGCTC CTGRAG CCSGG CETNNNNNAGG RGGNCCY GAATTC Range end Organizm Details AarI Avril BaebI Brel Cac l CviQI Ddel DrdIW Eael Esas5l Enterobacter aerogenes Enterobacter agglomerans Enterobacter aerogenes Escherichia coli
179. n pressed in the dialog the auto annotating becomes enabled In the Annotations editor the Restriction Sites annotations can be found in the Auto annotations enzyme group The direct and complement cut site positions are visualized as triangles on an annotation in the Sequence details view 625 630 635 GAAGCTTAAGACGT I 174 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 9 Molecular Cloning in silico This chapter describes a set of tools in UGENE to perform molecular cloning experiments in silico This allows you to digest a molecule into fragments create a fragment from a sequence region and ligate fragments into a new molecule 11 9 1 Digesting into Fragments Open a DNA molecule you want to cut into fragments Digestion into fragments is performed using restriction enzymes So before continuing make sure that the restric tion analysis has been performed Refer chapter Restriction Analysis for details Select either the Tools gt Cloning Digest into Fragments item or the Actions gt Cloning Digest into Fragments item in the main menu or the Cloning Digest into Fragments item in the context menu The Digest Sequence into Fragments dialog appears U Digest Sequence into Fragments Conserved Annotations Output Target Sequence CVU55762 Available enzymes Selected enzymes Aaal 2 cut s AadlI 1 cut s Aagl 1 cut s pna AbrI 1 cut s Add All gt Acal 2 cuti
180. n use different items from the Select submenu of the context menu to select a sequence AP Select sequence region Hb Mew annotation Analyze Align P se k Selecting the Sequence region context menu item opens the Select range dialog Unipro UGENE User Manual Version 1 12 3 c2 TS la Jk 140k 160k AA Jk 140k 160k CGATAAAGAAGATEAL gt gt lt gt 4 gt gt gt gt 7y7p gt gt 19920 199925 199930 199 Ctril h Sequence region F r e r U Select range Here you can specify the sequence range you would like to select K a E HL E cn Sequence around selected annotations You can open the same dialog using the Select sequence region button on a sequence toolbar or using the Ctrl1 A key sequence To use the Sequence between selected annotations item select two annotations in the Annotations editor holding the Ctrl key at the same time 5 8 Manipulating Sequence 57 Unipro UGENE User Manual Version 1 12 3 Name Value S gP NC_001363 Features murine_copy2 gb GP cos fo 4 E e misc Feature 0 2 misc fe EJ a 2 Cor ote Go to position Ctr G il repeat S K oO Select sequence region CLA EE i note il repeat Mew annotation Chri h i a source 0 4 Copy K Select d Sequence region Add N Sequence between selected annotations Analyze K S
181. nce stored in a separate file The sequence being edited is displayed right above the original one Symbols can be changed by clicking on interesting value modifications are shown in bold 94 Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 6 3 1 Exporting Chromatogram Data Open for example the SUGENE data samples SCF 90 JRI 07 srf file In the Project View context menu there is Export chromatogram to SCF item Project x 7 90 JR1 07 sequence Objects Chromatogram s s Open viem K Y 2 c al Hh Unload selected documents Lock document For editing Add K Export Export sequences Remove d Export sequences as alignment Export chromatogram to SCF 1 10 20 30 40 50 BO FO per After clicking on the item the Export chromatogram file dialog will appear U Export chromatogram file Export Eo File CUBO JRI 07_ cop scf lnd Reversed Complemented Add document Eo the project Check the Reversed and Complemented options if you want to create a reverse and complement chromatogram Press the Export button The exported file will be opened in the Seguence View 6 3 2 Viewing Two Chromatograms Simultaneously To add another sequence to the Sequence View drag the required sequence object from the Project View and drop it in the Sequence View area Note that the dragged object is the sequence object not the chromatogram object 6 3 Chromatogram Viewer 9
182. nces and identify library sequences that resemble the query sequence above a certain threshold Save annotation s to Existing annotation table Create new table Annotation parameters Group name lt auto gt 11 6 Remote BLAST ae Unipro UGENE User Manual Version 1 12 3 General options are Select the search type in the remote databases the blastn search is used for nucleotide sequences blastp and cdd searches are used for amino sequences UGENE also provides a way to use blastp and cdd searches for nucleotide sequences This is achieved by translating the nucleotide sequence into the amino sequences When a sequence is translated the translation table from the active Sequence View is used Finally all 6 translations are used to query the remote database with the selected blastp or cdd search Expectation value this option specifies the statistical significance threshold for reporting matches against database sequences Lower expect thresholds are more stringent leading to fewer chance matches being reported Max hits the maximum number of hits that will be shown not equal to number of annotations Database the target database Search for short nearly exact matches automatically adjusts the word size and other parameters to improve results for short queries Megablast select this option to compare query with closely related sequences It works best if the target percent identity is 95 or
183. need to do the following e Open a multiple sequence alignment and click the Align Align with MUSCLE item in the context menu or in the Actions main menu A PLS 3 ly Statistics K N Align sequences to profile with MUSCLE Views E EY align profile to profile with MUSCLE l L ETE S S e You will see the Align with MUSCLE dialog Fill required fields and click the Remote run button Configuration details The default settings are designed to give the best accuracy Command line muscle Advanced options Do not re arrange sequences stable Max iterations Max time minutes Region to align 2 Whole alignment e The Remote machines monitor dialog will appear You can also add remove or modify remote machines here e Select a machine to run and click the Run button Note that only 1 machine can be selected in the current version of UGENE e That s all After the task is finished you will see the task report in the Task View 10 5 Running MUSCLE Align Task on Remote Machine 153 11 Plugins 154 Unipro UGENE User Manual Version 1 12 3 11 1 Workflow Designer The Workflow Designer allows a molecular biologist to create and run complex computational workflow schemas even if he or she is not familiar with any programming language The workflow schemas comprise reproducible reusable and self documented research routines with a simple and unambiguous visual representation suitable for
184. ns Parameters in input sequence file String Required out output file with the annotations String Required name name of the annotated regions String Optional Default misc feature ptrn subsequence pattern to search for e g AGGCCT String Required score percent identity between the pattern and a subsequence Number Optional Default 90 matrix scoring matrix String Optional Default Auto Among others the following values are available blosum62 e dna e rna dayhoff e gonnet 12 2 CLI Predefined Tasks 241 Unipro UGENE User Manual Version 1 12 3 e pam250 e etc The matrices available are stored in the SUGENE data weight_matrix directory filter results filtering strategy String Optional Default filter intersections The following values are available e filter intersections e none Example ugene find sw in human_Tl fa out sw gb ptrn TGCT filter none 12 2 7 Adding Phred Quality Scores to Sequence Task Name join quality Adds Phread quality scores to a sequence and saves the result to the output FASTQ file Parameters in input sequence file String Required quality input Phred quality scores file String Required out output FASTQ file String Required Example ugene join quality in e_coli fa quality e_coli qual out res fastqg 12 2 8 Local BLAST Search Task Name local blast Performs a search on a loc
185. off s remove overlaps with similarity scores less than the specified value The specified value should be more than 250 Length and percent identity of an overlap parameters Overlap length cutoff o minimum length of an overlap in base pairs The specified value should be more than 15 base pairs Overlap percent identity cutoff p minimum percent identity of an overlap The specified value should be more than 65 Other parameters Maximum number of word matches t an upper limit of word matches between a read and other reads Increasing the value would result in more accuracy however this could slow down the program The specified value should be more than 0 Band expansion size a a number of bases to expand a band of diagonals for an overlapping alignment between two sequence reads The specified value should be more than 10 Max gap length in any overlap f reject overlaps with a gap longer than the specified value A small value may cause the program to remove true overlaps and to produce incorrect results This option may be used by the user to split reads from alternative splicing forms into separate contigs The specified value should be more than 1 Assembly reverse reads r consider reads in reverse orientation for assembly The default value is checked 11 19 CAP3 221 Unipro UGENE User Manual Version 1 12 3 11 20 Weight Matrix The Weight Matrix plugin is a tool for solv
186. ole sequence a Visible region or a Custom region When all the parameters are set click the Export button The consensus sequence is exported to the file and if the Add to project check box has been checked it is added to the current project and opened 8 7 4 Exporting Image To export the visible part of the assembly as an image select either the Actions gt Export as image item in the main menu or the following button on the toolbar LZ The Export Image dialog appears File name C Users user untitled bmp Quality In the dialog you can select the image file name and its format bmp jpeg png etc For some file formats the Quality parameter also becomes available When the parameters are set click the OK button 8 7 Exporting 135 Unipro UGENE User Manual Version 1 12 3 8 8 Options Panel in Assembly Browser 8 8 1 Navigation The Navigation tab of the Options Panel in the Assembly Browser includes the list of well covered regions of the assembly and the field for searching required position LF I a o EL l L To learn more about well covered regions refer to the Assembly Browser Window chapter To learn more about searching required position refer to the Go to Position in Assembly chapter 8 8 2 Assembly Browser Settings The Assembly Browser Settings tab includes Reads Area Consensus Area and Ruler settings 136 Chapter 8 Assembly Browser Unipro UGENE User Manual Version 1 12
187. olpath path to an appropriate BLAST executable e g blastn blastp etc By default the path specified in the Application Settings is applied String Optional Default default tmpdir directory for temporary files By default the path specified in the Application Settings is applied String Optional Default default in semicolon separated list of input sequence files String Required dbpath path to the BLAST database files String Required dbname base name of the BLAST database files String Required out output Genbank file the results of the search are stored as annotations String Required name name of the annotations String Optional Default blast result p type of the BLAST search String Optional Default blastn The following values are available e blastn e blastp e blastx e tblastn e tblastx e expectation value threshold Number Optional Default 10 Example ugene local blast 1in 1input fa dbpath dbname mydb out output gb 12 2 CLI Predefined Tasks 243 Unipro UGENE User Manual Version 1 12 3 12 2 10 Remote NCBI BLAST and CDD Requests Task Name remote request Performs remote requests to the NCBI Saves the results as annotations Parameters in semicolon separated list of input files A file can be of any format containing sequences or alignments String Required db database to search in String Optional De
188. om wiki BWA SW On low error short queries BWA SW is slower and less accurate than the is algorithm but on long reads it is better 3 div does not work for long genomes Enable long gaps checking this box allows one to set the Max gap extentions parameter Max gap extensions e maximum number of gap extensions Indel offset i disallow insertions and deletions within the specified number of base pairs towards the ends Max long deletion extensions d disallow a long deletions within the specified number of base pairs towards the 3 end Seed length take the subsequence of the specified length as seed If the specified length is larger than the query sequence seeding will be disabled For long reads this option is typically ranged from 25 to 35 Max seed differences k maximum edit distance in the seed Max queue entries m maximum queue entries Threads t number of threads Mismatch penalty M BWA will not search for suboptimal hits with a score lower than the specified value Gap open penalty O gap open penalty Gap extension penalty E gap extension penalty Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Best hits R proceed with suboptimal alignments if there are no more than specified number of equally best hits This option only affects paired end mapping Increasing this threshold helps to improve the pairing accuracy at the
189. on CLD Select sequence region Ctrl 4 New annotation Ctrl N Copy Select Add Analyze raed enec Align ME E 3 A Cloning Export Export selected sequence region Edit sequence Export sequence of selected annotations Remove Export annotations Rulers Statistics No active tasks 7 J The Export Selected Sequence Region dialog will appear which is similar to the Export Selected Sequences dialog described here 5 8 13 Exporting Sequence of Selected Annotations Open the Sequence View with document that contains annotations A good candidate here could be any file in Genbank format with both sequence and annotations Select a single or several annotations or annotation groups in the Annotation editor click the right mouse button to open the context menu and select the Export gt Export sequence of selected annotations item 64 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 e UGENE CVU55762 sd Find qualifier mmm Invert annotation selection E File Actions Settings GE E A Go to position LEC D Select sequence region Ctrl A New annotation Ctrl M Copy Select K Add K Analyze K Align K G p G Cloning K CGCGGGCCCGGG Fetch sequences from remote database K GC GSL CC CC GGG CC CH i lt i E 7 Edit sequence K Name Remove ae Export annotations cy Auto annotation Rulers K a CVUS5762 featur A Rulers 4 Y cos 0 2 Statistics gt C
190. on Other Query sequence fies Save annotationts Eo Existing annotation table Annotations MyDocument8 gb O M Annotations MyDocuments gb sl Create new table CiliMyDocument_3 gb Annotation parameters You can set options of the Phmmer search by choosing the needed dialog tab Here you can see the e value calibration options U Phmmer search Input and output Reporting tresholds Scoring system Acceleration E value calibration Other Length of sequences For MS Gumbel mu Fit Number of sequences For MSY Gumbel mu Fit Length of sequences For Viterbi Gumbel mu Fit Number of sequences For Viterbi Gumbel mu Fit Length of sequences For Forward exp tail mu Fit Number of sequences For Forward exp tail mu Fit Tail mass For Forward exponential tail mu Fit 11 14 HMM3 201 Unipro UGENE User Manual Version 1 12 3 The results are stored as sequence annotations in the Genbank file format cy gi 2136280 pir 139344 titin human IP EEE EEE EEE EEE EEE EEE En e ee 1 200k 400k BOOK 00k 1 2m 1 4m 1 6m mi 2 2M 40569 40575 405650 40305 cy Annotations MyDocument_3 gb EE signal 0 546 EH signal EEH El signal EEH l signal EEH El signal EH signal B signal be Accuracy per residue 2 Conditional e walue pes Envelope of domain location HMM region ben Independent e value L Query sequence EH signal e ACCURACY per residue Warning The Phmmer search work
191. on Input files are files with a long DNA reads in FASTA or FASTQ formats At least one input file should be added Input a Result contig name and press the Run button CAP3 produces assembly results in the ACE file format ace The file contains one or several contigs assembled from the input reads 11 19 CAP3 219 Unipro UGENE User Manual Version 1 12 3 Also you can change the following advanced parameters U Contig Assembly With CAP3 Base Advanced Clipping for poor regions Length and percent identity of an overlap Base quality cutoff for clipping c Overlap length cutoff 0 Clipping range y Overlap percent identity cutoff p Quality difference score of an overlap Other parameters Base quality cutoff for differences b Max number of word matches t Max gscore sum at differences d Band expansion size a Max gap length in any overlap f Similarity score of an overlap Match score factor m Mismatch score factor n Gap penalty factor g Overlap similarity score cutoff 5 Assembly reverse reads 1 Cancel Clipping for poor regions parameters Clipping of a poor end region of a read is controlled by parameters Base quality cutoff for clipping c the specified value should be more than 5 and Clipping range y the specified value should be more than 5 Quality difference score of an overlap parameters Base quality cutoff for differences b if an overlap contains a dif
192. on s to 2 Existing annotation table Annotations MyDocument_1 0b Create new table ents and Settings user MyDocument_2 ab Annotation parameters Group name Annotation name cone Results found 7863 11 20 1 Searching JASPAR Database Press the Search JASPAR database button in the Weight matrix search dialog The following dialog will appear 1 U Search JASPAR database MANOS 1 MAQOSS 1 MADO21 1 MAQ110 1 MA0001 1 MA 128 1 MA0123 1 MA0127 1 MA 0121 1 MADOOS 1 MA0097 1 224 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Here the matrices are divided into categories and you can read detailed information of a matrix which is represented by its properties It could help you to choose the matrix properly Note The matrices provided with UGENE are located in the SUGENE data position_weight_matrix folder 11 20 2 Building New Matrix To create a position weight or frequency matrix from an alignment or a file with several sequences press the Build new matrix button in the Weight matrix search dialog or select the Tools Weight matrix Build weight matrix program main menu item Tools Window Help Create index File y Weight matrix K Build weight matrix workflow Designer The Build weight or frequency matrix dialog will appear Ut weight or frequency matrix Ed O L Statistic options Statistic type 2 Mononucleic Din
193. ons main menu The following dialog appears U Align with T Coffee Advanced options _ Gap extension penalty _ Number of iterations The following parameters are available Gap opening penalty indicates the penalty applied for opening a gap The penalty must be negative Gap extension penalty indicates the penalty applied for extending a gap Number of iterations specifies the number of iterations 234 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 23 Query Designer The Query Designer allows a molecular biologist to analyze a nucleotide sequence using different algorithms Repeats finder ORF finder Weight matrix matching etc at the same time imposing constraints on the positional relationship of the results obtained from the algorithms A user friendly interface is used to create a schema of the algorithms and constraints U My Project UGENE OA Ea 8 tren mode U A Property Editor DRF Finds Omer a The elements S DET representing srepance of a pene h tha lt Par arr ers ane tale bler al a codon false Require stop codon Fakes Annotate aa ORF Mayr length Lobe Min length 100 bp Require rE codon irja Search in bods Genet coda l tee The constraints imposed on the results of the algorithms calculations Alternatively you can create edit a schema using a text editor When the schema has been created and all its parameters have been set you can
194. ontext menu Also you may find useful the following video tutorials devoted to the multiple sequence alignment e Making a multiple sequence alignment from FASTA file http www youtube com watch v 2pZszPGKnT8 e Working with large alignments in UGENE http www youtube com watch v npN1mZoK4lE e Performing profile to profile and profile to sequence MUSCLE alignments http www youtube com watch v AYECTzDuibg e Running remote MUSCLE task http www youtube com watch v FmSsKqpT 9bE 7 2 Working with Alignment 117 Unipro UGENE User Manual Version 1 12 3 7 2 5 Working with Sequences List Adding New Sequences You can add new sequences to an alignment using the Add submenu in the Actions main menu or the context menu There are two ways to add a new sequence to the current alignment e From a file in the compatible format FASTA GenBank etc The list of the supported data formats can be found here e From the current project If you activate this item the following dialog will appear U Select item A s NC_001363 sequence Yd s M 009719 sequence human _T1 fa G s humar_T1 UCSC April 2002 chr7 115977709 117855134 You will see the Project View tree filtered to show only appropriate sequences Select the items to add and press the Ok button Copying Sequences To copy current selection click the Copy gt Copy selection item in the Actions main menu or the context menu The hotkey for this action is
195. orkflow schemas on the cloud Make sure to read the documentation pages devoted to the Workflow Designer There is also video tutorial available e Using Workflow Designer to export sequences from PDB files into FASTA files http www youtube com watch v s5zp8DZxNVI amp fmt 18 10 2 2 Cloud Computing Basically a cloud is a cluster of virtual servers available over the Internet One can use these servers to execute specific functions storage computation etc UGENE provides for users capability to launch their computational tasks on the cloud UGENE computational service is hosted on the Amazon EC2 http aws amazon com ec2 servers and maintained by the UGENE team Currently this service works in testing mode and it is free to try Note To ensure that sensitive data are not intercepted and read all data are transmitted over a secure connection Although the available cloud computing functionality is based on the Amazon EC2 http aws amazon com ec2 Unipro company is capable of installing and maintaining UGENE dis tributed computation service in any private local network or cluster environment Please contact us for more details on this matter 10 2 3 Cloud Remote Machine Before launching a distributed workflow make sure that the public EC2 remote machine is enabled in the remote machines monitor The public EC2 machine settings are already provided with the default UGENE bundle so you don t have to specify them Once
196. otation name Expert options Filter results with E value greater then Filter resulks with Score lower than Number of sequences in dababase Algorithm The search results are stored as sequence annotations in the Genbank file format Warning All HMM2 UGENE tools work only with files that contain a single HMM model 11 13 HMM2 197 Unipro UGENE User Manual Version 1 12 3 11 14 HMM3 The HMM3 plugin is a toolkit based on the Sean Eddy s HMMER3 package http hmmer janelia org While working on this plugin we were guided by the following principles e Make the HMMER3 tools accessible to a wider user audience by providing graphical interface for all sup ported utilities for most of the platforms e Be compatible with the original HMMER3 package e Create the high performance solution utilizing modern multi core processors The current version of UGENE provides user interface for three HMM3 tools HMM3 build HMM3 search and Phmmer search In the original program the corresponding commands are hmmbuild hmmsearch and phmmer To access these tools select the Tools gt HMMER3 tools submenu of the program main menu IE Window Help E Create index File i DNA Assembly ek Weight matrix K He HMMERZ tools JD SITECON d MS pud profile Workflow Designer Search HMM signals Phmmer search TT 7 mi D H TT T aw Tr TT We highly recommend reading the original HMMER3 documentation to learn h
197. oto position Cher note 1 misc_fe o Select sequence region Ctrl 4 source 0 4 New annotation CLN Copy Select Add H Analyze Align Export Ctrl M 5 10 Manipulating Annotations 73 Unipro UGENE User Manual Version 1 12 3 The dialog will appear U Add new qualifier Name new qualifier value Cancel Here you can specify the name and the value of the qualifier You can use the F2 key to rename a qualifier U Rename qualifier Ed ew qualifier To edit a qualifier select the qualifier and press the F4 key or use the Edit qualifier context menu item Mame 2 value cy NC 001363 Features murine_copryz gb a GP cos o 4 ae misc_feature 0 2 E o misc_feature 7 oor e Copy qualifier note value H sourd Add note column Goto position Cera Select sequence region CIA G 29 Teale 5 New annotation Ctrl h Copy K Select K Add K Analyze K Align K 5 10 5 Adding Column for Qualifier It is possible to add a column with the qualifier values to the Annotations editor To add the column select the Add the qualifier name column qualifier context menu item 74 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 5 10 6 Copying Qualifier Text Use the Copy qualifier the qualifier name text qualifier context menu item to copy the qualifier value 5 10 7 Deleting Annotations and Qualifiers Selected annotations groups
198. ove columns of paps L 7 Remove columns with percentage of gaps 10 Remove Cancel 2 Remove all columns of gaps There are the following options Remove columns with number of gaps removes columns with number of gaps greater than or equal to the specified value Remove columns with percentage of gaps removes columns with percentage of gaps greater than or equal to the specified value Remove all columns of gaps this option is selected by default It specifies to remove columns from the alignment if they entirely consist of gaps Select the option required and press the Remove button Filling Selection with Gaps Select a region in the alignment and choose the Edit gt Fill selection with gaps item in the context menu or press the Spacebar The region is filled with gaps shifting the subalignment from the region to the right 7 2 4 Aligning Sequences The Alignment Editor integrates several popular multiple sequence alignment algorithms Below is the list of available algorithms and links to the documentation e Port of the popular MUSCLES3 algorithm e KAlign plugin effective work with huge alignments e ClustalW and MAFFT these algorithms appeared in the version 1 7 2 of UGENE with the External Tools plugin e T Coffee this alignment algorithm is available since version 1 8 1 of UGENE with the External Tools plugin To align sequences choose a preferred alignment method in the Actions main menu or in the c
199. ow to use utilities provided by the plugin 11 14 1 Building HMM Model HMM3 Build The HMM3 build tool is used to build a new HMM profile from a multiple alignment You can use any alignment file formats supported by UGENE The output HMM profile format is compatible with the HMMER3 package but it is not compatible with the HMMER2 The HMM3 build automatically calibrates the target model U HMM build Input and output Coretruch n str teges HEAR nye m s ra Effective weighting E value calibration Other Input shonmert hile C Program Files B rece UGEMNE Idatalsamples Stockholm CBS sto Budd to profile C Program Ple B rep USENE result feret 198 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 The HMM3 configuration dialog provides an easy way to set appropriate search parameters Here you can see effective weighting strategies options U HMMS build Input and output Construction strategies Relative vesqhting Effective weighting E value calibration Other Adjust effective sequence number to achieve relative entropy target Minimum relative entropy poation Sigma parameter O Use number of single linkage clusters as effective Use nuriber of sequences as effective O EFfective sequence number for al models to 11 14 2 Searching Sequence Using HMM Profile HMM3 Search The HMM3 search tool reads a HMM profile from a file and searches a sequence for significantly similar sequence matches The sequ
200. owing dialog appears U DNA Flexibility Window size Remember Settings Window step Restore Defaults Threshold Search Cancel The calculation is made for overlapping windows along a given sequence If there are two or more consecutive windows with an average flexibility threshold in each window greater than the specified Threshold parameter such area is marked by an annotation The average threshold in a window is calculated by the following formula average window threshold sum of flexibility angles in the window the window size 1 The following flexibility angles are used during the calculation mo e a e fo e e A minimum value is used when N characters is present in a dinucleotide e CN NC GN NG NN 7 2 e AN NA TN NT 7 6 11 3 DNA Flexibility 157 Unipro UGENE User Manual Version 1 12 3 11 3 1 Configuring Dialog Settings In the dialog you can setup the corresponding parameters Window size the number of bases in a window The window size should be greater than 2 The default value is 100 bp Window step the number of bases used to shift a window The Window step should be a positive integer The default value is 1 bp Threshold the threshold value of the twist angle see above The default value is 13 7 You can remember the input values or restore the default values using the Remember Setting and the Restore Defaults buttons The annotations names and
201. ple position in the assembly zoom scale etc 8 3 Getting Information About Read A read displayed in the Reads Area consists of the bases A C G T It may also contain the N character that stays for an ambigous base Depending on the value of the Cigar parameter the read can be shown partially or gaps can be inserted inside the read see below By default when a read is hovered over in the Reads Area a hint appears To disable this behaviour click the following button on the toolbar Or uncheck the Show pop up hint check box on the Assembly Browser Settings tab of the Options Panel The hint shows the following information about the read e Read name e Location e Length e Cigar e Strand e Read sequence The operations in the Cigar parameter are described as follows e M Alignment match can be a sequence match or mismatch e Insertion to the reference Skipped when the read is aligned to the reference i e it is not shown in the Reads Area but is present in the read sequence 130 Chapter 8 Assembly Browser Unipro UGENE User Manual Version 1 12 3 e D Deletion from the reference Gaps are inserted to the read when the read is aligned to the reference For example S OWI as gaps BS A ETE A AET REC CCGATATACCCGCGCTGTCGA ER Re BET BTR AKS Ree N Skipped region from the reference Behaves as D but has a different biological meaning for mRNA to genome alignment it represen
202. primers a PsiPred n PsiPred protein secondary struc The window shows the list of available plugins To add or remove plugins use the Add plugin and the Remove plugin items available in the Plugin Viewer context menu Add plugin Remove plugin _ When you select the Remove plugin item for a plugin the plugin s status is changed to the to remove after restart value The Remove plugin is no more available in the context menu of the plugin Instead the Enable plugin item appears in the context menu Add plugin Enable plugin If you select this item the plugin will be enabled again i e it will not be removed after restart Otherwise the plugin will not be available after UGENE restart 4 14 Adding and Removing Plugins 35 Unipro UGENE User Manual Version 1 12 3 4 15 Fetching Data from Remote Database UGENE allows fetching data from remote biological databases such as NCBI GenBank NCBI protein sequence database and some others To fetch data select the File gt Access remote database item in the main menu The dialog will appear U Fetch data from remote database NCBI GenBank DNA sequence Save to directory C Documents and Settings user UGENE_downloade Hint Use Genbank DNA accession number For example NC_001363 or 011266 Here you need to enter unique id of the biological object and choose a database Unique identifiers are different for various databases For exampl
203. publications The workflow schemas can be run both locally and remotely either using graphical interface or launched from the command line The elements that a schema consists of corresponds to the bulk of algorithms integrated into UGENE Additionally you can create custom workflow elements U GENE Workflow Designer Build HWA from alignment and test it Pr zer te Ed bos Eiri nak ef earch HMM search pR hE esc ngut meros ee i ee peguenos matchez bo al speed HM profiles n case peera profiles a sages mere supobed searches with al Do repons E TE Pn r porii one bey one end pipu j ud dat al rro labora fer gach DeLee re Colosabon march a Peed repeats Ta corfigure fhe par rre dera of the S Fred bring mrn element go to Parameters ares bakr k Import PED qaier heso ec Local BLAST sand Fo Local BLAST gench Result annotation A i Filer by Me rE H Flequest to remote database Rande E E S CS m Hume of sm K EF marcar ai mater pear ch s chi input data IS Akg with CME LA LAL e Cutout data 7 Filter Kr high Leslae E value K Aken with algo Mit filtering can be used to esc hom probebdity hits from result E dion wet FFT L atan with MUSCLE in Akan with T C afes Fr i E BAG frequency metro na Oats J Ham To learn more about the Workflow Designer read the Workflow Designer Manual follow the link on the UGENE documentation page http ugene unipro ru documenta
204. quence objects in the Project View window click right mouse button to open the context menu and select the Export gt Export sequences as alignment item 4 10 Exporting Objects 29 Unipro UGENE User Manual Version 1 12 3 UGENE fasta_ File Actions Settings Tools Window Help Gd G S A a sas fa a ta Sa s Phaneroptera_falcata s Is altaica_EF540870 s Bicolorana_bicolor_EF540830 s Roeseliana_roese LL s Montana monta Open view TCTAATTCGAGCC GAATTAG Es Metrioptera PF ganlan E 10 12 14 16 18 20 22 24 27 s Gampsocleis_ sed M Unload selected objects AGATTAAGCTCGGCTTAATC s Deracantha_de y R n o N s Zychia_baranovi Lock document for editing s Tettigonia_viridi Add s Conocephalus Ti Import s Conocephalus_sp s Conocephalus i 5 r rppoda elc Remove UY fasta_example Save selected documents ON Auto annotations fasta_example Bicolorana cha Marea apa pa k at rnr Facts cvsennle C anararnh U The Export Sequences as Alignment dialog will appear where you can point the result alignment file location to select a multiple alignment file format to use Genbank SOURCE tags as a name of sequences for Genbank sequences and optionally add the created document to the current project Fie format to use Add document to the project Use Genbank SOURCE tags as a name of sequences for Genbank sequences only 4 10 3 Exporting Alignment to Sequence Format Se
205. quence names pattern subsequence names and for pattern sequence name 11 12 Smith Waterman Search 193 Unipro UGENE User Manual Version 1 12 3 le File Actions Settings Tools Window Help lt le Bom aak ak W sz gt Z Project x ES Name filter E x Consensus Objects Reference subsequence al Hk human_T1 fa e Y s human TL UCSC April 2002 Fan T1 57274 57281 LA 8 P1_human_T1_1 aln gt 7 P17 17 A 7 m P1_ human T1_1 Pattern subsequence d al j Find Alignment inif 2 Col1 8 Pos1 8 194 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 13 HMM2 The AMM2 plugin is a toolkit based on the Sean Eddy s HMMER2 package http hmmer janelia org While working on this plugin we were guided by the following principles e Make the HMMER2 tools accessible to a wider user audience by providing graphical interface for all sup ported utilities for most of the platforms e Be compatible with the original HMMER2 package e Create the high performance solution utilizing modern multi core processors and SIMD instructions The current version of UGENE provides user interface for three HMM2 tools HMM build HMM calibrate and HMM search In the original program the corresponding commands are hmmbuild hmmcalibrate and hmmsearch To access these tools select the Tools gt HMMER2 tools submenu of the program main menu Tools Window Help
206. r Notice that the 3D Structure Viewer can be closed from this menu 6 2 3D Structure Viewer 93 Unipro UGENE User Manual Version 1 12 3 6 3 Chromatogram Viewer The Chromatogram Viewer plugin brings DNA chromatogram data viewing and editing capabilities into UGENE Currently supported chromatogram file formats are ABIF and SCF To view a chromatogram just open an interesting file in UGENE by standard means e g drag amp drop the file or press the Ctr1 0O shortcut The Chromatogram Viewer is automatically embedded into the generic Sequence View if chromatogram data are found as on the screenshot below O 0678 Cm WAL Tr E3 ca Te Lv Chromatogram view zoom 1n to see base calls D TO 20 C0 A ii di 1 IAAL 00 dci ban 5 1 0 40 BO aU 100 120 140 160 150 00 220 240 260 27 7 gt H L A H z 2 L 2 D Y 2 F T T 2 V H H A W L e H L z P I I WF a L U A L T Ti D MN I z L E L E H W E R Y CACATGGCCTEGCCAATCTCATCTCTCTCOCCATTATAGCTTTCACAACAAGCCETTA 122 125 130 135 140 145 150 155 160 165 10 115 4 tl Na After zooming in more chromatogram details are available A AT AT T GG A AC GG T AG eee oe ee ee ee 4 00 AAT AT TG A ACG H AG A C G N F ANA To edit a sequence data right click on the chromatogram view and select the Edit new sequence item in the appeared context menu The original DNA sequence is not allowed to be changed however you can add and modify a new seque
207. r a ae Le i a er ae 4 ENEE ENE 70 80 90 Lodina 130 140 150 160 170 180 190 200 2132 Be eee Reece nec onate 000000009 iN ula lo de 07 108 105 M0 1M 12 13 14 115 18 17 MO MO 12 411 112 113 114 115 116 117 118 119 121 GAGGCAGGJIAATCAAAGACASG gt 6 R n sequence MO e r oB Ra e a a 4 L 40 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 n3 E RRR S R SWS S n 00090000000 W Mao L HI A SA value 6 3 Chromatogram Viewer 97 Unipro UGENE User Manual Version 1 12 3 6 4 DNA Graphs Package The DNA Graphs Package draws contextual graphs for sequences The DNA Graphs Package is available for the Standard DNA alphabet A C G T and N Open a sequence in the Sequence View and click the Graphs icon on the toolbar The popup menu appears O a 120k 130k 140k DNA Flexibility h 130k 190k GC Content 6 120k 130k 140k AG Content 180k 190k 1 GC Frame Plot GC Deviation 6 0 G 0 AT Deviation A T A4 1 A Karlin Signature Difference Informational Entropy z wAGCCCATCACGACTAACAGCOGA PETACAGECCAGAAGAG To see a graph select the corresponding graph item in the popup menu A new area with the graph appears right above the Sequence zoom view 4 murine NC_001363 Em lx 9 Nc_001363 dna O e 2 O c r 0028 S vit 5633112 EE Frame Plot 1 5833 Window 30 Step 10 dl idl 17 U N TRA Mug mat ih 5833 Window 100 Name Value r
208. r as multiple alignment To set the saving parameters go to the nput and output tab of the dialog If you want to save the results as annotations input the annotations saving parameters Annotation name Group name a file to save the annotation to If you want to save the results as multiple alignment select the following parameters Smith Waterman Search Smith Waterman parameters Input and output Save results as Multiple alignment ka Aligner options Alignment files directory path Set advanced options Template for alignment files names IPN TSI TC Template for reference subsequences names SN 5 E Template for pattern subsequences names PN 5 E Pattern sequence name Fi SN Reference sequence name prefix PN Pattern sequence name prefix 5 Subsequence start position E Subsequence end position Here you can select a file to save the alignment to Alignment files directory path parameter Using the Set advanced options checkbox you can select the saving options You can set the different templates for files names create your own or create by using the following E adds a subsequence end position hms adds a time MDY adds a date S adds a subsequence start position L adds a subsequence length SN adds a reference sequence name prefix PN adds a pattern sequence name prefix C adds a counter You can create templates for alignment files names reference subse
209. r than terminator Boolean Optional Default true allow alternative codons allows ORFs starting with alternative initiation codons accord ingly to the current translation table Boolean Optional Default false Chapter 12 UGENE Command Line Interface Unipro UGENE User Manual Version 1 12 3 Example ugene find orfs in human_T1 fa out result gb require init codon false 12 2 5 Finding Repeats Task Name find repeats Searches for repeats in sequences and saves the regions found as annotations Parameters in semicolon separated list of input files String Required out output file with the annotations String Required name name of the annotated regions String Optional Default repeat unit min length minimum length of the repeats Number Optional Default 5 identity percent identity between repeats Number Optional Default 100 min distance minimum distance between the repeats Number Optional Default 0 max distance maximum distance between the repeats Number Optional Default 5000 inverted if true searches for the inverted repeats Boolean Optional Default false Example ugene find repeats in murine gb out murine_repeats gb identity 99 12 2 6 Finding Pattern Using Smith Waterman Algorithm Task Name find sw Searches for a pattern in a nucleotide or protein sequence using the Smith Waterman algorithm and saves the regions found as annotatio
210. ra Falcata Isophya_altaica_EFS40820 Bicolorana_ bicolor _EFS540830 Roeseliana roeseli Montana_montana Metrioptera japonica EFS40831 Gampsoacleis sedakovilEFS40678 Deracantha_deracantoides_EFS40 vchia_baranowi hal Ta ixicliscino w Invert selection Select all Clear selection rere SG Add to project Specify the name of the new MSA file in the File name field The currently selected region is extracted by default when you press the Extract button You can change the columns to be extracted using the From and to fields And change the rows to be extracted by checking unchecking required sequences in the Selected sequences list Use buttons e Invert selection to invert the selection of the sequences e Select all to select all sequences e Clear selection to clear the selection of all sequences The Add to project check box specifies to add the MSA file created from the subalignment to the active project Removing All Gaps Use the Edit gt Remove all gaps item in the Actions main menu or in the context menu to remove all gaps from the alignment Removing Selection To remove a subalignment select it and choose the Edit Remove selection item in the context menu or press the Delete key Unipro UGENE User Manual Version 1 12 3 Removing Columns of Gaps To remove colums containg certain number of gaps select the Edit Remove columns of gaps item in the context menu The dialog appears U Rem
211. rch type NIL amp object ISCR amp quickSearch Quick Search LexA represses the transcription of several genes involved in the cellular response to DNA http biocyc org He bksuabsirmgHtion of DNA replication search type NIL amp object LEXA amp quickSearch Quick Search Lrp Leucine responsive regulatory protein http biocyc org ECOLI substring search type NILMobject Lrp amp quickSearch Quick Search MALT Maltose regulator http biocyc org ECOLI substring search type NIL amp object MALT amp quickSearch Quick Search MARA Multiple antibiotic resistance http biocyc org ECOLI substring search type NIL amp object MARA amp quickSearch Quick Search MELR Melibiose regulator http biocyc org ECOLI substring search type NIL amp object MELR amp quickSearch Quick Search MetJ represses the expression of genes involved in biosynthesis and transport of methionine http biocyc org ECOLI substring search type NIL amp Mobject MEtJ amp quickSearch Quick Search MetR1 MetR participates in controlling several genes involved in methionine biosynthesis Weiss http biocyc org HaEHOY substriagzene involved in protection against nitric oxide search type NIL amp object MetR amp quickSearch Quick Search MLC DgsA better known as Mlc makes large colonies is a transcriptional dual regulator http biocyc org Eb ldosdbolsritige expression of a number of genes encoding enzymes of the Escherichia search type NILMob
212. rest parameters are optional Search in select either to search in the sequence or in its translation Strand select the strand to search in direct complementary or both strands Region specifies the region of the sequence that will be used to search for the pattern By default if a subsequence has been selected when the dialog has been opened then the selected subsequence is searched for the pattern Otherwise the whole sequence is used You can also input a custom range Algorithm version version of the algorithm implementation Non classic versions produce the same results as classic but much faster To use these optimizations our system must support these capabilities e Classic 2 e SSE2 e CUDA e OPENCL Scoring matrix can be chosen from a bunch of matrices supplied with UGENE To view a matrix selected click the View button 192 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Gap open penalty for opening a gap Gap extension penalty for extending a gap Report results simple heuristic which allows to filter intersected hits If it is set to none the algorithm may report large set of almost identical results in the same region Minimal score another simple heuristic which measures sequences similarity It is more convenient than using some abstract scores If set to 100 the algorithm will search for exact substring match The results of the search are saved as annotations o
213. rix bype Frequency matrix b Weight matrix Weight algorithm Bera and von Hippel A BT Note The Alignment logo appears when e The input file format is pfm aln or it is a file with several sequences e The size of the input file is small enough To start the operation press the Start button The matrix will be created and saved If the Build weight or frequency matrix dialog was invoked from the Weight matrix search dialog then the matrix also will be chosen as the current profile 226 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 21 Primers The Primer3 plugin is a port of the Primer3 tool http primer3 sourceforge net It is intended to pick primers from a DNA sequence To use the Primer3 open a DNA sequence and select the Analyze gt Primer3 context menu item The dialog will appear U Primer designer Man Ganera Settings internal Dgo Penalty Weight Sequence Quality Result Settings Excluded Berre Targets Product Sine Panes 150 230 100 300 301 400 401 500 501 600 601 700 701 850 851 1000 Start Codon Position Number To Return 5 lt Mav Y Edi Ma Repeat Mespeirrin 12 00 Par Max Repeat Mespeiring Max Template Mispriming 12 00 Pair Max Template Mespriming Pick left primer Pick fore kz ai a p probe Pick right primer or use right priri or use left primer below eral oo Or Lite dkg below bel 5 bo Y on copada Arad All available parameters are the sa
214. rom the Notifications popup window click the notification cross button Note that you can click on the clip button of the Notifications popup window to show the window always on top 4 3 Main Menu Overview mea esa OOOO A set of project level operations Example create open etc a project open a document access remote database to download a file Actions Various actions associated with the active window Example export remove edit analyze a sequence using different plugins for the Se quence View edit align change the consensus mode for the Alignment Editor Application plugins and tools settings Tools Various tools independent of an active window This menu is extended by different plugins Example HMMER2 HMMER3 tools SITECON Workflow Designer Window A list of active windows and basic manipulations with the windows Example close all windows tile windows select next window Help Application help and check for updates The menus can be dynamically populated with new actions added by plugins Check the Plugins documentation to learn how each plugin affects global and context menus 20 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 4 4 Creating New Project A project stores links to the data files cross file data associations and visualization settings Below is the description on how to create a new project manually Note that if you have no project created when opening fil
215. ry Examples are given below protease NOT hiv1f organism this will limit a BLAST search to all proteases except those in HIV 1 1000 2000 slen this limits the search to entries with lengths between 1000 to 2000 bases for nucleotide entries or 1000 to 2000 residues for protein entries Mus musculusforganism AND biomol_ mrna properties this limits the search to mouse mRNA entries in the database For common organisms one can also select from the pulldown menu 10000 100000 mlwt this is yet another example usage which limits the search to protein sequences with calculated molecular weight between 10 kD to 100 kD src specimen voucher properties this limits the search to entries that are annotated with a specimen_ voucher qualifier on the source feature 11 6 Remote BLAST 109 Unipro UGENE User Manual Version 1 12 3 When the blastp search is selected in the general options the view of the Advanced options tab is the following all filter NOT enviromnental sample filter NOT metagenomesforgn this excludes se quences from metagenome studies and uncultured sequences from anonymous environmen tal sample studies For help in constructing Entrez queries see the Entrez Help document http www ncbi nlm nih gov books NBK3837 Filters filters for regions of low compositional complexity and repeat elements of the human s genome Masks for lookup table only this option masks only for purpose
216. s AcallI 1 cutis Remove Dad 1 cut s Ddell 1 cut s Dmal 2 cut s Tag 8 cut s Tatl 6 cutis Cancel On the Restriction Sites tab of the dialog you can see the name of the molecule the list of restriction enzymes found during the restriction analysis that can cut the molecule and the list of enzymes selected to perform the digestion To digest the sequence into fragments you should select at least one enzyme To move an enzyme to the Selected enzymes list click on it in the Available enzymes list and press the Add button Note that you can select several items in a list by holding the Ctrl key while clicking on the items To select all available enzymes press the Add All button To remove enzymes from the Selected enzymes list select them in the list and press the Remove button To remove all items from the Selected enzymes list press the Clear Selection button 11 9 Molecular Cloning in silico 175 Unipro UGENE User Manual Version 1 12 3 On the Conserved Annotations tab of the dialog you can select the annotations that must not be disrupted during cloning On the Output tab of the dialog you can select the file to save the new molecule to As soon as the required parameters are selected press the OK button The fragments will be saved as annotations Also all the generated fragments are available in the task report U Task report DigestSequence Task Task report DigestSequenceTask Fini
217. s The Zoom View contains not more than 20 rows by default The rest rows are available by scrolling To change this behavior use the Manage Rows in Zoom View menu button on a sequence toolbar E Show All Rows E l 1 Row 5 Rows Reset Rows Number When the Show All Rows item is checked all available annotations are always shown You can also add rows by selecting the 5 Rows and 1 Row items and remove rows by selecting the 5 Rows and 1 Row items To restore the default number of rows select the Reset Rows Number item See also e Navigating Sequence zoom view using Sequence overview e Zooming Sequence e Creating New Ruler e Manipulating Annotations 5 5 Sequence Zoom View R Unipro UGENE User Manual Version 1 12 3 5 6 Sequence Details View The Sequence details view is a supplementary component of the Sequence overview It is used to show sequence content without zooming Every time you double click the sequence in the Sequence overview area or select an annotation the corresponding sequence position is made visible in the Sequence details view For a DNA sequence the Sequence details view automatically shows complement DNA strand and 6 amino translation frames U New Project UGENE NC_001363 NC_001363 sequence O x U File Actions Settings Tools Window Help E x Ro Bia a la ta a Ga fa A Aq 1 we_no1363 sequence 2 pa T a LE TS ha a 1k 4 5k 2k 2 5k 5 5k5 833
218. s Settings Tools Window Help Ol U COI COI wif Consensus Multiple Sequence Alignment Grid Profile Phaneropte fsophya_al Bicolorana_ Roese ana am a C f Jif2 slafs 6 7 e s io ufrfs iafis 16 17 18 19 20 21 22 28 24 2 a Consensus T A A G T T T A T T A A T T C G A G C T G A A T A L poun gnm ojojojo pa 1 030 0 OO ojojo Alignment file C Program Files Unipro UGENE data samples CLUSTALWW COI aln Col Table content symbol counts Legend 10 25 30 70 BO a No active tasks y 160 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 5 ORF Marker From this chapter you can learn how to search for Open Reading Frames ORF in a DNA sequence The ORFs found are stored as automatic annotations This means that if the automatic annotations highlighting has been enabled then ORFs are searched and highlighted for each sequence opened Refer Automatic Annotations Highlighting to learn more To open the ORF Marker dialog select the Analyze gt Find ORFs item in the context menu t ORF Marker Search Settings Preview Min length bp 100 Z Toes results l T Must terminate within region Must start with init codon C Allow overlaps C Allow alternative init codons _ Indude stop codon Start codons Alternative start codons TTG CTG Stop codons TAA TAG TGA Region Whole sequence 199950 0 results found OR The following search settings are available Min length
219. s Te G Ea 4 oy A p 4 E s He Ka Be bal S F ror lt R Q ri Lai N 45 F bia Bs h mi xS HER HE w C ans SARL Hee Lay E Ei Ha E te eT BT ee gG PE paz A 4 mice at bbe eee Med BL do Bt ed ae V EN nl a RER D 9 a 7 Al ait 1 4 RR CE i i AA HERR nu Aa ick Sonics ata CMB DE EEE ES ar 5 k al a a 0 sera TRS a Ly a B ae gt I F i i H H a Mr V a TE H eet irr ry K TAP SR PET RST ka ge ri Zn A mv Yi Peli y 5 E X 16 Gy c e Cad w sL ND E IAT tee sey 40k ms ae RAE e KG E R Pa L a ai Ae r TT Er RT La o RIRE LA Jre VEH S a ivi Li So a a T ot A gt x Mw ba T C Aba T gA k Ca a H TA i eae Fi LE ee E T SU at Y C e nA e Has b Pr DA JAS fe Fae k He po o ro 7 na A aah C E a ins y 3 E T LE c eee Sr p lt EAT ZR E ah 4 A oe Pe roe x E NG dr a4 2 YAA i KO PZ Ay HES 3 Nc i 3 E iy i a V Pri x k R a LE a I ea rae os EA a A ee ee T SA pei es pr k C KY C Pa ia ae A y Ar T CL Se ate ee LF Fae My Ft S N a L ik s R in r TE 2 4 C L aal 7 Ipa S G 749 E b E U h Gx NC_014267 sequence min length 11 identity 100 3 Inverted repeats The Dotplot plugin allows to search for inverted repeats as well Inverted repeats are shown contrary to the direct repeats Use the Search direct repeats and Search inverted repeats options of the Dotplot parameters dialog to select which repeats to draw the dialog is described here Chapter 6 Sequence View Extensions Unipro UGENE
220. s bases either adenine or guanine on a DNA molecule It is calculated by the following formula A G A G C T 100 GC Frame Plot this graph is similar to the GC content graph but shows the GC content of the first second and third position independently It is most effective in organisms with GC rich genomic sequence but it also works on all microbial sequences GC Deviation G C G C shows the difference between the G content of the forward strand and the reverse strand GC Deviation is calculated by the following formula G C 7 GC AT Deviation A T A T shows the difference between the A content of the forward strand and the reverse strand AT Deviation is calculated by the following formula A T A T Karlin Signature Difference dinucleotide absolute relative abundance difference between the whole se quence and a sliding window Let f XY frequency of the dinucleotide XY f X frequency of the nucleotide X p XY XY X Y p_seq XY p XY for the whole sequence p_win XY p XY for a window The Karlin Signature Difference Tor a window is calculated by the following formula sum p_seg XY p_win XY 16 Informational Entropy is calculated from a table of overlapping DNA triplet frequencies The use of overlapping triplets smooths the frame effect Informational Entropy is calculated by the following formula triplet frequency log10 triplet frequency log
221. s of constructing the lookup table used by BLAST so that no hits are found based upon low complexity sequence or repeats if repeat filter is checked Mask lower case letters with this option selected you can cut and paste a FASTA sequence in upper case characters and denote areas you would like filtered with lower case Filter by filters results by accession by definition of annotations or by id Select result by selects results by EValue or by score Word size s 111 m Entrez query l Matrix BLOSUM62 Service plain Filters Masks Low complexity filter Mask for lookup table only E Human repeats filter E Mask lower case letters Filter results Filter by Select result by A accession Evalue E def filter by definition of annotations Score E id As you can see there is no Match scores option but there are Matrix and Service options 166 Matrix key element in evaluating the quality of a pair wise sequence alignment is the substitution matrix which assigns a score for aligning any possible pair of residues Service blastp service which needs to be performed plain psi or phi Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 The Advanced options tab is not available when the cdd search is selected 11 6 Remote BLAST 167 Unipro UGENE User Manual Version 1 12 3 11 7 Repeat Finder The Repeat Finder plugin provides a tool to search for direct and invert repea
222. s only with single sequence databases 202 40595 40 6k 40605 9701 0056 10471 10549 13660 13717 36707 36702 A 40566 406435 6 024 4e 01 1 38025e 02 1 34634e 01 40579 40645 604 5050 1 34634e 01 Fibronectin 1 2 1 3 51 3604 63633 53408 7 36 2Z 3be 01 4 17314e 03 40615 4062 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 15 uMUSCLE UGENE contains graphical ports of the Robert C Edgar s MUSCLE http www drive5 com muscle tool for multiple alignment Note MUSCLE4 is not supported since UGENE version 1 7 2 The package is integrated completely so there is no need in extra files for using it It is possible to run several multiple alignment tasks in parallel check the progress and cancel the running tasks safely Note The k mer clustering part of the MUSCLE algorithm was optimized for multicore systems by Timur Tleukenov Novosibirsk State Technical University 11 15 1 Aligning with MUSCLE To run the classic MUSCLE use the Align gt Align with MUSCLE context menu item in the Alignment Editor Edit K Align with MUSCLE il Statistics M 4lign sequences to profile with MUSCLE view M Align profile to profile with MUSCLE Advanced E K Align with Kalign Mode details The default settings are designed to give the best accuracy Command line muscle Advanced options Do not re arrange sequences stable F Max iterations Max time minutes Region to
223. s sequence annotations in the Genbank file format O gi 2136250 pir 1128394 tikin human i 200K 400k 600k BOOK 1m 1 2m 1 4m 1 6m 1 6m m 2 4m 2 4m 2 6m 2 8m A A A A A A O A A A E cy Annotations MyDocument_3 gb S fe Arnea_sigral 0 24024 E E hmm signal 6594 6679 E hmn signal 6695 6791 Y B hmn signal 6796 6882 8 hmm signal 6992 7076 1 hmm_signal 7092 7177 i Accuracy per residue 9 7635le 01 Blas 35375 4e 02 Conditional e walue 5 96204e 17 Envelope of domain location 7091 2177 HMM made Fn3 Accession number in PRAM database PFODO41 AMM region Li Independent e value 1 89874e 17 Score 49 664152 mrn signal food Fore Y B hmm signal 7387 7473 Warning The HMM3 search works only with files that contain a single HMM model 200 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 14 3 Searching Sequence Against Sequence Database Phmmer Search The Phmmer search tool searches for query sequence matches in sequence database much as BLASTP or FASTA would do The Phmmer search works essentially like the HMM3 search does except you provide a query sequence instead of a query profile HMM The database sequence must be selected in the Project View or there must be an active Sequence View window opened Select the query sequence in the Phmmer search dialog U Phmmer search Input and output Reportingtresholds Scoring system Acceleration E value calibrati
224. sequences it describes the similarity of each amino acid to each other The following values are available e BLOSUM BLOcks of Amino Acid SUbstitution Matrices http en wikipedia org wiki BLOSUM first introduced in a paper by Henikoff and Henikoff These matrices appear to be the best available for carrying out data base similarity homology searches e PAM Point Accepted Mutation matrices http en wikipedia org wiki Point_ Accepted Mutation introduced by Margaret Dayhoff These have been extremely widely used since the late 70s e GONNET these matrices were derived using almost the same procedure as the Dayhoff one above but are much more up to date and are based on a far larger data set They appear to be more sensitive than the Dayhoff series e D identity matrix which gives a score of 1 0 to two identical amino acids and a score of zero otherwise Iteration type specifies the iteration type to use During the iteration step each sequence is removed in turn and realigned It is kept if the resulting alignment is better than the one has been made before This process is repeated until the score converges or until the maximum number of iterations is reached Available values are e NONE specifies not to use iterations e TREE specifies to iterate at each step of the progressive alignment e ALIGNMENT specifies to iterate on the final alignment Max iterations maximum number of iterations C
225. ser Manual Version 1 12 3 There is a list of contigs below the Source URL Check the contigs that you want to import to the database You can use the Select All Deselect All and Invert Selection buttons to manage the selection The Destination URL field specifies the output database file If you check the Import unmapped reads then all unmapped reads in the assembly i e read with the unmapped flag or without CIGAR are imported Note however that they are not vizualized in the current UGENE version To start the import click the Import button in the dialog You can see the progress of the import in the Task View To export a UGENE database file into the SAM format select the Actions Export assembly to SAM format item in the main menu 8 2 Browsing and Zooming Assembly 8 2 1 Opening Assembler Browser Window An imported assembly added to the project is shown in the Project View as follows Project x Name filter Objects 4 Klebsislla sort bam ugenedb 7 as pkF70 as pkf140 Each as object corresponds to an imported contig When you double click on an as object a new Assembly Browser window with the assembly data is opened A window for the first assembly object in the list is opened automatically after the import 8 2 2 Assembly Browser Window The opened window contains the list of well covered regions of the assembly 126 Chapter 8 Assembly Browser Unipro UGENE User Manual Version 1 12 3
226. shed 0 00 00 015 Digest into fragments murine gb linear Generated 10 fragments From EcoRV 138 To EcoRV 214 77 bp From EcoRV 214 To EcoRV 3227 3014 bp From EcoRV 3227 To BglII 3698 472 bp From BglII 3702 To HindIII 5023 1322 bp From HindIII 5027 To Clal 5104 78 bp Refer to Notifications to learn more about task reports 11 9 2 Creating Fragment To create a DNA fragment from a sequence region activate the Sequence View window and select either the Actions gt Cloning Create Fragment item in the main menu or the Cloning Create Fragment item in the context menu The Create DNA Fragment dialog appears 176 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 U Create DNA Fragment l x Fragment Options Output Region Whole sequence 29751 l l Indude Left Overhang Indude Right Overhang Direct Reverse complement Direct Reverse complement el Cancel If a region has been selected you can choose to create the fragment from this region Otherwise you can either choose to create the fragment from the whole sequence or choose the Custom item and input the custom region To add a 5 overhang to the direct strand check the Include Left Overhang check box and input the required nucleotides To add a 5 overhang to the reverse strand in addition to the described steps select the Reverse complement i
227. t x e 3D Structure Viewer Active view 1 1E8B G E Name fiter Objects M 1crn pdb A My s 1CRN chain 1 sequence cy a 1CRN chain 1 annotation 4 3d 1CRN B 1FDL pdb Py e 1FDL chain 1 sequence Eal kim 4 samen be H 2 Press the Add button on the toolbar The Select Item dialog will appear Select 3d objects to add Hint Use the Ctrl keyboard button to select several objects Select Item El E HZ icrn pdb E 3d 1CRN B 1MOT pdb 2 3d 1MOT B 16C pdb 3d 1V6C 92 Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 Below you can see the 3D Structure Viewer with two views E 3D Structure Viewer Active view z ICRN WM E Ele Display Links Add To select an active view click on the view area or select an appropriate value in the Active view combo box on the toolbar To synchronize the views press the Synchronize 3D Structure Views sticky button on the toolbar see the image above When the button has been pressed the 3D structures are moved zoomed and spinned synchronously Press the button again to stop the views synchronization The views that are no more required can be closed by selecting the Close button in the 3D Structure Viewer context menu Also you can hide show views for a while Use the menu of the green arrow button on the toolbar to do it wy a Show 1CRN w Show 1V6C Show 1MOT Close 3D Structure Viewe
228. ta 9 4 3 Changing Labels Formatting To change formatting of a tree labels select either the Formatting toolbar button or the Actions gt Formatting item in the main menu The Labels Formatting dialog will appear Labels Formatting Here you can select color font size and attributes bold italic etc of the labels Note that when a clade has been selected the labels formatting settings are applied to the clade only 9 5 Zooming Tree To change the size of a tree use the Zoom In and Zoom Out toolbar button You can use the Restore Zooming toolbar button to set the default size Or use the corresponding items in the Actions main menu See also Zooming Clade 9 6 Working with Clade This paragraph describes how to select a clade and modify it s appearance 9 5 Zooming Tree 143 Unipro UGENE User Manual Version 1 12 3 9 6 1 Selecting Clade To select a clade click on it s root node Eophya_altaica_EF 40820 israr aale FC Raed Bicolorana bicolor EFS Roeselians_roesel Montana montana You can see that the corresponding branches are highlighted To select several clades at the same time hold the Shift key and click on the root nodes of the clades 9 6 2 Collapsing Expanding Branches You can hide branches of a clade by selecting the Collapse item in the context menu of the clade s root node Swap Siblings Zoom In g Branch Settings See the result of a collapsing on the image below
229. ta about used context dependent conformational and physicochemical properties are available in the PROP ERTY Database http wwwmgs bionet nsc ru mgs gnw bdna 11 11 SITECON Mags Unipro UGENE User Manual Version 1 12 3 11 11 1 SITECON Searching Transcription Factors Binding Sites To search transcription factor binding sites in a DNA sequence select the Analyze Search TFBS with SITECON context menu item In the appeared search dialog you must select a file with TFBS profile The profiles supplied with UGENE are placed in the SUGENE data sitecon_models folder After the profile is loaded the threshold filter is populated with values read from profile You can use the filter to remove low scoring regions from the result e SITECON search File with model D trunk data sitecon_models eukaryotic CEBP_a sitecon gz Threshold 73 first type error 0 561 second type error 0 00097402 Strands 2 Both strands Direct strand Complement strand First type error Second type error Clear results Save as annotations O results found The regions found by SITECON algorithm can be saved as annotations to the DNA sequence in the Genbank format Every SITECON profile supplied with UGENE contains complete information about calibration settings provided to UGENE team by the author of S TECON The original TFBS alignments used to calculate profiles can be requested directly from the author of S TECON 184 Chapter 11
230. ta will be opened in UGENE 14 1 Using BioMart with UGENE 263
231. table C Documents and Settings user Found_Tandem_Repeats gb Annotation parameters Annotation name repeat_unit The dialog parameters Tandem preset specify the tandem repeats parameters with predefined values by selecting the available preset Micro satellites Mini satellites Big period tandems Custom Min period Max period the minimum and maximum acceptable repeat length measured in base symbols Region to process specify the region to search in the whole sequence a custom region or the region of the current selection if any Save annotation s to specify the existing or new annotations table file to save the resulting annotations into Annotation parameters you can change the default group name and annotation s name values of the resulting annotations 170 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 Advanced Advanced parameters Repeats identity Algorithm Minimum tandem size Minimum repeat count C Show overlapped tandems Additional search options can be found in the Advanced tab Algorithm the algorithm parameter allows to select the search algorithm The default and a fast One is optimized suffix array algorithm Minimum tandem size the minimum tandem size sets the limit on minimum acceptable length of the tandem i e the minimum total repeats length of the searched tandem Minimum repeat count the minimum number of repeats of a searched
232. tandem Show overlapped tandems check if the plugin should search for the overlapped tandems otherwise keep unchecked Tandem Repeats Search Result An example of the search results for the micro satellite preset P00 1037 sequence K MK X AAA N AAAA N AA GMA X L AR A MM ADMA XW AAA KX AMA OO AA A MX MAAA AO L Al3 1 1m 2m 3m 4m 5m 5m 7m 8 234 322 M I L X Vr wr S K P P E R E R K S L P S amp S S P E Y C K C N R S H Q R E R E N H C H R H Q R D I YF 8 Y I E A T R E R E KE I I A I Ve FE R E TGATATTGTAAGTGTAATCGAAGCCAC CAGAGAGAGAGAGAAAAT CATTGCCATCGTCACCAGAGA lt gt 2 Value amp 4 D repeat_unit join 7704727 7704727 7704728 7704728 7704729 7704729 7704730 7704730 7704731 7704731 7704732 7704732 77 H D repeat_unit join 1209587 1209588 1209589 1209590 1209591 1209592 1209593 1209594 1209595 1209596 1209597 1209598 repeat_unit join 2451669 2451670 2451671 2451672 2451673 2451674 2451675 2451676 2451677 2451678 2451679 2451680 H D repeat_unit join 6 149028 6149029 6 149030 6149031 6 149032 6149033 6 149034 6 149035 6 149036 6149037 6149038 6149039 O repeat_unit join 4607596 4607598 4607599 4607601 4607602 4607604 4607605 4607607 4607608 4607610 4607611 4607613 M repeat unit NN SS RAS lt The tandem repeats annotations are located side by side 11 7 Repeat Finder 171 Unipro UGENE User Manual Version 1 12 3 11 8 Restrictio
233. tem in the same group box Similarly to add a 3 overhang check the Include Right Overhang check box input the required overhang and select either the direct or the reverse complement strand On the Output tab of the dialog you can optionally modify the annotations output settings Finally press the OK button to create the fragment The fragment will be saved as an annotation 11 9 3 Constructing Molecule To construct a new molecule from fragments select the Tools Cloning Construct Molecule item in the main menu If a Sequence View window is active you can also select either the Actions gt Cloning Construct Molecule item in the main menu or the Cloning Construct Molecule item in the context menu The Construct Molecule dialog appears 11 9 Molecular Cloning in silico 177 Unipro UGENE User Manual Version 1 12 3 U Construct Molecule EE VA IN Construction Output Available fragments CVU55762 CVU55762 gb Fragment 2 CVU55762 CVUS5762 0b Fragment 3 CVU55762 CVU55762 qb Fragment 4 CVU55762 CVU55762 gb Fragment 1 From Project New molecule contents l 5 Fragment 3 Inverted ACGT Fwd no 111 Annotate fragments in new molecule _ Force blunt and omit all overhangs _ Make cirdurar Available Fragments All the fragments available in the current project are shown in the Available fragments list You can automatically create a fragment from a DNA
234. the annotations table into required Results Result fle D projects dev ugene trunk test _common_data scenarios annotatons_import result gb File format Genbank Check Add result file to project to link the annotations to the currently opened sequence Add result file to project To use a separator to split the table check the Column separator item and specify the separator symbols Also you can press Guess to try to detect the separator from the input file File parsing Column separator Alternatively you can press Edit and edit the script which will specify the separator for each parsed line It is possible to use line number in the script First lines to WA E Interpre L z Script text lefault annotaf The script parses input line misc_feature and returns an array of parsed elements as the result esults preview var line input line var lineNum parsed line number var firstColumn JlineNum var otherColumns line split 17 result firstColumn concat otherColumns aw file previe Check syntax Using the arrows you exclude the necessary number of lines at the beginning of the document from parsing You can also skip all lines that start with the specified text First lines to skip By pressing Preview one can bring up the view of the current annotations table which is produced from the input file with the specified parameters values The input file contents will also be s
235. the sequence to be edited an doesn t allow to input the start position Replace subsequence Remove d Remove subsequence Also it is possible to remove selected subsequence from a sequence When you select corresponding item in the context menu or in the Actions menu the Remove subsequence dialog appears e Remove subsequence Region to remove Annotations region resolving mode Merge annotations to this file Document location fenbank murine_copy1_new f ces Description of the parameters Region to remove specifies the region of the sequence that will be removed in the form This parameter is mandatory Annotated regions resolving specifies what to do with annotations that overlap with the region that is removed You can select either Resize such annotations i e make it smaller or Remove them Save resulted document to a new file similar to the same parameter in the Insert subsequence dialog described above 5 8 Manipulating Sequence 63 Unipro UGENE User Manual Version 1 12 3 5 8 12 Exporting Selected Sequence Region Open a sequence object in the Sequence View and select a region by pressing and moving the left mouse button over the sequence Use the Export Export selected sequence region context menu item to save selection into a file of a sequence format H UGENE CVU55762 s CVU55762 Y File Actions Settings Tools Window Help aei bas sa En ka a Za Go to positi
236. tion You can change font by clicking the Change font button To reset zoom and font click the Reset zoom button 7 1 6 Searching for Pattern You can search for a pattern inside an alignment Enter a query string in the edit box under the Sequence area pop pa pp Press the right arrow to search in the direction From left to right from top to bottom Press the left arrow to search in the direction From right to left from bottom to top If the pattern is found the result will be focused and highlighted in the Sequence area You can continue the search in any direction from this position 7 1 Overview 113 Unipro UGENE User Manual Version 1 12 3 7 1 7 Consensus Each base of a consensus sequence is calculated as a function of the corresponding column bases There are different methods to calculate the consensus Each method reveals unique biological properties of the aligned sequences The Alignment Editor allows switching between different consensus modes To switch the consensus mode activate the context menu using the right mouse button or the Actions menu and select the Consensus mode item laj Consensus representation __ 7 O Select consensus representation scheme N Consensus type Default A Threshold QQ 100 7 D Ni Le Based on JalView algorithm Returns if there are 2 characters with high frequency Returns symbol in lower case if the Bi symbol content in a row is lower than t
237. tion html E 11 1 Workflow Designer 155 Unipro UGENE User Manual Version 1 12 3 11 2 DNA Annotator The DNA Annotator plugin provides an algorithm to search for sequence regions that contain a predefined set of annotations Usage example Open the Sequence View for a sequence that has annotations A good candidate here could be any file in Genbank format with a rich set of annotations Select the Analyze Find annotated regions item in the context menu The dialog will appear U Find groups of annotated regions O misc_feature O source Annotations to search Results Regi n size 1000 Search Save regions as annotations Clear resutls Annotation must fit inte region Select annotation names bo search Using this dialog you can search for DNA sequence regions that contain every annotation from the list on the left side The found regions are displayed on the right side of the dialog Use the Save regions as annotations button to store the regions as new annotations to the sequence 156 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 11 3 DNA Flexibility To search for regions of high DNA helix flexibility in a DNA sequence open the sequence in the Sequence View and select the Analyze gt Find high DNA flexibility regions item in the context menu Note that only standard DNA alphabet is supported i e the sequence should consist of characters A C G T and N The foll
238. tion on the screen 66 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 5 9 Annotations Editor The Annotations editor contains tools to manipulate annotations for a sequence It provides a convenient way to organize view and modify a single annotation as well as annotation groups An annotation for a sequence consists of e Name or key indicates the biological nature of the annotated feature e Location coordinates in the sequence e The list of qualifiers qualifiers are the general mechanism for supplying information about annotation Qualifiers are stored as pairs of name value strings Below is the default layout of the Annotations editor with an extra column for the note qualifier added Z murine NC_001363 sequence NP 040335 1 k 12 2 E 810 12 14 16 18 20 22 24 26 28 30 32 34 36 28 i Auto annotations murine gb NS Objects with annotations Column with gF NC_001363 features murine gb y note qualifier 2 9 cos 0 4 a values O CDS a 1042 2658 H lt oroups l O CDS join 2970 3413 3412 3873 Predicted My GeneMark artific eos 3875 4999 O CDS Annotation 5048 5203 Predicted by GeneMark E ev misc_ feature Oy O misc_feature 2 590 5 terminal repeat Z note Oualifier s terminal repeat a O misc_feature name and value 5245 5833 3 terminal repeat Q source o 1 There are usually several objects with annotations in the Annotations edi
239. tion_name gt option ugene h lt option_name gt ugene help lt task_name gt Shows help for the lt task_name gt task ugene h lt task_name gt task lt task_name gt lt task_parameter gt value Specifies the task to run A user defined UGENE workflow schema can be used as a task name For example ugene task align in COI aln out result aln ugene task C myschema uwl in COI aln out res aln 236 Unipro UGENE User Manual Version 1 12 3 109 no task progress A task progress is shown by default when a task is running This option specifies not to show the progress L0g9 1evel lt categoryli gt sSLlevell gt Ly ssl Sets the log level per category If a category is not specified the log level is applied to all categories The following categories are available e Algorithms e Console e Core Services e Input Output e Performance e Remote Service e Scripts e Tasks The following log levels are available TRACE DETAILS INFO ERROR or NONE By default loglevel ERROR For example ugene log level NONE ugene log level Tasks DETAILS Console DETAILS 109 tormat lt Eormat string Specifies the format of a log line Use the following notations L level C category YY YY or YY year MM month dd day hh hour mm minutes ss seconds zzz milliseconds By default logformat L hh mm license Shows license information lang lang
240. to use are located in one directory you can simply select the directory with the files By default only the files are taken into account with fa and fasta extensions You can change this by specifying either Include files filter or Exclude files filter You can choose either protein or nucleotide type of the files Then you must select the path to save the database file and specify a Base name for BLAST files and a Title for database file Making Request to Database To make a request to a local BLAST database do the following e f you re using BLAST open Tools BLAST BLAST Search e f you re using BLAST open Open Tools BLAST gt BLAST Search If there is a sequence opened you can also initiate the request to a local BLAST database from the Sequence View e f you re using BLAST select the Analyze gt Query with BLAST item in the context menu or in the Actions main menu e f you re using BLAST select the Analyze gt Query with BLAST item in the context menu or in the Actions main menu The Request to local BLAST database dialog appears 230 Chapter 11 Plugins Unipro UGENE User Manual Version 1 12 3 U Request to local BLAST database Select input file C Program Files Unipro UGENE data samples GSenbank murine gb General options Advanced options Select search Search for short nearly exact matches Expectation value 10 00 L Megablast Select database path Doo e Save annotation s
241. tomatic annotations calculations use the Automatic Annotations Highlighting menu button on the Sequence View toolbar OUR Eo T 3 5k_ ORFs T Restriction Sites 5 9 2 The db_xref Qualifier Some files in Genbank format contain the db_ xref qualifier A value of this qualifier is a reference to a database NC 004718 features sars gb Y sum 0 1 H sum 0 1 2 9 cos 0 14 H O CDS join 265 13398 13398 21485 E CDS 203 13415 GeneID 1489680 not_experimental evidence When you click on the value a web page is opened or a file is loaded specified in the reference The loaded file is added to the current project 68 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 5 10 Manipulating Annotations 5 10 1 Creating Annotation To create a new annotation for the active sequence press the Ctr1 N key sequence select the New annotation toolbar button or use the Add New annotation or New annotation context menu item E I E E V La La E L H n V L E Li F E D H D We A El L E R H G Goto position Chris gt Select sequence region Cha jLAAGL TALL ITLAAGLAALIGLLAL L Ay New annotation Ctrl h 25 30 32 34 36 38 40 42 44 46 46 50 E GTTEGATEGAATITCATTGEGGTEA Copy K iC A L E L L A UI 5 Select K L 5 L Y R ll s Analyze 9 Existing object with annotations Align K NC Export g Edit sequence K E Remove K E 210320 A Rulers S terminal repeat y n24
242. tor A special Auto annotations object is always presented for each sequnce opened It contains annotations automatically calculated for the sequence see below for details An object contains groups of annotations used by UGENE for logical organization of the annotations An anno tation must always belongs to some group 5 9 Annotations Editor 67 Unipro UGENE User Manual Version 1 12 3 For documents created not by UGENE annotations are grouped by their names For annotations created in UGENE it is possible to use arbitrary group names Groups can contain both annotations and other groups The numbers in the brackets after a group name in the Annotations editor are the count of subgroups and annotations in the current group A single annotation is allowed to be presented in several groups simultaneously An annotation is physically removed from the document when it does not belong to any group 5 9 1 Automatic Annotations Highlighting Enabling the automatic annotations highlighting allows you to automatically calculate and highlight annotations on each nucleotide sequence opened Currently the following annotations types support the automatic highlighting e Open reading frames e Restriction sites The corresponding groups of annotations found are stored in the Auto annotations object in the Annotations editor for example E cy Auto annotations murine gb NC_001363 a EH enzyme 8 0 ea E orf 0 27 To disable enable the au
243. tores information for imported BAM or SAM files and can be used for converting this information into a SAM file See also mport BAM SAM File A multiple sequence alignments file format See also Alignment Editor Human readable format to store UGENE Workflow Designer schemas See also Workflow Designer Human readable format to store UGENE Query Designer schemas See also Query Designer Format for storing workflow elements that can launch an external command line tool See also Workflow Designer Example of usage annotations can be exported to this format the Weight Matrix matrices list can also be saved to this format For example it is used to store reports These formats are used throughout the program to save screenshots etc It is possible to view and modify plain text files in UGENE Chapter 13 APPENDIXES 14 Tutorials 14 1 Using BioMart with UGENE The BioMart http www biomart org system enables scientists to perform advanced querying of a wide range of biological data sources through a single web interface regardless of the data sources geographical locations This tutorial describes how data found through the BioMart web interface can be easily opened for further analysis in UGENE by a couple of mouse clicks 14 1 1 Environment requirements Please make sure that 1 Google Chrome or Mozilla Firefox web browsers should be used 2 A special UGENE extension for the web browser is installed
244. ts an intron e S Soft clipping clipped sequences are present in the read sequence i e behaves as 1 e H Hard clipping clipped sequences are not present in the read sequence e P Padding silent deletion from padded reference e Exact match to the reference e x Reference sequence mismatch To copy the information about the read to the clipboard select the Copy read information to clipboard item in the Reads Area context menu Now you can paste it in any text editor To copy the current position of the read select the Copy current position to clipboard item in the Reads Area context menu 8 4 Short Reads Vizualization There are various modes of reads highlighting and shadowing 8 4 1 Reads Highlighting To apply a reads highlighting mode select it in the Reads highlighting menu of the Reads Area context menu or on the Assembly Browser Settings tab of the Options Panel The following modes are available e Nucleotide shows all nucleotides in different colors It is used by default e Difference highlights gaps and nucleotides that differ from the reference sequence 019 C2407 k la la k he K K i ie be be ie be A e T A A A TI TD TI T T 9 Sl SN E O E 0 E O amp amp K 0 a sal al al a sal sal al sal dl BER Ree Rhee 00 00 00 lo le lo bo le le bo le le lo le lo b amp b Mo 9 9 9 9 9 BREE ke amp Ee Es ke ho T ho ho ko ho ho lo he ko El ll be he li he he he
245. ts in a DNA sequence Also it allows to search for tandem repeats 11 7 1 Finding Repeats Usage example Open a DNA sequence in the Sequence View and select the Analyze gt Find repeats context menu item Find annotated regions Align Build dotplot Export LA Find repeats Edit sequence K Remove b q Find query designer pattern Pee _ Find restriction sites The dialog will appear that allows specifying repeat parameters and the annotations table document to save the results into U Find repeats Advanced Repeat finder parameters Minimum repeat length p gt y Repeats identity Minimum distance between repeats Obp Maximum distance between repeats 5000bp Region to process Whole sequence Selection Custom range 199950 Save annotation s to Existing annotation table CM Create new table C Documents and Settings user Found Repeats gb Annotation parameters Group name Annotation name repeat_unit 4 Estimated repeats count 50 The dialogues status line displays approximate repeats number that will be found with the current settings The Advanced tab provides additional repeats finding options 168 Chapter 11 Plugins 1 Do not filter nested repeats Search for inverted repeats Search only for repeats that lie inside of an annotated region Search only for repeats that have an annotated region inside Filter repeats
246. type check the Show value of qualifier check box and input the values of the required qualifiers in the text field nearby this check box See the image below Orcs OB P E FF Select an annotation type Annotation CDS mise feature source 12 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 41 Configure the annotation type TTTACTTTCTGGGGTGGGCATECACCEGTTCEGATEGAATTCA Show annotations of this type I F S G N E L H C A L E L F S L G W G Y T A L S Lb YF _ Show on translation H F W G G T P P L 3 A T Show value of qualifier lal PAI note Value gt B CDS join 2970 3413 3412 3873 gt H CDS 3875 4999 gt m CDS 5048 5203 5 S misc feature 0 2 a misc_feature 2 290 q note gt UE 7 4 E misc feature 5245 5833 note Show value of qualifier al w source 0 1 gt Ml source 1 5833 ES L If you input several qualifiers names separated by comma then the first found qualifier is taken into account and shown on the annotation 72 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 5 10 4 Creating and Editing Qualifier To add a qualifier to an annotation select it in one of the Sequence View subviews and press the Insert key or use the Add gt Qualifier context menu or the Actions main menu item TTETEGGGEGTEEEGCATECLACEOGSTTEGATECGAATICATITCGOGGTGAAACET Value 101363 Features murine _copy z gb DS 0 nisc feature 0 2 ce G
247. uage_code Specifies the language to use e g for the log output The following values are available e CS Czech e EN English e RU Russian 109 C00L0L O0U pPuE If log output is enabled this option make it colored ERROR messages are displayed in red DETAILS messages are displayed in green TRACE messages are displayed in blue 12 1 CLI Options 237 Unipro UGENE User Manual Version 1 12 3 12 2 CLI Predefined Tasks Using current version of UGENE you can perform the following tasks by running a simple command Converting Between Formats Tasks e Converting MSA e Converting Sequences Basic Analysis Tasks e Extracting Sequence e Finding ORFs e Finding Repeats e Finding Pattern Using Smith Waterman Algorithm e Adding Phred Quality Scores to Sequence e Local BLAST Search e Local BLAST Search e Remote NCBI BLAST and CDD Requests e Annotating Sequence with UQL Schema DNA Assembly Tasks e Building Bowtie Index e Aligning Short Reads with Bowtie HMMER2 Tasks e Building Profile HMM Using HMMER2 e Searching HMM Signals Using HMMER2 Multiple Sequence Alignment Tasks e Aligning with ClustalW e Aligning with Kalign e Aligning with MAFFT e Aligning with MUSCLE Transcription Factor Tasks e Building PFM e Searching for TFBS with PFM e Building PWM e Searching for TFBS with Weight Matrices e Building Statistical Profile for SITECON e Searching for TFBS with SITECON Other Tasks e Fetching Sequenc
248. ucleic Matrix options Matrix type CO Frequency matrix e Weight matrix Weight algorithm Berg and von Hippel Mw The following parameters are available Input file an alignment or a file with several sequences to build the matrix from The parameter is mandatory Output file the resulting matrix will be saved in this file The parameter is mandatory Statistic type defines the way in which the statistics will be collected The Mononucleic option is basically good for small alignments and the Dinucleic option must give more appropriate results for big alignments Matrix type defines the type of the resulting matrix 11 20 Weight Matrix 225 Unipro UGENE User Manual Version 1 12 3 If the Frequency matrix option is selected then the frequency matrix will be created and saved into the resulting file If the Weight matrix option is selected then the intermediate frequency matrix will be created and then transformed into a weight matrix on basis of the selected Weight algorithm Then the weight matrix will be saved into the resulting file For some input files the colored Alignment Logo appears at the bottom of the dialog It gives the representation of the selected alignment U Build weight or frequency matrix El E Input File ENE datafposition_weight_matrix IASPAR insects MA0303 1 pml m wwe Sd Statistic options Statistic type Mononucleic Dinucleic Matrix options Mat
249. uence format Edit Export nucleic alignment to amino translation Remove Export document Save selected documents al L k oF Ln 173 Col 1 604 Pos 1 589 i No active tasks g E int to Amina ranslatior Amino translation 1 The Standard Genetic Code Add document to the project Export range Whole alignment Selected rows Here it is possible to specify the result file location to select a file format and an amino translation to export whole alignment or selected rows and optionally add the created document to the current project 4 11 Using Bookmarks One of the most important features supported by most Object Views is an ability to save and restore visual view state Saving and restoring visual state of an Object View enables rapid switching between different data regions and is similar to bookmarks used in Web browsers 32 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 Initially an Object View is created as transient lt means that its state is not saved To save current state of a view select an item with the view name in the Bookmarks part of the Project View windows and select the Add bookmark item in the context menu P 9 M1_UfoL lt Z features cy s NT_078122 sequence aa A ad Bookmarks hs chr NT 0115975 sequence Activate view Space Add bookmark Fa STA LA I d Rename DODI aff For every persistent view UGENE automatica
250. ughout the entire alignment not just in the seed The default is 70 Bowtie rounds quality values to the nearest 10 and saturates at 30 rounding can be disabled with nomaground Number Optional nomaqround prevents rounding of quality values see nader r description Boolean Optional maxbts maximum number of backtracks permitted when aligning a read in n 2 ir n 3 mode default 125 without best 800 with best A backtrack is the introduction of a speculative substitution into the alignment Without this limit the default parameters will sometimes require that Bowtie tries 100s or 1 000s of backtracks to align a read especially if the read has many low quality bases and or has no valid alignments slowing bowtie down significantly However this limit may cause some valid alignments to be missed Higher limits yield greater sensitivity but require longer running times Number Optional n maximum number of mismatches permitted in the seed i e the first L base pairs of the read where L is set with seedlen Number Optional nofw specifies not to align against the forward reference strand Boolean Optional norc specifies not to align against the reverse complement reference strand Boolean Optional v reports alignments with at most lt specified number gt mistmatches magerr and seedlen are ignored and quality values have no effect on what alignments are valid v is mutually exclusive with n Numb
251. ugin provides a set of algorithms for the protein secondary structure alpha helix beta sheet prediction from a raw sequence Currently available algorithms are e GORIV Jean Garnier Jean Francois Gibrat and Barry Robson GOR Method for Predicting Protein Sec ondary Structure from Amino Acid Sequence in Methods in Enzymology vol 266 pp 540 553 1996 Improved version of the GOR method in J Garnier D Osguthorpe and B Robson J Mol Biol vol 120 p 97 1978 e PsiPred Bryson K McGuffin LJ Marsden RL Ward JJ Sodhi JS amp Jones DT 2005 Protein structure prediction servers at University College London Nucl Acids Res 33 Web Server issue W36 38 Jones DT 1999 Protein secondary structure prediction based on position specific scoring matrices J Mol Biol 292 195 202 You can access these analysis capabilities for a protein sequence using the Analyze gt Predict secondary structure context menu item The dialog will appear U Secondary structure prediction Algorithm Range start Range end Results Region Structure type Total predicted 0 Start prediction Save as annotation It supports the following options Algorithm you can choose the preferred algorithm Currently GORIV and PsiPred algorithms are available Range start Range end select the sequence range for prediction Results visual representation of the prediction results for example
252. up sort etc 48 Chapter 5 Sequence View Unipro UGENE User Manual Version 1 12 3 5 2 Global Actions id eet EA The global action toolbar provides possibility to go to the specified position in all sequences at the same time Also it allows to lock or adjust ranges of sequences in the same Sequence View See this paragraph for details 5 3 Sequence Toolbar A brief description of the sequence toolbar buttons is shown on the picture below OF NC_001363 sequence f T wit E F p The sequence name TUK NA Toggle The Circular Viewer germinal repeat Lo L view plugin icon 23 N ZOOM 7E g buttons _ C The DNA GraphPack T P Y LG plugin icon gt 4 ii D Capture screen W V A B gt LET Eu e n amp O Amino translation L Show translations 3 A _ p TAGC Select sequence region waa Ad dC 40 m 1 Ad G wo TR aT dd pa 39 qm da dd dC da Cn OP ECE CC EG Cm C lt d See also Toggling Views Capturing Screenshot Zooming Sequence Showing and Hiding Translations Selecting Sequence 5 2 Global Actions ag Unipro UGENE User Manual Version 1 12 3 5 4 Sequence Overview The Sequence overview is an area of the Sequence View below the sequence toolbar It shows the sequence in whole and provides handy navigation in the Sequence zoom view and the Sequence details view 15k 16k Scrolls the Sequence zoom view Scrolls the Sequence d
253. urceforge net manual shtml Select one of the following alignment modes The n alignment mode When the n mode is selected Bowtie determines which alignments are valid according to the following policy Alignments may have no more than N mismatches where N is a number 0 3 in the first L bases where L is a number 5 or greater set with Seed ength on the high quality left end of the read The sum of the Phred quality values at all mismatched positions not just in the seed may not exceed E set with Mad error Where qualities are unavailable e g if the reads are from a FASTA file the Phred quality defaults to 40 The v alignment mode In v mode alignments may have no more than V mismatches where V may be a number from 0 through 3 Quality values are ignored The v mode is mutually exclusive with the n mode The following parameters are available Maq error magqerr maximum permitted total of quality values at all mismatched read positions throughout the entire alignment not just in the seed The default is 70 By default Bowtie rounds quality values to the nearest 10 and saturates at 30 Note that the rounding can be disabled with No Maq rounding Seed Length seedlen the number of bases on the high quality end of the read to which the n applies The lowest permitted setting is 5 and the default is 28 Maximum of backtracks maxbts the maximum number of backtracks default 125 without Best 80
254. us of the active tasks Started Running Finished and so on The Task progress column shows the percentage of the tasks progress If you want to cancel a task click the red cross button in the Actions column for the task 4 2 3 Log View The Log View shows the program log information To show hide the Log View click the Log button in the main UGENE window INFO 16 54 Converting assembly from Klebsislla sort bam to Klebsislla sort bam ugenedb started DETAILS 16 54 Importing assembly pkF70 1 of 3 DETAILS 16 55 Succesfully imported 136066 reads for assembly pkF70 total 136066 reads imported DETAILS 16 55 Importing assembly pkf140 2 of 3 DETAILS 16 57 Succesfully imported 416287 reads for assembly pkF140 total 552353 reads imported DETAILS 16 57 Importing assembly pKF9 3 of 3 INFO 17 01 Canceling task Convert BAM to UGENE database Klebsislla sort bam ugenedb INFO 17 04 Canceling task BAM SAM file import Klebsislla sort bam INFO 17 04 Canceling task Convert BAM to UGENE database Klebsislla sort bam ugenedb MEL 11 Ml Canceling tack Andino document tnnroaect Kleheciells en No active tasks y 7 The hotkey for this action is A1t 3 It is possible to configure the Log View settings the level of the log to show ERROR INFO DETAILS TRACE the category Algrorithms Tasks etc and the format of the log messages format of the dates etc This settin
255. utput Short reads Parameters Flags Mode 2 _ Colorspace color _ Mag error magerr 70 _ No Mag rounding nomaground _ Seed length seedlen 7 _ Forward orientation nofw Maximum of backtracks maxbts _ Reverse complement orientation norc Descriptors memory usage chunkmbs 64 L_ Try as hard tryhard _ Seed seed _ Best alignments best reads All alignments all There are the following parameters Reference sequence DNA sequence to align short reads to This parameter is required Result file name file in SAM format to write the result of the alignment into This parameter is required Prebuilt index check this box to use an index file instead of a source reference sequence The index is a set of 6 files with suffixes 1 ebwt 2 ebwt 3 ebwt 4 ebwt rev 1 ebwt and rev 2 ebwt The index is created during the alignment Also you can build it manually 11 16 Bowtie 207 Unipro UGENE User Manual Version 1 12 3 SAM output always save the output file in the SAM format the option is disabled for Bowtie Short reads each added short read is a small DNA sequence file At least one read should be added Note Short reads length for Bowtie can t be more than 1024 You can also configure other parameters They are the same as in the original Bowtie you can read detailed description of the parameters on the Bowtie manual page http bowtie bio so
256. uttons on the toolbar mwtera 7 2 2 Selecting Subalignment While in the Sequence area if you hold the left mouse button and move the cursor you will activate the selection mode By moving the cursor you can adjust the size of the selection Releasing the mouse button will result in exiting the selection mode The selection mode is available in the Sequence list and the Consensus area too The difference between these areas and the Sequence area is that here you can add to selection the whole rows or columns respectively To cancel the selection press the Esc key 7 2 3 Editing Alignment Select the Edit submenu in the Alignment Editor context menu T e Goto position Ctra Add Copy S Colors Align Ilh Statistics View Extract selected as MSA 3 Remove all gaps Remove selection ds Remove columns of gaps ShiFt Del TE Y TY oGO o F lt Advanced Fill selection with gaps Consensus mode The actions available from this menu are described below 7 2 Working with Alignment 115 Unipro UGENE User Manual Version 1 12 3 Extracting Selected as MSA It is possible to extract a subalignment and save it as new multiple sequence alignment MSA Select a subalignment and choose the Edit gt Extract selected as MSA item in the Actions main menu or in the context menu The following dialog appears E ud U Extract selected as MSA E From Selected sequences Phaneropte
257. wing Custom settings are available Alphabet here you can select the alphabet Custom settings Alphabet Standard DNA Standard DNA Skip unknown symbols Standard RNA Replace unknown symbols y Extended DNA Extended RNA Standard amino All symbols Document location Skip unknown symbols Replace unknown symbols with you can select either to skip unknown input symbols or to replace them with the specified symbol 24 Chapter 4 Basic Functions Unipro UGENE User Manual Version 1 12 3 Document location location of the created document Document format format of the created document Currently available formats are FASTA and Genbank Sequence name name of the sequence in the created document Save file immediately check this option if you want to save the document immediately after the Create button is pressed The created document will be added to the current project and opened in the Sequence View 4 7 Exporting Documents If a document has a format that supports writing in UGENE see the Supported File Formats chapter you can export the document to a new document in a required format To do it use the Export document item in the context menu Open view Add to view Unload selected documents Lock document for editing Add Import Export K Edit Remove Export document Save selected documents The following dialog appears Ka
258. with different kinds of related data sequence annotations graphs chromatograms sequence analysis algorithms Note that the Sequence View is described in more details in the separate documentation section 4 10 Exporting Objects The document objects can be exported into a new document For more details see the following chapters 4 10 Exporting Objects 27 Unipro UGENE User Manual Version 1 12 3 4 10 1 Exporting Sequences to Sequence Format Select a single or several sequence objects in the Project View window and click the Export Export sequences context menu item Settings Tools Window Help lex Oe Bi 4A sabs Sa al eo Ara eM mR Project Phaneroopt S 6 cx TE Tr la xXx Name filter SZ Se Q File Actions Objects m a H fasta example fasta 1550 100 150 200 250 300 350 400 450 500 550604 s Phaneroptera_falcata s Isophya_altaica_EF540820 s Bicolorana_bicolor_EF540830 s Roeseliana_roeseli 5 Montana montana atrior iaponi Add to view s a oe en 14 16 18 20 22 24 26 28 30 33 s Gampsocleis_sedakovii IG Unload selected objects AGCTCGGCTTAATCCAGTTG EID 40 90 96 e ede ne 46 40 Open view R A E L G Q TCGAGCCGAATTAGGTCAAC s Deracantha_de Lock document for editing R a N P d G s Zychia_baranovi s Tettigonia_viridissima s Conocer s discolor MPA Add s Conocephalus_sp Ea Edit fasta_example_fasta Remove B Save selected documents No
259. y Auto annotations murine gb NC_001363 ES gy NC_001363 features murine gb MT Each point on a graph is calculated for a window of a specified size The window is moved along the sequence by a step See Graph Settings for instructions on how to modify these parameters All graphs are always aligned to the range shown in the Sequence zoom view lt means that if you change the visible range in the overview either by zooming or scrolling the graph will also be updated The minimum and maximum values of the visible range are shown at the right lower and upper corners of the graph 98 Chapter 6 Sequence View Extensions Unipro UGENE User Manual Version 1 12 3 To close a graph uncheck its item in the popup menu 6 4 1 Description of Graphs Find below the detailed description of each graph Note that characters A C G and T in the formulas denote the number of corresponding nucleotide in a window e DNA Flexibility searches for regions of high DNA helix flexibility in a DNA sequence The average 6 4 Threshold in a window is calculated by the following formula sum of flexibility angles in the window the window size 1 For more detailed information see DNA Flexibility paragraph GC Content shows the percentage of nitrogenous bases either guanine or cytosine on a DNA molecule It is calculated by the following formula G C A G C T 100 AG Content shows the percentage of nitrogenou
260. you ve opened the Remote machines monitor the session with the remote cloud service will be initialized and ping task will be performed 188 O Chanter 10 Distributed Computing Unipro UGENE User Manual Version 1 12 3 U Remote machine monitor Protocol Remove Modify Ping Check the cloud machine status in the Remote machines monitor f the session has been initialized successfully the green tick is highlighted in the Ping column If there is no green tick check the Log View for details about the occurred problem 10 2 4 Launching Workflow Open the Workflow Designer using the Tools Workflow Designer main menu item and prepare a workflow schema Don t forget to validate it before launching Once the schema is ready select the Remote machine workflow run mode on the Workflow Designer toolbar E jects 5 Local host El Data sources Remote machine L 0 Alignment reader Ks a Sequence reader DNA assembly a a Ssss H UGENE genome aligner Bowtie aligner Bowtie build indexer Read all Align short reads L T Bowtie index reader sequences from from Short reads shortreads fa and loader to the E i output each in reference genome Sea turn refseg ab and send it to output write rasta aii Run the schema e g by clicking the corresponding Workflow Designer toolbar button The Remote machines monitor dialog wi

Download Pdf Manuals

image

Related Search

Related Contents

Manuel d`utilisation en Français  MIIAVISION - Miia Style  Owner's manual & Installation manual Mode d'emploi et    ViPNet Business Mail. Benutzerhandbuch  CY-1178  User Manual - EPSCoR Reporting    bouilloire/théière Mode d`emploi  TECHNICAL SERVICE MANUAL  

Copyright © All rights reserved.
Failed to retrieve file