Home

here - CLC bio

1. Figure 17 22 An example of the packed compactness setting 17 7 2 Editing the contig When editing contigs you are typically interested in confirming or changing single bases and this can be done simply by selecting the base typing the right base Some users prefer to use lower case letters in order to be able to see which bases were altered when they use the results later on In CLC DNA Workbench all changes are recorded in the history log see section 8 allowing the user to quickly reconstruct the actions performed in the editing session There are three shortcut keys for easily finding the positions where there are conflicts e Space bar Finds the next conflict e punctuation mark key Finds the next conflict e comma key Finds the previous conflict In the contig view you can use Zoom in D to zoom to a greater level of detail than in other views see figure 17 20 This is useful for discerning the trace curves CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 300 If you want to replace a residue with a gap use the Delete key If you wish to edit a selection of more than one residue right click the selection Edit Selection 2 This will show a warning dialog but you can choose never to see this dialog again by clicking the checkbox at the bottom of the dialog No
2. annotation Tab ES X Rows 28 E New Annotation Filter kh i Name Type Region Qualifiers Shown annotation types CDS TORES f forganismrHomo sapiens ene fmol type mRNA db_xref taxon 9606 Rikers V chromosome X seiis map Xql1 2 q12 v 515 Select all gene AR Deselect all 1023 1097 standard name GDB 600694 db_xref UniSTS 99252 gene AR 836 958 standard_name DxXS7498 fdb_xref UniSTS 38944 a OBE me amp Figure 10 9 A table showing annotations on the sequence only wish to see gene annotations de select the other annotation types so that only gene is selected Each row in the table is an annotation which is represented with the following information e Name Type e Region Qualifiers The Name Type and Region for each annotation can be edited simply by double clicking typing the change directly and pressing Enter This information corresponds to the information in the dialog when you edit and add annotations see section 10 3 2 You can benefit from this table in several ways e It provides an intelligible overview of all the annotations on the sequence e You can use the filter at the top to search the annotations Type e g UCP into the filter and you will find all annotations which have UCP in either the name the type the region or the qualifiers Combined with showing or hiding the annotation types in the Side Panel this makes it easy t
3. Region 977 Modified element Rews Comments Edit No Comment Assembled sequences to reference Wed Jan 21 10 38 50 CET 2009 Figure 8 1 An element s history to your locale settings see section 5 1 e User The user who performed the operation If you import some data created by another person in a CLC Workbench that persons name will be shown e Parameters Details about the action performed This could be the parameters that was chosen for an analysis e Origins from This information is usually shown at the bottom of an element s history Here you can see which elements the current element origins from If you have e g created an alignment of three sequences the three sequences are shown here Clicking the element selects it in the Navigation Area and clicking the history link opens the element s own history e Comments By clicking Edit you can enter your own comments regarding this entry in the history These comments are saved 8 1 1 Sharing data with history The history of an element is attached to that element which means that exporting an element in CLC format clc will export the history too In this way you can share folders and files with others while preserving the history If an element s history includes source elements i e if there are elements listed in Origins from they must also be exported in order to see the full history Otherwise the history w
4. Finish Figure 9 1 Inputting five sequences to Find Binding Sites and Create Fragments e All subfolders are treated as individual batch units This means that if the subfolder contains several input files they will be pooled as one batch unit Nested subfolders i e subfolders within the subfolder are ignored An example of a batch run is shown in figure 9 2 e All files that are not in subfolders are treated as individual batch units E q Find Binding Sites and Create Fragments 1 Choose where to run Navigation Area 2 Select nucleotide ha CLC Data a Example Data ae 4 9 Cloning vector libre fj Enzyme lists 206 pcDNA3 atpsal H206 pcDNAt TO 4 Processed data Primers protein Protein analyses Protein orthologs RNA secondary structe A Sequencing data c ATPBal genomic seque 206 ATPBal mRNA sequence s to match primer against Of EDET nm E Selected Elements 1 9 E coli Illumina 4 ot r Q lt enter search term gt A Z Batch Previous gt Next o 1 Cloning Finish Figure 9 2 The Cloning folder includes both folders and sequences The Cloning folder that is found in the example data see section 1 6 2 contains two sequences x and three folders HJ If you click Batch only folders can be added to the list of selected eleme
5. Q enter search term gt Figure 18 1 Selecting one or more sequences containing the fragments you want to clone Note that the vector sequence will be selected when you click Next as shown in figure figure 18 2 Select the cloning vector by clicking the browse GT button Once the sequence has been selected click Finish The CLC DNA Workbench will now create a sequence list of the fragments and vector sequences and open it in the cloning editor as shown in figure 18 3 When you save the cloning experiment it is saved as a Sequence list See section 10 7 for more information about sequence lists If you need to open the list later for cloning work simply switch to the Cloning ij editor at the bottom of the view If you later in the process need additional sequences you can easily add more sequences to the CHAPTER 18 CLONINGANDCUTTING 8 1 Select fragments to done Setpa 2 Select vector sequence Vector oc pcDNA4 TO a Previous Figure 18 2 Selecting a cloning vector view Just right click anywhere on the empty white area Add Sequences 18 1 1 Introduction to the cloning editor In the cloning editor most of the basic options for viewing selecting and zooming the sequences are the same as for the standard sequence view See section 10 1 for an explanation of these options This means that e g known SNP s exons and other annotations can be displayed on the sequences to guide the
6. Eia Select al Deselect Figure 18 5 Hindlll and Xhol sites used to open the vector E Adapt overhangs 1 Adapt overhangs m GPL UVS aNg Replace input sequences with result pcDNA4 TO Fragment ATP8a1 mRNA ATPS pcDNA4 TO o b gt b gt b STTA ASCITAT GE TCGAGTC 3 3 446bp 3 AATTCGA ATA CTGAGCT CAG q q q q Y Y Figure 18 6 Showing the insertion point of the vector This dialog visualizes the details of the insertion The vector sequence is on each side shown in a faded gray color In the middle the fragment is displayed If the overhangs of the sequence and the vector do not match you can blunt end or fill in the overhangs using the drag handles l Click and drag with the mouse to adjust the overhangs Whenever you drag the handles the status of the insertion point is indicated below e The overhangs match f e The overhangs do not match 2 In this case you will not be able to click Finish Drag the handles to make the overhangs match The fragment can be reverse complemented by clicking the Reverse complement fragment Kg When several fragments are used the order of the fragments can be changed by clicking the move buttons h da There is an options for the result of the cloning Replace input sequences with result Per default the construct will be opened in a new view and can be saved separately By selecting this option the constr
7. License Wizard EJ d CLC DNA Workbench License Agreement Please read and accept the license agreement below to begin using you license END USER LICENSE AGREEMENT FOR CLC BIO SOFTWARE CLC Genomics Workbench 1 0 1 Recitals 1 1 This End User License Agreement EULA is a legal agreement between you either an individual person or a single legal entity who will be referred to in this EULA as You and CLC bio A S CVR no 28 30 50 87 for the software products that accompanies this EULA including any associated media printed matenals and electronic documentation the Software Product I accept these terms If you experience any problems please contact The CLC Support Team Figure 1 13 Read the license agreement carefully If the Workbench succeeds to find an existing license the next dialog will look as shown in figure 1 14 License Wizard amp 3 d CLC DNA Workbench Upgrade a License The workbench will attempt to find a valid license for a previous version If a license can not be located or if you would like to upgade a different license please click the Choose a different License File button and locate it manually C Program Files CLC Combined Workbench 3 licenses workbench clccombinedwb key License Number CLCCOMBINEDWB3 Choose a different License File If you experience any problems please contact The CLC Support Team Proxy Settings Previous Quit Wor
8. 1 G 2 A 1 G2 R Conflict resolution vote Conflict amp 1 G 2 G ter gil E Conflict resolution vote CETE BEG T 1 Figure 17 24 The graphical view is displayed at the top At the bottom the conflicts are shown in a table At the conflict at position 637 the user has entered a comment in the table This comment is now also reflected on the tooltip of the conflict annotation in the graphical view above The table has the following columns CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 304 e Reference position The position of the conflict measured from the starting point of the reference sequence e Consensus position The position of the conflict measured from the starting point of the consensus sequence e Consensus residue The consensus s residue at this position The residue can be edited in the graphical view as described above e Other residues Lists the residues of the reads Inside the brackets you can see the number of reads having this residue at this position In the example in figure 17 24 you can see that at position 637 there is a C in the top read in the graphical view The other two reads have a T Therefore the table displays the following text C 1 T 2 e IUPAC The ambiguity code for this position The ambiguity code reflects the residues in the reads not in the consensus sequence The IUPAC codes can be found in section l e Status The status can either b
9. 600 agg 1 000 1 200 1 400 Reverse primer region Atpsa T7 Promoter Atp8a1 Forward primer region Figure 2 39 A forward and a reverse primer region Now you can let CLC DNA Workbench calculate all the possible primer pairs based on the Primer parameters that you have defined Click the Calculate button right hand pane Modify parameters regarding the combination of the primers for now just leave them unchanged Calculate This will open a table showing the possible combinations of primers To the right you can specify the information you want to display e g showing Fragment length see figure 2 40 Em pcDNA3 atp8al Filter All v E GE gt Rows 100 Standard primers For pcDNA3 atp8al primers Show column Score Pair annealing align Fwd Rev Fragment length Sequence Fwd Melt temp Fwd Sequence Rev Melt temp Rev T Z Score GGTGGGAGGTCTATATAA Pair annealing Fwd Rev 62 56 II Ut 49 094 598 00 GGTGGGAGGTCTATATAA 48 572 GGAACTGAGAATAGAGGAA 7 Pair annealing align Fwd Rev AAGGAGATAAGAGTCAAGG GGTGGGAGGTCTATATAA 57 873 II Il AGGAGATAAGAGTCAAGG 598 00 GGTGGGAGGTCTATATAA 48 572 GGAACTGAGAATAGAGGA 49 566 Pair end annealing Fwd Rev V Fragment length Fwd Rev V Sequence Fwd Region Fwd GCCGTGGATAGCGCGTTTGA I l AGAACTACGTTGGTCGGAG Ea Ob y Figure 2 40 A list of primers To the right are the Side
10. A Computer D Network Files of type Portable Document Format pdf v Directory C Users smoensted Desktop Name ATP8al pdf Previous gt Next XX Cancel Figure 7 11 Location and name for the graphics file Format Suffix Type Portable Network Graphics png bitmap JPEG Jpg bitmap Tagged Image File tif bitmap PostScript ps vector graphics Encapsulated PostScript eps vector graphics Portable Document Format pdf vector graphics Scalable Vector Graphics SVg vector graphics These formats can be divided into bitmap and vector graphics The difference between these two categories is described below Bitmap images In a bitmap image each dot in the image has a specified color This implies that if you zoom in on the image there will not be enough dots and if you zoom out there will be too many In these cases the image viewer has to interpolate the colors to fit what is actually looked at A bitmap image needs to have a high resolution if you want to zoom in This format is a good choice for storing images without large shapes e g dot plots It is also appropriate if you don t have the need for resizing and editing the image after export CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 127 Vector graphics Vector graphic is a collection of shapes Thus what is stored is e g information about where a line starts and ends and the color of the line and its width This enables a given viewer to decide
11. For every base the Workbench calculates the running sum of this value If the sum drops below zero it is set to zero The part of the sequence to be retained after trimming is the region between the first positive value of the running sum and the highest value of the running sum Everything before and after this region will be trimmed off A read will be completely removed if the score never makes it above zero At http www clcbio com files usermanuals trim zip you find an example sequence and an Excel sheet showing the calculations done for this particular sequence to illustrate the procedure described above e Trim ambiguous nucleotides This option trims the sequence ends based on the presence of ambiguous nucleotides typically N Note that the automated sequencer generating the data must be set to output ambiguous nucleotides in order for this option to apply The algorithm takes as input the maximal number of ambiguous nucleotides allowed in the sequence after trimming If this maximum is set to e g 3 the algorithm finds the maximum length region containing 3 or fewer ambiguities and then trims away the ends not included in this region e Trim contamination from vectors in UniVec database If selected the program will match the sequence reads against all vectors in the UniVec database and remove sequence ends with significant matches the database is included when you install the CLC DNA Workbench A list of all the vectors in the
12. in A not B is found in A but not in B the spliced alignment will contain a sequence named in A not B The first part of this sequence will contain the characters from A but since no sequence information is available from B a number of gap characters will be added to the end of the sequence corresponding to the number of residues in B Note that the function does not require that the individual alignments contain an equal number of sequences 19 5 Pairwise comparison For a given set of aligned sequences see chapter 19 it is possible make a pairwise comparison in which each pair of Sequences are compared to each other This provides an overview of the diversity among the sequences in the alignment In CLC DNA Workbench this is done by creating a comparison table Toolbox in the Menu Bar Alignments and Trees E Pairwise Comparison 4 or right click alignment in Navigation Area Toolbox Alignments and Trees Pairwise Comparison HE This opens the dialog displayed in figure 19 13 If an alignment was selected before choosing the Toolbox action this alignment is now listed in the Selected Elements window of the dialog Use the arrows to add or remove elements from the Navigation Area Click Next to adjust parameters 19 5 1 Pairwise comparison on alignment selection A pairwise comparison can also be performed for a selected part of an alignment right click on an alignment selection Pairwise Comparison HE This lead
13. 2001 e Inner melting temperature This option is only activated when the Nested PCR or TaqMan mode is selected In Nested PCR mode it determines the allowed melting temperature interval for the inner nested pair of primers and in TaqMan mode it determines the allowed temperature interval for the TaqMan probe e Advanced parameters A number of less commonly used options Buffer properties A number of parameters concerning the reaction mixture which influence melting temperatures Primer concentration Specifies the concentration of primers and probes in units of nanomoles nM Salt concentration Specifies the concentration of monovalent cations N AF K and equivalents in units of millimoles mM Magnesium concentration Specifies the concentration of magnesium cations Mgt in units of millimoles mM dNTP concentration Specifies the concentration of deoxynucleotide triphos phates in units of millimoles mM x DMSO concentration Specifies the concentration of dimethyl sulfoxide in units of volume percent vol x N GC content Determines the interval of CG content C and G nucleotides in the primer within which primers must lie by setting a maximum and a minimum GC content Self annealing Determines the maximum self annealing value of all primers and probes This determines the amount of base pairing allowed between two copies of CHAPTER 16 PRIMERS 253 the same molecule The s
14. Accession Definition Modification Date A AM270166 Aspergillus niger contig 4n08c0110 complete genome 2007 03 24 4M711867 Clavibacter michiganensis subsp michiganensis NCPPB 2007 05 18 AP008209 Oryza sativa japonica cultivar group genomic DNA c 2007 05 19 J BA000016 Clostridium perfringens str 13 DNA complete genome 2007 05 19 BC029387 Homo sapiens hemoglobin gamma G mRNA cDNA clon 2007 02 08 BC130457 Homo sapiens hemoglobin gamma G mRNA cDNA clon 2007 01 04 BC130459 Homo sapiens hemoglobin gamma G mRNA cDNA clon 2007 01 04 BC139602 Danio rerio hemoglobin beta embryonic 2 mRNA cDNA 2007 04 18 BC142787 Danio rerio hemoglobin beta embryonic 1 mRNA cDNA 2007 06 11 Bx842577 Mycobacterium tuberculosis H37Ry complete genome 2006 11 14 v H Download and Open 4 Download and Save Total number of hits 245 Open at NCBI Figure 11 1 The GenBank search view As default CLC DNA Workbench offers one text field where the search parameters can be entered Click Add search parameters to add more parameters to your search Note The search is a and search meaning that when adding search parameters to your search you search for both or all text strings rather than any of the text strings You can append a wildcard character by checking the checkbox at the bottom This means that you only have to enter the first part of the search text e g searching for genom will find both genomic and geno
15. BglII 5 gatc N4 methy tt E Smal Blunt N4 methy EcoRI 5 aatt N6 methy Sall 5 tcga N6 methy EcoRV Blunt N6 methy gt PstI 3 taca N6 methy HindIII 5 agct N6 methy XhoI 5 tcga N6 methy PstI 3 taca N6 methy S EcoRV Blunt N6 methy Sall 5 tcga N6 methy BglII 5 gatc N4 methy Smal Blunt N4 methy Xbal 5 ctag N6 methy Xbal 5 ctag N6 methy HindIII 5 agct N6 methy XhoI 5 tcga N6 methy 70k BamHI 5 gatc N4 methy Clal S ra N6 methy 4 E Save Save as new enzyme list ZDES evs pee ECTE Figure 18 32 Adding or removing enzymes from the Side Panel At the top you can choose to Use existing enzyme list Clicking this option lets you select an enzyme list which is stored in the Navigation Area See section 18 5 for more about creating and modifying enzyme lists Below there are two panels e To the left you see all the enzymes that are in the list select above If you have not chosen to use an existing enzyme list this panel shows all the enzymes available e To the right there is a list of the enzymes that will be used Select enzymes in the left side panel and add them to the right panel by double clicking or clicking the Add button E gt If you e g wish to use EcoRV and BamHI select these two enzymes and add them to the right side pa
16. By clicking the Dock icon 48 the floating Side Panel reappear in the right side of the view The size of the floating Side Panel can be adjusted by dragging the hatched area in the bottom right Chapter 6 Printing Contents 6 1 Selecting which part of the view to print 08 082 ee ee eee 114 Ge Foree eek geet weae ea beece ee wena Se eee ee E E 115 6 2 1 Header and footer 2a 4 ecu daw tba R eRe ee ED EPA RM GE we 116 6 3 Printpreview 2 ee ee ee 116 CLC DNA Workbench offers different choices of printing the result of your work This chapter deals with printing directly from CLC DNA Workbench Another option for using the graphical output of your work is to export graphics see chapter 3 in a graphic format and then import it into a document or a presentation All the kinds of data that you can view in the View Area can be printed The CLC DNA Workbench uses a WYSIWYG principle What You See Is What You Get This means that you should use the options in the Side Panel to change how your data e g a sequence looks on the screen When you print it it will look exactly the same way on print as on the screen For some of the views the layout will be slightly changed in order to be printer friendly It is not possible to print elements directly from the Navigation Area They must first be opened in a view in order to be printed To print the contents of a view select relevant view Print 5 in the toolbar This will
17. Chapter 2 Tutorials Contents 2 1 Tutorial Getting started 1 2 ee a 37 2 1 1 Creating a a folder seas oa ee eRe RE wR ee ew dom E 37 2 1 2 Import dat hc Se eed GHEE SHRED STD 38 2 2 Tutorial View sequence 0 08 eee eee ee a 39 2 3 Tutorial Side Panel Settings 0 0 28 ee eee ee es 41 2 3 1 Saving the settings inthe Side Panel 41 2 3 2 Applying saved settings dees deere ARA ees eee as 43 2 4 Tutorial GenBank search and download sssaaa nsan n ssassn 43 2 4 1 Searching for matching objects 2 200200 08 44 24 2 Saving the sequence 6 2 2 eee a a e 44 2 5 Tutorial Assembly gcc ec ack wate eee Rated ew Sans bee ES 45 2 5 1 Trimming the SEQUENCES 6 sc eee ewe eee eee He eee wR ES 45 2 5 2 Assembling the sequencing data 0 a 08 46 2 5 3 Getting an overview of the contig 5 8052 ee aee 47 2 5 4 Finding and editing conflicts 0 0 08 2 2 ee eaee 41 2 5 5 Including regions that have been trimmed off 48 2 5 6 nspecting the traces 0 eee ee ee te ee ee 48 2 5 f Synonymous substitutions a sooo a a e ee 49 2 5 8 Getting an overview of the conflicts 5 058 50 2 5 9 Documenting your changes 0 0 00 ee ee eee eee 50 2 5 10 Using the result for further analyses 2 200 eee ees 50 2 6 Tutorial In silico cloning cloning work flow 0 58208 08 eee wee 52 2 6 1 Locat
18. Desired temperature difference in melting temperature between outer primers and inner TaqMan oligos the scoring function discounts solution sets which deviate greatly from this value Regarding this and the minimum difference option mentioned above please note that to ensure flexibility there is no directionality indicated when setting parameters for melting temperature differences between probes and primers i e it is not specified whether the probes should have a lower or higher Im Instead this is determined by the allowed temperature intervals for inner and outer oligos that are set in the primer parameters preference group in the side panel If a higher Tm of probes is required choose a Tm interval for probes which has higher values than the interval for outer primers The output of the design process is a table of solution sets Each solution set contains the following a set of primers which are general to all Sequences in the alignment a TaqMan probe which is specific to the set of included sequences Sequences where selection boxes are checked and a TaqMan probe which is specific to the set of excluded sequences marked by Otherwise the table is similar to that described above for TaqMan probe prediction on single sequences 16 10 Analyze primer properties CLC DNA Workbench can calculate and display the properties of predefined primers and probes select a primer sequence primers are represented as DNA sequences in the
19. Prags and Probes tag Cloning and Restriction Sites Numbers on plus strand BLAST Search Follow selection lg Database Search l v Processes Toolbox kar O E Ch i Idle 1 elementis are selected Status Bar Figure 3 1 The user interface consists of the Menu Bar Toolbar Status Bar Navigation Area Toolbox and View Area 3 1 Navigation Area The Navigation Area is located in the left side of the screen under the Toolbar see figure 3 2 It is used for organizing and navigating data Its behavior is similar to the way files and folders are usually displayed on your computer CLC_Data EE Example Data a Cloning vectors FE Extra aa Nucleotide GF Protein GEG RNA E a e Or centr searchter gt JA Figure 3 2 The Navigation Area CHAPTER 3 USER INTERFACE 18 3 1 1 Data structure The data in the Navigation Area is organized into a number of Locations When the CLC DNA Workbench is started for the first time there is one location called CLC_Data unless your computer administrator has configured the installation otherwise A location represents a folder on the computer The data shown under a location in the Navigation Area is stored on the computer in the folder which the location points to This is explained visually in figure 3 3 CLC Data File Edit View Favorites Tools Help ay Back S pP Search NI VIGOLIVIT A O 2 2 2 2 SY Sa tas Y z e CLC Data Example data
20. Sacl agct 5 S meth SphI catg poai Apal qgec 5 5S meth gt Ball nnn S N4 met Chal gate eto Fokl lt N amp gt 3 N6 met E Hhal cg 5 S meth NsiI tgca Sacll gc 5 S meth La a 0 CM 0 0 0 0 Co a w Figure 18 33 Selecting enzymes If you need more detailed information and filtering of the enzymes either place your mouse cursor on an enzyme for one second to display additional information see figure 18 52 or use the view of enzyme lists See 18 5 All enzymes Filter 3 Name Overh Methyl Pop PstI 3 N meth er ES KpnI 3 N6 meth t SacI 3 S methyl ja Sphi 3 HEEE Apal 3 5 methyl pr Sacll 3 5 methyl NsiI Enzyme SacII Recognition site pattern CCGCGG Suppliers GE Healthcare Qbiogene American Allied Biochemical Inc Nippon Gene Co Ltd Takara Bio Inc New England Biolabs Toyobo Biochemicals Molecular Biology Resources Promega Corporation EURx Ltd Figure 18 34 Showing additional information about an enzyme like recognition sequence or a list of commercial vendors At the bottom of the dialog you can select to save this list of enzymes as a new file In this way you can save the selection of enzymes for later use When you click Finish the enzymes are added to the Side Panel and the cut sites are shown on the sequence
21. The hydrophobicity is calculated by sliding a fixed size window of an odd number over the protein sequence At the central position of the window the average hydrophobicity of the entire window is plotted see figure 15 7 Hydrophobicity scales Several hydrophobicity scales have been published for various uses Many of the commonly used hydrophobicity scales are described below Kyte Doolittle scale The Kyte Doolittle scale is widely used for detecting hydrophobic regions in proteins Regions with a positive value are hydrophobic This scale can be used for identifying both surface exposed regions as well as transmembrane regions depending on the window size used Short window sizes of 5 generally work well for predicting putative surface exposed regions Large window sizes of 19 21 are well suited for finding transmembrane domains if the values calculated are above 1 6 Kyte and Doolittle 1982 These values should be used as a rule of thumb and deviations from the rule may occur CHAPTER 15 PROTEIN ANALYSES 242 Engelman scale The Engelman hydrophobicity scale also known as the GES scale is another scale which can be used for prediction of protein hydrophobicity Engelman et al 1986 As the Kyte Doolittle scale this scale is useful for predicting transmembrane regions in proteins Eisenberg scale The Eisenberg scale is a normalized consensus hydrophobicity scale which shares many features with the other hydrophobicity scale
22. Whether sequences can be displayed with this information depends on their origin Sequences that you have created yourself or imported might not include this information and you will only be able to see them represented by their name However sequences downloaded from databases like GenBank will include this information To change how sequences are displayed right click any element or folder in the Navigation Area Sequence Representation select format This will only affect sequence elements and the display of other types of elements e g alignments trees and external files will be not be changed If a sequence does not have this information there will be no text next to the sequence icon CHAPTER 3 USER INTERFACE 83 Rename element Renaming a folder or an element in the Navigation Area can be done in three different ways select the element Edit in the Menu Bar Rename or select the element F2 click the element once wait one second click the element again When you can rename the element you can see that the text is selected and you can move the cursor back and forth in the text When the editing of the name has finished press Enter or select another element in the Navigation Area If you want to discard the changes instead press the Esc key For renaming annotations instead of folders or elements see section 10 3 3 3 1 7 Delete elements Deleting a folder or an element can be done in two ways right click the el
23. e The Primer solution submenu is used to specify requirements for the match of a PCR primer against the template sequences These options are described further below It contains the following options Perfect match Allow degeneracy Allow mismatches The work flow when designing alignment based primers and probes is as follows e Use selection boxes to specify groups of included and excluded sequences To select all the sequences in the alignment right click one of the selection boxes and choose Mark All e Mark either a single forward primer region a single reverse primer region or both on the sequence and perhaps also a TaqMan region Selections must cover all Sequences in the included group You can also specify that there should be no primers in a region No Primers Here or that a whole region should be amplified Region to Amplify e Adjust parameters regarding single primers in the preference panel e Click the Calculate button 16 9 2 Alignment based design of PCR primers In this mode a single or a pair of PCR primers are designed CLC DNA Workbench allows the user to design primers which will specifically amplify a group of included sequences but not amplify the remainder of the sequences the excluded sequences The selection boxes are used to indicate the status of a sequence if the box is checked the sequence belongs to the included sequences if not it belongs to the excluded sequences To design prim
24. 13 6 1 Pattern discovery search parameters Various parameters can be set prior to the pattern discovery The parameters are listed below and a screen shot of the parameter settings can be seen in figure 13 20 e Create and search with new model This will create a new HMM model based on the selected sequences The found model will be opened after the run and presented in a table view It can be saved and used later if desired e Use existing model It is possible to use already created models to search for the same pattern in new sequences e Minimum pattern length Here the minimum length of patterns to search for can be Specified CHAPTER 13 GENERAL SEQUENCE ANALYSES 221 e Maximum pattern length Here the maximum length of patterns to search for can be Specified e Noise Specify noise level of the model This parameter has influence on the level of degeneracy of patterns in the sequence s The noise parameter can be 1 2 5 or 10 percent e Number of different kinds of patterns to predict Number of iterations the algorithm goes through After the first iteration we force predicted pattern positions in the first run to be member of the background In that way the algorithm finds new patterns in the second iteration Patterns marked Pattern have the highest confidence The maximal iterations to go through is 3 e Include background distribution For protein sequences it is possible to include information on the back
25. 5 2 Default view preferences There are five groups of default View settings Toolbar Side Panel Location New View View Format oO FPF WB NO FF User Defined View Settings In general these are default settings for the user interface The Toolbar preferences let you choose the size of the toolbar icons and you can choose whether to display names below the icons The Side Panel Location setting lets you choose between Dock in views and Float in window When docked in view view preferences will be located in the right side of the view of e g an alignment When floating in window the side panel can be placed everywhere in your screen also outside the workspace e g on a different screen See section 5 6 for more about floating Side panels The New view setting allows you to choose whether the View preferences are to be shown automatically when opening a new view If this option is not chosen you can press Ctrl U 36 U on Mac to see the preferences panels of an open view The View Format allows you to change the way the elements appear in the Navigation Area The following text can be used to describe the element e Name this is the default information to be shown e Accession Sequences downloaded from databases like GenBank have an accession number e Latin name e Latin name accession e Common name e Common name accession The User Defined View Settings gives you an overview of the diff
26. CLC DNA Workbench User manual Manual for CLC DNA Workbench 6 6 Windows Mac OS X and Linux February 23 2012 This software is for research purposes only CLC bio Finlandsgade 10 12 DK 8200 Aarhus N gt Denmark o il bio Contents 1 2 Introduction Introduction to CLC DNA Workbench i 1 2 La 1 4 Licenses La 1 6 1 7 Plugins 1 8 1 9 Tutorials 2 1 Tutorial 2 2 Tutorial 2 3 Tutorial 2 4 Tutorial 2 5 Tutorial 2 6 Tutorial 2 Tutorial 2 8 Tutorial 2 9 Tutorial 2 10 Tutorial 2 11 Tutorial 2 12 Tutorial Contact information Download and installation System requirements About CLC Workbenches When the program is installed Getting started Network configuration The format of the user manual Getting started View sequence Side Panel Settings GenBank search and download Assembly In silico cloning cloning work flow Primer design BLAST search aoao a a a a Tips for specialized BLAST searches Align protein sequences Create and modify a phylogenetic tree Find restriction sites 10 12 12 15 15 2 29 30 33 34 CONTENTS Core Functionalities User interface opal 3 2 3 3 3 4 3 5 3 6 Navigation Area View Area 2 04 Zoom and selection in View Area Toolbox and Status Bar Workspace List of shortcuts Searching your data 4 1 What kind of information can be searched 00
27. HBB_ANAPP HBB AQUCH im HBB_CALJA Figure 19 7 Realigning using fixpoints In the top view fixpoints have been added to two of the sequences In the view below the alignment has been realigned using the fixpoints The three top sequences are very similar and therefore they follow the one sequence number two from the top that has a fixpoint aligned to each other Advanced use of fixpoints Fixpoints with the same names will be aligned to each other which gives the opportunity for great control over the alignment process It is only necessary to change any fixpoint names in very Special cases One example would be three sequences A B and C where sequences A and B has one copy of a domain while sequence C has two copies of the domain You can now force sequence A to align to the first copy and sequence B to align to the second copy of the domains in sequence C This is done by inserting fixpoints in sequence C for each domain and naming them fp and fp2 CHAPTER 19 SEQUENCE ALIGNMENT 353 for example Now you can insert a fixpoint in each of sequences A and B naming them fp1 and fp2 respectively Now when aligning the three sequences using fixpoints sequence A will align to the first copy of the domain in sequence C while sequence B would align to the second copy of the domain in sequence C You can name fixpoints by right click the Fixpoint annotation Edit Annotation S
28. PERH EJ Digest and Create Restriction Map amp TECCCATGGTT TCC Select Sequence T PERF Make Sequence Circular GAAATGGGAAGAGA 120 F web Info ds Google sequence TGC 160 2 NCBI in PubMed References PERH3BC ACTCTCCACTCACA CTC Figure 11 3 Open webpages with information about this sequence This will open your computer s default browser searching for the sequence that you selected 11 2 1 Google sequence The Google search function uses the accession number of the sequence which is used as search term on http www google com The resulting web page is equivalent to typing the accession number of the sequence into the search field on http www google com 11 2 2 NCBI The NCBI search function searches in GenBank at NCBI http www ncbi nlm nih gov using an identification number when you view the sequence as text it is the GI number Therefore the sequence file must contain this number in order to look it up at NCBI All sequences downloaded from NCBI have this number 11 2 3 PubMed References The PubMed references search option lets you look up Pubmed articles based on references contained in the sequence file when you view the sequence as text it contains a number of PUBMED lines Not all sequence have these PubMed references but in this case you will se a dialog and the browser will not open 11 2 4 UniProt The UniProt search function searches in the UniProt database http www ebi uniprot
29. Pressing Ctrl on Mac while you click will refine the existing sorting 383 APPENDIX C WORKING WITH TABLES 384 C 1 Filtering tables The final concept to introduce is Filtering The table filter as an advanced and a simple mode The simple mode is the default and is applied simply by typing text or numbers see an example in figure C 2 HA Find reading Rows 91 169 Find reading Frame output Filter Length Found ab strand Start codon 14 306 57a negative PIIN AM 405 B00 396 negative TT 1378 1 52 375 negative TAT E 1995 2403 2309 alz negative AAT Tn dae oes ee o Figure C 2 Typing neg in the filter in simple mode Typing neg in the filter will only show the rows where neg is part of the text in any of the columns also the ones that are not shown The text does not have to be in the beginning thus ega would give the same result This simple filter works fine for fast textual and non complicated filtering and searching However if you wish to make use of numerical information or make more complex filters you can switch to the advanced mode by clicking the Advanced filter j button The advanced filter is structure in a different way First of all you can have more than one criterion in the filter Criteria can be added or removed by clicking the Add S or Remove E buttons At the top you can choose whether all the criteria should be fulfilled Match all or if just one of th
30. Protein Analyses ha Create Protein Charge Plot L This opens the dialog displayed in figure 15 1 237 CHAPTER 15 PROTEIN ANALYSES 238 E q Create Protein Charge Plot 1 Select a protein Select a proe Projects Selected Elements 2 CLC_Data ss ATP8al Example Data ye 094296 Shs ATP8al Cloning Primers Protein analyses Protein orthologs ye gt P39524 f P57792 Sys 929449 gt Shs QONTI2 As 95x33 RNA secondary struc Sequencing data 4 uw p Qy zenter search term gt A Figure 15 1 Choosing protein sequences to calculate protein charge lf a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements You can perform the analysis on several protein sequences at a time This will result in one output graph showing protein charge graphs for the individual proteins Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish 15 1 1 Modifying the layout Figure 15 2 shows the electrical charges for three proteins In the Side Panel to the right you can modify the layout of the graph Protein charge 100 Charge 100 200 pH Figure 15 2 View of the protein charge See section B in the appendix for information about the graph
31. To install a plug in click the Download Plug ins tab This will display an overview of the plug ins that are available for download and installation see figure 1 24 Manage Plug ins and Resources T i o Ca Manage Plug ins Download Plug ins Manage Resources Download Resources oO Bookmark Navigator version 1 03 g ca E cen Additional allignments With this extension you can bookmark elements in the Navigation Area Version 1 02 Description Perform alignments with many different programs from within the workbench ClustalW Windows Mac Linux Muscle Windows Mac Linux T Coffee Mac Linux Download and install MAFFT Mac Linux Kalign Mac Linux Extract Annotations g Version 1 02 Extracts annotations from one or more sequences The result is a More information is available on the sequence list containing sequences covered by the specified Additional alignments plugin website annotations Additional information E Usage g Annee MEN GET ME Located in Toolbox gt Alignments and Trees gt Additional Alignments Version 1 02 Using this plug in it is possible to annotate a sequence From list of annotations found in a GFF file Y E Additional Alignments Located in the Toolbox bag HEE Clustal Alignment Q SignalP HEE Muscle Alignment Version 1 02 Clustal Alignment ht a iE Figure 1 24 The plug ins that are available for download a a Clicking a plug in
32. org using the accession number Furthermore it checks whether the sequence was indeed downloaded from UniProt CHAPTER 11 ONLINE DATABASE SEARCH 1 2 11 2 5 Additional annotation information When sequences are downloaded from GenBank they often link to additional information on taxonomy conserved domains etc If such information is available for a sequence it is possible to access additional accurate online information If the db xref identifier line is found as part of the annotation information in the downloaded GenBank file it is possible to easily look up additional information on the NCBI web site To access this feature simply right click an annotation and see which databases are available Chapter 12 BLAST Search Contents 12 1 Running BLAST searches 0 0 08 ee eee ee ee es 174 LA BEAST UNGER 4 caras Cone eee eee ee ES eee we a 175 12 1 2 BLAST a partial sequence against NCBI 206 178 12 1 3 BLAST against local data css a a a Dee we ED amp 178 12 1 4 BLAST a partial sequence against a local database 180 12 2 Output from BLAST searches 0 0 2 eee eee ee ee a 180 12 2 1 Graphical overview for each query sequence 2 8080 180 12 2 2 Overview BLAST table 0 0 0 0 ee eee te ee ee 180 12 2 3 BLAST gra aphitS g sceatean ee cee eRe EES ERS Gee ewe E 182 LO LL rrr rc ARARAS ae eee 183 12 3 Local BLAST databases 0 0 0 ee eee eee ee es 185 12 3 1 Make pre
33. type the name in the Name field 19 2 View alignments Since an alignment is a display of several sequences arranged in rows the basic options for viewing alignments are the same as for viewing sequences Therefore we refer to section 10 1 for an explanation of these basic options However there are a number of alignment specific view options in the Alignment info and the Nucleotide info in the Side Panel to the right of the view Below is more information on these view options Under Translation in the Nucleotide info there is an extra checkbox Relative to top sequence Checking this box will make the reading frames for the translation align with the top sequence so that you can compare the effect of nucleotide differences on the protein level The options in the Alignment info relate to each column in the alignment e Consensus Shows a consensus sequence at the bottom of the alignment The consensus sequence is based on every single position in the alignment and reflects an artificial sequence which resembles the sequence information of the alignment but only as one single sequence If all Sequences of the alignment is 100 identical the consensus sequence will be identical to all sequences found in the alignment If the sequences of the alignment differ the consensus sequence will reflect the most common sequences in the alignment Parameters for adjusting the consensus sequences are described below Limit This option deter
34. 11 2 1 Google sequence aoao oaoa ononon omo oa o a 171 Ki Ml bes eed eee kee ee eae ee hee ew eee ee eG 171 11 2 3 PubMed References 0 0 0 ee ee ee ee ee ee ee 171 hie VO ok ean eae eee ee eee ee ee Bee eee ee eS ee ee 171 11 2 5 Additional annotation information 00 28558 ee ee 172 CLC DNA Workbench offers different ways of searching data on the Internet You must be online when initiating and performing the following searches 11 1 GenBank search This section describes searches for sequences in GenBank the NCBI Entrez database The NCBI search view is opened in this way figure 11 1 Search Search for Sequences at NCBI or Ctrl B 3 B on Mac This opens the following view 11 1 1 GenBank search options Conducting a search in the NCBI Database from CLC DNA Workbench corresponds to conducting the search on NCBI s website When conducting the search from CLC DNA Workbench the results are available and ready to work with straight away You can choose whether you want to search for nucleotide sequences or protein sequences 167 CHAPTER 11 ONLINE DATABASE SEARCH 168 NCBI search Choose database Nucleotide O Protein al Fields v human x al Fields v hemoglobin x al Fields v complete B Add search parameters 8 Start search Append wildcard to search words Rows 50 Search results Filter
35. 211 377 Local Database BLAST 1 8 Locale setting 105 Location search in 101 of selection on sequence 92 path to 78 Side Panel 106 Locations multiple 376 Log of batch processing 138 Logo sequence 354 3 8 LR reaction Gateway cloning 326 ma4 file format 395 Mac OS X installation 13 Manage BLAST databases 188 Manipulate sequences 377 380 Manual editing auditing 105 Manual format 34 Marker in gel view 343 Maximize size of view 89 Maximum likelihood 3 9 Melting temperature DMSO concentration 252 dNTP concentration 252 Magnesium concentration 252 410 Melting temperature 252 Cation concentration 252 270 Cation concentration 2 2 Inner 252 Primer concentration 252 270 Primer concentration 2 2 Menu Bar illustration MFold 379 mmCIF file format 395 Mode toolbar 91 Modification date 160 Modify enzyme list 345 Modules 30 Molecular weight 215 Motif list 227 Motif search 221 227 379 Mouse modes 91 Move content of a view 92 elements in Navigation Area 80 sequences in alignment 358 msf file format 395 Multiple alignments 304 3 8 Multiplexing 2 9 by name 2 9 Multiselecting 80 Name 160 Navigation Area create local BLAST database 18 7 illustration NCBI 167 search sequence in 1 1 search tutorial 43 NCBI BLAST add more databases 38 Negatively charged residues 217 Neighbor Joining algorithm 373 Neighbor joining 379 Nested PCR prime
36. 24342 soe t lhos negative AT CADDE Cec ETEC mansabi Tr T Figure C 3 The advanced filter showing open reading frames larger than 400 that are placed on the negative strand Both for the simple and the advanced filter there is a counter at the upper left corner which tells you the number of rows that pass the filter 91 in figure C 2 and 15 in figure C 3 Appendix D BLAST databases Several databases are available at NCBI which can be selected to narrow down the possible BLAST hits D 1 Peptide sequence databases D 2 nr Non redundant GenBank CDS translations PDB SwissProt PIR PRF excluding those in env_nr refseq Protein sequences from NCBI Reference Sequence project http www ncbi nlm nih gov RefSeg swissprot Last major release of the SWISS PROT protein sequence database no incre mental updates pat Proteins from the Patent division of GenBank pdb Sequences derived from the 3 dimensional structure records from the Protein Data Bank http www rcsb org pdb env nr Non redundant CDS translations from env nt entries month All new or revised GenBank CDS translations PDB SwissProt PIR PRF released in the last 30 days Nucleotide sequence databases nr All GenBank EMBL DDBJ PDB sequences but no EST STS GSS or phase 0 1 or 2 HTGS sequences No longer non redundant due to computational cost refseq_rna MRNA sequences from NCBI Reference Sequence Project refseq_genomi
37. All the conflict annotations are preserved and in the sequence s history you will find a reference to the original contig As long as you also save the original contig you will always be able to go back to it by choosing the Reference contig in the consensus sequence s history see figure 2 22 User smoensted l Parameters Comments Edit No Comment Originates from Reference contig history ace Es oz g M y Figure 2 22 The history of the consensus sequence which has been extracted from the contig Clicking the blue text Reference contig will find and highlight the name of the saved contig in the Navigation Area Clicking the blue text history to the right will open the history view of the earlier contig From there you can choose other views such as the Read mapping view of the contig CHAPTER 2 TUTORIALS 92 2 6 Tutorial In silico cloning cloning work flow In this tutorial the goal is to virtually PCR amplify a gene using primers with restriction sites at the 5 ends and insert the gene into a multiple cloning site of an expression vector We start off with a set of primers a DNA template sequence and an expression vector loaded into the Workbench This tutorial will guide you through the following steps 1 Adding restriction sites to the primers 2 Simulating the effect of PCR by creating the fragment to use for cloning 3 Specify
38. Assemble sequences 291 to existing contig 295 to reference sequence 293 Assembly 376 tutorial 45 variance table 303 Atomic composition 217 attB sites add 319 Audit 105 Backup 123 Base pairs required for mispriming 257 Batch edit element properties 84 Batch processing 133 log of 138 Bibliography 403 Binding site for primer 2 1 Bioinformatic data export 122 formats 117 392 bl2seq see Local BLAST BLAST 377 against a local Database 1 8 against NCBI 175 contig 301 create database from file system 187 create database from Navigation Area 187 create local database 18 database file format 395 database management 188 graphics output 182 list of databases 386 parameters 1 6 search 174 1 75 sequencing data assembled 301 specify server URL 109 table output 183 tips for specialized searches 64 tutorial 61 64 URL 109 BLAST database index 187 BLAST DNA sequence BLASTn 175 BLASTX 175 tBLASTx 175 BLAST Protein sequence 406 BLASTp 176 tBLASTn 176 BLAST result search in 185 BLAST search Bioinformatics explained 189 BLOSUM scoring matrices 208 Bootstrap values 3 4 Borrow floating license 25 BP reaction Gateway cloning 324 Broken pair coloring 298 Browser import sequence from 119 Bug reporting 28 C G content 145 CDS translate to protein 149 Chain flexibility 146 Cheap end gaps 349 ChIP Seq analysis 3 0 Chromatogram traces scale 2 8 cif file for
39. At the top is there a graphical representation of BLAST hits with tool tips showing additional information on individual hits Below is a tabular form of the BLAST results 12 1 Running BLAST searches With the CLC DNA Workbench there are two ways of performing BLAST searches You can either have the BLAST process run on NCBI s BLAST servers http www ncbi nlm nih gov or you can perform the BLAST search on your own computer The advantage of running the BLAST search on NCBI servers is that you have readily access to the popular and often very large BLAST databases without having to download them to your own computer The advantages of running BLAST on your own computer include that you can use your own sequence collections as blast databases and that running big batch BLAST jobs can be faster and more reliable when done locally CHAPTER 12 BLAST SEARCH 1 5 12 1 1 BLAST at NCBI When running a BLAST search at the NCBI the Workbench sends the sequences you select to the NCBI s BLAST servers When the results are ready they will be automatically downloaded and displayed in the Workbench When you enter a large number of sequences for searching with BLAST the Workbench automatically splits the sequences up into smaller subsets and sends one subset at the time to NCBI This is to avoid exceeding any internal limits the NCBI places on the number of sequences that can be submitted to them for BLAST
40. CHAPTER 12 BLAST SEARCH 183 Color box For Line and Bar plots the color of the plot can be set by clicking the color box If a Color bar is chosen the color box is replaced by a gradient color box as described under Foreground color The remaining View preferences for BLAST Graphics are the same as those of alignments See section 19 2 Some of the information available in the tooltips is e Name of sequence Here is shown some additional information of the sequence which was found This line corresponds to the description line in GenBank if the search was conducted on the nr database e Score This shows the bit score of the local alignment generated through the BLAST search e Expect Also known as the E value A low value indicates a homologous sequence Higher E values indicate that BLAST found a less homologous sequence e Identities This number shows the number of identical residues or nucleotides in the obtained alignment e Gaps This number shows whether the alignment has gaps or not e Strand This is only valid for nucleotide sequences and show the direction of the aligned strands Minus indicate a complementary strand e Query This is the sequence or part of the sequence which you have used for the BLAST search e Sbjct subject This is the sequence found in the database The numbers of the query and subject sequences refer to the sequence positions in the submitted and found sequences If the subject se
41. Figure 16 5 Detailed information mode The number of information line groups reflects the chosen length interval for primers and probes One group is shown for every possible primer length Within each group a line is shown for every primer property that is selected from the checkboxes in the primer information preference group Primer properties are shown at each potential primer starting position and are of two types Properties with numerical values are represented by bar plots A green bar represents the starting point of a primer that meets the set requirement and a red bar represents the starting point of a primer that fails to meet the set requirement e G C content e Melting temperature e Self annealing score e Self end annealing score Secondary structure score Properties with Yes No values If a primer meets the set requirement a green circle will be shown at its starting position and if it fails to meet the requirement a red dot is shown at its starting position e C G at 3 end e C G at 5 end Common to both sorts of properties is that mouse clicking an information point filled circle or bar will cause the region covered by the associated primer to be selected on the sequence 16 4 Output from primer design The output generated by the primer design algorithm is a table of proposed primers or primer pairs with the accompanying information see figure 16 6 CHAPTER 16 PRIMERS 256 E pcDNA3 atp8al O
42. If you wish CHAPTER 5 USER PREFERENCES AND SETTINGS 111 Sequence layout CI Spaces every 10 residues No wrap Suto wrap O Fixed wrap 10000 Double stranded Numbers on sequences Relative to Numbers on plus strand Follow selection Lock labels Sequence label Mame t k Annotation layout k Annotation types k Restriction sites k Residue coloring k Nucleotide info k Find k Text Format Figure 5 9 The Sequence layout is expanded ee ee ga k r Figure 5 10 At the top of the Side Panel you can Expand all groups Collapse all preferences Dock Undock preferences Help and Save Restore preferences to change which settings should be used per default open the Preferences dialog see section 5 2 e Delete Settings Opens a dialog to select which of the saved settings to delete Apply Saved Settings This is a submenu containing the settings that you have previously saved By clicking one of the settings they will be applied to the current view You will also see a number of pre defined view settings in this submenu They are meant to be examples of how to use the Side Panel and provide quick ways of adjusting the view to common usages At the bottom of the list of settings you will see CLC Standard Settings which represent the way the program was set up when you first launched it r q Save Settings as Please enter a name for these user settings my settings v Alwa
43. Importing parts of the database Instead of importing the whole database automatically you can export parts of the database from Vector NTI Explorer and subsequently import into the Workbench First export a selection CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 121 SS vector NTI Data aa Proteins EE Nucleotide OE ADCY Hx Adenoz DOC ADRALA j Hx BaculoDirect Linear DMA i cem 2 BaculoDirect Linear DNA Clonir 3 an e BOY sn E BER OF j an e CDE ie Col 1 Figure 7 4 The Vector NTI Data folder containing all imported sequences of the Vector NTI Database of files as an archive as shown in figure 7 5 Exploring Local Vector NTI Database DNA RNA Edit View Analyses Align Database Assemble Tools Help Ge O Ea ta Order Open ase DNA RNA Molecules Edit Linear Basic NCBI Entrez NCBI E New 2 35937 Linear Basic NCBI Entrez NCBIE Import 2306 Linear Basic NCBI Entrez NCBI E Molecule into Text file Linear Basic Invitrogen Invitro Gateway cloning Sequence into Tert file es en cs Evo TOPO wizard 5 Circular Basic NCBI Entrez NCBI E Selection into Archive Linear Basic NCBI Entrez NCBIE Delete with Descendants from DB 22260 Linear Basic NCBI Entrez NCBIE 6 Circular Basic NCBI Entrez MCBI E Exclude from Subset Linear Basic NCBI Entrez NCBIE gt Delete from Database Linear Basic NCBI Entrez NCBIE Linear Basic NCBI Entrez NCBI E Baws m m m Figure 5 Select the relevant files and ex
44. Navigation Area Toolbox in the Menu Bar Primers and Probes E1 Analyze Primer Properties CHAPTER 16 PRIMERS 2 0 cd Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum G C content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements 5 end must meet G C requirements Probe parameters Minimum number of mismatches 1 Minimum number of mismatches in central part 1 Primer combination parameters Max percentage point difference in G C content Max difference in melting temperatures within a primer pair Max hydrogen bonds between pairs Max hydrogen bonds between pair ends Minimum difference in melting temperature Primers Probes Maximum length of amplicon wf Calculate Help Figure 16 14 Calculation dialog shown when designing alignment based TaqMan probes If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove a sequence from the selected elements Clicking Next generates the dialog seen in figure 16 15 f BB Analyze Primer Properties Ea 1 You can only select one repare ES single sequence 2 Set parameters Concentrations Primer concentration nM 200 gt Salt concentration mM 100 lt Template 5
45. Note that the number of mismatches is reported in the output so you will be able to filter on this afterwards see below Below the match settings you can adjust Concentrations concerning the reaction mixture This is used when reporting melting temperatures for the primers e Primer concentration Specifies the concentration of primers and probes in units of nanomoles nM e Salt concentration Specifies the concentration of monovalent cations N 47 A and equivalents in units of millimoles mM 16 11 2 Results binding sites and fragments Click Next to specify the output options as shown in figure 16 18 The output options are e Add binding site annotations This will add annotations to the input sequences see details below CHAPTER 16 PRIMERS 2 3 Find Binding Sites and Create Fragments Select nucleotide RESUR TES sequences to match primer against Set Primer properties Output format Result handling Add binding site annotations Create binding site table Create fragment table Min Fragment length 100 Max fragment length 2 000 gt Result handling Open Save Log handling Figure 16 18 Output options include reporting of binding sites and fragments e Create binding site table Creates a table of all binding sites Described in details below e Create fragment table Showing a table of all fragments that could result from using the primers Note that you can s
46. Primer 3 GGTGGGAGGTCTATATAA CCACCCTCCAGATATATT dangler 3 Template 5 dangler q Previous gt Next XX Cancel Figure 16 15 The parameters for analyzing primer properties In the Concentrations panel a number of parameters can be specified concerning the reaction mixture and which influence melting temperatures e Primer concentration Specifies the concentration of primers and probes in units of nanomoles nM e Salt concentration Specifies the concentration of monovalent cations N 47 K and equivalents in units of millimoles mM CHAPTER 16 PRIMERS 2 1 In the Template panel the sequences of the chosen primer and the template sequence are shown The template sequence is as default set to the reverse complement of the primer sequence i e as perfectly base pairing However it is possible to edit the template to introduce mismatches which may affect the melting temperature At each side of the template sequence a text field is Shown Here the dangling ends of the template sequence can be specified These may have an important affect on the melting temperature Bommarito et al 2000 Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish The result is shown in figure 16 16 Fez Primer proper Primer Table settme Rows 1 Primer properties For sequence Primer Fite O g a L oo Sequence Melt Self annealing alignmen
47. Selection Mode Show hide Side Panel Sort folder Split Horizontally Split Vertically Undo User Preferences Zoom In Mode Zoom In without clicking Zoom Out Mode Zoom Out without clicking Inverse zoom mode Windows Linux Shift arrow keys Ctrl tab Ctrl W Ctrl Shift W Ctrl C Ctrl X Delete Alt F4 Ctrl E Ctrl G Space or F1 Ctrl Ctrl M Ctrl arrow keys arrow keys Ctrl Shift N Ctrl N Ctrl O Ctrl V Ctrl P Ctrl Y F2 Ctrl S Ctrl F Ctrl Shift F Ctrl B Ctrl Shift U Ctri A Ctrl 2 Ctrl U Ctrl Shift R Ctrl T Ctrl J Ctrl Z Ctrl K Ctrl plus plus Ctrl minus minus press and hold Shift 96 Mac OS X Shift arrow keys Ctrl Page Up Down ao W a6 Shift W a C a X Delete or Backspace db Q ao E a G Space or F1 ao M db arrow keys arrow keys Shift N N O V P Y S F Shift F B Shift U A 2 U Shift R 7 J Z 3 plus d 4 SE SESI LILLE NIE IEEE minus press and hold Shift Combinations of keys and mouse movements are listed below tOn Linux changing tabs is accomplished using Ctrl Page Up Page Down CHAPTER 3 USER INTERFACE 97 Action Windows Linux Mac OS X Mouse movement Maximize View Double click the tab of the View Restore View Double click the View title EL
48. The protein alignment as it looks when you open it with background color according to the Rasmol color scheme and automatically wrapped Now we are going to modify how this alignment is displayed For this we use the settings in the Side Panel to the right All the settings are organized into groups which can be expanded collapsed by clicking the name of the group The first group is Sequence Layout which is expanded by default First select No wrap in the Sequence Layout This means that each sequence in the alignment is kept on the same line To see more of the alignment you now have to scroll horizontally Next expand the Annotation Layout group and select Show Annotations Set the Offset to More offset and set the Label to Stacked Expand the Annotation Types group Here you will see a list of the types annotation that are carried by the sequences in the alignment see figure 2 6 Check the Region annotation type and you will see the regions as red annotations on the sequences Next we will change the way the residues are colored Click the Alignment Info group and under Conservation check Background color This will use a gradient as background color for the residues You can adjust the coloring by dragging the small arrows above the color box 2 3 1 Saving the settings in the Side Panel Now the alignment should look similar to figure 2 7 At this point if you just close the view the changes made to the Side Pane
49. This opens a dialog where you can alter your choice of sequences which you want to create Statistics for You can also add sequence lists Note You cannot create statistics for DNA and protein sequences at the same time When the sequences are selected click Next This opens the dialog displayed in figure 13 15 The dialog offers to adjust the following parameters e Individual statistics layout If more sequences were selected in Step 1 this function generates separate statistics for each sequence e Comparative statistics layout If more sequences were selected in Step 1 this function generates statistics with comparisons between the sequences CHAPTER 13 GENERAL SEQUENCE ANALYSES 213 a g Create Sequence Statistics 8 1 Select sequences of same SEE pa amete type 2 Set parameters Layout Individual statistics layout Comparative statistics layout Background distribution For proteins Include background distribution of amino acids Based on Homo Sapiens human JCS etreias pue JU Jeh Xena Figure 13 15 Setting parameters for the sequence statistics You can also choose to include Background distribution of amino acids If this box is ticked an extra column with amino acid distribution of the chosen species is included in the table output The distributions are calculated from UniProt www uniprot org version 6 0 dated September 13 2005 Click Next if you wish to adjus
50. We use the ATPase protein alignment located in Protein orthologs in the Example data To create a phylogenetic tree click the ATPase protein alignment in the Navigation Area Toolbox Alignments and Trees Create Tree E A dialog opens where you can confirm your selection of the alignment Click Next to move to the next step in the dialog where you can choose between the neighbor joining and the UPGMA algorithms for making trees You also have the option of including a bootstrap analysis of the result Leave the parameters at their default and click Finish to start the calculation which can be seen in the Toolbox under the Processes tab After a short while a tree appears in the View Area figure 2 55 Te protein align x P68053 Tree Settings P 6so046 ill k T r Tree Layout Node symbol Layout Standard P68345 Show internal node labels P68063 E Label color Branch label color Mode color EE Line color Annotation Layout Branches Bootstrap k Text Format Figure 2 55 After choosing which algorithm should be used the tree appears in the View Area The Side panel in the right side of the view allows you to adjust the way the tree is displayed 2 11 1 Tree layout Using the Side Panel in the right side of the view you can change the way the tree is displayed Click Tree Layout and open the Layout drop down menu Here you can choose be
51. ax JEPAC 3261 Linear Basic NCBI Entrez NCBI ac FYN 2647 Linear Basic NCBI Entrez NCBI ae GNAT1 3367 Linear Basic NCBI Entrez NCBI ram mm 343 DNA RNA molecules Figure 7 2 Data stored in the Vector NTI Local Database accessed through Vector NTI Explorer File Import Vector NTI Database Edit Search View Toolbox Workspace Help g Show Ctrl 0 Extract Sequences New Show C Close Ctrl W Close Tab Area Close All Views Ctrl Shift W Close Other Tabs Save Ctrl S E Save As Ctrl Shift S ES Import Ctrl I ES Import VectorNTI Data c8 Export Ctrl E Export with Dependent Elements Export Graphics Ctrl G Location b P Page Setup amp Print Ctrl P S Exit Alt F4 Figure 7 3 Import the whole Vector NTI Database This will bring up a dialog letting you choose to import from the default location of the database or you can specify another location If the database is installed in the default folder like e g C VNTI Database press Yes If not click No and specify the database folder manually When the import has finished the data will be listed in the Navigation Area of the Workbench as shown in figure 7 4 If something goes wrong during the import process please report the problem to sup port clcbio com To circumvent the problem see the following section on how to import parts of the database It will take a few more steps but you will most likely be able to import this way
52. i e chromosome coordinates If you export including gaps the data points in the file no longer corresponds to the reference coordinates because each gap will shift the coordinates Clicking Next will present a file dialog letting you specify name and location for the file The output format of the file is like this CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 130 Position Valice LS EM Oo ue corto ee a 7 5 Copy paste view output The content of tables e g in reports folder lists and sequence lists can be copy pasted into different programs where it can be edited CLC DNA Workbench pastes the data in tabulator separated format which is useful if you use programs like Microsoft Word and Excel There is a huge number of programs in which the copy paste can be applied For simplicity we include one example of the copy paste function from a Folder Content view to Microsoft Excel First step is to select the desired elements in the view click a line in the Folder Content view hold Shift button press arrow down up key See figure 16 L3 Sequences Contents of Sequences Filter Name Description Length AY738615 Homo sapiens hemoglobin delta beta Fusion protein HBD HBB gene 180 HUMDINUC Human dinucleotide repeat polymorphism at the D115439 and HBB loci 190 HUMHBB Human beta globin region on chromosome NH 000044 Homo sapiens androgen receptor dihydrotestosterone receptor testi 4314 IPER
53. score 1567 8 bits 4050 Expect 0E00 TFOZATOBA M identities 779 1144 68 Positives 933 1144 82 Gaps 29 1144 2 3920 478B1_ HUMAN SS e 2G3 AT11E HUMAN m T B196 AT114 HUMAN _ lt _ 1111 lt 1 1 IB4S AT11C HUMAN CA23JATOBS HUMAN 0423 478B3_ HUMAN J1TO ATPSA HUMAN BOC dA TAAAC LITIR A RI 4 es PRSE NE Figure 12 8 Default display of the output of a BLAST search for one query sequence At the top is there a graphical representation of BLAST hits with tool tips showing additional information on individual hits Mukti BLAST E sp Q9NTIZIATEA HUMAN Probable phospholipicttransporting ATPase IB ATPase class 2 ML 1 Rows 6 Filter B s g Column width F Query Number of hits Lowest E value Accession E value Automatic w 094296 101 0 00 NP_596486 up cmn P39524 101 0 00 P39524 V Query p57792 101 0 00 NP_173938 TF Number of hits 029449 113 0 00 NP 777263 a QONTI2 111 0 00 NP 057613 Lowest E value Q95 33 102 0 00 NP 177038 Accession E value Description E value Greatest identity Accession identity Description identity Greatest positive Accession positive Description positive Open BLAST Output Open Query Sequence Greatest hit length Accession hit length Figure 12 9 An overview BLAST table summarizing the results for a number of query sequences In
54. sequence lists and the cloning editor and choose Digest All Sequences with Selected Enzymes and Run on Gel Note When using the right click options the sequence will be digested with the enzymes that are selected in the Side Panel This is explained in section 10 1 2 The view of the gel is explained in section 18 4 3 18 4 2 Separate sequences on gel To separate sequences without restriction enzyme digestion first create a sequence list of the sequences in question see section 10 7 Then click the Gel button EE at the bottom of the view of the sequence list For more information about the view of the gel see the next section 18 4 3 Gel view In figure 18 49 you can see a simulation of a gel with its Side Panel to the right This view will be explained in this section CHAPTER 18 CLONING AND CUTTING 342 Separated sequences HUMDINUC pBR322 HUMHBB Figure 18 48 A sequence list shown as a gel Restriction nv a el Setting i T 2 co x x oO 45 5 p 2 q q D Gel options v I I I I ia a 14 14 Joel background wi W wi Ww wi 2 oa oa oa oa Wiser Scale band spread E Show marker ladder 3 5 10 20 50 200 2 Sequences in separate lanes All sequences in one lane b Text Format lt RASA Figure 18 49 Five lanes showing fragments of five sequences cut with restriction enzymes Information on bands fragments You can get
55. the GenBank format You can add as many qualifier key lines as you wish by clicking the button Redundant lines can be removed by clicking the delete icon 4 The information entered on these lines is shown in the annotation table See section 10 3 1 and in the yellow box which appears when you place the mouse cursor on the annotation If you write a hyperlink in the Key text field like e g www clcbio com it will be recognized as a hyperlink Clicking the link in the annotation table will open a web browser Click OK to add the annotation Note The annotation will be included if you export the sequence in GenBank Swiss Prot or CLC format When exporting in other formats annotations are not preserved in the exported file 10 3 3 Edit annotations To edit an existing annotation from within a sequence view right click the annotation Edit Annotation This will show the same dialog as in figure 10 10 with the exception that some of the fields are filled out depending on how much information the annotation contains There is another way of quickly editing annotations which is particularly useful when you wish to edit several annotations CHAPTER 10 VIEWING AND EDITING SEQUENCES 159 To edit the information simply double click and you will be able to edit e g the name or the annotation type If you wish to edit the qualifiers and double click in this column you will see the dialog for editing annotations Advanced editing o
56. the color box is replaced by a gradient color box as described under Foreground color e G C content Calculates the G C content of a part of the sequence and shows it as a gradient of colors or as a graph below the sequence Window length Determines the length of the part of the sequence to calculate A window length of 9 will calculate the G C content for the nucleotide in question plus the 4 nucleotides to the left and the 4 nucleotides to the right A narrow window will focus on small fluctuations in the G C content level whereas a wider window will show fluctuations between larger parts of the sequence Foreground color Colors the letter using a gradient where the left side color is used for low levels of G C content and the right side color is used for high levels of G C content The sliders just above the gradient color box can be dragged to highlight relevant levels of G C content The colors can be changed by clicking the box This will show a list of gradients to choose from CHAPTER 10 VIEWING AND EDITING SEQUENCES 146 Background color Sets a background color of the residues using a gradient in the same way as described above Graph The G C content level is displayed on a graph Learn how to export the data behind the graph in section 4 x Height Specifies the height of the graph x Type The graph can be displayed as Line plot Bar plot or as a Color bar Color box For Line and Bar plots the color o
57. 13 are included in the initial seeding After initial finding of words seeding the BLAST algorithm will extend the only 3 residues long alignment in both directions see figure 12 17 Each time the alignment is extended an alignment score is increases decreased When the alignment score drops below a predefined threshold the extension of the alignment stops This ensures that the alignment is not extended to regions where only very poor alignment between the query and hit sequence is possible If the obtained alignment receives a score above a certain threshold it will be included in the final BLAST result By tweaking the word size W and the neighborhood word threshold T it is possible to limit the search space E g by increasing T the number of neighboring words will drop and thus limit the search space as shown in figure 12 18 This will increase the speed of BLAST significantly but may result in loss of sensitivity Increasing the word size W will also increase the speed but again with a loss of sensitivity CHAPTER 12 BLAST SEARCH 192 Query 325 SLAALLNKCKTPOGQRLVNQWIKOPLMDKNRIEERLNLVEA 365 LA L TP G R W P D ER A Sbjct 290 TLASVLDCTVTPMGSRMLKRWLHMPVRDTRVLLERQQTIGA 330 Figure 12 17 Blast aligning in both directions The initial word match is marked green N D e G T 12 m og O dp Sequence 1 N ab O D 5 T 16 O 0 N Sequence 1 Figure 12 18 Each dot represents a word match Increasing
58. 2 8 Quality of trace 289 Quality score of trace 289 Quality scores 145 Quick start 29 Rasmol colors 144 Reading frame 234 Realign alignment 3 8 Reassemble contig 304 Rebase restriction enzyme database 343 Rebuild index 103 Recognition sequence insert 318 Recycle Bin 83 Redo alignment 350 412 Redo Undo 87 Reference sequence 3 6 References 403 Region types 149 Remove annotations 160 sequences from alignment 358 terminated processes 93 Rename element 83 Report program errors 28 Report protein 377 Request new feature 28 Residue coloring 144 Restore deleted elements 83 size of view 89 Restriction enzmyes filter 330 332 336 344 from certain suppliers 330 332 336 344 Restriction enzyme list 343 Restriction enzyme star activity 343 Restriction enzymes 327 compatible ends 334 cutting selection 331 isoschizomers 334 methylation 330 332 336 344 number of cut sites 329 overhang 330 332 336 344 separate on gel 341 sorting 329 Restriction sites 327 3 8 enzyme database Rebase 343 select fragment 149 number of 337 on sequence 143 327 parameters 335 tutorial 72 Results handling 136 Reverse complement 231 3 8 Reverse complement contig 297 Reverse sequence 232 Reverse translation 243 378 Bioinformatics explained 245 Right click on Mac 34 RNA secondary structure 3 9 RNA translation 232 RNA Seq analysis 376 INDEX rnaml file format 395 Saf
59. 3 2 3 Close views When a view is closed the View Area remains open as long as there is at least one open view A view is closed by right click the tab of the View Close or select the view Ctrl W or hold down the Ctrl button Click the tab of the view while the button is pressed CHAPTER 3 USER INTERFACE 87 By right clicking a tab the following close options exist See figure 3 10 a P68046 O aet P68053Q agt Poasa O at P File k k view k k k HBE Toolbox Show PF68225 MVHLTPEEKNAVTTLWG D Close erly B Close Tab Area HBB TE Close all views Ctrl 5hift w E Reid Pee225 ESFGDLSSPDAVMGNPK ILDNL S save as Ctrl Shift 5 Figure 3 10 By right clicking a tab several close options are available e Close See above e Close Tab Area Closes all tabs in the tab area Close All Views Closes all tabs in all tab areas Leaves an empty workspace Close Other Tabs Closes all other tabs in all tab areas except the one that is selected 3 2 4 Save changes in a view When changes are made in a view the text on the tab appears bold and italic on Mac it is indicated by an before the name of the tab This indicates that the changes are not saved The Save function may be activated in two ways Click the tab of the view you want to save Save HD in the toolbar or Click the tab of the view you want to save Ctrl S 38 S on Mac If you close a view containing an element that has been change
60. AND SETTINGS 109 5 4 Advanced preferences The Advanced settings include the possibility to set up a proxy server This is described in section 1 8 5 4 1 Default data location If you have more than one location in the Navigation Area you can choose which location should be the default data location The default location is used when you e g import a file without selecting a folder or element in the Navigation Area first Then the imported element will be placed in the default location Note The default location cannot be removed You have to select another location as default first 5 4 2 NCBI BLAST URL to use for BLAST It is possible to specify an alternate server URL to use for BLAST searches The standard URL for the BLAST server at NCBI is http blast ncbi nlm nih gov Blast cgi Note Be careful to specify a valid URL otherwise BLAST will not work 5 5 Export import of preferences The user preferences of the CLC DNA Workbench can be exported to other users of the program allowing other users to display data with the same preferences as yours You can also use the export import preferences function to backup your preferences To export preferences open the Preferences dialog Ctrl K 46 on Mac and do the following Export Select the relevant preferences Export Choose location for the exported file Enter name of file Save Note The format of exported preferences is cpf This notation must be submit
61. BLOSUM In 1992 14 years after the PAM matrices were published the BLOSUM matrices BLOcks SUbstitution Matrix were developed and published Henikoff and Henikoff 1992 Henikoff et al wanted to model more divergent proteins thus they used locally aligned sequences where none of the aligned sequences share less than 62 identity This resulted CHAPTER 13 GENERAL SEQUENCE ANALYSES 210 I em i E foo to Mo ot Mo Mo Bo Bo E E Mo 4 WONFNWARPRPWNHWNHNWNHDND BBW W ONWORRNEKBRBBNOKBKRONNE AD ONWRPERNWENNWONOKRWBNOUERD WONRORNWNHNOWWKRODOOWKOAONZ WOWKRRPORWBHWEKWERNOWOAKRNNDT PONBPBRwWBNHRWOKRKRWBWBAWOWWDWOO MD EPNRPORWOKNWONNOUWOOKKO MUNWRPORWNKWWONUNAKANOOHRM M WWNHNONWWNKAKRNANNWRONOD ONNNEBNEBNRWBWANDOOWRKRON IT WR WRNWORWNAWEAWWHRWWWK PRPENOBEBNWBONNANWBEKAWNHKRAWNEOC MNWBRORWKRUNWENKBKBRWBRONB SR RABAANOGANAENANORONHHAS PWKRNYONKHMAOWDDOORWBWWNWWWNHT NONRNBPANANVNAGONNARORNNHTO MNWRPARNRONNKRODOKROKBBKBE YD ONNORBENKBRBREBKBNNKBRKBBRBROBR O 4 PNNNNWBWENEPBRNWBNHKENWNND lt RE WBONNBRKRENBRWWWNHNKRWWWOK lt lt lt S 70 707 2Z2AHRTF 7TOAMODV2Z2AD YS Table 13 1 The BLOSUM62 matrix A tabular view of the BLOSUM62 matrix containing all possible substitution scores Henikoff and Henikoff 1992 in a scoring matrix called BLOSUM62 In contrast to the PAM matrices the BLOSUM matrices are calculated from alignments without gaps emerging from the BLOCKS database http Fi DCCs Tiers org Sean Eddy recently
62. HH sequence list Accession Definition Modification Date Length P maniculatus dee 27 APR 1953 110 PERHIBE M15289 Pimaniculatus dee 27 APR 1993 PERHZBA M15293 Pimaniculatus dee 27 APR 1993 PERHZEE M15290 Pimaniculatus dee 27 APR 1993 M15291 Pimaniculatus dee 27 APR 1993 Figure 10 16 A sequence list containing multiple sequences can be viewed in either a table or in a graphical sequence list The graphical view is useful for viewing annotations and the sequence itself while the table view provides other information like sequence lengths and the number of sequences in the list number of Rows reported 10 7 1 Graphical view of sequence lists The graphical view of sequence lists is almost identical to the view of single sequences see section 10 1 The main difference is that you now can see more than one sequence in the same view However you also have a few extra options for sorting deleting and adding sequences e To add extra sequences to the list right click an empty white space in the view and select Add Sequences e To delete a sequence from the list right click the sequence s name and select Delete Sequence e To sort the sequences in the list right click the name of one of the sequences and select Sort Sequence List by Name or Sort Sequence List by Length e To rename a sequence right click the name of the sequence and select Rename Sequence 10 7 2 Sequence list table Each s
63. Hence click Zoom Out 5 in the Toolbar click the sequence until you can see the whole sequence This sequence is circular which is indicated by lt lt and gt gt at the beginning and the end of the sequence In the following we will show how the same sequence can be displayed in two different views one linear view and one circular view First Zoom in to see the residues again by using the Zoom In 40 or the 100 4 Then we make a split view by press and hold the Ctrl button on the keyboard 38 on Mac click Show as Circular at the bottom of the view This opens an additional view of the vector with a circular display as can be seen in figure 2 4 act prONAS atp al S pcONAS atp8al EGACGGAT CGGGAGATCTCCCGATCCCCTATGGICGACTCTCAGT 60 a0 pcDNA3 atpsal ACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTATCTGCTCCC O E DE pcDNA3 atpsal Sal Sal Ampicillin ORF ColE4 w S pcDNA3 atp8ai rsi 9118 bp Smat Neomycin ORF SV40 origin of replicatio V 40 promoter Aho Sal BHG Poly A Sp promoter 0 E 0 E Figure 2 4 The resulting two views which are split horizontally Make a selection on the circular sequence remember to switch to the Selection h tool in the tool bar and note that this selection is also reflected in the linear view above CHAPTER 2 TUTORIALS 41 2 3 Tutorial Side Panel Settings This brief tutorial will show you how to use the Side Panel to change th
64. If you have specified a set of enzymes which you always use it will probably be a good idea to save the settings in the Side Panel see section 3 2 7 for future use Show enzymes cutting inside outside selection Section 18 3 1 describes how to add more enzymes to the list in the Side Panel based on the name of the enzyme overhang methylation sensitivity etc However you will often find yourself in a situation where you need a more sophisticated and explorative approach An illustrative example you have a selection on a sequence and you wish to find enzymes cutting within the selection but not outside This problem often arises during design of cloning experiments In this case you do not know the name of the enzyme so you want the Workbench to find the enzymes for you CHAPTER 18 CLONING AND CUTTING 332 right click the selection Show Enzymes Cutting Inside Outside Selection HE This will display the dialog shown in figure 18 35 where you can specify which enzymes should initially be considered a ke Show Enzymes Cutting Inside Outside Selection Es O 1 Enzymes to be considered Enzym Es CO DE considered In Calculation in calculation Enzyme list J Use existing enzyme list Popular enzymes X o Enzymes in Popular en Enzymes to be used Filter Filter Name Overhang Methylation Popularity Name Overhang Methylation Popula BamHI 5 gate N4 methyl a Smal Blunt N4 meth Bgl
65. Import button Note that there is also another import button at the very bottom of the dialog but this will import the other settings of the Preferences dialog see section 5 5 The dialog asks if you wish to overwrite existing Side Panel settings or if you wish to merge the imported settings into the existing ones see figure 5 7 os How do you want to import Merge into existing styles Overwrite existing styles da X Cancel Figure 5 7 When you import settings you are asked if you wish to overwrite existing settings or if you wish to merge the new settings into the old ones Note If you choose to overwrite the existing settings you will loose all the Side Panel settings that you have previously saved To avoid confusion of the different import and export options here is an overview e Import and export of bioinformatics data such as sequences alignments etc described in section 7 1 1 e Graphics export of the views which creates image files in various formats described in section 3 e Import and export of Side Panel Settings as described above e Import and export of all the Preferences except the Side Panel settings This is described in the previous section 5 3 Data preferences The data preferences contain preferences related to interpretation of data e g linker sequences e Predefined primer additions for Gateway cloning See section 18 2 1 CHAPTER 5 USER PREFERENCES
66. Import Export Description CLC cle X X Rich format including all information Clustal Alignment aln X X GCG Alignment msf X X Nexus NXs Nexus X X Phylip Alignment phy X X Zip export zip X Selected files in CLC format Zip import zip gzip tar X Contained files folder structure G 1 4 Tree formats File type Suffix Import Export Description CLC cle X X Rich format including all information Newick wk X X Nexus nXS nexus X X Zip export Zip X Selected files in CLC format Zip import Zip gzip tar X Contained files folder structure APPENDIX G FORMATS FOR IMPORT AND EXPORT G 1 5 Miscellaneous formats File type Suffix BLAST Database phr nhr CLC clc CSV CSV Excel xIs xIsx GFF gff mmcCIF cif PDB pdb Tab delimited txt Text txt Zip export Zip Zip import 395 Import Export Description X X zip gzip tar X Link to database imported Rich format including all information All tables All tables and reports See http www clcbio com annotate with gff 3D structure 3D structure All tables All data in a textual format Selected files in CLC format Contained files folder structure Note The Workbench can import external files too This means that all kinds of files can be imported and displayed in the Navigation Area but the above mentioned formats are the only ones whose contents can be shown in the Workbench G 2 List of graphics data formats Below is a list of form
67. In this dialog you will be able to specify how this trimming should be performed For this data we wish to use a more stringent trimming so we set the limit of the quality score trim to 0 02 see figure 2 12 g Trim Sequences Es m 1 Select nucleotide set trim parameters sequences 2 Set trim parameters Sequence trimming Ignore existing trim information Trim using quality scores Limit 0 02 Trim using ambiguous nucleotides Residues 2 Vector trimming Trim contamination from vectors in UniVec database Trim contamination from saved sequences to be chosen in the next step limit moderate DA E to a one Figure 2 12 Specifying how sequences should be trimmed A stringent trimming of 0 02 is used in this example There is no vector contamination in these data se we only trim for poor quality If you place the mouse cursor on the parameters you will see a brief explanation Click Next and choose to Save the results When the trimming is performed the parts of the sequences that are trimmed are actually annotated not removed see figure 2 13 By choosing Save the Trim annotations will be saved directly to the sequences without opening them for you to view first CHAPTER 2 TUTORIALS 46 Trim CAGCACAGAGGTCATACTGGCATTCTGAACG A Www WAV lh Figure 2 13 Trimming creates annotations on the regions that will be ignored in the assembly process These annotated
68. MAT X PAD caso BGR awe ss da A eS gt 100 Fixed wrap Conservation 0 6 80 l l 4 Numbers on sequences Q29449 2 25 se ee ee ee uu ee uu E O9NTIZ Ss eect eee ee eee ee ere eee Ce las us Relative to 1 p39524 BBTTSHSGSR SKETNSHANG H PPrPsH Oe EETIDEDADO s Lock numbers 094296 BEREDRECSE sSoMESsSScoN STNP BRAD 6 ae PSII senao eu do Wa eed Gaon a dra a aca da ADE oat dl ae es 1 A ide labels Q95X33 2 222 eee eee ee eee Be ee Be eee 11 V Lock labels CONSEIA lt s 8s S82 se See Sa Sees SSS SSeS SS ns SS mia es Sequence label 100 N Conservation ane x 04 Domed ie Dees eee annal em emo Show selection boxes E y Of Figure 2 54 The resulting alignment Note The new alignment is not saved automatically To save the alignment drag the tab of the alignment view into the Navigation Area Installing the Additional Alignments plugin gives you access to other alignment algorithms ClustalW Windows Mac Linux Muscle Windows Mac Linux T Coffee Mac Linux MAFFT Mac Linux and Kalign Mac Linux The Additional Alignments Module can be downloaded from http www clcbio com plugins Note that you will need administrative privileges on your CHAPTER 2 TUTORIALS 11 system to install it 2 11 Tutorial Create and modify a phylogenetic tree You can make a phylogenetic tree from an existing alignment See how to create an alignment in the tutorial Align protein sequences
69. N met lt a cg 5 S meth tgca Hk qc 5 S meth seo Apal Ball Chal FokI Hhal NsiI Sacll 03 a a CM 0 0 0 0 0 6 w Figure 18 41 Selecting enzymes If you need more detailed information and filtering of the enzymes either place your mouse cursor on an enzyme for one second to display additional information see figure 18 52 or use the view of enzyme lists See 18 5 All enzymes Filter 3 Name Overh Methyl Pop PstI 3 N6 meth terre a KpnI 3 N meth e Sacl 3 S methyl Pet Sphl 3 ia Apal 3 5 methyl er Sacll is S methyl et Enzyme Sacll Recognition site pattern CCGCGG Suppliers GE Healthcare Qbiogene American Allied Biochemical Inc Nippon Gene Co Ltd Takara Bio Inc New England Biolabs Toyobo Biochemicals Molecular Biology Resources Promega Corporation EURx Ltd Figure 18 42 Showing additional information about an enzyme like recognition sequence or a list of commercial vendors CHAPTER 18 CLONING AND CUTTING 337 Number of cut sites Clicking Next confirms the list of enzymes which will be included in the analysis and takes you to the dialog shown in figure 18 43 4 g Restriction Site Analysis 1 Select DNA RNA Number of cut sites sequence s Ex 2 Enzymes to be considered in calculation 3 Number of cut sites Display enzymes with No restricti
70. O Create Alignment HEE Wl 5 mM Create Alignment Es PMR RRRRRRRRRR RRR RRR A Search Database nucleotide NC 012671 E HRRRRRRRRRR RRR RRRRRRRRRR RRR 100 Figure 3 17 A database search and an alignment calculation are running Clicking the small icon next to the process allow you to stop pause and resume processes Besides the options to stop pause and resume processes there are some extra options for a selected number of the tools running from the Toolbox e Show results If you have chosen to save the results see section 9 2 you will be able to open the results directly from the process by clicking this option e Find results If you have chosen to save the results see section 9 2 you will be able to high light the results in the Navigation Area e Show Log Information This will display a log file showing progress of the process The log file can also be shown by clicking Show Log in the handle results dialog where you choose between saving and opening the results e Show Messages Some analyses will give you a message when processing your data The messages are the black dialogs shown in the lower left corner of the Workbench that disappear after a few seconds You can reiterate the messages that have been shown by clicking this option The terminated processes can be removed by View Remove Terminated Processes 3C If you close the program while there are running p
71. The peak is called by changing the residue to an ambiguity character and by adding an annotation at this position To call secondary peaks select sequence s Toolbox in the Menu Bar Sequencing Data Analyses F Call Secondary Peaks This opens a dialog where you can alter your choice of sequences When the sequences are selected click Next This opens the dialog displayed in figure 17 26 The following parameters can be adjusted in the dialog e Percent of max peak height for calling Adjust this value to specify how high the secondary peak must be to be called e Use IUPAC code N for ambiguous nucleotides When a secondary peak is called the residue at this position can either be replaced by an N or by a ambiguity character based on the IUPAC codes see section CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 306 a g Secondary Peak Calling 1 Select nucleotide 4 sequences with traces 2 Set parameters Calling parameters Percent of max peak height for calling 205 9 Use IUPAC code for ambiguous nucleotides Use N for ambiguous nucleotides V Add annotations II e Piet Veh Xe Figure 17 26 Setting parameters secondary peak calling e Add annotations In addition to changing the actual sequence annotations can be added for each base which has been called Click Next if you wish to adjust how to handle the results see section 9 2 If not click
72. This chapter first describes how to create and second how to adjust the view of the plot 13 2 1 Create dot plots A dot plot is a simple yet intuitive way of comparing two sequences either DNA or protein and is probably the oldest way of comparing two sequences Maizel and Lenk 1981 A dot plot is a 2 dimensional matrix where each axis of the plot represents one sequence By sliding a fixed size window over the sequences and making a sequence match by a dot in the matrix a diagonal line will emerge if two identical or very homologous sequences are plotted against each other Dot plots can also be used to visually inspect sequences for direct or inverted repeats or regions with low sequence complexity Various smoothing algorithms can be applied to the dot plot calculation to avoid noisy background of the plot Moreover can various substitution matrices be applied in order to take the evolutionary distance of the two sequences into account To create a dot plot Toolbox General Sequence Analyses Ga Create Dot Plot 4 or Select one or two sequences in the Navigation Area Toolbox in the Menu Bar General Sequence Analyses GA Create Dot Plot 4 CHAPTER 13 GENERAL SEQUENCE ANALYSES 202 or Select one or two sequences in the Navigation Area right click in the Navigation Area Toolbox General Sequence Analyses 9 Create Dot Plot 22 This opens the dialog shown in figure 13 3 g Create Dot Plot Es m 1 S
73. UniVec database can be found at http www ncbi nlm nih gov VecScreen replist html Hit limit Specifies how strictly vector contamination is trimmed Since vector contamination usually occurs at the beginning or end of a sequence different criteria are applied for terminal and internal matches A match is considered terminal if it is located within the first 25 bases at either sequence end Three match categories are defined according to the expected frequency of an alignment with the same score occurring between random sequences The CLC DNA Workbench uses the same settings as VecScreen http www ncbi nlm nih gov VecScreen VecScreen html x Weak Expect 1 random match in 40 queries of length 350 kb Terminal match with Score 16 to 18 Internal match with Score 23 to 24 x Moderate Expect 1 random match in 1 000 queries of length 350 kb Terminal match with Score 19 to 23 Internal match with Score 25 to 29 x Strong Expect 1 random match in 1 000 000 queries of length 350 kb Terminal match with Score gt 24 Internal match with Score gt 30 Note that selecting e g Weak will also include matches in the Moderate and Strong categories e Trim contamination from saved sequences This option lets you select your own vector sequences that you know might be the cause of contamination If you select this option you will be able to select one or more sequences when you click Next CHAPTER 17 SEQUENCING DATA ANALYSES AND AS
74. Whereas the distance based methods compress all sequence information into a single number the character based methods attempt to infer the phylogeny based on all the individual characters nucleotides or amino acids Parsimony In parsimony based methods a number of sites are defined which are informative about the topology of the tree Based on these the best topology is found by minimizing the number of substitutions needed to explain the informative sites Parsimony methods are not based on explicit evolutionary models Maximum Likelihood Maximum likelinood and Bayesian methods see below are probabilistic methods of inference Both have the pleasing properties of using explicit models of molecular evolution and allowing for rigorous statistical inference However both approaches are very computer intensive A stochastic model of molecular evolution is used to assign a probability likelinood to each phylogeny given the sequence data of the OTUs Maximum likelihood inference Felsenstein CHAPTER 20 PHYLOGENETIC TREES 374 1981 then consists of finding the tree which assign the highest probability to the data Bayesian inference The objective of Bayesian phylogenetic inference is not to infer a single correct phylogeny but rather to obtain the full posterior probability distribution of all possible phylogenies This is obtained by combining the likelihood and the prior probability distribution of evolutionary parameters The vast
75. a list which contains the sequences present in the cloning editor The inserted sequence remains on the list of sequences If the two sequences do not have blunt ends the ends overhangs have to match each other Otherwise a warning is displayed e Insert sequence before this sequence Insert another sequence before this sequence The sequence to be inserted can be selected from a list which contains the sequences present in the cloning editor The inserted sequence remains on the list of sequences If the two sequences do not have blunt ends the ends overhangs have to match each other Otherwise a warning is displayed e Reverse sequence Reverse the sequence and replaces the original sequence in the list This is sometimes useful when working with single stranded sequences Note that this is not the same as creating the reverse complement see the following item in the list e Reverse complement sequence 3 Creates the reverse complement of a sequence and replaces the original sequence in the list This is useful if the vector and the insert sequences are not oriented the same way e Digest Sequence with Selected Enzymes and Run on Gel ES See section 18 4 1 CHAPTER 18 CLONING AND CUTTING 315 e Rename sequence Renames the sequence e Select sequence This will select the entire sequence e Delete sequence amp This deletes the given sequence from the cloning editor e Open copy of sequencew 4 This will open a c
76. a selection on the negative strand Open selection in New View L By doing that the sequence will be reversed This is only possible when the double stranded view option is enabled It is possible to copy the selection and paste it in a word processing program or an e mail To obtain a reverse complement of an entire sequence select a sequence in the Navigation Area Toolbox in the Menu Bar Nucleotide Analyses GA Reverse Complement x or right click a sequence in Navigation Area Toolbox Nucleotide Analyses 4 Reverse Complement x This opens the dialog displayed in figure 14 3 a q Reverse Complement Sequence Es lal 1 Select nucleotide Eia ss ie ias sequences Projects Selected Elements 1 5 CLC_Data Xc ATPBai mRNA Example Data Xx ATP8al genomic sec Cloning Cloning vector liti Enzyme lists Xc pcDNA3 atp8al xx pcDNA4_TO Processed data i Cloning expe gt Primers SS Protein analyses Protein orthologs RNA secondary strui Sequencing data Q lt enter search term gt A previous puet Senh Xema Figure 14 3 Creating a reverse complement sequence If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Click Next if you wish to adjust how to handle the results see
77. acid The hydrophobicity score is then calculated as the sum of the values in a window which is a particular range of the sequence The window length can be set from 5 to 25 residues The wider the window the less fluctuations in the hydrophobicity scores For more about the theory behind hydrophobicity see 15 2 3 In the following we will focus on the different ways that CLC DNA Workbench offers to display the hydrophobicity scores We use Kyte Doolittle to explain the display of the scores but the different options are the same for all the scales Initially there are three options for displaying the hydrophobicity scores You can choose one two or all three options by selecting the boxes See figure 15 6 Coloring the letters and their background When choosing coloring of letters or coloring of their background the color red is used to indicate high scores of hydrophobicity A color slider allows you to amplify the scores thereby emphasizing areas with high or low blue levels of hydrophobicity The color settings mentioned are default settings By clicking the color bar just below the color slider you get the option of changing color settings Graphs along sequences When selecting graphs you choose to display the hydrophobicity scores underneath the sequence This can be done either by a line plot or bar plot or by coloring CHAPTER 15 PROTEIN ANALYSES 241 Atp8at wky ATP8a1 MPTMRRTVSEIRSRAEGYEKTDDVSEKTSLADQEEVR
78. acid could translate into several different codons only 20 amino acids but 64 different codons Thus the program offers a number of choices for determining which codons should be used These choices are explained in this section In order to make a reverse translation Reverse Translate or right click a protein sequence Toolbox Protein Analyses ha Reverse translate 3A This opens the dialog displayed in figure 15 8 If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements You can translate several protein sequences at a time Click Next to adjust the parameters for the translation CHAPTER 15 PROTEIN ANALYSES 244 E BB Reverse Translate E 1 Select protein sequences Select protein sequences Projects Selected Elements 1 CLC_Data Ss so ATP Sal Example Data H Cloning Primers Protein analyses Protein ortholog RNA secondary Sequencing data gt e T Qy lt enter search term gt Previous Finish x Cancel Figure 15 8 Choosing a protein sequence for reverse translation 15 3 1 Reverse translation parameters Figure 15 9 shows the choices for making the translation E q Reverse Translate 1 Select protein sequences pet pal ametrers 2 Set pa
79. acid distribution e Histogram of amino acid distribution e Annotation table e Counts of di peptides e Frequency of di peptides The output of nucleotide sequence statistics include e General statistics Sequence type Length Organism Name Description Modification Date Weight This is calculated like this swimunitsinsequence wetght unit links x weight H20 where links is the sequence length minus one for linear sequences and sequence length for circular molecules The units are monophosphates Both the weight for single and double stranded molecules are includes The atomic composition is defined the same way CHAPTER 13 GENERAL SEQUENCE ANALYSES 215 e Atomic composition Nucleotide distribution table Nucleotide distribution histogram e Annotation table e Counts of di nucleotides Frequency of di nucleotides A short description of the different areas of the statistical output is given in section 13 4 1 13 4 1 Bioinformatics explained Protein statistics Every protein holds specific and individual features which are unique to that particular protein Features such as isoelectric point or amino acid composition can reveal important information of a novel protein Many of the features described below are calculated in a simple way Molecular weight The molecular weight is the mass of a protein or molecule The molecular weight is simply calculated as the sum of the atomic mass of
80. all the atoms in the molecule The weight of a protein is usually represented in Daltons Da A calculation of the molecular weight of a protein does not usually include additional posttransla tional modifications For native and unknown proteins it tends to be difficult to assess whether posttranslational modifications such as glycosylations are present on the protein making a calculation based solely on the amino acid sequence inaccurate The molecular weight can be determined very accurately by mass spectrometry in a laboratory Isoelectric point The isoelectric point pl of a protein is the pH where the proteins has no net charge The pl is calculated from the pKa values for 20 different amino acids At a pH below the pl the protein carries a positive charge whereas if the pH is above pl the proteins carry a negative charge In other words pl is high for basic proteins and low for acidic proteins This information can be used in the laboratory when running electrophoretic gels Here the proteins can be separated based on their isoelectric point Aliphatic index The aliphatic index of a protein is a measure of the relative volume occupied by aliphatic side chain of the following amino acids alanine valine leucine and isoleucine An increase in the aliphatic index increases the thermostability of globular proteins The index is calculated by the following formula Aliphaticindex X Ala ax X Val 6 X Leu 6 X Ile
81. an open source program and anyone can download and change the program code This has also given rise to a number of BLAST derivatives WU BLAST is probably the most commonly used Altschul and Gish 1996 CHAPTER 12 BLAST SEARCH 190 BLAST is highly scalable and comes in a number of different computer platform configurations which makes usage on both small desktop computers and large computer clusters possible 12 5 1 Examples of BLAST usage BLAST can be used for a lot of different purposes A few of them are mentioned below e Looking for species If you are sequencing DNA from unknown species BLAST may help identify the correct species or homologous species e Looking for domains If you BLAST a protein sequence or a translated nucleotide sequence BLAST will look for Known domains in the query sequence e Looking at phylogeny You can use the BLAST web pages to generate a phylogenetic tree of the BLAST result e Mapping DNA to a known chromosome If you are sequencing a gene from a known species but have no idea of the chromosome location BLAST can help you BLAST will show you the position of the query sequence in relation to the hit sequences e Annotations BLAST can also be used to map annotations from one organism to another or look for common genes in two related species 12 5 2 Searching for homology Most research projects involving sequencing of either DNA or protein have a requirement for obtaining biological informa
82. analysis 16 7 1 TaqMan output table In TaqMan mode there are two primers and a probe in a given solution forward primer F reverse primer R and a TaqMan probe TP The output table can show primer probe pair combination parameters for all three combinations of primers and single primer parameters for both primers and the TaqMan probe see section on CHAPTER 16 PRIMERS 264 Standard PCR for an explanation of the available primer pair and single primer information The fragment length in this mode refers to the length of the PCR fragment generated by the primer pair and this is also the PCR fragment which can be exported 16 8 Sequencing primers This mode is used to design primers for DNA sequencing In this mode the user can define a number of Forward primer regions and Reverse primer regions where a sequencing primer can start These are defined by making a selection on the sequence and right clicking the selection If areas are known where primers must not bind e g repeat rich areas one or more No primers here regions can be defined No requirements are instated on the relative position of the regions defined After exploring the available primers See section 16 3 and setting the desired parameter values in the Primer Parameters preference group the Calculate button will activate the primer design algorithm After pressing the Calculate button a dialog will appear see figure 16 11 q Calculation parameters Ch
83. any problems please contact The CLC Support Team Figure 1 11 Read the license agreement carefully 1 4 3 Import a license from a file lf you are provided a license file instead of a license ID you will be able to import the file using this option When you have clicked Next you will see the dialog shown in 1 12 License Wizard xe d CLC DNA Workbench Import a license from a file Please click the button below and locate the file containing your license No file selected Choose License File If you experience any problems please contact The CLC Support Team Proxy Settings Previous Next Quit Workbench Figure 1 12 Selecting a license file Click the Choose License File button and browse to find the license file provided by CLC bio When you have selected the file click Next Accepting the license agreement Regardless of which option you chose above you will now see the dialog shown in figure 1 13 Please read the License agreement carefully before clicking I accept these terms and Finish 1 4 4 Upgrade license If you already have used a previous version of CLC DNA Workbench and you are entitled to upgrading to the new CLC DNA Workbench 6 6 select this option to get a license upgrade When you click Next the workbench will search for a previous installation of CLC DNA Workbench It will then locate the old license CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 22
84. author and provider of the work You may not use this work for commercial purposes You may not alter transform nor build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents 13 5 Join sequences CLC DNA Workbench can join several nucleotide or protein sequences into one sequence This feature can for example be used to construct supergenes for phylogenetic inference by joining several disjoint genes into one Note that wnen sequences are joined all their annotations are carried over to the new spliced sequence Two or more sequences can be joined by select sequences to join Toolbox in the Menu Bar General Sequence Analyses Join sequences 258 or select sequences to join right click any selected sequence Toolbox General Sequence Analyses Join sequences 58 This opens the dialog shown in figure 13 17 If you have selected some sequences before choosing the Toolbox action they are now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences from the selected elements Click Next opens the dialog shown in figure 13 18 In step 2 you can change the order in which the sequences will be joined Select a sequence and use the arrows to move the selected sequence up or down CHAPTER 13 GENERAL SEQUENCE ANALYSES 219 f g Join Sequences ES 1 Select sequences of same Seet se
85. be able to download your license as a file and import in the next step Ifyou experience any problems please contact The CLC Support Team Proxy Settings Previous Next Que workbench Figure 1 2 Choosing between direct download or download web page e Go to license download web page The workbench will open a Web Browser with the License Download web page when you click Next From there you will be able to download your license as a file and import it This option allows you to get a license even though the Workbench does not have direct access to the CLC Licenses Service If you select the first option and it turns out that you do not have internet access from the Workbench because of a firewall proxy server etc you will be able to click Previous and use the other option instead Direct download Selecting the first option takes you to the dialog shown in figure 1 5 License Wizard 83 d CLC DNA Workbench Requesting a license Requesting and downloading an evaluation license by establishing a direct connection to the CLC bio License Web Service An Evaluation License was successfully downloaded The License is valid until 2008 07 03 If you experience any problems please contact The CLC Support Team Proxy Settings Previous next Quit Workbench Figure 1 3 A license has been downloaded A progress for getting the license is shown and when the license is downloaded you
86. be just above the selected node Set root at this node defines the root of the tree to be at the selected node Toggle collapse collapses or expands the branches below the node Change label allows you to label or to change the existing label of a node Change branch label allows you to change the existing label of a branch You can also relocate leaves and branches in a tree or change the length It is possible to modify the text on the unit measurement at the bottom of the tree view by right clicking the text In this way you can specify a unit e g years Branch lengths are given in terms of expected numbers of substitutions per site Note To drag branches of a tree you must first click the node one time and then click the node again and this time hold the mouse button In order to change the representation CHAPTER 20 PHYLOGENETIC TREES 371 e Rearrange leaves and branches by Select a leaf or branch Move it up and down Hint The mouse turns into an arrow pointing up and down e Change the length of a branch by Select a leaf or branch Press Ctrl Move left and right Hint The mouse turns into an arrow pointing left and right Alter the preferences in the Side Panel for changing the presentation of the tree 20 2 Bioinformatics explained phylogenetics Phylogenetics describes the taxonomical classification of organisms based on their evolutionary history i e their phylogeny Phylogenetics is therefore an
87. but the user can change to a more detailed mode in the Primer information preference group The number of information lines reflects the chosen length interval for primers and probes In the compact information mode one line is shown for every possible primer length and each of these lines contain information regarding all possible primers of the given length At each potential primer starting position a circular information point is shown which indicates whether the primer fulfills the requirements set in the primer parameters preference group A green circle indicates a primer which fulfils all criteria and a red circle indicates a primer which fails to meet one or more of the set criteria For more detailed information place the mouse cursor over the circle representing the primer of interest A tool tip will then appear on screen displaying detailed information about the primer in relation to the set criteria To locate the primer on the sequence simply left click the circle using the mouse The various primer parameters can now be varied to explore their effect and the view area will dynamically update to reflect this allowing for a high degree of interactivity in the primer design process After having explored the potential primers the user may have found a satisfactory primer and choose to export this directly from the view area using a mouse right click on the primers information point This does not allow for any design information to e
88. choose where to export to choose GenBank gbk format enter name the new file Save Export of dependent elements When exporting e g an alignment CLC DNA Workbench can export the alignment including all the sequences that were used to create it This way when sending your alignment with the dependent sequences your colleagues can reproduce your findings with adjusted parameters if desired To export with dependent files select the element in Navigation Area File in Menu Bar Export with Dependent Elements enter name of of the new file choose where to export to Save The result is a folder containing the exported file with dependent elements stored automatically in a folder on the desired location of your desk Export history To export an element s history select the element in Navigation Area Export ES select History PDF pdf choose where to export to Save The entire history of the element is then exported in pdf format The CLC format CLC DNA Workbench keeps all bioinformatic data in the CLC format Compared to other formats the CLC format contains more information about the object like its history and comments The CLC format is also able to hold several elements of different types e g an alignment a graph and a phylogenetic tree This means that if you are exporting your data to another CLC Workbench you can use the CLC format to export several elements in one file and you will preserve all t
89. click Set as Parameter Prototype Note that the Workbench is validating a lot of the input and parameters when running in normal CHAPTER 9 BATCHING AND RESULT HANDLING 136 non batch mode When running in batch this validation is not performed and this means that some analyses will fail if combinations of input data and parameters are not right Therefore batching should only be used when the batch units are very homogenous in terms of the type and size of data 9 1 4 Running the analysis and organizing the results At the last dialog before clicking Finish it is only possible to use the Save option When a tool is run in batch mode it will place the result files in the same folder as the input files In the example shown in figure 9 3 the result of the two single sequences will be placed in the Cloning folder whereas the results for the Cloning vector library and Processed data runs will be placed inside these folders When the batch run is started there will be one master process representing the overall batch job and there will then be a separate process for each batch unit The behavior of this is different between Workbench and Server e When running the batch job in the Workbench only one batch unit is run at a time So when the first batch unit is done the second will be started and so on This is done in order to avoid many parallel analyses that would draw on the same compute resources and slow down the computer e Wh
90. counting instances of each nucleotide and then letting the majority decide the nucleotide in the contig In case of equality ACGT are given priority over one another in the stated order Unknown nucleotide N The contig will be assigned an N character in all positions with conflicts Ambiguity nucleotides R Y etc The contig will display an ambiguity nucleotide reflecting the different nucleotides found in the reads For an overview of ambiguity codes see Appendix Note that conflicts will always be highlighted no matter which of the options you choose Furthermore each conflict will be marked as annotation on the contig sequence and will be present if the contig sequence is extracted for further analysis As a result the details of any experimental heterogeneity can be maintained and used when the result of single sequence analyzes is interpreted Read more about conflicts in section 17 7 4 e Create full contigs including trace data This will create a contig where all the aligned reads are displayed below the contig sequence You can always extract the contig sequence without the reads later on For more information on how to use the contigs that are created see section 17 7 e Show tabular view of contigs A contig can be shown both in a graphical as well as a tabular view If you select this option a tabular view of the contig will also be opened Even if you do not select this option you can show the tabula
91. databases are available from a dedicated BLAST ftp site ftp ftp ncbi nlm nih gov blast db Moreover it is possible to download programs scripts from the same site enabling automatic download of changed BLAST databases Thus it is possible to schedule a nightly update of changed databases and have the updated BLAST database stored locally or on a shared network drive at all times Most BLAST databases on the NCBI site are updated on a daily basis to include all recent sequence submissions to GenBank A few commercial software packages are available for searching your own data The advantage of using a commercial program is obvious when BLAST is integrated with the existing tools of these programs Furthermore they let you perform BLAST searches and retain annotations on the query sequence see figure 12 22 It is also much easier to batch download a selection of hit sequences for further inspection CHAPTER 12 BLAST SEARCH 197 Intron 2 intron 2 CGTGGATCCTGAGAACTTCAGGGTGAGTCTATGGGACGCTTGATS CGTGGATCCTGAGAACTTCAGGGTGAGTC TGTGGATCCTGAGAACTTCAAGGTGAGTC TGOTGGATCCTGAGAACTTCAAGGTGAGTC TGOTGGATCCTGAGAACTTCAAGGTGAGT CGTGGACCCTGAGAACTTCCTGGTGAGT Figure 12 22 Snippet of alignment view of BLAST results from CLC Main Workbench Individual alignments are represented directly in a graphical view The top sequence is the query sequence and is shown with a selection of annotations 12 5 8 What you cannot get out of BLAST Don t expect BLAST
92. e To the left you see all the enzymes that are in the list select above If you have not chosen to use an existing enzyme list this panel shows all the enzymes available e To the right there is a list of the enzymes that will be used Select enzymes in the left side panel and add them to the right panel by double clicking or clicking the Add button E If you e g wish to use EcoRV and BamHI select these two enzymes and add them to the right side panel If you wish to use all the enzymes in the list Click in the panel to the left press Ctrl A 38 A on Mac Add gt The enzymes can be sorted by clicking the column headings i e Name Overhang Methylation or Popularity This is particularly useful if you wish to use enzymes which produce e g a 3 overhang In this case you can sort the list by clicking the Overhang column heading and all the enzymes producing 3 overhangs will be listed together for easy selection When looking for a specific enzyme it is easier to use the Filter If you wish to find e g Hindlll sites simply type Hindlll into the filter and the list of enzymes will shrink automatically to only include the Hindlll enzyme This can also be used to only show enzymes producing e g a 3 overhang as shown in figure 18 51 If you need more detailed information and filtering of the enzymes either place your mouse cursor on an enzyme for one second to display additional information see figure 18 52 o
93. eas data overview Secondary peak calling Multiplexing based on barcode or name 376 DNA RNA E E E E E E DNA RNA E E E E z E E E E Main Main E Genomics Genomics E APPENDIX A COMPARISON OF WORKBENCHES AND THE VIEWER Next generation Sequencing Data Analysis Viewer Protein DNA RNA Import of 454 Illumina Genome Analyzer SOLID and Helicos data Reference assembly of human size genomes De novo assembly SNP DIP detection Graphical display of large contigs Support for mixed data assembly Paired data support RNA Seq analysis Expression profiling by tags ChIP Seq analysis Expression Analysis Viewer Protein DNA RNA Import of Illumina BeadChip Affymetrix GEO data Import of Gene Ontology annotation files Import of Custom expression data table and Custom annotation files Multigroup comparisons Advanced plots scatter plot volcano plot box plot and MA plot Hierarchical clustering Statistical analysis on count based and gaus Sian data Annotation tests Principal component analysis PCA Hierarchical clustering and heat maps Analysis of RNA Seq Tag profiling samples Molecular cloning Viewer Protein DNA RNA Advanced molecular cloning E Graphical display of in silico cloning E Advanced sequence manipulation 2 Database searches Viewer Protein DNA RNA GenBank Entrez searches E E UniProt searches Swiss Prot TrEMBL Web based sequence search using BLAST BLAST on local database Cre
94. existing enzyme list Popular enzymes v EB Enzymes in Popular en Enzymes to be used Filter Filter Name Overhang Methylat Popul Name Overhang Methyla Pop PstI tgca S N6 met te KpnI gtac 5 N met Sacl agct 5 S meth SphI catg ie Apal gace 5 5 meth Ball nnn 5 N4 met k Chal gate Fokl lt NA gt 3 N met Hhal cg 5 5 meth NsiI tgca SacII gc 5 5 meth Figure 18 36 Selecting enzymes lf you need more detailed information and filtering of the enzymes either place your mouse cursor on an enzyme for one second to display additional information see figure 18 52 or use the view of enzyme lists see 18 5 All enzymes Filter 3 Name Overh Methyl Pop PstI 3 N6 meth eer la KpnI 3 N6 meth ee Sacl 3 S methyl ee SphI 3 preso Apal 3 S methyl tt SaclII 3 5 methyl Nsil Enzyme Sacll Chal Recognition site pattern CCGCGG Ball Suppliers GE Healthcare Hhal Qbiogene cml American Allied Biochemical Inc Dralll Nippon Gene Co Ltd Takara Bio Inc BanlI New England Biolabs Toyobo Biochemicals Molecular Biology Resources Promega Corporation EURx Ltd Figure 18 37 Showing additional information about an enzyme like recognition sequence or a list of commercial vendors Clic
95. facto standard scoring matrix for a wide range of alignment programs It is the default matrix in BLAST Calculate your own PAM matrix hile Fem DiGi note ice nL Loos pam hem BLOKS database hte blocks 2 ere oro NCBI help site http www ncbi nlm nih gov Education BLASTinfo Scoring2 html Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in its original form and CLC bio has to be clearly labeled as author and provider of the work You may not use this work for commercial purposes You may not alter transform nor build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents 13 3 Local complexity plot In CLC DNA Workbench it is possible to calculate local complexity for both DNA and protein sequences The local complexity is a measure of the diversity in the composition of amino acids within a given range window of the sequence The K2 algorithm is used for calculating local complexity Wootton and Federhen 1993 To conduct a complexity calculation do the following Select sequences in Navigation Area Toolbox in Menu Bar General Sequence Analyses GA Create Complexity Plot l This open
96. formatted BLAST databases available 186 12 3 2 Download NCBI pre formatted BLAST databases 186 12 3 3 Create local BLAST databases 2 0 0 eee eee ees 187 12 4 Manage BLAST databases 0 08 eee eee eee ee 188 12 4 1 Migrating from a previous version of the Workbench 189 12 5 Bioinformatics explained BLAST 0 0088 ee ee een nena 189 12 5 1 Examples of BLAST Usage css we Ee wm Se we we 190 12 5 2 Searching for homology wc ee be a hw ee ae ee eee ee a A 190 12 5 3 How does BLAST work 0 0 ee ee ee ee ee 190 12 5 4 Which BLAST program should use 0 0208 192 12 5 5 Which BLAST options should change 193 12 5 6 Explanation of the BLAST output 2 44 2 044 oe ed ee ew ead eo ws 194 12 5 7 I want to BLAST against my own sequence database is this possible 196 12 5 8 What you cannot get out of BLAST 4 197 12 5 9 Other useful resources 0 0 0 eee eee eee we ee ee 197 CLC DNA Workbench offers to conduct BLAST searches on protein and DNA sequences In short a BLAST search identifies homologous sequences between your input query query Sequence and a database of sequences McGinnis and Madden 2004 BLAST Basic Local Alignment Search 173 CHAPTER 12 BLAST SEARCH 174 Tool identifies homologous sequences using a heuristic method which finds short matches between two sequences After initial match BLAST attempts to s
97. have selected to export the whole view if you have chosen to export the visible area only the graphics file will be on one page with no headers or footers 7 3 4 Exporting protein reports It is possible to export a protein report using the normal Export function E which will generate a pdf file with a table of contents Click the report in the Navigation Area Export ES in the Toolbar select pdf You can also choose to export a protein report using the Export graphics function this way you will not get the table of contents but in 1 4 Export graph data points to a file Data points for graphs displayed along the sequence or along an alignment mapping or BLAST result can be exported to a semicolon separated text file csv format An example of such a graph is shown in figure 7 14 This graph shows the coverage of reads of a read mapping produced with CLC Genomics Workbench To export the data points for the graph right click the graph and choose Export Graph to Comma separated File Depending on what kind of graph you have selected different options CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 129 SSS SS eT NC_000003 iACCATTCGATGATTGCATTCAATTCATTCGATGACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC Consensus iACCATTCGATGATTGCATTCAATTCATTCGATGACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC 3388 Coverage awe A GA LU A ONAT aes AA 8 1205 1326 1 TGACGATTCCATTCAATTCCGTTCAATGATTCCATTHEGATTC 1 2 413 1273 2 TGACGATTCCA
98. in the Menu Bar Paste 71 If there is already an element of that name the pasted element will be renamed by appending a number at the end of the name Elements can also be moved instead of copied This is done with the cut paste function select the files to cut right click one of the selected files Cut right click the location to insert files into Paste C gt or select the files to cut Ctrl X 38 X on Mac select where to insert files Ctrl V 3 V on Mac When you have cut the element it is greyed out until you activate the paste function If you change your mind you can revert the cut command by copying another element Note that if you move data between locations the original data is kept This means that you are essentially doing a copy instead of a move operation Move using drag and drop Using drag and drop in the Navigation Area as well as in general is a four step process click the element click on the element again and hold left mouse button drag the element to the desired location let go of mouse button This allows you to e Move elements between different folders in the Navigation Area e Drag from the Navigation Area to the View Area A new view is opened in an existing View Area if the element is dragged from the Navigation Area and dropped next to the tab s in that View Area e Drag from the View Area to the Navigation Area The element e g a sequence alignment sea
99. in the Vector NTI Local Database which can be accessed through Vector NTI Explorer This is described in the first section below e Your data is stored as single files on your computer just like Word documents etc This is described in the second section below Import from the Vector NTI Local Database If your Vector NTI data are stored in a Vector NTI Local Database as the one shown in figure 7 2 you can import all the data in one step or you can import selected parts of it Importing the entire database in one step From the Workbench there is a direct import of the whole database see figure 3 CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 120 A al Exploring Local Vector NTI Database oO ss Table Edit View Analyses Align Database Assemble Tools Help Tl DNA RNA Molecules X amp iBAalta SIE All Subsets All database DNA RNA Molecules E DNA RNA Molecules MAIN alll Invitrogen vectors xz ADCY7 6196 Linear Basic NCBI Entrez NCBI uc Adeno2 35937 Linear Basic NCBI Entrez NCBI x ADRA1A 2306 Linear Basic NCBI Entrez NCBI BaculoDirect Linear DNA 139370 Linear Basic Invitrogen Invitr BaculoDirect Linear DNA Clonin 5770 Linear Construc Invitrogen Invitr 33 BPV1 7945 Circular Basic NCBI Entrez NCBI a BRAF 2510 Linear Basic NCBI Entrez NCBI se CDK2 2226 Linear Basic NCBI Entrez NCBI 3 ColE1 6646 Circular Basic NCBI Entrez NCBI uc CREB1 2964 Linear Basic NCBI Entrez NCBI
100. information about the individual bands by hovering the mouse cursor on the band of interest This will display a tool tip with the following information e Fragment length e Fragment region on the original sequence e Enzymes cutting at the left and right ends respectively CHAPTER 18 CLONING AND CUTTING 343 For gels comparing whole sequences you will see the sequence name and the length of the sequence Note You have to be in Selection or Pan mode in order to get this information It can be useful to add markers to the gel which enables you to compare the sizes of the bands This is done by clicking Show marker ladder in the Side Panel Markers can be entered into the text field separated by commas Modifying the layout The background of the lane and the colors of the bands can be changed in the Side Panel Click the colored box to display a dialog for picking a color The slider Scale band spread can be used to adjust the effective time of separation on the gel i e how much the bands will be spread over the lane In a real electrophoresis experiment this property will be determined by several factors including time of separation voltage and gel density You can also choose how many lanes should be displayed e Sequences in separate lanes This simulates that a gel is run for each sequence e All sequences in one lane This simulates that one gel is run for all Sequences You can also modify the layout of the vi
101. integral part of the science of systematics that aims to establish the phylogeny of organisms based on their characteristics Furthermore phylogenetics is central to evolutionary biology as a whole as it is the condensation of the overall paradigm of how life arose and developed on earth 20 2 1 The phylogenetic tree The evolutionary hypothesis of a phylogeny can be graphically represented by a phylogenetic tree Figure 20 5 shows a proposed phylogeny for the great apes Hominidae taken in part from Purvis Purvis 1995 The tree consists of a number of nodes also termed vertices and branches also termed edges These nodes can represent either an individual a species or a higher grouping and are thus broadly termed taxonomical units In this case the terminal nodes also called leaves or tips of the tree represent extant species of Hominidae and are the operational taxonomical units OTUs The internal nodes which here represent extinct common ancestors of the great apes are termed hypothetical taxonomical units since they are not directly observable Root node Branches edges Terminal nodes leaves Most recent common ancestor Operational Taxonomical Units Orangutan Human Pygmy chimpanzee Chimpanzee Gorilla Internal Node vertice Hypothetical Taxonomical Unit Figure 20 5 A proposed phylogeny of the great apes Hominidae Different components of the tree are marked see text for description The ordering of the
102. is relative to the overall number of sequence reads Foreground color Colors the letters using a gradient where the left side color is used for low coverage and the right side is used for maximum coverage Background color Colors the background of the letters using a gradient where the left side color is used for low coverage and the right side is used for maximum coverage Graph The coverage is displayed as a graph Learn how to export the data behind the graph in section 4 Height Specifies the height of the graph Type The graph can be displayed as Line plot Bar plot or as a Color bar Color box For Line and Bar plots the color of the plot can be set by clicking the color box If a Color bar is chosen the color box is replaced by a gradient color box as described under Foreground color e Residue coloring There is one additional parameter Sequence colors This option lets you use different colors for the reads Main The color of the consensus and reference sequence Black per default x Forward The color of forward reads single reads Green per default x Reverse The color of reverse reads single reads Red per default x Paired The color of paired reads Blue per default Note that reads from broken pairs are colored according to their Forward Reverse orientation or as a Non specific match but with a darker nuance than ordinary single reads Non specific matches When a read would have ma
103. license server in the dialog When you restart CLC DNA Workbench you will be asked for a license as described in section 1 4 1 4 6 Limited mode We have created the limited mode to prevent a situation where you are unable to access your data because you do not have a license When you run in limited mode a lot of the tools in the Workbench are not available but you still have access to your data also when stored in a CLC Bioinformatics Database When running in limited mode the functionality is equivalent to the CLC Sequence Viewer see section A To get out of the limited mode and run the Workbench normally restart the Workbench When you restart the Workbench will try to find a proper license and if it does it will start up normally If it can t find a license you will again have the option of running in limited mode 1 5 About CLC Workbenches In November 2005 CLC bio released two Workbenches CLC Free Workbench and CLC Protein Workbench CLC Protein Workbench is developed from the free version giving it the well tested user friendliness and look amp feel However the CLC Protein Workbench includes a range of more advanced analyses In March 2006 CLC DNA Workbench formerly CLC Gene Workbench and CLC Main Workbench were added to the product portfolio of CLC bio Like CLC Protein Workbench CLC DNA Workbench CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 28 builds on CLC Free Workbench It shares some of the advanced produc
104. matter if it is Zoomed in our out displays minimum 10 nucleotides on each line Fixed wrap Makes it possible to specify when the sequence should be wrapped In the text field below you can choose the number of residues to display on each line e Double stranded Shows both strands of a sequence only applies to DNA sequences e Numbers on sequences Shows residue positions along the sequence The starting point can be changed by setting the number in the field below If you set it to e g 101 the first residue will have the position of 100 This can also be done by right clicking an annotation and choosing Set Numbers Relative to This Annotation e Numbers on plus strand Whether to set the numbers relative to the positive or the negative strand in a nucleotide sequence only applies to DNA sequences e Follow selection When viewing the same sequence in two separate views Follow selection will automatically scroll the view in order to follow a selection made in the other view e Lock numbers When you scroll vertically the position numbers remain visible Only possible when the sequence is not wrapped e Lock labels When you scroll horizontally the label of the sequence remains visible e Sequence label Defines the label to the left of the sequence Name this is the default information to be shown Accession Sequences downloaded from databases like GenBank have an accession number Latin name Lati
105. me Bot readl gj Bee read Trace of reade sc length S60 low quality 88 medium quality 135 high quality 337 BE reads Xe read4 OE reads Figure 17 1 A tooltip displaying information about the quality of the chromatogram The qualities are based on the phred scoring system with scores below 19 counted as low quality scores between 20 and 39 counted as medium quality and those 40 and above counted as high quality If the trace file does not contain information about quality only the sequence length will be shown To view the trace data open the sequence read in a standard sequence view pz 17 1 1 Scaling traces The traces can be scaled by dragging the trace vertically as shown in figure figure 17 2 The Workbench automatically adjust the height of the traces to be readable but if the trace height varies a lot this manual scaling is very useful The height of the area available for showing traces can be adjusted in the Side Panel as described insection 17 1 2 T c A c G Cc T T G c Cc A ir eae trace data by dragging up and down Vw Figure 17 2 Grab the traces to scale 17 1 2 Trace settings in the Side Panel In the Nucleotide info preference group the display of trace data can be selected and unselected When selected the trace data information is shown as a plot beneath the sequence The appearance of the plot can be adjusted using the following options see figure 17 3 e Nucleotide trace For eac
106. new enzyme list es 1 Please choose enzymes MissAs Name Overhang Methylation Popularity Name Overhang Methylation Popularity HindIII 5 agct N6 methyl tee a EcoRV Blunt N6 methyl Smal Blunt N4 methyl EcoRI 5 aatt N6 methyl 40 Xbal 5 ctag N6 methyl SmaI Blunt N4 methyl Sall 5 tcga N6 methyl Sall 5 tega N6 methyl tt EcoRV Blunt N6 methyl PstI 3 tgca N6 methyl EcoRI 5 aatt N6 methyl tee N6 methyl tt BglII 5 gatc N4 methyl BglII 5 gatc N4 methyl et Xhol 5 tega N6 methyl Xbal 5 ctag N6 methyl 7e PstI 3 tgca N6 methyl HindIII 5 agct N6 methyl BamHI 5 gatc N4 methyl BamHI 5 gatc N4 methyl rt KpnI 3 gtac N6 methyl Ncol 5 catg N4 methyl et NotI 5 gacc N4 methyl SacI 3 aget 5 methyle NcoI 5 catg N4 methyl ee KpnI 3 gtac N6 methyl et Sacl 3 aget 5 methylc NotI 5 ggcc N4 methyl NdeI S ta N6 methyl teto Fa A a E ca A E _ a L ES j wf OK XX Cancel he Es Figure 18 50 Choosing enzymes for the new enzyme list At the top you can choose to Use existing enzyme list Clicking this option lets you select an enzyme list which is stored in the Navigation Area See section 18 5 for more about creating and modifying enzyme lists Below there are two panels
107. nodes determine the tree topology and describes how lineages have diverged over the course of evolution The branches of the tree represent the amount of evolutionary divergence between two nodes in the tree and can be based on different measurements A tree is completely specified by its topology and the set of all edge lengths The phylogenetic tree in figure 20 5 is rooted at the most recent common ancestor of all Hominidae species and therefore represents a hypothesis of the direction of evolution e g that CHAPTER 20 PHYLOGENETIC TREES 372 the common ancestor of gorilla chimpanzee and man existed before the common ancestor of chimpanzee and man In contrast an unrooted tree would represent relationships without assumptions about ancestry 20 2 2 Modern usage of phylogenies Besides evolutionary biology and systematics the inference of phylogenies is central to other areas of research As more and more genetic diversity is being revealed through the completion of multiple genomes an active area of research within bioinformatics is the development of comparative machine learning algorithms that can simultaneously process data from multiple species Siepel and Haussler 2004 Through the comparative approach valuable evolutionary information can be obtained about which amino acid substitutions are functionally tolerant to the organism and which are not This information can be used to identify substitutions that affect protein function
108. number of possible trees means that bayesian phylogenetics must be performed by approximative Monte Carlo based methods Larget and Simon 1999 Yang and Rannala 1997 20 2 4 Interpreting phylogenies Bootstrap values A popular way of evaluating the reliability of an inferred phylogenetic tree is bootstrap analysis The first step in a bootstrap analysis is to re sample the alignment columns with replacement l e In the re sampled alignment a given column in the original alignment may occur two or more times while some columns may not be represented in the new alignment at all The re sampled alignment represents an estimate of how a different set of sequences from the same genes and the same species may have evolved on the same tree If a new tree reconstruction on the re sampled alignment results in a tree similar to the original one this increases the confidence in the original tree If on the other hand the new tree looks very different it means that the inferred tree is unreliable By re sampling a number of times it is possibly to put reliability weights on each internal branch of the inferred tree If the data was bootstrapped a 100 times a bootstrap score of 100 means that the corresponding branch occurs in all 100 trees made from re sampled alignments Thus a high bootstrap score is a sign of greater reliability Other useful resources The Tree of Life web project http tolweb org Joseph Felsensteins list of phylogeny
109. of Serpa ameters type same 2 Set parameters Set order of concatenation top First IEE alignment 2 aj W IEE alignment 1 adn q Previous gt Next Enh XX cone o Figure 19 11 Selecting order of concatenation To adjust the order of concatenation click the name of one of the alignments and move it up or down using the arrow buttons Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish The result is seen in figure 19 12 CHAPTER 19 SEQUENCE ALIGNMENT 361 4 100 200 sequence A from alignment 1 110 sequence B from alignment 1 110 sequence A from alignment 2 7 sequence B from alignment 2 e 111 v Es a Figure 19 12 The joining of the alignments result in one alignment containing rows of sequences corresponding to the number of uniquely named sequences in the joined alignments 19 4 1 How alignments are joined Alignments are joined by considering the sequence names in the individual alignments If two sequences from different alignments have identical names they are considered to have the same origin and are thus joined Consider the joining of alignments A and B If a sequence named in A and B is found in both A and B the spliced alignment will contain a sequence named in A and B which represents the characters from A and B joined in direct extension of each other If a sequence with the name
110. of the residues using a gradient in the same way as described above Graph Displays the conservation level as a graph at the bottom of the alignment The bar default view show the conservation of all sequence positions The height of the graph reflects how conserved that particular position is in the alignment If one position is 100 conserved the graph will be shown in full height Learn how to export the data behind the graph in section 7 4 x Height Specifies the height of the graph x Type The type of the graph Line plot Displays the graph as a line plot Bar plot Displays the graph as a bar plot Colors Displays the graph as a color bar using a gradient like the foreground and background colors x Color box Specifies the color of the graph for line and bar plots and specifies a gradient for colors e Gap fraction Which fraction of the sequences in the alignment that have gaps The gap fraction is only relevant if there are gaps in the alignment Foreground color Colors the letter using a gradient where the left side color is used if there are relatively few gaps and the right side color is used if there are relatively many gaps Background color Sets a background color of the residues using a gradient in the same way as described above Graph Displays the gap fraction as a graph at the bottom of the alignment Learn how to export the data behind the graph in section 4 x Height Specifies the he
111. on how the data was created Also if you have performed an analysis and you want to reproduce the analysis on another element you can check the history of the analysis which will give you all parameters you set This chapter will describe how to use the History functionality of CLC DNA Workbench 8 1 Element history You can view the history of all elements in the Navigation Area except files that are opened in other programs e g Word and pdf files The history starts when the element appears for the first time in CLC DNA Workbench To view the history of an element Select the element in the Navigation Area Show 42 in the Toolbar History LR or If the element is already open History LR at the bottom left part of the view This opens a view that looks like the one in figure 8 1 When opening an element s history is opened the newest change is submitted in the top of the view The following information is available e Title The action that the user performed e Date and time Date and time for the operation The date and time are displayed according 131 CHAPTER 8 HISTORY LOG 132 Ch Reference contig NI ASI LIL cL Moved aligned region Wed Jan 21 10 40 45 CET 2009 User smoensted Parameters Read name Fuda Old aliqned region 159 966 New aligned region 37 900 Comments Edik gt Wo Comment Deleted selection Wed Jan 21 10 39 57 CET 2009 User smoensted Parameters
112. only the search parameters This means that you can easily conduct the same search later on when your data has changed 4 4 Search index This section has a technical focus and is not relevant if your search works fine However if you experience problems with your search results if you do not get the hits you expect it might be because of an index error The CLC DNA Workbench automatically maintains an index of all data in all locations in the Navigation Area If this index becomes out of sync with the data you will experience problems with strange results In this case you can rebuild the index Right click the relevant location Location Rebuild Index This will take a while depending on the size of your data At any time the process can be stopped in the process area see section 3 4 1 Chapter 5 User preferences and settings Contents 5 1 General preferences 080 ee eee ee 104 5 2 Default view preferences 0 00 ee eee eee 106 5 2 1 Number formatting in tables 5 2 ee ee ee eee 107 5 2 2 Import and export Side Panel settings 107 5 3 Datapreferences 000 ee eee ee 108 5 4 Advanced preferences 2 20 eee eee eee ee ee ee 109 5 4 1 Default data location 2 eee ee es 109 Oe NOPIBLASI area bw tee ede eee eed ee Seed wee 109 5 5 Export import of preferences 0 08 ee eee ee ee 109 5 5 1 The different options for export and importing 109
113. pA Select restriction sites in sequence view to define target vector and Fragments for cloning 400 Current Sequence as Fragment K Define target vector Perform Cloning ES HindIII 3 E Multiple cutters X Define Fragments to insert Bs Feapv ia mm BABE OLY Figure 18 3 Cloning editor There are essentially three ways of performing cloning in the CLC DNA Workbench The first is the most straight forward approach which is based on a simple model of selecting restriction sites for cutting out one or more fragments and defining how to open the vector to insert the fragments This is described as the cloning work flow below The second approach is unguided and more flexible and allows you to manually cut copy insert and replace parts of the sequences This approach is described under manual cloning below Finally the CLC DNA Workbench also supports Gateway cloning see section 18 2 18 1 2 The cloning work flow The cloning work flow is designed to support restriction cloning work flows through the following steps 1 Define one or more fragments 2 Define how the vector should be opened 3 Specify orientation and order of the fragment Defining fragments First select the sequence containing the cloning fragment in the list at the top of the view Next make sure the restriction enzyme you wish to use is listed in the Side Panel see section 18 3 1 To specify which part of the sequence should be tre
114. pages If you set the value to e g 2 the printed content will be broken up vertically and split across 2 pages Note It is a good idea to consider adjusting view settings e g Wrap for sequences in the Side Panel before printing As explained in the beginning of this chapter the printed material will look like the view on the screen and therefore these settings should also be considered when adjusting Page Setup CHAPTER 6 PRINTING 116 12 34 5 6 Figure 6 6 An example where Fit to pages horizontally is set to 2 and Fit to pages vertically is set to 3 6 2 1 Header and footer Click the Header Footer tab to edit the header and footer text By clicking in the text field for either Custom header text or Custom footer text you can access the auto formats for header footer text in Insert a caret position Click either Date View name or User name to include the auto format in the header footer text Click OK when you have adjusted the Page Setup The settings are saved so that you do not have to adjust them again next time you print You can also change the Page Setup from the File menu 6 3 Print preview The preview is shown in figure 6 7 Preview CLC Main Workbench 4 0 Es S UW w tw W Zoom 100 Figure 6 7 Print preview The Print preview window lets you see the layout of the pages that are printed Use the arrows in the toolbar to navigate between the pages Click Print lt 5 to sho
115. pairwise distances are the UPGMA and the Neighbor Joining algorithms Thus the first step in these analyses is to compute a matrix of pairwise distances between OTUs from their sequence differences To correct for multiple substitutions it is common to use distances corrected by a model of molecular evolution such as the Jukes Cantor model Jukes and Cantor 1969 UPGMA A simple but popular clustering algorithm for distance data is Unweighted Pair Group Method using Arithmetic averages UPGMA Michener and Sokal 1957 Sneath and Sokal 19 3 This method works by initially having all sequences in separate clusters and continuously joining these The tree is constructed by considering all initial clusters as leaf nodes in the tree and each time two clusters are joined a node is added to the tree as the parent of the two chosen nodes The clusters to be joined are chosen as those with minimal pairwise distance The branch lengths are set corresponding to the distance between clusters which is calculated CHAPTER 20 PHYLOGENETIC TREES 373 as the average distance between pairs of sequences in each cluster The algorithm assumes that the distance data has the so called molecular clock property i e the divergence of sequences occur at the same constant rate at all parts of the tree This means that the leaves of UPGMA trees all line up at the extant sequences and that a root is estimated as part of the procedure Arabidopsis thaliana Ara
116. pertaining to oligo pairs such as e g the oligo pair annealing score The ideal score for a solution is 100 and solutions are thus ranked in descending order Each parameter is assigned an ideal value and a tolerance Consider for example oligo self annealing here the ideal value of the annealing score is O and the tolerance corresponds to the maximum value specified in the side panel The contribution to the final score is determined by how much the parameter deviates from the ideal value and is scaled by the specified tolerance Hence a large deviation from the ideal and a small tolerance will give a large deduction in the final score and a small deviation from the ideal and a high tolerance will give a small deduction in the final score 16 2 Setting parameters for primers and probes The primer specific view options and settings are found in the Primer parameters preference group in the Side Panel to the right of the view see figure 16 3 Primer parameters Primer information Length no Show Min Compact Melk temp 20 Detailed Advanced parameters Mode Standard PCR TaqMan O Nested PCR Sequencing Calculate Figure 16 3 The two groups of primer parameters in the program the Primer information group is listed below the other group CHAPTER 16 PRIMERS 252 16 2 1 Primer Parameters In this preference group a number of criteria can be set which the selected primers must meet All the crit
117. primer which fulfils all criteria and a red primer indicates a primer which fails to meet one or more of the set criteria For more detailed information place the mouse cursor over the circle representing the primer of interest A tool tip will then appear on screen displaying detailed information about the primer in relation to the set criteria To locate the primer on the sequence simply left click the circle using the mouse The various primer parameters can now be varied to explore their effect and the view area will dynamically update to reflect this If e g the allowed melting temperature interval is widened more green circles will appear indicating that more primers now fulfill the set requirements and if e g a requirement for 3 G C content is selected rec circles will appear at the starting points of the primers which fail to meet this requirement 16 3 2 Detailed information mode In this mode a very detailed account is given of the properties of all the available primers When a region is chosen primer information will appear in groups of lines beneath it see figure 16 5 CHAPTER 16 PRIMERS 255 TT PERHSEC E TREE e Primer information A FERH3BEC GTGAGTCTGATGGGTETGE i 7 Show Tm L 18 Compact Detailed Tm L 19 GC contentiGich Melting temp Tr Self annealingisa Tm L 20 Self end annealing SE Tm L 21 Secondary structured 3 end ofc Tm L 22 Ss end ofc a lt
118. primers as a criteria in the design process see above The central part of the dialog contains parameters pertaining to primer pairs Here three parameters can be set CHAPTER 16 PRIMERS 259 e Maximum percentage point difference in G C content if this is set at e g 5 points a pair of primers with 45 and 49 G C nucleotides respectively will be allowed whereas a pair of primers with 45 and 51 G C nucleotides respectively will not be included e Maximal difference in melting temperature of primers in a pair the number of degrees Celsius that primers in a pair are all allowed to differ e Max hydrogen bonds between pairs the maximum number of hydrogen bonds allowed between the forward and the reverse primer in a primer pair e Max hydrogen bonds between pair ends the maximum number of hydrogen bonds allowed in the consecutive ends of the forward and the reverse primer in a primer pair e Maximum length of amplicon determines the maximum length of the PCR fragment 16 5 2 Standard PCR output table If only a single region is selected the following columns of information are available e Sequence the primer s sequence e Score measures how much the properties of the primer or primer pair deviates from the optimal solution in terms of the chosen parameters and tolerances The higher the score the better the solution The scale is from O to 100 e Region the interval of the template sequence covered by the pri
119. r mer Tapie Secring en eT Rows 100 Standard primers For pcDNA3 atp8al primers Filter All Show column Score Pair annealing align Fwd Rev Fragment length Sequence Fwd Melt temp Fwd Sequence Rev Melt temp Rev 71s core GGTGGGAGGTCTATATAA a Pair annealing Fwd Rev l 62 56 II tl 598 00 GGTGGGAGGTCTATATAA 48 572 GGAACTGAGAATAGAGGAA 49 094 Pair annealing align Fwd Rev AAGGAGATAAGAGTCAAGG Pair end annealing Fwd Rev GGTGGGAGGTCTATATAA 57 873 RL 598 00 GGTGGGAGGTCTATATAA 48 572 GGAACTGAGAATAGAGGA 49 566 AGGAGATAAGAGTCAAGG V Sequence Fwd Region Fwd Fragment length Fwd Rev GCGTGGATAGCGGTTTGA 55 921 pee 1 660 00 GCGTGGATAGCGGTTTGA 56 978 GAGGCTGGTTGATGAAGA 56 439 Self annealing Fwd AGAAGTAGTTGGTCGGAG Oy _ E E a 1H Self annealing alignment Fwd v Figure 16 6 Proposed primers In the preference panel of the table it is possible to customize which columns are shown in the table See the sections below on the different reaction types for a description of the available information The columns in the output table can be sorted by the present information For example the user can choose to sort the available primers by their score default or by their self annealing score simply by right clicking the column header The output table interacts with the accompanying primer editor such that when a proposed combination of primers and probes is selected in the table the primers and probe
120. regions can be defined It is required that the Forward primer region is located upstream of the Forward inner primer region that the Forward inner primer region is located upstream of the Reverse inner primer region and that the Reverse inner primer region is located upstream of the Reverse primer region In Nested PCR mode the Inner melting temperature menu in the Primer parameters panel is activated allowing the user to set a separate melting temperature interval for the inner and outer primer pairs After exploring the available primers see section 16 3 and setting the desired parameter values in the Primer parameters preference group the Calculate button will activate the primer design algorithm After pressing the Calculate button a dialog will appear see figure 16 9 The top and bottom parts of this dialog are identical to the Standard PCR dialog for designing primer pairs described above The central part of the dialog contains parameters pertaining to primer pairs and the comparison between the outer and the inner pair Here five options can be set e Maximum percentage point difference in G C content described above under Standard PCR this criteria is applied to both primer pairs independently CHAPTER 16 PRIMERS 261 q Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum GIC content Maximum melting temperature Minimum melting temperature Maximu
121. restriction enzyme analysis and functionalities for managing lists of restriction enzymes First after a brief introduction restriction cloning and general vector design is explained Next we describe how to do Gateway Cloning t Finally the general restriction site analyses are described Gateway is a registered trademark of Invitrogen Corporation 307 CHAPTER 18 CLONING AND CUTTING 308 18 1 Molecular cloning Molecular cloning is a very important tool in the quest to understand gene function and regulation Through molecular cloning it is possible to study individual genes in a controlled environment Using molecular cloning it is possible to build complete libraries of fragments of DNA inserted into appropriate cloning vectors The in silico cloning process in CLC DNA Workbench begins with the selection of sequences to be used Toolbox Cloning and Restriction Sites 5 Cloning G This will open a dialog where you can select the sequences containing the fragments you want to clone figure 18 1 1 Select fragments to done Select rac Navigation Area Selected Elements 1 2 Fragment ATP amp a1 ATP8al genomic sequence ATP8a1 mRNA Cloning H Cloning vector library HJ Enzyme lists WE poDNAS atp8al XE pcDNA4 TO EX Primers 0G ATPBal fwd DOC ATPBa 1 rev HJ Processed data Primers Protein analyses Protein orthologs RNA secondary structure Candiancina dots d nT
122. searching The size of the Subset created in the CLC software depends both on the number and size of the sequences To start a BLAST job to search your sequences against databases held at the NCBI Toolbox BLAST 5 NCBI BLAST i Alternatively use the keyboard shortcut Ctrl Shift B for Windows and 3 Shift B on Mac OS This opens the dialog seen in figure 12 2 e BLAST at NCBI 1 Select sequences of same _ Select sequences oF S type Navigation Area Selected Elements 1 EB CLC Data As ATP8al E3 Example Data Ne gt 5 Protein orthologs x ATP8al MRNA gt 5 Protein analyses gt tq Cloning X lt ATP8al genomic sequence gt 5 Sequencing data 5 Primers o fq RNA secondary structure D ma i gt Next X Cancel Figure 12 2 Choose one or more sequences to conduct a BLAST search with Select one or more sequences of the same type either DNA or protein and click Next In this dialog you choose which type of BLAST search to conduct and which database to search against See figure 12 3 The databases at the NCBI listed in the dropdown box will correspond to the query sequence type you have DNA or protein and the type of blast search you have chosen to run A complete list of these databases can be found in Appendix D Here you can also read how to add additional databases available the NCBI to the list provided in the dropdown menu e BLAST at NCBI 1 Select sequences of same _Set pa
123. second challenge is to find the optimal alignment given a scoring function For pairs of sequences this can be done by dynamic programming algorithms but for more than three sequences this approach demands too much computer time and memory to be feasible A commonly used approach is therefore to do progressive alignment Feng and Doolittle 1987 where multiple alignments are built through the successive construction of pairwise alignments These algorithms provide a good compromise between time spent and the quality of the resulting alignment Presently the most exciting development in multiple alignment methodology is the construction of statistical alignment algorithms Hein 2001 Hein et al 2000 These algorithms employ a scoring function which incorporates the underlying phylogeny and use an explicit stochastic model of molecular evolution which makes it possible to compare different solutions in a statistically rigorous way The optimization step however still relies on dynamic programming and practical use of these algorithms thus awaits further developments Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in its original form and CLC bio has to be clearly labeled as author and provider of t
124. section 9 2 If not click Finish This will open a new view in the View Area displaying the reverse complement of the selected sequence The new sequence is not saved automatically To save the sequence drag it into the Navigation Area or press Ctrl S S on Mac to activate a save dialog CHAPTER 14 NUCLEOTIDE ANALYSES 232 14 4 Reverse sequence CLC DNA Workbench is able to create the reverse of a nucleotide sequence By doing that a new sequence is created which also has all the annotations reversed since they now occupy the opposite strand of their previous location Note This is not the same as a reverse complement If you wish to create the reverse complement please refer to section 14 3 select a sequence in the Navigation Area Toolbox in the Menu Bar Nucleotide Analyses A Reverse Sequence x This opens the dialog displayed in figure 14 4 a q Reverse Sequence Es 1 Select either protein or EEM SS usa dos tis 0 e e 8a nucleotide sequences Projects Selected Elements 1 CLC Data xx ATP8al mRNA gt Example Data Xc ATP8al genomic AM fhs ATP8al Cloning Primers Protein analyses Protein orthologs RNA secondary s 55 Sequencing data gt Rm r Q nter search term gt seos oe La RE Figure 14 4 Reversing a sequence If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elem
125. selected Treat ambiguous characters as wildcards in sequence If you search for e g ATG you will find both ATG and ATN If you have large regions of Ns this option should not be selected Note that if you enter a position instead of a sequence it will automatically switch to position search e Annotation search Searches the annotations on the sequence The search is performed both on the labels of the annotations but also on the text appearing in the tooltip that you see when you keep the mouse cursor fixed If the search term is found the part of the sequence corresponding to the matching annotation is selected Below this option you can choose to search for translations as well Sequences annotated with coding regions often have the translation specified which can lead to undesired results e Position search Finds a specific position on the sequence In order to find an interval e g from position 500 to 570 enter 500 570 in the search field This will make a selection from position 500 to 570 both included Notice the two periods between the start an end number see section 10 3 2 If you enter positions including thousands separators like 123 345 the comma will just be ignored and it would be equivalent to entering 123345 CHAPTER 10 VIEWING AND EDITING SEQUENCES 148 e Include negative strand When searching the sequence for nucleotides or amino acids you can search on both strands e Name search Searches fo
126. sequence they will be Shown in the contig view as well This can be very convenient e g for Primer design TE If you wish to BLAST the consensus sequence simply select the whole contig for your BLAST search It will automatically extract the consensus sequence and perform the BLAST search In order to preserve the history of the changes you have made to the contig the contig itself Should be saved from the contig view using either the save button or by dragging it to the Navigation Area 17 7 6 Extract parts of a contig Sometimes it is useful to extract part of a contig for in depth analysis This could be the case if you have performed an assembly of several genes and you want to look at a particular gene or region in isolation This is possible through the right click menu of the reference or consensus sequence Select on the reference or consensus sequence the part of the contig to extract Right click Extract from Selection This will present the dialog shown in figure 17 23 The purpose of this dialog is to let you specify what kind of reads you want to include Per default all reads are included The options are Paired status Include intact paired reads When paired reads are placed within the paired dis tance specified they will fall into this category Per default these reads are colored in blue CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 302 g Open New Contig from Selection 1 5 gl E ct rea d 5 k O
127. show a print dialog See figure 6 1 In this dialog you can e Select which part of the view you want to print e Adjust Page Setup e See a print Preview window These three options are described in the three following sections 113 CHAPTER 6 PRINTING 114 a q Print Graphics zs Page Setup Parameters Orientation Portrait Paper Size A4 Horizontal Pagecount Not Applicable Vertical Pagecount Not Applicable Header Text Footer Text Show Pagenumber Yes Output Options Print visible area Print whole view X Cancel Help 23 Preview ED Page Setup Figure 6 1 The Print dialog 6 1 Selecting which part of the view to print In the print dialog you can choose to e Print visible area or e Print whole view These options are available for all views that can be zoomed in and out In figure 6 2 is a view of a circular sequence which is zoomed in so that you can only see a part of it pcDNA3 atp8a1 9118 bp e DE Ery amp HY Figure 6 2 A circular sequence as it looks on the screen When selecting Print visible area your print will reflect the part of the sequence that is visible in the view The result from printing the view from figure 6 2 and choosing Print visible area can be seen in figure 6 3 gt MV promoter T7 Promoter tp8a1 pcDNA3 atp8a1 9118 bp Figure 6 3 A print of the sequence selecting Print visible area On the other hand i
128. software http evolution genetics washington edu phylip software html Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in its original form and CLC bio has to be clearly labeled as author and provider of the work You may not use this work for commercial purposes You may not alter transform nor build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents Part IV Appendix Appendix A Comparison of workbenches and the viewer Below we list a number of functionalities that differ between CLC Workbenches and the CLC Sequence Viewer e CLC Sequence Viewer m e CLC Protein Workbench m e CLC DNA Workbench m e CLC RNA Workbench m e CLC Main Workbench m e CLC Genomics Workbench m Data handling Viewer Protein Add multiple locations to Navigation Area E Share data on network drive E Search all your data E Assembly of sequencing data Viewer Protein Advanced contig assembly Importing and viewing trace data Trim sequences Assemble without use of reference sequence Map to reference sequence Assemble to existing contig Viewing and edit contigs Tabular view of an assembled contig
129. ta Extra Nucleotide ES Assembly H Cloning 3 More data l Primer design F Restriction analysis Sequences Protein 3D structures Address C Documents and Settingsiclcuser CLC Data Y Go Folders x B CLC Data a 5 Example data CD Extra O Nucleotide O Assembly O Cloning O More data CD Primer design O Restriction analysis f B S HJ More data ao Protein c Sequences O 3D structures README O More data ER Recycle bin 0 O Sequences ca HHH E be Ee Figure 3 3 In this example the location called CLC_Data points to the folder at C Documents and settings clcuser CLC_Data Adding locations Per default there is one location in the Navigation Area called CLC_Data It points to the following folder e On Windows C Documents and settings lt username gt CLC_Data e On Mac CLC Data e On Linux homefolder CLC Data You can easily add more locations to the Navigation Area File New Location 1 73 This will bring up a dialog where you can navigate to the folder you wish to use as your new location see figure 3 4 When you click Open the new location is added to the Navigation Area as shown in figure 3 5 The name of the new location will be the name of the folder selected for the location To see where the folder is located on your computer place your mouse cursor on the location icon E for second This will show the path to the location Sharing da
130. the N terminal amino acid thus overall protein stability Bachmair et al 1986 Gonda et al 1989 Tobias et al 1991 The importance of the N terminal residues is generally known as the N end rule The N end rule and consequently the N terminal amino acid simply determines the half life of proteins The estimated half life of proteins have been investigated in mammals yeast and E coli see Table 13 2 If leucine is found N terminally in mammalian proteins the estimated half life is 5 5 hours Extinction coefficient This measure indicates how much light is absorbed by a protein at a particular wavelength The extinction coefficient is measured by UV spectrophotometry but can also be calculated The amino acid composition is important when calculating the extinction coefficient The extinction coefficient is calculated from the absorbance of cysteine tyrosine and tryptophan using the following equation Ext Protein count Cystine Eat Cystine count Tyr Eat Tyr count Trp Eat Trp CHAPTER 13 GENERAL SEQUENCE ANALYSES 217 where Ext is the extinction coefficient of amino acid in question At 280nm the extinction coefficients are Cys 120 Tyr 1280 and Trp 5690 This equation is only valid under the following conditions e pH 6 5 e 6 0 M guanidium hydrochloride e 0 02 M phosphate buffer The extinction coefficient values of the three important amino acids at different wavelengths are found in Gill and von Hippel
131. the algorithm will find a match if AC occurs in the beginning of the sequence The symbol restricts the search to the end of your sequence For example if you search through a sequence with the regular expression GT the algorithm will find a match if GT occurs in the end of the sequence Examples The expression ACG AC G 2 matches all strings of length 4 where the first character is A C or G and the second is any character except A C and the third and fourth character is G The expression G A matches all strings of length 3 in the end of your sequence where the first character is C the second any character and the third any character except A CHAPTER 13 GENERAL SEQUENCE ANALYSES 221 13 7 4 Create motif list CLC DNA Workbench offers advanced and versatile options to create lists of sequence patterns or known motifs represented either by a literal string or a regular expression A motif list is created from the Toolbox Toolbox General Sequence Analyses Create Motif List This will open an empty list where you can add motifs by clicking the Add button at the bottom of the view This will open a dialog shown in figure 13 28 Add motif o Simple E Java Prosite Name TATA box Press Shift Fi for options Description binding site of either transcription Factors or histones Ken Figure 13 28 Entering a new motif in the list In this dialog you can enter the following information e Nam
132. the locations defined in the BLAST database manager see section 12 4 e Add the location where your BLAST databases are stored using the BLAST database manager see section 12 4 See figure 12 14 12 3 2 Download NCBI pre formatted BLAST databases Many popular pre formatted databases are available for download from the NCBI You can download any of the databases available from the list at ftp ftp ncbi nih gov blast db from within your CLC DNA Workbench You must be connected to the internet to use this tool If you choose or Toolbox BLAST Download BLAST Databases amp a window like the one in figure 12 11 pops up showing you the list of databases available for download e Download BLAST Databases ize o Select download location home joeuser blastdbs F X cancel Figure 12 11 Choose from pre formatted BLAST databases at the NCBI available for download In this window you can see the names of the databases the date they were made available for download on the NCBI site the size of the files associated with that database and a brief description of each database You can also see whether the database has any dependencies This aspect is described below You can also specify which of your database locations you would like to store the files in Please see the Manage BLAST Databases section for more on this section 12 4 There are two very important things to note if you wish to tak
133. the option of choosing a translation table the start codons to use minimum ORF length as well as a few other parameters These choices are explained in this section To find open reading frames select a nucleotide sequence Toolbox in the Menu Bar Nucleotide Analyses Ga Find Open Reading Frames xx or right click a nucleotide sequence Toolbox Nucleotide Analyses A Find Open Reading Frames x This opens the dialog displayed in figure 14 7 If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements If you want to adjust the parameters for finding open reading frames click Next 14 6 1 Open reading frame parameters This opens the dialog displayed in figure 14 8 The adjustable parameters for the search are CHAPTER 14 NUCLEOTIDE ANALYSES 235 e Start codon a EB Find Open Reading Frames ES 1 Select nucleotide Projects Selected Elements 1 sequences CLC Data xx ATP8al genomic sequence HE Example Data xx XxX ATP8al mRNA Cloning Fe Primers Protein analyses Ej Protein orthologs 5 RNA secondary structure FJ Sequencing data 4 HE Qy zenter search term gt A s Previous Finish x Cancel Figure 14 7 Create Reading Frame dialog r q Find Open Reading Fra
134. the plot Dot plots are one of the oldest methods for comparing two sequences Maizel and Lenk 1981 The scores that are drawn on the plot are affected by several issues e Scoring matrix for distance correction Scoring matrices BLOSUM and PAM contain substitution scores for every combination of two amino acids Thus these matrices can only be used for dot plots of protein sequences e Window size The single residue comparison bit by bit comparison window size 1 in dot plots will undoubtedly result in a noisy background of the plot You can imagine that there are many successes in the comparison if you only have four possible residues like in nucleotide sequences Therefore you can set a window size which is smoothing the dot plot Instead of comparing single residues it compares subsequences of length set as window size The score is now calculated with respect to aligning the subsequences e Threshold The dot plot shows the calculated scores with colored threshold Hence you can better recognize the most important similarities Examples and interpretations of dot plots Contrary to simple sequence alignments dot plots can be a very useful tool for spotting various evolutionary events which may have happened to the sequences of interest CHAPTER 13 GENERAL SEQUENCE ANALYSES 205 Below is shown some examples of dot plots where sequence insertions low complexity regions inverted repeats etc can be identified visually Simil
135. the processes of character substitution insertion and deletion The input to multiple alignment algorithms is a number of homologous sequences i e sequences that share a common ancestor and most often also share molecular function The generated alignment is a table see figure 19 16 where each row corresponds to an input sequence and each column corresponds to a position in the alignment An individual column in this table represents residues that have all diverged from a common ancestral residue Gaps in the table commonly represented by a represent positions where residues have been inserted or deleted and thus do not have ancestral counterparts in all sequences 19 6 1 Use of multiple alignments Once a multiple alignment is constructed it can form the basis for a number of analyses e The phylogenetic relationship of the sequences can be investigated by tree building methods based on the alignment e Annotation of functional domains which may only be known for a subset of the sequences can be transferred to aligned positions in other un annotated sequences e Conserved regions in the alignment can be found which are prime candidates for holding functionally important sites e Comparative bioinformatical analysis can be performed to identify functionally important regions 19 6 2 Constructing multiple alignments Whereas the optimal solution to the pairwise alignment problem can be found in reasonable time the problem of cons
136. the server you can borrow a license Borrowing a license means that you take one of the floating licenses available on the server and borrow it for a specified amount of time During this time period there will be one less floating license available on the server At the point where you wish to borrow a license you have to be connected to the license server The procedure for borrowing is this Click Help License Manager to display the dialog shown in figure 1 22 Use the checkboxes to select the license s that you wish to borrow Select how long time you wish to borrow the license and click Borrow Licenses You can now go offline and work with CLC DNA Workbench Oo BRB WN EB When the borrow time period has elapsed you have to connect to the license server again to use CLC DNA Workbench 6 When the borrow time period has elapsed the license server will make the floating license available for other users Note that the time period is not the period of time that you actually use the Workbench Note When your organization s license server is installed license borrowing can be turned off In that case you will not be able to borrow licenses CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 26 No license available If all the licenses on the server are in use you will see a dialog as shown in figure 1 20 when you start the Workbench No valid license found X XX CLC Network Licensing The Following pro
137. the threshold of T limits the search space significantly 12 5 4 Which BLAST program should use Depending on the nature of the sequence it is possible to use different BLAST programs for the database search There are five versions of the BLAST program blastn blastp blastx tblastn tblastx Option Query Type DE Type Comparison noe Nucleotide Nucleotide Nucleotide Nucleotide blastp Protein Protein tblastn Protein Nucleotide Protein Protein ee database is translated into protein blastx Nucleotide Protein Protein Protein The queries are translated Bee Mesos jian Motema TS o aaea tblastx Nucleotide Nucleotide Protein Protein The queries and database are Bese Moonee tusome Peet rensleted into proton The most commonly used method is to BLAST a nucleotide sequence against a nucleotide database blastn or a protein sequence against a protein database blastp But often another BLAST program will produce more interesting hits E g if a nucleotide sequence is translated CHAPTER 12 BLAST SEARCH 193 before the search it is more likely to find better and more accurate hits than just a blastn search One of the reasons for this is that protein sequences are evolutionarily more conserved than nucleotide sequences Another good reason for translating the query sequence before the search is that you get protein hits which are likely to be annotated Thus you can directly see the protein function of the sequenced gene
138. titles can be edited simply by clicking with the mouse These changes will be saved when you Save the graph whereas the changes in the Side Panel need to be saved explicitly see section 5 6 For more information about the graph view please see section B Appendix C Working with tables Tables are used in a lot of places in the CLC DNA Workbench The contents of the tables are of course different depending on the context but there are some general features for all tables that will be explained in the following Figure C 1 shows an example of a typical table This is the table result of Find Open Reading Frames xx We will use this table as an example in the following to illustrate the concepts that are relevant for all kinds of tables Find reading Rows 169 Find reading Frame output Filter Po a Column width Found at strand Start codon positive AT negative MMM Show column negative TT positive Tac aah positive ACC End negative TAT Length negative AT E E CAC Found at strand positive AGG Start codon positive Baia eo postive TTG negative Estad Deselect All negative CT positive faia negative ala Figure C 1 A table showing open reading frames First of all the columns of the table are listed in the Side Panel to the right of the table By clicking the checkboxes you can hide show the columns in the table Furthermore you can sort the table by clicking on the column headers
139. to a license server Check this option if you wish to use the license server e Automatically detect license server By checking this option you do not have to enter more information to connect to the server e Manually specify license server There can be technical limitations which mean that the license server cannot be detected automatically and in this case you need to specify more options manually CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 25 d CLC DNA Workbench Configure License Server connection Please choose how you would like to connect to your CLC License server V Enable license server connection Automatically detect license server Manually specify license server Port Disable license borrowing IF you choose this option users of this computer will not be able to borrow licenses From the License Server If you experience any problems please contact The CLC Support Team Proxy Settings Previous Finish Cancel Figure 1 19 Connecting to a license server Host name Enter the address for the licenser server Port Specify which port to use e Disable license borrowing on this computer If you do not want users of the computer to borrow a license see section 1 4 5 you can check this option Borrow a license A floating license can only be used when you are connected to the license server If you wish to use the CLC DNA Workbench when you are not connected to
140. to obtain a license for your workbench Request an Evaluation License Choose this option if you would like to try out the application for 30 days Please note that only a single 30 day evaluation license will be allowed For each computer Download a License Choose this option if you have a License Order ID and would like to download a License Import a License from a File Choose this option if you have a License File on your computer and would like to import it Upgrade a license from an older Workbench Choose this option if you have an older version of this workbench with a commercial license and would like to upgrade your license Configure License Server Connection Choose this option if your company or institution is using a central CLC License Server This option also enables you to disable a license server connection if you experience any problems please contact The CLC Support Team uimiegide Frevous Crea Cout Figure 1 1 The license assistant showing you the options for getting started Gro settngs o o e Configure license server connection If your organization has a license server select this option to connect to the server Select an appropriate option and click Next If for some reason you don t have access to getting a license you can click the Limited Mode button see section 1 4 6 1 4 1 Request an evaluation license We offer a fully functional demo version of CLC DN
141. to use EcoRV and BamHI select these two enzymes and add them to the right side panel If you wish to use all the enzymes in the list Click in the panel to the left press Ctrl A 38 A on Mac Add gt The enzymes can be sorted by clicking the column headings i e Name Overhang Methylation or Popularity This is particularly useful if you wish to use enzymes which produce e g a 3 overhang In this case you can sort the list by clicking the Overhang column heading and all the enzymes producing 3 overhangs will be listed together for easy selection When looking for a specific enzyme it is easier to use the Filter If you wish to find e g Hindlll sites simply type Hindlll into the filter and the list of enzymes will shrink automatically to only include the Hindlll enzyme This can also be used to only show enzymes producing e g a 3 overhang as shown in figure 18 51 Restriction Site Analysis 1 Select DNA RNA tre o be conero FT cadet sequence s Enzyme list 2 Enzymes to be considered opr EA Te v Use existing enzyme list Popul v in calculation isting enzyme list Popular enzymes us Enzymes in Popular en Enzymes to be used Filter a Filter Name Overhang Methylat Popul Name Overhang Methyla Pop PstI taca 5 N met tee KpnI gtac 5 N met SacI agct 5 5S meth SphI catg oo ggcc 5 5 meth gt nnn 5 N4 met lt N amp gt 3
142. treated as gap extensions and any gaps past 10 are free End gaps as any other Gaps at the ends of sequences are treated like gaps in any other place in the sequences When aligning a long sequence with a short partial sequence it is ideal to use free end gaps since this will be the best approximation to the situation The many gaps inserted at the ends are not due to evolutionary events but rather to partial data Many homologous proteins have quite different ends often with large insertions or deletions This confuses alignment algorithms but using the Cheap end gaps option large gaps will generally be tolerated at the sequence ends improving the overall alignment This is the default setting of the algorithm Finally treating end gaps like any other gaps is the best option when you know that there are no biologically distinct effects at the ends of the sequences Figures 19 3 and 19 4 illustrate the differences between the different gap scores at the sequence ends 19 1 2 Fast or accurate alignment algorithm CLC DNA Workbench has two algorithms for calculating alignments e Fast less accurate This allows for use of an optimized alignment algorithm which is very fast The fast option is particularly useful for data sets with very long sequences e Slow very accurate This is the recommended choice unless you find the processing time too long CHAPTER 19 SEQUENCE ALIGNMENT 350 40 20 P49342 1 MNP TETRA MP WS
143. two options e Open This will open the result of the analysis in a view This is the default setting e Save This means that the result will not be opened but saved to a folder in the Navigation Area If you select this option click Next and you will see one more step where you can specify where to save the results See figure 9 6 In this step you also have the option of creating a new folder or adding a location by clicking the buttons w 5 at the top of the dialog CHAPTER 9 BATCHING AND RESULT HANDLING 138 a EB Convert DNA to RNA Eg 1 Select DNA sequences Savenfodr Eee 2 Result handling toa 49 3 Save in Folder Folder Update All CLC Data Example Data XxX ATP8al genomic sequence xx Sis ATPSal Cloning Primers Protein analyses Protein orthologs RNA secondary structure Sequencing data Qy lt enter search term gt Figure 9 6 Specify a folder for the results of the analysis 9 2 1 Table outputs Some analyses also generate a table with results and for these analyses the last step looks like figure 9 7 ci Find Open Reading Frames 1 Select nucleotide Result handling sequences 2 Set parameters 3 Result handling Output options V Add annotation to sequence Z Create table Result handling o Open Save Log handling Make log Q Previous gt Next XX Cancel Figure 9 7 Analyses which al
144. types Coverage j U LJ 1 7 0 j l gt Residue coloring Find Low Coverage Conflict Conflict b Sequence layout gt Annotation layout Alignment info Fw di gt Consensus gt Conservation Trace data gt Gap Fraction b Color different residues gt Sequence logo v Coverage rim Foreground color Background color Fw d2 oe Graph Height low v Trace data Line plot v b Paired ends distance gt Single paired ends reads gt Double matches rim Fw d3 Trace data Figure 2 15 An overview of the contig with the coverage graph This overview can be an aid in determining whether coverage is satisfactory and if not which regions a new sequencing effort should focus on Next we go into the details of the contig 2 5 4 Finding and editing conflicts Click Zoom to 100 4 to zoom in on the residues at the beginning of the contig Click the Find Conflict button at the top of the Side Panel or press the Space key to find the first position where there is disagreement between the reads see figure 2 16 In this example the first read has a T marked with a light pink background color whereas the second line has a gap In order to determine which of the reads we should trust we assess the quality of the read at this position A quick look at the regularity of the peaks of read Rev2 compared to Rev3 indicates that we should trust the Rev2 read In addition
145. view CHAPTER 15 PROTEIN ANALYSES 239 15 2 Hydrophobicity CLC DNA Workbench can calculate the hydrophobicity of protein sequences in different ways using different algorithms See section 15 2 3 Furthermore hydrophobicity of sequences can be displayed as hydrophobicity plots and as graphs along sequences In addition CLC DNA Workbench can calculate hydrophobicity for several sequences at the same time and for alignments 15 2 1 Hydrophobicity plot Displaying the hydrophobicity for a protein sequence in a plot is done in the following way select a protein sequence in Navigation Area Toolbox in the Menu Bar Protein Analyses lx Create Hydrophobicity Plot This opens a dialog The first step allows you to add or remove sequences Clicking Next takes you through to Step 2 which is displayed in figure 15 3 g Create Hydrophobicity Plot Es 1 Select protein sequences RSS Ss m 2 Set parameters Hydrophobicity scale 4 Kyte Doolittle 4 Eisenberg Engelman Hopp Woods Janin Rose Cornette Window size Number of residues must be odd 11 A Previous gt Next wf Einish X Cancel Figure 15 3 Step two in the Hydrophobicity Plot allows you to choose hydrophobicity scale and the window size The Window size is the width of the window where the hydrophobicity is calculated The wider the window the less volatile the graph You can chose from a numbe
146. will display additional information at the right side of the dialog This will also display a button Download and Install Click the plug in and press Download and Install A dialog displaying progress is now shown and the plug in is downloaded and installed If the plug in is not shown on the server and you have it on your computer e g if you have downloaded it from our web site you can install it by clicking the Install from File button at the bottom of the dialog This will open a dialog where you can browse for the plug in The plug in file Should be a file of the type cpa When you close the dialog you will be asked whether you wish to restart the CLC DNA Workbench The plug in will not be ready for use before you have restarted 1 7 2 Uninstalling plug ins Plug ins are uninstalled using the plug in manager Help in the Menu Bar Plug ins and Resources or Plug ins 5 in the Toolbar This will open the dialog shown in figure 1 25 The installed plug ins are shown in this dialog To uninstall CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 32 Manage Plug ins and Resources Manage Plug ins Download Plug ins Manage Resources Download Resources Additional Alignments O CLC bio support clcbio com Version 1 02 Perform alignments with many different programs From within the workbench ClustalW Windows Mac Linux Muscle Windows Mac Linux T Coffee Mac Linux MAFFT Mac Linux Kalign Mac Linux Ann
147. 000000000004 Lgt 22 00000000000000000000 000000000 0000004 Figure 2 35 Five lines of dots representing primer suggestions There is a line for each primer length 18bp through to 22 bp 2 1 2 Examining the primer suggestions Each line consists of a number of dots each representing the starting point of a possible primer E g the first dot on the first line primers of length 18 represents a primer starting at the dot s position and with a length of 18 nucleotides shown as the white area in figure 2 36 620 CTATTACCATGGTGATGCGGTTTTGGCAGTAC LER RE RRR EERE REE ERR ERR ERR RRR SS Figure 2 36 The first dot on line one represents the starting point of a primer that will anneal to the highlighted region Position the mouse cursor over a dot A box will appear providing data about this primer Clicking the dot will select the region where that primer would anneal See figure 2 37 CHAPTER 2 TUTORIALS 59 Forward primer region tt t tt 6 6 Primer covering postions 612 to 629 GC content 0 5 Melt temp 58 55 c Self annealing 10 Self end annealing 3 Secondary structure 9 ZITO m requirement not met Figure 2 37 Clicking the dot will select the corresponding primer region Hovering the cursor over the dot will bring up an information box containing details about that primer Note that some of the dots are colored red This indicates that the primer represented by this dot d
148. 02 ee eee Ae Woes tu ce eae eee ee eee eee Led Ted 4 3 Advanced Searels cane ie bh age eae SER REE Bee we RK A ee 4 4 Search index score dica s nd da tae bee eed SS E User preferences and settings 5 1 General preferences 0 00 ee ee ee ee 5 2 Default view preferences 00 eee ee ee ee aa 5 3 Data preferences 4 eam amp Xe sda ew RS CE eR eRe E 5 4 Advanced preferences 1 a a a 5 5 Export import OT preferences 28 425 0 Hwee ewe eee eR Rw A 5 6 View settings forthe Side Panel 2 ee a Printing 6 1 Selecting which part of the view to print 0 eee ee ee ee es Da POC CCD ee saanee te ee eee tw bs tae eee ds ee Dema E a 6 3 Print preview 6a kee bee Dew he ee RE Dee Re Hee Eee Sew A Import export of data and graphics fea Fe 1 3 14 Bioinformatic data formats External files Export graphics to files Export graph data points to a file 15 16 ff 84 91 93 94 95 98 98 99 101 103 104 104 106 108 109 109 110 113 114 115 116 CONTENTS 5 tes Copy paste view output a ao soa aoao dir SE ee a 130 8 History log 131 SL Element NOM es aere ee Se sedania ekidi paa ere we E EE E 131 9 Batching and result handling 133 Oe Beles serrar ee ee teen eee ee ec eae eee ewe eee 133 9 2 Howto handle results of analyses 1 ee a 136 Ill Bioinformatics 140 10 Viewing and editing sequences 141 10 1 View sequence noaoo a a a a a ra 1
149. 1 pixels 696 MB memory usage eee ore Xora Figure 7 12 Parameters for bitmap formats size of the graphics file You can adjust the size the resolution of the file to four standard sizes Screen resolution e Low resolution Medium resolution e High resolution CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 128 The actual size in pixels is displayed in parentheses An estimate of the memory usage for exporting the file is also shown If the image is to be used on computer screens only a low resolution is sufficient If the image is going to be used on printed material a higher resolution iS necessary to produce a good result Parameters for vector formats For pdf format clicking Next will display the dialog shown in figure 7 13 this is only the case if the graphics is using more than one page q Export Graphics Es 1 Output options iG sasas 2 Save in file 3 Page setup Page setup parameters Orientation Portrait Paper Size A4 Horizontal Pagecount Not Applicable Vertical Pagecount Not Applicable Header Text Footer Text Show Pagenumber Yes E Page Setup ener Si ee CXe Figure 7 13 Page setup parameters for vector formats The settings for the page setup are shown and clicking the Page Setup button will display a dialog where these settings can ba adjusted This dialog is described in section 6 2 The page setup is only available if you
150. 10 Gap extension cost 1 End gap cost As any other w Alignment O Fast less accurate Slow very accurate Redo alignments Use Fixpoints GOS Cees dm um Xena Figure 19 2 Adjusting alignment algorithm parameters CHAPTER 19 SEQUENCE ALIGNMENT 349 19 1 1 Gap costs The alignment algorithm has three parameters concerning gap costs Gap open cost Gap extension cost and End gap cost The precision of these parameters is to one place of decimal e Gap open cost The price for introducing gaps in an alignment e Gap extension cost The price for every extension past the initial gap If you expect a lot of small gaps in your alignment the Gap open cost should equal the Gap extension cost On the other hand if you expect few but large gaps the Gap open cost should be set significantly higher than the Gap extension cost However for most alignments it is a good idea to make the Gap open cost quite a bit higher than the Gap extension cost The default values are 10 0 and 1 0 for the two parameters respectively e End gap cost The price of gaps at the beginning or the end of the alignment One of the advantages of the CLC DNA Workbench alignment method is that it provides flexibility in the treatment of gaps at the ends of the sequences There are three possibilities Free end gaps Any number of gaps can be inserted in the ends of the sequences without any cost Cheap end gaps All end gaps are
151. 12 5 5 Which BLAST options should I change The NCBI BLAST web pages and the BLAST command line tool offer a number of different options which can be changed in order to obtain the best possible result Changing these parameters can have a great impact on the search result It is not the scope of this document to comment on all of the options available but merely the options which can be changed with a direct impact on the search result The E value The expect value E value can be changed in order to limit the number of hits to the most significant ones The lower the E value the better the hit The E value is dependent on the length of the query sequence and the size of the database For example an alignment obtaining an E value of 0 05 means that there is a 5 in 100 chance of occurring by chance alone E values are very dependent on the query sequence length and the database size Short identical sequence may have a high E value and may be regarded as false positive hits This is often seen if one searches for short primer regions small domain regions etc The default threshold for the E value on the BLAST web page is 10 Increasing this value will most likely generate more hits Below are some rules of thumb which can be used as a guide but should be considered with common sense e E value lt 10e 100 Identical sequences You will get long alignments across the entire query and hit sequence e 10e 100 lt E value lt 10e 50 Almost id
152. 13 A horizontal split screen The two views split the View Area Maximize Restore size of view The Maximize Restore View function allows you to see a view in maximized mode meaning a mode where no other views nor the Navigation Area is shown Maximizing a view can be done in the following ways select view Ctrl M or select view View Maximize restore View or select view right click the tab View Maximize restore View 1 or double click the tab of view The following restores the size of the view Ctrl M or View Maximize restore View or double click title of view CHAPTER 3 USER INTERFACE 90 agt PERDAS O ser 68225 O ae P6s053 O act P68046 P68225 VDEVGGEALI P68046 DEVGGEALGF PF68225 RLLVVYTPWTI Pest46 LLVVYPWT OF P68225 RFFESFGDL P8046 HEEE e i Figure 3 14 A vertical split screen 9 CLC Dna Workbench 3 0 Current workspace Default File Edit Search View Toolbox Workspace Help ME AO S SA CC ol EE ed ZO Show New Import Export Graphics Print Workspace Search Fit Width 100 Pan SEM Zoom In Zoom Out FEE protein align P68053 MHBTCEEKA aA TaABWcKYN WDENcc nTc 29 P68225 MMHBTPEEKN ANTTENGKEN gt a P68873 E ANTA Ewe KEN Sequence layout P68228 G AUHciEwsKEK es p68231 MMHBScCBEKN AWHcBWSKUK WDENccEAEc 30 jae Ines P6s063 MAWTABEKQ ENTCENcKEN MABCcCABABA 29 H P68945 MHWTABEKQ ElTcENcKEN BaADccABABaA 29 O No wrap Conse
153. 16 5210773 5210901 105 147 sa Hit start 3 06E 50 5212141 5212239 1 33 sa Hit end 1 05E 39 5232095 5232322 31 106 78 2 58E 31 5247257 5247484 31 106 75 _ Hit length J OF 20 aT E Es Query start Query end C Identity bad Figure 2 46 Placement of translated nucleotide sequence hits on the Human beta globin 1 000 5 203 500 5 204 000 5 204 500 5 205 000 HBB HBE HEB IT NC 000011 58 20 ti Figure 2 47 Human beta globin exon view e Use BLASTx e Use the protein sequence AAA16334 as database Using the genomic sequence as query the mapping of the protein sequence to the exons is visually very clear as shown in figure 2 48 In theory you could use the chromosome sequence as query but the performance would not be optimal it would take a long time and the computer might run out of memory In this example you have used well annotated sequences where you could have searched for the name of the gene instead of using BLAST However there are other situations where you CHAPTER 2 TUTORIALS 67 EE MC_000011 sel E gt 000011 selection _ _ IRD ID O reverse A IRD ID O reverse o IRD ID O reverse RD ID O reverse _ RD ID O reverse E lt rn ree m i gt El Op EE ME 00001 sel 000011 selection GG CAGACTTCTCCTCAGGAGTCAGATGCACCATGGTGTC RD ID O reverse Ala Ser Lys Glu Glu Pro Thr Leu His Val Met RD IDO reverse RD ID O reve
154. 1989 Knowing the extinction coefficient the absorbance optical density can be calculated using the following formula Ext Protei Absorbance Protein Ri fio RA olecular weig Two values are reported The first value is computed assuming that all cysteine residues appear as half cystines meaning they form di sulfide bridges to other cysteines The second number assumes that no di sulfide bonds are formed Atomic composition Amino acids are indeed very simple compounds All 20 amino acids consist of combinations of only five different atoms The atoms which can be found in these simple structures are Carbon Nitrogen Hydrogen Sulfur Oxygen The atomic composition of a protein can for example be used to calculate the precise molecular weight of the entire protein Total number of negatively charged residues Asp Glu At neutral pH the fraction of negatively charged residues provides information about the location of the protein Intracellular proteins tend to have a higher fraction of negatively charged residues than extracellular proteins Total number of positively charged residues Arg Lys At neutral pH nuclear proteins have a high relative percentage of positively charged amino acids Nuclear proteins often bind to the negatively charged DNA which may regulate gene expression or help to fold the DNA Nuclear proteins often have a low percentage of aromatic residues Andrade et al 1998 Amino acid distribution Am
155. 2 1 1 Creating a a folder When CLC DNA Workbench is started there is one element in the Navigation Area called CLC Data This element is a Location A location points to a folder on your computer where your data for use with CLC DNA Workbench is stored lf you have downloaded the example data this will be placed as a folder in CLC Data CHAPTER 2 TUTORIALS 38 g CLC Dna Workbench 3 0 Current workspace Default Sele File Edit Search View Toolbox Workspace Help FEAT BBs sis te ES viv olay ip eae E Ea PERRY X WD Show New Import Expor Paste Workspace Search Pan SPS Zoom In Zoom Out gt Example data 4 Nucleotide iH Protein gt Extra README i Recycle bin 1 a Alignments and Trees KA General Sequence Analyses KA Nucleotide Analyses ga Protein Analyses mA Sequencing Data Analyses al Primers and Probes tag Cloning and Restriction Sites BLAST Search 8h Database Search bl EE RE Processes Toolbox Ta Idle 1 elementis are selected Figure 2 1 The user interface as it looks when you start the program for the first time Windows version of CLC DNA Workbench The interface is similar for Mac and Linux The data in the location can be organized into folders Create a folder File New Folder H or Ctrl Shift N 36 Shift N on Mac Name the folder My folder and press Enter 2 1 2 Import data Next we want to import a sequence called HU
156. 3 GENERAL SEQUENCE ANALYSES 224 q L At the top select a motif list by clicking the Browse py button When the motif list is selected its motifs are listed in the panel in the left hand side of the dialog The right hand side panel contains the motifs that will be listed in the Side Panel when you click Finish 13 7 2 Motif search from the Toolbox The dynamic motifs described in section 13 7 1 provide a quick way of routinely scanning a sequence for commonly used motifs but in some cases a more systematic approach is needed The motif search in the Toolbox provides an option to search for motifs with a user specified similarity to the target sequence and furthermore the motifs found can be displayed in an overview table This is particularly useful when searching for motifs on many sequences To start the Toolbox motif search Toolbox General Sequence Analyses 171 Motif Search 4 Use the arrows to add or remove sequences or sequence lists from the selected elements You can perform the analysis on several DNA or several protein sequences at a time If the analysis is performed on several sequences at a time the method will search for patterns in the sequences and create an overview table of the motifs found in all sequences Click Next to adjust parameters see figure 13 26 q Motif Search Es 1 Select one or more sequences of same type 2 Set parameters Motif parameters Simple motif Java regular expressi
157. 4 Some enzymes cut the sequence twice for each recognition site and in this case the two cut positions are surrounded by parentheses Table of restriction fragments The restriction map can be shown as a table of fragments produced by cutting the sequence with the enzymes Click the Fragments button at the bottom of the view The table is shown in see figure 18 47 Each row in the table represents a fragment If more than one enzyme cuts in the same region or if an enzyme s recognition site is cut by another enzyme there will be a fragment for each of the possible cut combinations The following information is available for each fragment e Sequence The name of the sequence which is relevant if you have performed restriction map analysis on more than one sequence e Length The length of the fragment If there are overhangs of the fragment these are included in the length both 3 and 5 overhangs e Region The fragment s region on the original sequence Furthermore if this is the case you will see the names of the other enzymes in the Conflicting Enzymes column CHAPTER 18 CLONING AND CUTTING 340 EH Restriction m E Rows 9 Restriction Fragment table Filter Sequence Length Region overhangs Leftend Rightend Conflicting enzymes FERHABT 100 154 Tsol 100 151 133 151 146 184 179 196 Figure 18 47 The result of the restriction ana
158. 41 409 Chain Flexibility 242 Cornette 146 242 Eisenberg 146 242 Emini 146 Engelman GES 146 241 Hopp Woods 146 242 Janin 146 242 Karplus and Schulz 146 Kolaskar Tongaonkar 146 242 Kyte Doolittle 146 241 Rose 242 Surface Probability 242 Welling 146 242 ID license 19 Illumina Genome Analyzer 376 Import bioinformatic data 118 119 existing data 38 FASTA data 38 from a web page 119 list of formats 392 preferences 109 raw sequence 119 Side Panel Settings 107 using copy paste 119 In silico PCR 2 1 Index for searching 103 Infer Phylogenetic Tree 366 Information point primer design 250 Insert gaps 35 7 Insert restriction site 318 Installation 12 Invert sequence 232 Isoelectric point 215 Isoschizomers 334 IUPAC codes nucleotides 398 Join alignments 359 sequences 218 Jpg format export 126 Keywords 160 Label of sequence 142 Landscape Print orientation 115 INDEX Lasergene sequence file format 393 Latin name batch edit 84 Length 160 License 15 ID 19 starting without a license 27 License server 24 License server access offline 25 Limited mode 27 Links from annotations 158 Linux installation 14 installation with RPM package 15 List of restriction enzymes 343 List of sequences 163 Load enzyme list 330 Local BLAST 178 Local BLAST Database 187 Local BLAST database management 188 Local BLAST Databases 185 Local complexity plot
159. 41 W CKU DNA ee te ke ER eR EERE EERE RESET EERE OG 150 10 3 Working with annotations 2 0 2 ee ee 152 10 4 Element information 2 25 s 2 64b44 be eee aw bo aw ee ELES ee we 160 10 5 View as text eh ee ee ee ee Se eS oe eae ee eee ow we ae ee 161 10 6 Creating anew sequence 1 a a a a a a a ra 162 10 7 Sequence Lists 0 ce ra 163 11 Online database search 167 11 1 GenBank search 2 a a a 16 7 11 2 Sequence web info cc ese amp Bee Sw ee ew ewe we BEES we SEE E 170 12 BLAST Search 173 12L Ruming ELI ass s et ieee eee te ee eee ee ee ee eX 174 12 2 Output from BLAST searcheS 4 iin dace we Gud db td ae wou GSEs 180 12 3 Local BLAST databases 6444424 Sa wb ee ee Oe Se WO we Oe ee 185 12 4 Manage BLAST databases 4 5244 8 ke wee Ee ee ES SE HES wu A 188 12 5 Bioinformatics explained BLAST 00000 eee ee ee ee 189 13 General sequence analyses 199 13 1 Shuffle sequence 2t ieee a a kit beeGae ee Eee Lee eRe HO 199 13 2 WOUDIOG lt eo tues baw eee repor dd wo a ee a 201 13 3 Localcomplexity plot gia cuacevwiaceie ahaa we baw EEE ces 211 CONTENTS 13 4 13 5 13 6 ie Sequence statistics Join sequences Pattern Discovery Motif Search 14 Nucleotide analyses 14 1 14 2 14 3 14 4 14 5 14 6 Convert DNA to RNA Convert RNA to DNA Reverse complements of sequences 0 0 00 0 eee eee ee ee Reverse sequence Translat
160. 5 6 View settings for the Side Panel 2 2 2 eee ee ee ee 110 5 6 1 Floating Side Panel sussa be ol wee ae ewe ee we ee a 112 The first three sections in this chapter deal with the general preferences that can be set for CLC DNA Workbench using the Preferences dialog The next section explains how the settings in the Side Panel can be saved and applied to other views Finally you can learn how to import and export the preferences The Preferences dialog offers opportunities for changing the default settings for different features of the program The Preferences dialog is opened in one of the following ways and can be seen in figure 5 1 Edit Preferences 1 55 or Ctrl K 36 on Mac 5 1 General preferences The General preferences include 104 CHAPTER 5 USER PREFERENCES AND SETTINGS 105 EB Preferences e Undo limit 500 EE Audit Support Enable audit of manual sequence modifications Number of hits normal search 50 Number of hits NCBI Uniprot 50 Style English United States Show all dialogs with Never show this dialog again Show Dialogs Help Jf OK X Cancel Export Import Figure 5 1 Preferences include General preferences View preferences Colors preferences and Advanced settings Undo Limit As default the undo limit is set to 500 By writing a higher number in this field more actions can be undone Undo applies to all changes made
161. 6 Feng and Doolittle 1987 Feng D F and Doolittle R F 1987 Progressive sequence align ment as a prerequisite to correct phylogenetic trees J Mol Evol 25 4 351 360 Forsberg et al 2001 Forsberg R Oleksiewicz M B Petersen A M Hein J Botner A and Storgaard T 2001 A molecular clock dates the common ancestor of European type porcine reproductive and respiratory syndrome virus at more than 10 years before the emergence of disease Virology 289 2 1 74 179 Gill and von Hippel 1989 Gill S C and von Hippel P H 1989 Calculation of protein extinction coefficients from amino acid sequence data Anal Biochem 182 2 319 326 Gonda et al 1989 Gonda D K Bachmair A Wunning l Tobias J W Lane W S and Varshavsky A 1989 Universality and structure of the N end rule J Biol Chem 264 28 16700 16712 Guindon and Gascuel 2003 Guindon S and Gascuel O 2003 A Simple Fast and Accu rate Algorithm to Estimate Large Phylogenies by Maximum Likelihood Systematic Biology 52 5 696 704 Hasegawa et al 1985 Hasegawa M Kishino H and Yano T 1985 Dating of the human ape splitting by a molecular clock of mitochondrial DNA Journal of Molecular Evolution 22 2 160 174 Hein 2001 Hein J 2001 An algorithm for statistical alignment of sequences related by a binary tree In Pacific Symposium on Biocomputing page 179 Hein et al 2000 Hein J Wiuf C Knuds
162. 6 20 1 20 l tlA CTTTTCAAGG AGTATTTCCT ATGAACGAGT TAGACGGCAT evgA CATTGCAAAG GGAATAATCT ATGAACGCAA TAATTATTGA ypdl CATTTTCAGG ATAACTTTCT ATGAAAGTAA ACTTAATACT nrB GAAAAGAAAT CGAGGCAAAA ATGAGCAAAG TCAGACTCGC hmpA TGCAAAAAAA GGAAGACCAT ATGCTTGACG CTCAAACCAT narQ TTTTTGTGGA GAAGACGCGT GTGATTGTTA AACGACCCGT gtf GTTATTAAGG ATATGTTCAT ATGTTTTTCA AAAAGAACCT intS TACCCACCGG ATTTTTACCC ATGCTCACCG TTAAGCAGAT yidF AATCAAAATG GAATAAAATC ATGCTACCAT CTATTTCAAT dsdX ATCACAGGGG AAGGTGAGAT ATGCACTCTC AAATCTGGGT sunB ACATCCAGTG AGAGAGACCG ATGCATCCGA TGCTGAACAT Consensus AATTTAAAGG AGAATTACCT ATGAACGCAA TAATAAACAT Sequence Logo lt 8 RABG faha xea ASt ea 8 58 x efl Conservation i i o oa eta coca Figure 19 8 Ungapped sequence alignment of eleven E coli sequences defining a start codon The start codons start at position 1 Below the alignment is shown the corresponding sequence logo As seen a GTG start codon and the usual ATG start codons are present in the alignment This can also be visualized in the logo at position 1 Calculation of sequence logos A comprehensive walk through of the calculation of the information content in sequence logos is beyond the scope of this document but can be found in the original paper by Schneider and Stephens 1990 Nevertheless the conservation of every position is defined as Rse which is the difference between the maximal entropy Smar and the observed entropy for the residue distributi
163. 601 4 R 133 401 che 133 1615 Fragment length 133 1601 zer Other Fragments Fwd primer Melt temp C Rev primer Melt temp C Giff Melt temp Select All Deselect All Figure 16 21 A table showing all possible fragments of the specified size CHAPTER 16 PRIMERS 2 5 The table first lists the names of the forward and reverse primers then the length of the fragment and the region The last column tells if there are other possible fragments fulfilling the length criteria on this sequence This information can be used to check for competing products in the PCR In the Side Panel you can show information about melting temperature for the primers as well as the difference between melting temperatures You can use this table to browse the fragment regions If you make a split view of the table and the sequence see section 3 2 6 you can browse through the fragment regions by clicking in the table This will cause the sequence view to jump to the start position of the fragment There are some additional options in the fragment table First you can annotate the fragment on the original sequence This is done by right clicking Ctrl click on Mac the fragment and choose Annotate Fragment as shown in figure 16 22 Rows 7 Fragments Filter 1d Rey Fragment length Region v Other F imer 3 primer 2 1486 1575 3062 imer primer 5 HindIII ee 5 151 1615 imer 6 primer 5 Open Fragment 51 151 1601 imer 6 Ecok pr
164. 7 48E 66 243 817 1DXL B Chain B Hemoglobin Deos Mutant With Val 1 Replac 7 48E 66 243 817 Download and Open Download and Save Open at MEBI Open Structure l ES Of Figure 12 10 Display of the output of a BLAST search in the tabular view The hits can be sorted by the different columns simply by clicking the column heading Figure 12 10 is an example of a BLAST Table The BLAST Table includes the following information Query sequence The sequence which was used for the search Hit The Name of the sequences found in the BLAST search Id GenBank ID Description Text from NCBI describing the sequence E value Measure of quality of the match Higher E values indicate that BLAST found a less homologous sequence Score This shows the score of the local alignment generated through the BLAST search Bit score This shows the bit score of the local alignment generated through the BLAST search Bit scores are normalized which means that the bit scores from different alignments can be compared even if different scoring matrices have been used Hit start Shows the start position in the hit sequence Hit end Shows the end position in the hit sequence Hit length The length of the hit Query start Shows the start position in the query sequence Query end Shows the end position in the query sequence Overlap Display a percentage value for the overlap of the query sequence and hit sequence Only the length of the loca
165. 8al primers Filter All a Show column A Score Pair annealing align Fwd Rev Fragment length Sequence Fwd Melt temp Fwd Sequence Rev Melt temp Rev Fis core Pair annealing Fwd Rev Open Primer s Fwd Rev Pair annealing align Fwd Rev Save Primer s Fwd Rev Pair end annealing Fwd Rev Mark Primer Annotation on Sequence o F Z Fragment length Fwd Rev pen Fragment 48 572 GGAACTGAGAATAGAGGA 49 566 GGTGGGAGGTCTATATAA 57 873 It Ul AGGAGATAAGAGTCAAGG Op Y Figure 2 41 The options available in the right click menu Here Mark primer annotation on sequence has been chosen resulting in two annotations on the sequence above labeled Oligo Sequence Fwd as Save Fragment Workbench user manual linked to on this webpage http www clcbio com download 2 8 Tutorial BLAST search BLAST is an invaluable tool in bioinformatics It has become central to identification of homologues and similar sequences and can also be used for many other different purposes This tutorial takes you through the steps of running a blast search in CLC Workbenches If you plan to use blast for your research we highly recommend that you read further about it Understanding how blast works is key to setting up meaningful and efficient searches Suppose you are working with the ATP8a1 protein sequence which is a phospholipid transporting ATPase expressed in the adult ho
166. 92 and PAM Dayhoff and Schwartz 1978 CHAPTER 13 GENERAL SEQUENCE ANALYSES 209 oa PRPPPPCLOLOLPELA 4 a Ed PPPPPRPEIPEE EP LE OL p Ly V PFFF RN Fa i Pa Pi a a le tae te 1 Peete na r A j cw Yi N s Se as _ _ e AAAI a ua A ti hy Se e a dd N N sr A Ma B Mey k ia PERESELESEPLIPE Er re ee Figure 13 12 The dot plot A a low complexity region in the sequence The sequence is artificial and low complexity regions does not always show as a square Different scoring matrices PAM The first PAM matrix Point Accepted Mutation was published in 1978 by Dayhoff et al The PAM matrix was build through a global alignment of related sequences all having sequence similarity above 85 Dayhoff and Schwartz 1978 A PAM matrix shows the probability that any given amino acid will mutate into another in a given time interval As an example PAM1 gives that one amino acid out of a 100 will mutate in a given time interval In the other end of the scale a PAM256 matrix gives the probability of 256 mutations in a 100 amino acids see figure 13 13 There are some limitation to the PAM matrices which makes the BLOSUM matrices somewhat more attractive The dataset on which the initial PAM matrices were build is very old by now and the PAM matrices assume that all amino acids mutate at the same rate this is not a correct assumption
167. A MN HM AA O O O O O O O AO O AO AO wo oO E Figure 18 39 Enzymes with compatible ends At the top you can choose whether the enzymes considered should have an exact match or not Since a number of restriction enzymes have ambiguous cut patterns there will be variations in the resulting overhangs Choosing All matches you cannot be 100 sure that the overhang will match and you will need to inspect the sequence further afterwards We advice trying Exact match first and use All matches as an alternative if a satisfactory result cannot be achieved CHAPTER 18 CLONING AND CUTTING 335 At the bottom of the dialog the list of enzymes producing compatible overhangs is shown Use the arrows to add enzymes which will be displayed on the sequence which you press Finish When you have added the relevant enzymes click Finish and the enzymes will be added to the Side Panel and their cut sites displayed on the sequence 18 3 2 Restriction site analysis from the Toolbox Besides the dynamic restriction sites you can do a more elaborate restriction map analysis with more output format using the Toolbox Toolbox Cloning and Restriction Sites is Restriction Site Analysis This will display the dialog shown in figure 18 40 f g Restriction Site Analysis X m 1 Select DNA RNA E S sequence s Projects Selected Elements 1 CLC_Data x ATP8a1 mRNA gt Example Data XX ATP8al genomic sequence xx Clon
168. A Workbench to all users free of charge Each user is entitled to 30 days demo of CLC DNA Workbench If you need more time for evaluating another two weeks of demo can be requested We use the concept of quid quo pro The last two weeks of free demo time given to you is therefore accompanied by a short form questionnaire where you have the opportunity to give us feedback about the program The 30 days demo is offered for each major release of CLC DNA Workbench You will therefore have the opportunity to try the next major version when it is released If you purchase CLC DNA Workbench the first year of updates is included When you select to request an evaluation license you will see the dialog shown in figure 1 2 In this dialog there are two options e Direct download The workbench will attempt to contact the online CLC Licenses Service and download the license directly This method requires internet access from the workbench CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 1 7 License Wizard EA d CLC DNA Workbench Request an evaluation license Please choose how you would like to request an evaulation license Direct Download The workbench will attempt to contact the CLC Licenses Service and download the license directly This method requires internet access from the workbench Go to License Download web page The workbench will open a Web Browser with the License Download web page From there you will
169. A and subsequent a translation to proteins occur This is of course simplified but is in general what is happening in order to have a steady production of proteins needed for the survival of the cell In bioinformatics analysis of proteins it is sometimes useful to know the ancestral DNA sequence in order to find the genomic localization of the gene Thus the translation of proteins back to DNA RNA is of particular interest and is called reverse translation or back translation The Genetic Code In 1968 the Nobel Prize in Medicine was awarded to Robert W Holley Har Gobind Khorana and Marshall W Nirenberg for their interpretation of the Genetic Code http nobelprize org medicine laureates 1968 The Genetic Code represents translations of all 64 different codons into 20 different amino acids Therefore it is no problem to translate a DNA RNA sequence into a specific protein But due to the degeneracy of the genetic code several codons may code for only one specific amino acid This can be seen in the table below After the discovery of the genetic code it has been concluded that different organism and organelles have genetic codes which are different from the standard genetic code Moreover the amino acid alphabet is no longer limited to 20 amino acids The 21 st amino acid selenocysteine is encoded by an UGA codon which is normally a stop codon The discrimination of a selenocysteine over a stop codon is carried out by the translati
170. A4 TO sequence View Split Horizontally CHAPTER 2 TUTORIALS 99 Note that this can also be achieved by simply dragging the pcDNA4_TO sequence into the lower part of the open view Switch to the Circular view at the bottom of the view Zoom in 5 on the multiple cloning site downstream of the green CMV promoter annotation You should now have a view similar to the one shown in figure 2 24 sob ATPBal mRNA Pstl Sequence settings E jAtp8a1 hi P Pstl Smal PstI Atp8a1 EcoRV Bgill EcoRI EcoRV v Sequence layout st COR amH Stl BamHI Spacing No spacing X ATP8a1 MRNA No wrap Auto wrap Fixed wrap MOBEA SHeOY poonas TO a gt Annotation layout gt Annotation types gt Restriction sites gt Motifs gt Find b Text Format uz amp Ep Of Y Figure 2 24 Check cut sites co OB By looking at the enzymes we can see that both Hindlll and Xhol cut in the multi cloning site of the vector and not in the AtpSa1 gene Note that you can add more enzymes to the list in the Side Panel by clicking Manage Enzymes under the Restriction Sites group Close both views and open the ATP8al fwd primer sequence When it opens double click the name of the sequence to make a selection of the full sequence If you do not see the whole sequence turn purple please make sure you have the Selection Tool chosen and not one of the other tools available from the t
171. APTER 2 TUTORIALS 10 It is possible to add and remove sequences from Selected Elements list Since we had already selected the eight proteins just click Next to adjust parameters for the alignment Clicking Next opens the dialog shown in figure 2 53 a q Create Alignment 1 Select sequences of same MES amete type 2 Set parameters Gap settings Gap open cost 10 Gap extension cost 1 End gap cost As any other w Alignment Fast less accurate Slow very accurate Redo alignments Use Fixpoints PERES Previous gt Next XX Cancel Figure 2 53 The alignment dialog displaying the available parameters which can be adjusted Leave the parameters at their default settings An explanation of the parameters can be found by clicking the help button Alternatively a tooltip is displayed by holding the mouse cursor on the parameters Click Finish to start the alignment process which is shown in the Toolbox under the Processes tab When the program is finished calculating it displays the alignment see fig 2 54 FEE ATPase protei o _ Alignment Settin Hans k ig conta nperi pieredies eee sorttssrro mS p39524 M BORET PPKRKPGEDO THE HOGER a 504295 MARBWDNKGN AKRISRDEDE BEEAGESwic RTEONPRECE 2 Every 10 residues P57792 MA T PARR S s GRRRKR 22 Uwe ee eee ee eS a ee Se No wrap Q9sx33 Miles c TKRRRR E Consensus
172. ATSB1_ HUMAN e i i M 2G3 AT11B HUMAN gt ee ee eee IMC SI9BJATIIA HUMAN mm ee IM MM Eu IMM JB49 AT HC HUMAN e 4 1 1 a ee DA23 AT8B3 HUMAN ce si t 0423 AT8B3_HUMAN SMOATPIA HUMAN e e e e e O e e DD DO2CA4lA TMA ULIRAARI E Peg ii amp OO i 4 Th t PREY Figure 2 44 Output of a BLAST search By holding the mouse pointer over the lines you can get information about the sequence Try placing your mouse cursor over a potential homologous sequence You will see that a context box appears containing information about the sequence and the match scores obtained from the BLAST algorithm The lines in the BLAST view are the actual sequences which are downloaded This means that you can zoom in and see the actual alignment Zoom in in the Tool Bar 45 Click in the BLAST view a number of times until you see the residues Now we will focus our attention on sequence 09Y200 the BLAST hit that is at the top of the list To download the full sequence right click the line representing sequence Q9Y2Q0 Download Full Hit Sequence from NCBI This opens the sequence However the sequence is not saved yet Drag and drop the sequence into the Navigation Area to save it This homologous sequence is now stored in the CLC DNA Workbench and you can use it to gain information about the query sequence by using the various tools of the workbench e g by studying its textual informat
173. ATTTAAAC AGATGGTGTT TGCTTATTCC Min PERH3BC O TTCTAGGGAG CAGTTTAGAT GGAAGGTATC TGCTTGTICC b Advanced parameters Consensus TTCTAGGGAG NNNTTTANAN NGANGGTNTN TGCTTNTTCC Mode mor TTCTAGGGAG cos TT Modo oGdeGGTals TTTeTICG Omara TaqMan PERH2BD O CCCATGGAAT GCGGA AGA GTTTGATTGT TTTACCCTCC Primer solution PERH3BC O CCCATGGAGT GCTGACAAGA GTTTGGTTAT TTTACTCTCC gt Perfect match Consensus CCCATGGANT GCNGACAAGA GTTTGHTTHT TTTACHCTCC mare QNTO GteGhcaMGh GTM To TTTAGSCICG ww tes PES LO Figure 16 12 The initial view of an alignment used for primer design CHAPTER 16 PRIMERS 266 16 9 1 Specific options for alignment based primer and probe design Compared to the primer view of a single sequence the most notable difference is that the alignment primer view has no available graphical information Furthermore the selection boxes found to the left of the names in the alignment play an important role in specifying the oligo design process This is elaborated below The Primer Parameters group in the Side Panel has the same options for specifying primer requirements but differs by the following see figure 16 12 e In the Mode submenu which specifies the reaction types the following options are found Standard PCR Used when the objective is to design primers or primer pairs for PCR amplification of a single DNA fragment TaqMan Used when the objective is to design a primer pair and a probe set for TaqMan quantitative PCR
174. Aspartic acid 3 50 3 00 3 10 0 90 0 62 0 60 9 20 E Glutamic acid 3 50 3 00 1 80 0 74 0 62 0 70 8 20 F Phenylalanine 2 80 2 50 4 40 1 19 0 88 0 50 3 70 G Glycine 0 40 0 00 0 00 0 48 0 72 0 30 1 00 H Histidine 3 20 0 50 0 50 0 40 0 78 0 10 3 00 Isoleucine 4 50 1 80 4 80 1 38 0 88 0 70 3 10 K Lysine 3 90 3 00 3 10 1 50 0 52 1 80 8 80 L Leucine 3 80 1 80 5 70 1 06 0 85 0 50 2 80 M Methionine 1 90 1 30 4 20 0 64 0 85 0 40 3 40 N Asparagine 3 50 0 20 0 50 0 78 0 63 0 50 4 80 P Proline 1 60 0 00 2 20 0 12 0 64 0 30 0 20 Q Glutamine 3 50 0 20 2 80 0 85 0 62 0 70 4 10 R Arginine 4 50 3 00 1 40 2 53 0 64 1 40 12 3 S Serine 0 80 0 30 0 50 0 18 0 66 0 10 0 60 T Threonine 0 70 0 40 1 90 0 05 0 70 0 20 1 20 V Valine 4 20 1 50 4 70 1 08 0 86 0 60 2 60 W Tryptophan 0 90 3 40 1 00 0 81 0 85 0 30 1 90 Y Tyrosine 1 30 2 30 3 20 0 26 0 76 0 40 0 70 Table 15 1 Hydrophobicity scales This table shows seven different hydrophobicity scales which are generally used for prediction of e g transmembrane regions and antigenicity work for commercial purposes You may not alter transform nor build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents 15 3 Reverse translation from protein into DNA A protein sequence can be back translated into DNA using CLC DNA Workbench Due to degeneracy of the genetic code every amino
175. C DNA Workbench but are opened by other applications e g pdf files Microsoft Word files Open Office spreadsheet files or links to programs and web pages etc This chapter first deals with importing and exporting data in bioinformatic data formats and as external files Next comes an explanation of how to export graph data points to a file and how export graphics 7 1 Bioinformatic data formats The different bioinformatic data formats are imported in the same way therefore the following description of data import is an example which illustrates the general steps to be followed regardless of which format you are handling 117 CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 118 1 1 1 Import of bioinformatic data CLC DNA Workbench has support for a wide range of bioinformatic data such as sequences alignments etc See a full list of the data formats in section G 1 The CLC DNA Workbench offers a lot of possibilities to handle bioinformatic data Read the next sections to get information on how to import different file formats or to import data from a Vector NTI database Import using the import dialog To start the import using the import dialog click Import amp in the Toolbar This will show a dialog similar to figure 7 1 depending on which platform you use You can change which kind of file tyoes that should be shown by selecting a file format in the Files of type box EM impor CSE i EN File name Files o
176. CCCCATTGACGCAAATGGGCGGTAGGCGTGTAC 480 500 520 GGTGGGAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGG Figure 13 23 Showing dynamic motifs on the sequence This case shows the CMV promoter primer Sequence which is one of the pre defined motifs in CLC DNA Workbench The motif is per default shown as a faded arrow with no text The direction of the arrow indicates the strand of the motif CHAPTER 13 GENERAL SEQUENCE ANALYSES 223 Placing the mouse cursor on the arrow will display additional information about the motif as illustrated in figure 13 24 sCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGG Imotif CGCAAATG GGCGGTAGGCGTG list index 5 CMV itype Simple Idescriptior CMV promoter primer Figure 13 24 Showing dynamic motifs on the sequence To add Labels to the motif select the Flag or Stacked option They will put the name of the motif as a flag above the sequence The stacked option will stack the labels when there is more than one motif so that all labels are shown Below the labels option there are two options for controlling the way the sequence should be searched for motifs e Include reverse motifs This will also find motifs on the negative strand only available for nucleotide sequences e Exclude matches in N regions for simple motifs The motif search handles ambiguous characters in the way that two residues are different if they do not have any residues in common For example For nucleotides N matches any
177. CHAPTER 13 GENERAL SEQUENCE ANALYSES 216 Amino acid Mammalian Yeast E coli Ala A 4 4 hour gt 20 hours gt 10 hours Cys C 1 2 hours gt 20 hours gt 10 hours Asp D 1 1 hours 3 min gt 10 hours Glu E 1 hour 30 min gt 10 hours Phe F 1 1 hours 3 min 2 min Gly G 30 hours gt 20 hours gt 10 hours His H 3 5 hours 10 min gt 10 hours lle 1 20 hours 30 min gt 10 hours Lys K 1 3 hours 3 min 2 min Leu L 5 5 hours 3 min 2 min Met M 30 hours gt 20 hours gt 10 hours Asn N 1 4 hours 3 min gt 10 hours Pro P gt 20 hours gt 20 hours f Gin Q 0 8 hour 10 min gt 10 hours Arg R 1 hour 2 min 2 min Ser S 1 9 hours gt 20 hours gt 10 hours Thr T 7 2 hours gt 20 hours gt 10 hours Val V 100 hours gt 20 hours gt 10 hours Trp W 2 8 hours 3 min 2 min Tyr Y 2 8 hours 10 min 2 min Table 13 2 Estimated half life Half life of proteins where the N terminal residue is listed in the first column and the half life in the subsequent columns for mammals yeast and E coli X Ala X Val X lle and X Leu are the amino acid compositional fractions The constants a and b are the relative volume of valine a 2 9 and leucine isoleucine b 3 9 side chains compared to the side chain of alanine Ikai 1980 Estimated half life The half life of a protein is the time it takes for the protein pool of that particular protein to be reduced to the half The half life of proteins is highly dependent on the presence of
178. Database 24 This opens the dialog seen in figure 12 12 E E Create BLAST Database us 1 Choose where to run E a in A rhs Navigation rea Selected Elements 6 2 Select sequences of same J e CLC Data we 094296 3 Example Data Ss P39524 Cloning S P57792 Primers As Q29449 protein Ss Q9NTIZ Protein analyses f Q95x33 k3 Protein orthologs ys sw N ES ns xs as d RNA secondary str Sequencing data fe ATPBal c ATPBal genomic ser Bo ATPBal mRNA A F coli thirmina E Q enter search term gt AN Previous gt Next Finist XX Cancel Figure 12 12 Add sequences for the BLAST database Select sequences or sequence lists you wish to include in your database and click Next In the next dialog shown in figure 12 13 you provide the following information e Name The name of the BLAST database This name will be used when running BLAST searches and also as the base file name for the BLAST database files e Description You can add more details to describe the contents of the database e Location You can select the location to save the BLAST database files to You can add or change the locations in this list using the Manage BLAST Databases tool see section 12 4 CHAPTER 12 BLAST SEARCH 188 g E Create BLAST Database i xs J 1 Choose where to run Set database abe obs 2 Select sequences of same 3 Set database properties Database
179. Fields v hemoglobin B All Fields complete B Add search parameters 8 Start search C Append wildcard to search words Rows 50 Search results Filter Accession Definition Modification Date a AM270166 Aspergillus niger contig An08c0110 complete genome 2007 03 24 al AM711867 Clavibacter michiganensis subsp michiganensis NCPPB 2007 05 18 AP008209 Oryza sativa japonica cultivar group genomic DNA c 2007 05 19 BA000016 Clostridium perfringens str 13 DNA complete genome 2007 05 19 BC029387 Homo sapiens hemoglobin gamma G mRNA cDNA clon 2007 02 08 BC130457 Homo sapiens hemoglobin gamma G mRNA cDNA clon 2007 01 04 BC130459 Homo sapiens hemoglobin gamma G mRNA cDNA clon 2007 01 04 BC139602 Danio rerio hemoglobin beta embryonic 2 mRNA cDNA 2007 04 18 BC142787 Danio rerio hemoglobin beta embryonic 1 mRNA cDNA 2007 06 11 BX842577 Mycobacterium tuberculosis H37Rv complete genome 2006 11 14 v Download and Open 4 Download and Save Total number of hits 245 Open at NCBI a Figure 2 11 NCBI search view Click Start search to commence the search in NCBI 2 4 1 Searching for matching objects When the search is complete the list of hits is shown If the desired complete human hemoglobin DNA sequence is found the sequence can be viewed by double clicking it in the list of hits from the search If the desired sequence is not shown you can click the More button below th
180. Figure 10 3 shows an artificial sequence with all the different kinds of regions 20 40 Gene Gene Gene Gena CLCCECCLCE LCCLCCLCOL CCLCCLCCLO CLCOLCOLCOE LCCLCCLCCL CC ED BO 100 Gene Gene Gene LCCLCCLCCL CCLCCLCCLC CLCCLCCLCC LCCLCCLCCL CCLCCLCCLC CL 120 140 Gene I Gene Gene CCLCCELCCLC CLECLCCLCC LCCLCCLCCL CCLCCLCCLC CLCCLCCLCC LC 160 180 200 Lene CLCCLCCLCC LCCLECLCE L CCLCCLCCLC CLCULCCLCE LCCLCCLCCL cc 220 240 260 Genel Ganel LECLCCLCECL CCLCCLCCLC CLECLCCLCC LECLCCLCECL CCLCCLCCLC CL 280 300 CCLCCLCCLC CCLCCLCCLC CCLCCLCCLC CeLecLeccLe CCLCCLCCLC CC Figure 10 3 Region 1 A single residue Region 2 A range of residues including both endpoints Region 3 A range of residues starting somewhere before 30 and continuing up to and including 40 Region 4 A single residue somewhere between 50 and 60 inclusive Region 5 A range of residues beginning somewhere between 70 and 80 inclusive and ending at 90 inclusive Region 6 A range of residues beginning somewhere between 100 and 110 inclusive and ending somewhere between 120 and 130 inclusive Region 7 A site between residues 140 and 141 Region 8 A site between two residues somewhere between 150 and 160 inclusive Region 9 A region that covers ranges from 170 to 180 inclusive and 190 to 200 inclusive Region 10 A region on negative strand that covers ranges from 210 to 220 inclusive Region 11 A region on negative strand that covers range
181. Finish This will start the secondary peak calling A detailed history entry will be added to the history specifying all the changes made to the sequence Chapter 18 Cloning and cutting Contents 18 1 Molecular cloning 0 0 2 ee eee ee 308 18 1 1 Introduction to the cloning editor a a sees 309 18 1 2 Ie l ning WOrk IOW lt s 6265e 8628S IEA E Es 310 18 1 3 Manual cloning 2c kn hae ace eee eee DELETED E 313 18 1 4 Insert restriction site 2 2 2 ee ee ee 318 18 2 Gatewaycloning 2 ee ee 318 18 2 1 Add AUB sites wae we Re ee ew ee me wm Re ee 319 18 2 2 Create entry clones BP 0 0 0 2 eee ee a 324 18 2 3 Create expression clones LR 2 ee ee ee a 326 18 3 Restriction site analysis 0 0 ee eet 327 18 3 1 Dynamic restriction sites 0 0 ee ee ee ee a 327 18 3 2 Restriction site analysis from the Toolbox 2206 335 18 4 Gel electrophoresis 0 0 2 eee ee et ee a 340 18 4 1 Separate fragments of sequences on gel 2 5 502 2 eee 341 18 4 2 Separate sequences on gel 2 eee ee te ee 341 18 4 3 Gelview sis eee Dede eR ee ee ew 341 18 5 Restriction enzyme lists 00 eee ee ee 343 18 5 1 Create enzyme list 6 wwe we eee ee ee ew Aw we ew ee RA 343 18 5 2 View and modify enzyme list 0 0 0 ewww ee ee te 345 CLC DNA Workbench offers graphically advanced in silico cloning and design of vectors for various purposes together with
182. Forward primer region a Reverse primer region or both These are defined by making a selection on the sequence and right clicking the selection It is also possible to define a Region to amplify in which case a forward and a reverse primer region are automatically placed so as to ensure that the designated region will be included in the PCR fragment If areas are known where primers must not bind e g repeat rich areas one or more No primers here regions can be defined If two regions are defined it is required that at least a part of the Forward primer region is located upstream of the Reverse primer region After exploring the available primers see section 16 3 and setting the desired parameter values in the Primer Parameters preference group the Calculate button will activate the primer design algorithm When a single primer region is defined If only a single region is defined only single primers will be suggested by the algorithm After pressing the Calculate button a dialog will appear see figure 16 7 Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum GIC content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements 5 end must meet G C requirements Mispriming parameters Use mispriming as exclusion criteria Exact match Minimum number of b
183. G EALGRLLV P68231 MVHLSGDEKN AVHGLWSKV KVDEVGG EALGRLLV Q6H1U7 MVHLTAEEKN AITSLWGKV AIEQTGG EALGRLLI Pb8945 VHWTAEEKQ LITGLWGKV NVADCGA EALARLLI PF68873 MVHLTPEEKS AVTALWOKVX AAXNVDEVGG EALGRLLV Consensus MVHLTAEEKN AVTALWGKV NVDEVGG EALGRLLV Sequence Logo MVHCTsEEKe AvTaLWGKV aveevG6G EALGRLLy 351 Conservation Figure 19 5 The top figures shows the original alignment In the bottom panel a single sequence with four inserted X s are aligned to the original alignment This introduces gaps in all sequences of the original alignment All other positions in the original alignment are fixed This feature is useful if you wish to add extra sequences to an existing alignment in which case you just select the alignment and the extra sequences and choose not to redo the alignment It is also useful if you have created an alignment where the gaps are not placed correctly In this case you can realign the alignment with different gap cost parameters 19 1 4 Fixpoints With fixpoints you can get full control over the alignment algorithm The fixpoints are points on the sequences that are forced to align to each other Fixpoints are added to sequences or alignments before clicking Create alignment To add a fixpoint open the sequence or alignment and Select the region you want to use as a fixpoint right click the selection Set alignment fixpoint here This will add an annotation labeled Fixpoint to the sequence s
184. H2BD P maniculatus deer mouse beta 2 globin Hbb b2 DNA 3 region 194 PERH3BC P maniculatus deer mouse beta 3 globin Hbb b3 DNA 3 region 196 sequence list 0 Ty EIEEE EE Figure 7 16 Selected elements in a Folder Content view When the elements are selected do the following to copy the selected elements right click one of the selected elements Edit Copy 5 Then right click in the cell AZ Paste 4 The outcome might appear unorganized but with a few operations the structure of the view in CLC DNA Workbench can be produced Except the icons which are replaced by file references in Excel Note that all tables can also be Exported E directly in Excel format Chapter 8 History log Contents 8 1 Element history 2 4665 c ee ee ee ss dade ds E Ow Pe He EO 131 8 1 1 Sharing data with history 0 2 00 2 eee eee ee es 132 CLC DNA Workbench keeps a log of all operations you make in the program If e g you rename a sequence align sequences create a phylogenetic tree or translate a sequence you can always go back and check what you have done In this way you are able to document and reproduce previous operations This can be useful in several situations It can be used for documentation purposes where you can specify exactly how your data has been created and modified It can also be useful if you return to a project after some time and want to refresh your memory
185. II 5 gate N4 methyl Fr Xhol 5 tega N6 meth k EcoRI 5 aatt N6 methyl tetetetote i Xbal 5 ctag N6 meth EcoRV Blunt N6 methyl 7 BamHI 5 gatc N4 meth F HindIII 5 agct N6 methyl Sall 5 tcga N6 meth et PstI 3 tgca N6 methyl ettetee Ball S gatc N meth esc Sall 5 tega N6 methyl HindIII 5 agct N6 meth Smal Blunt N4 methyl teta __ fEcorv Blunt N6 meth et Xbal 5 ctag N6 methyl EcoRI 5 aatt N6 meth Xhol 5 tega N6 methyl PstI 3 tgca N6 meth k Clal 5 cg N6 methyl Ncol 5 catg N4 meth e Haelll Blunt 5 methylc Sacl 3 agct 5 methyl KpnI 3 gtac N6 methyl Clal 5 cg N6 meth Ncol 5 catg N4 methyl HaellI Blunt 5 methyl NdeI 5 ta N6 methyl E NdeI 5 ta N6 meth x RiskT a MA mathul Hk J lcabr FI sete q Previous gt Next Finish x Cancel Figure 18 35 Choosing enzymes to be considered At the top you can choose to Use existing enzyme list Clicking this option lets you select an enzyme list which is stored in the Navigation Area See section 18 5 for more about creating and modifying enzyme lists Below there are two panels e To the left you see all the enzymes that are in the list select above If you have not chosen to use an existing enzyme list this panel shows all the enzym
186. In the example above if we want to group the reads according to sample ID and gene name these two parts should be checked as shown in figure 17 4 a BB Sort Sequences by Name ES 1 Select at least 2 et algorithm par sequences of the same Specify settings type 2 Set algorithm parameters Simple Character x Positions Start 1 End Java regular expression l Press Shit F1 for options Preview Sequence name AO2 Asp F 016 2007 Resulting group Asp016 Number of sequences 8 Number of bins 4 Use for grouping Name A02 v Asp IF g Wi 016 2007 01 10 LIS e ETE Figure 17 4 Splitting up the name at every underscore _ and using the sample ID and gene name for grouping At the middle of the dialog there is a preview panel listing e Sequence name This is the name of the first sequence that has been chosen It is shown here in the dialog in order to give you a sample of what the names in the list look like e Resulting group The name of the group that this sequence would belong to if you proceed with the current settings e Number of sequences The number of sequences chosen in the first step e Number of groups The number of groups that would be produced when you proceed with the current settings This preview cannot be changed It is shown to guide you when finding the appropriate settings Click Next if you wish to adjust how to handle the results see section 9 2 If n
187. License was successfully downloaded The License is valid until 2008 08 01 If you experience any problems please contact The CLC Support Team Proxy Settings vious Next Quit Workbench Figure 1 15 A license has been downloaded A progress for getting the license is shown and when the license is downloaded you will be able to click Next Go to license download web page Selecting the second option Go to license download web page opens the license web page as Shown in 1 16 Download a license Figure 1 16 The license web page where you can download a license Click the Request Evaluation License button and you will be able to save the license on your computer e g on the Desktop Back in the Workbench window you will now see the dialog shown in 1 17 Click the Choose License File button and browse to find the license file you saved before e g on your Desktop When you have selected the file click Next CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 24 License Wizard 58 d CLC DNA Workbench Import a license from a file Please click the button below and locate the file containing your license No file selected Choose License File If you experience any problems please contact The CLC Support Team Proxy Settings Previous Next Quit Workbench Figure 1 17 Importing the license downloaded from the web site Accepting the license agreement Reg
188. MDINUC fsa FASTA format from our own Desktop into the new My folder This file is chosen for demonstration purposes only you may have another file on your desktop which you can use to follow this tutorial You can import all kinds of files In order to import the HUMDINUC fsa file Select My folder Import E gt in the Toolbar navigate to HUMDINUC fsa on the desktop Select The sequence is imported into the folder that was selected in the Navigation Area before you clicked Import Double click the sequence in the Navigation Area to view it The final result looks like figure 2 2 CHAPTER 2 TUTORIALS 39 g CLC Dna Workbench 3 0 Current workspace Default Sele File Edit Search View Toolbox Workspace Help DEM EO Se DC ol E Sa Ad ONDA O Show New Import Expor Graphics Print Copy Workspace Search Fit Width 100 Pan SOCI Zoom In Zoom Out der HUMDINUC E3 y Ma TAS CLC_Data T Sar My folder HUMDINUC ACAAATTGATTAATGATAGTGCTATC ae A HUMDINUC Sequence layout HR Recycle bin 1 7 Spacing HUMDINUC CTCTTGCATTTAGAGTTTAACTGGTA No spacing b 45 60 No wrap HUMDINUC CCTACTTCCAAAAGGGAAACAGAATT Autowrap 80 100 Fixed wrap l dia HUMDINUC AGAAAAGAAAATGTGGTTCCAGAAAG 10000 Alignments and Trees KA General Sequence Analyses 120 _ Double stranded KA Nucleotide Analyses ak Protein Analyses HUMDINUC GAAGAAAAAGAACACACACACACACA 4 Numbers on sequences TAA Sequenc
189. MENT 358 select the part of the sequence you want to delete right click the selection Edit Selection Delete the text in the dialog Replace The selection shown in the dialog will be replaced by the text you enter If you delete the text the selection will be replaced by an empty text i e deleted To delete entire columns select the part of the alignment you want to delete right click the selection Delete columns The selection may cover one or more sequences but the Delete columns function will always apply to the entire alignment 19 3 4 Copy annotations to other sequences Annotations on one sequence can be transferred to other sequences in the alignment right click the annotation Copy Annotation to other Sequences This will display a dialog listing all the sequences in the alignment Next to each Sequence is a checkbox which is used for selecting which sequences the annotation should be copied to Click Copy to copy the annotation If you wish to copy all annotations on the sequence click the Copy All Annotations to other Sequences 19 3 5 Move sequences up and down Sequences can be moved up and down in the alignment drag the name of the sequence up or down When you move the mouse pointer over the label the pointer will turn into a vertical arrow indicating that the sequence can be moved The sequences can also be sorted automatically to let you save time moving the sequences around To sort the sequenc
190. Mask lower case Expect 1 sa Word size 3H Matrix BLOSUM62 w Gap cost Existence 11 Extension 1 O Max number of hit sequences j 250 ee nfen previous Buea Jrinish Xcancei Figure 12 7 Examples of parameters that can be set before submitting a BLAST search CHAPTER 12 BLAST SEARCH 180 See section 12 1 1 for information about these limitations There is one setting available for local BLAST jobs that is not relevant for remote searches at the NCBI e Number of processors You can specify the number of processors which should be used if your Workbench is installed on a multi processor system 12 1 4 BLAST a partial sequence against a local database You can search a database using only a part of a sequence directly from the sequence view select the region that you wish to BLAST right click the selection BLAST Selection Against Local Database This will go directly to the dialog shown in figure 12 6 and the rest of the options are the same as when performing a BLAST search with a full sequence 12 2 Output from BLAST searches The output of a BLAST search is similar whether you have chosen to run your search locally or at the NCBI If a single query sequence was used then the results will show the hits found in that database with that single sequence If more than one sequence was used to query a database the default view of the results is a summary table showing the des
191. Name Orthologs Description set of ortholog proteins Location C Users smoensted CLCdatabases w EC Cem Kera Figure 12 13 Providing a name and description for the database and the location to save the files to Click Finish to create the BLAST database Once the process is complete the new database will be available in the Manage BLAST Databases dialog see section 12 4 and when running local BLAST see section 12 1 3 12 4 Manage BLAST databases The BLAST database available as targets for running local BLAST searches see section 12 1 3 can be managed through the Manage BLAST Databases dialog see figure 12 14 Toolbox BLAST Manage BLAST Databases e BLAST Database Manager BLAST database locations home joeuser CLCdatabases a home joeuser blastdbs Add Location Remove Location Refresh Locations BLAST databases overview Name Description Date Sequences Type Total size Location 1000 residues fungupdate fungnew 08 02 2011 17634 DNA 51683 home joeuser pataa Protein sequences d 04 05 2011 974785 Protein 198433 home joeuser vrlupdate vrinew4 08 02 2011 40754 DNA 46691 home joeuser 2 Close Figure 12 14 Overview of available BLAST databases At the top of the dialog there is a list of the BLAST database locations These locations are folders where the Workbench will look for vali
192. OMPARISON OF WORKBENCHES AND THE VIEWER Sequence alignment Multiple sequence alignments Two algo rithms Advanced re alignment and fix point align ment options Advanced alignment editing options Join multiple alignments into one Consensus sequence determination and management Conservation score along sequences Sequence logo graphs along alignments Gap fraction graphs Copy annotations between sequences in alignments Pairwise comparison RNA secondary structure Advanced prediction of RNA secondary struc ture Integrated use of base pairing constraints Graphical view and editing of secondary struc ture Info about energy contributions of structure elements Prediction of multiple sub optimal structures Evaluate structure hypothesis Structure scanning Partition function Dot plots Dot plot based analyses Phylogenetic trees Neighbor joining and UPGMA phylogenies Maximum likelihood phylogeny of nucleotides Pattern discovery Search for sequence match Motif search for basic patterns Motif search with regular expressions Motif search with ProSite patterns Pattern discovery Viewer E Viewer Viewer Viewer 2 Viewer i Protein E Protein Protein Protein E u Protein E DNA RNA Main DNA RNA Main DNA RNA Main DNA RNA Main DNA RNA Main 379 Genomics E Genomics Ly Genomics E Genomics E Genomics E APPENDIX A COMPARISON OF WORKBENCHES AND THE VIEWER P
193. Output from the contig 2c ccd atcteueevuabteaee dau a 300 Li Extract parts Oracle cee s sase ccsa ds ee ew Se ew CE 301 17 7 7 Variance table eR ee ne ee eae ea ea a ee 303 17 8 Reassemble contig 0 0 ee eee 304 17 9 Secondary peak calling 1 2 ee 305 CLC DNA Workbench lets you import trim and assemble DNA sequence reads from automated sequencing machines A number of different formats are supported see section 7 1 1 This chapter first explains how to trim sequence reads Next follows a description of how to assemble reads into contigs both with and without a reference sequence In the final section the options for viewing and editing contigs are explained 211 CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 2 8 17 1 Importing and viewing trace data A number of different binary trace data formats can be imported into the program including Standard Chromatogram Format SCF ABI sequencer data files ABI and AB1 PHRED output files PHD and PHRAP output files ACE see section 7 1 1 After import the sequence reads and their trace data are saved as DNA sequences This means that all analyzes which apply to DNA sequences can be performed on the sequence reads including e g BLAST and open reading frame prediction You can see additional information about the quality of the traces by holding the mouse cursor on the imported sequence This will display a tool tip as shown in figure 17 1 E gt Assembly
194. Panel showing the available choices of information to display 55 921 660 00 GCGTGGATAGCGGTTTGA 56 978 GAGGCTGGTTGATGAAGA 56 439 Self annealing Fwd x Self annealing alignment Fwd Clicking a primer pair in the table will make a corresponding selection on the sequence in the view above At this point you can either settle on a specific primer pair or save the table for later If you want to use e g the first primer pair for your experiment right click this primer pair in the table and save the primers You can also mark the position of the primers on the sequence by selecting Mark primer annotation on sequence in the right click menu see figure 2 41 This tutorial has shown some of the many options of the primer design functionalities of CLC DNA Workbench You can read much more using the program s Help function or in the CLC DNA CHAPTER 2 TUTORIALS 61 Ir atp8al 800 1 000 1 200 1 400 Primer Designer settings t x g ill Primer parameters Length T7 Promoter Reverse primer region orward primer region Atp8al Atp8a1 Max 2215 7 Min 185 Melt temp C Max 585 Min 48 4 Inner Melt temp C Max 625 nm pcDNA3 atp8a1 lt Min 525 gt Advanced parameters dl Mode 2 O BEEYAN Em pcDNA3 atp8al Primer Table Settings SS a T e E E Rows 100 Standard primers for pcDNA3 atp
195. Previous pues J Enisn aX cancer Figure 12 5 Choose one or more sequences to conduct a BLAST search lt enter search term gt IR Local BLAST 1 Select sequences of same a ES STE type 2 Choose program and target BLAST program Program blastp Protein sequence and database br Target Sequences BLAST database uniprotfun Protein uniprotfun es m Figure 12 6 Choose a BLAST program and a target database e Sequences When you choose this option you can use sequence data from the Navigation Area as database by clicking the Browse and select icon 55 A temporary BLAST database will be created from these sequences and used for the BLAST search It is deleted afterwards If you want to be able to click in the BLAST result to retrieve the hit sequences from the BLAST database at a later point you should not use this option create a create a BLAST database first see section 12 3 3 e BLAST Database Select a database already available in one of your designated BLAST database folders Read more in section 12 4 When a database or a set of sequences has been selected click Next This opens the dialog seen in figure 12 7 e Local BLAST 1 Select sequences of same Set pe d type 2 Choose program and target 3 Set BLAST parameters Choose parameters Number of threads 1H Filter low Complexity O R Choose filter
196. QQMECPHE PNERRHRROA WKTEPERRSQ STKES UMHEK P20810 E PNERRHRROA WETEPERESO STRESMMHER P27321 MSTTCABA WRNESER soO ssErrPmEHER P08855 1MNPABABAMP MSREVECPHP HSEERHAROR AKTEPER SO STMPPMBHER P12675 MNPTETRADP MsKOBECPHS PNEERHEKOA WMERTEPERESO STEPSENHER P20811 J METEPEKKPO ssKPSEMNHER Q95208 1MNPTBABAMP CSMOBBCPHS PNERRHEKOsA WMETEPERESsO sTEPSENHER 20 40 P49342 1MNPTETRAMP WS QQMECPHE PNEREHEKROA HERTE EKKSQ STRES UMHEK P20810 1MNPTETRAMP MsooMEcPHE PNEREHEROL P27321 1MBMBcCABAB P08855 1MNPABABAMP BsREMEcrPHP HSEREHEROS P12675 MNPTETRADP MsKOBECPHS PNEERHEROL P20811 1 MPMBABAB Q95208 1MNPTBABAMP CSHROBBCPHS PNERBHRRQ4 MRTEPERRSQ sTEPSENHER Figure 19 3 The first 50 positions of two different alignments of seven calpastatin sequences The top alignment is made with cheap end gaps while the bottom alignment is made with end gaps having the same price as any other gaps In this case it seems that the latter scoring scheme gives the best result STKES WHER ssEPPMIEHER STH P MBHER NM_173881_CD5 1 NM 000559 SS NM 1 3881 CDS 1 NM 000559 1 Figure 19 4 The alignment of the coding sequence of bovine myoglobin with the full mRNA of human gamma globin The top alignment is made with free end gaps while the bottom alignment is made with end gaps treated as any other The yellow annotation is the coding sequence in both seq
197. Qy zenter search term gt A Figure 14 5 Choosing sequences for translation If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Clicking Next generates the dialog seen in figure 14 6 Cc g Translate to Protein 88 1 Select nucleotide ME amerre sequences 2 Set parameters Translation of whole sequence J Reading frame 1 Reading frame 2 Reading frame 3 Reading frame 1 Reading Frame 2 Reading frame 3 Translation of coding regions V Translate CDS Translate ORF Genetic code translation table ad l ZA Cere ee NF Des Figure 14 6 Choosing 1 and 3 reading frames and the standard translation table Here you have the following options Reading frames If you wish to translate the whole sequence you must specify the reading frame for the translation If you select e g two reading frames two protein sequences are generated Translate coding regions You can choose to translate regions marked by and CDS or ORF annotation This will generate a protein Sequence for each CDS or ORF annotation on the sequence Genetic code translation table Lets you specify the genetic code for the translation The translation tables are occasionally updated from NCBI The tables are not available
198. Reverse zoom function Shift Shift Click in view Select multiple elements Ctrl ab Click elements Select multiple elements Shift Shift Click elements ements in this context refers to elements and folders in the Navigation Area selections on sequences and rows in tables Chapter 4 Searching your data Contents 4 1 What kind of information can be searched 080 082 ee eae 98 4 2 Quick search ie eee ee a eee eRe ee E eH 99 fie Quick search reSunS moe baw be Aes eo ed bw ewe a 99 4 2 2 Special search expressions 0 a ee ee ee ee 100 4 2 3 Quicksearch history 2 wee ee ee ee 101 4 3 Advanced search 00s eee eee 101 44 Seatch index is iraa wet HK ea SSK REEDS RO ERS HEH 103 There are two ways of doing text based searches of your data as described in this chapter e Quick search directly from the search field in the Navigation Area e Advanced search which makes it easy to make more specific searches In most cases quick search will find what you need but if you need to be more specific in your search criteria the advanced search is preferable 4 1 What kind of information can be searched Below is a list of the different kinds of information that you can search for applies to both quick search and the advanced search e Name The name of a sequence an alignment or any other kind of element The name is what is displayed in the Navigation Area per default Length The length of the sequenc
199. SEMBLY 291 Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish This will start the trimming process Views of each trimmed sequence will be shown and you can inspect the result by looking at the Trim annotations they are colored red as default If there are no trim annotations the sequence has not been trimmed 17 4 Assemble sequences This section describes how to assemble a number of sequence reads into a contig without the use of a reference sequence a known sequence that can be used for comparison with the other sequences see section 17 5 To perform the assembly select sequences to assemble Toolbox in the Menu Bar Sequencing Data Analyses 157 Assemble Sequences F7 This opens a dialog where you can alter your choice of sequences which you want to assemble You can also add sequence lists Note You can assemble a maximum of 2000 sequences at a time To assemble more sequences you need the CLC Genomics Workbench see http www clcbio com genomics When the sequences are selected click Next This will show the dialog in figure 17 16 r q Assemble Sequences nucleotide sequences ae Trimming 2 Set assembly parameters Alignment options Conflicts Output options Vote A C G T 1 Select at least two SSD Pat ameter S J Trim sequence ends before assembly Minimum aligned read length 50 Alignment stringency Medium w Unkno
200. Shift F1 For options Reverse insets Press Shift F1 for options Preview GGGG ACAAGTTTGTACAAAAAAGCAGGCTTA AGGAGGT attB 1 Shine Dalgamo Sequence of interest LACCCAGCTTTCTTGTACAAAGTGGT CCCC Figure 18 18 A Shine Dalgarno sequence has been inserted Add attB Sites Select nucleotide sequences Specify primer additions Set primer parameters Primer extensions Forward primer extension 20 Reverse primer extension 20 Figure 18 19 Specifying the length of the template specific part of the primers Besides the main output which is a copy of the the input sequence s now including attB sites and primer additions you can get a list of primers as output Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish The attB sites the primer additions and the primer regions are annotated in the final result as shown in figure 18 21 There will be one output sequence for each sequence you have selected for adding attB sites Save 5 the resulting sequence as it will be the input to the next part of the Gateway cloning work flow see section 18 2 2 When you open the sequence again you may need to switch on the relevant annotation types to show the sites and primer additions as illustrated in figure 18 21 CHAPTER 18 CLONING AND CUTTING 323 Add attB Sites Select nucleotide sequences Specify primer additions Set primer parameters Result ha
201. T search to meet your requirements e BLAST at NCBI 1 Select sequences of same Miss fcibliAch d type 2 Choose program 3 Set BLAST parame ters Choose parame ters Limit by entrez query all organisms x _ Filter low Complexity Choose filter _ _ Mask lower case Expect 10 Word size 3H Matrix BLOSUM 62 Gap cost Existence 11 Extension 1 e Max number of hit sequences 100 r q Previous gt Next X Cancel Figure 12 4 Parameters that can be set before submitting a BLAST search When choosing BLASTx or tBLASTx to conduct a search you get the option of selecting a translation table for the genetic code The standard genetic code is set as default This setting is particularly useful when working with organisms or organelles that have a genetic code different from the standard genetic code The following description of BLAST search parameters is based on information from http www ncbi nlm nih gov BLAST blastcgihelp shtml e Limit by Entrez query BLAST searches can be limited to the results of an Entrez query against the database chosen This can be used to limit searches to subsets of entries in the BLAST databases Any terms can be entered that would normally be allowed in an Entrez search session More information about Entrez queries can be found at http www ncbi nlm nih gov books NBK3837 fEntrezHelp Entrez Searching Options The syntax described there is the same as woul
202. TAATGTGAGATGGITCCCAATATCATGIGA POPU TEEPE eee TGTTICTTGGIAGATTATTCATAATGTIGAGATGGTICCCAATATCATGIGA 171 1163 1112 Score 224 bits 113 Expect 6e 56 Identities 161 161 100 Gaps 0 161 0 Strand Flus Flus Query Sbjct 213 GACIGIGCAATACTTAGAGAACCIATAGCATICTIICICATICCCATGIGGAACAGGATGCC PEP UTEP Eee GACTGIGCAATACTTAGAGAACCTATAGCATCTICICATICCCATGIGGAACAGGATGCC CACATACTGICTAATTAATAAATITICCACtrct ttt cCABACAAGTATGAATCTAGITGS PPP U PEPE eee CACATACTGICTAATTAATAAATTITCCATTTITITTITCAAACAAGTAIGAAICTIAGITGG 1324 Query 273 Sbjct TIGATGCCttttttttCATGACATAATAAAGIAITITCIIT PEPUTEEEP TEEPE eee TIGATGCCTITITTTICATGACATAATAAAGTATIITCIIT Query 373 Sbjct 1365 Figure 12 21 Alignment view of BLAST results Individual alignments are represented together with BLAST scores and more 12 5 7 I want to BLAST against my own sequence database is this possible It is possible to download the entire BLAST program package and use it on your own computer institution computer cluster or similar This is preferred if you want to search in proprietary sequences or sequences unavailable in the public databases stored at NCBI The downloadable BLAST package can either be installed as a web based tool or as a command line tool It is available for a wide range of different operating systems The BLAST package can be downloaded free of charge from the following location http www ncbi nim nih gov BLAST download shtml Pre formatted
203. TI FINQPOLTKFCNNHVS Window kem am mem jA tp8a1 ATP8al TAKYNVITFLPRFLYSOFRRAANSFFLEJALLOQIPDVSPTGRYTTLVPLEFICA 450 e a q Eq SS Atpeat 333 ATPSal VAAIKE 11 EDI KRAKADNAVNKKOTOVLRNGAWE VHWEKVNVGDIVI IKGKEYI v gt Hopp Weods mow JEI 1E Figure 15 6 The different ways of displaying the hydrophobicity scores using the Kyte Doolittle scale The latter option offers you the same possibilities of amplifying the scores as applies for coloring of letters The different ways to display the scores when choosing graphs are displayed in figure 15 6 Notice that you can choose the height of the graphs underneath the sequence 15 2 3 Bioinformatics explained Protein hydrophobicity Calculation of hydrophobicity is important to the identification of various protein features This can be membrane spanning regions antigenic sites exposed loops or buried residues Usually these calculations are shown as a plot along the protein sequence making it easy to identify the location of potential protein features 20 40 Q6H1U7 mvh EBRERA aitsiwgkva ie ogealg FilivypPWtsS Effanfk Figure 15 7 Plot of hydrophobicity along the amino acid sequence Hydrophobic regions on the sequence have higher numbers according to the graph below the sequence furthermore hydrophobic regions are colored on the sequence Red indicates regions with high hydrophobicity and blue indicates regions with low hydrophobicity
204. TTCAATTCCGTTCAATGATTCCATTEGATTC 98 1139 847 1 GACGATTCCATTCAATTCCGTTCAATGATTCCATTHEGATTC 2 90 40 189 2 GACGATTCCATTCAATTCCGTTCAATGATTCCATTHEGATTC 86 627 1969 1 GACGATTCCATTCAATTCCGTTCAATGATTCCATTHEGATTC 2 85 523 514 2 GACGATTCCATGCAATTCCGTTCAATGATTCCATTAGATTC 4 1256 1139 1 GACCATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC 78 1008 834 2 GACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC 64 294 1084 2 GACGATTCCATTCABTTCCGTTCAATGATTCCATTHEGATTC 8 722 1303 2 GACGATTCCATTCAATTCCGTTCAATGATTCCATTAGATTC Figure 7 14 A graph displayed along the mapped reads Right click the graph to export the data points to a file will be shown If the graph is covering a set of aligned sequences with a main sequence such as read mappings and BLAST results the dialog shown in figure 7 15 will be displayed These kinds of graphs are located under Alignment info in the Side Panel In all other cases a normal file dialog will be shown letting you specify name and location for the file g Export Graphics 1 Output options Export options Export excluding gaps Figure 7 15 Choosing to include data points with gaps In this dialog select whether you wish to include positions where the main sequence the reference sequence for read mappings and the query sequence for BLAST results has gaps If you are exporting e g coverage information from a read mapping you would probably want to exclude gaps if you want the positions in the exported file to match the reference
205. TTCTGGGCTTACCTTCCTATCAC Standard PCR Lgt 18 TagMan Lgt 19 O Nested PCR O Sequencing Lat 20 O EEn Eny a o Figure 16 1 The initial view of the sequence used for primer design wt 16 1 1 General concept The concept of the primer view is that the user first chooses the desired reaction type for the session in the Primer Parameters preference group e g Standard PCR Reflecting the choice of reaction type it is now possibly to select one or more regions on the sequence and to use the right click mouse menu to designate these as primer or probe regions see figure 16 2 CHAPTER 16 PRIMERS 250 GGAUGGAAGIIFGAGINAReeFIIAaE Forward primer region here do Reverse primer region here gt a E Region to amplify Pl E No primers here rai Copy Selection Open Selection in Mew view Edit Selection Er Delete Selection Si Add Annotation EA Ha 4dd Enzymes Cutting Selection to Panel E EE Insert Restriction Site After Selection Ss CCAAGI Insert Restriction Site Before Selection AAL Base Pair Constraint k Set Numbers Relative to This Selection Blast Selection Against NCBI E Blast Selection Against Local Database Figure 16 2 Right click menu allowing you to specify regions for the primer design When a region is chosen graphical information about the properties of all possible primers in this region will appear in lines beneath it By default information is showed using a compact mode
206. The created object can also be saved and exported as a text file See figure 16 23 CHAPTER 16 PRIMERS Primer order o Number of primers 4 Name Primer Fl 24 44 GTTTCCOTTCCTCTAGTTTCT Name Primer Rl l s 141 CTCTTGTCAGCACTCCAT Name Primer Rl l26 146 CCASACTCTTGTCAGCAC Name Primer Fl 19 37 CCATGGTTTCCTTCCTCT Figure 16 23 A primer order for 4 primers 2 6 Chapter 17 Sequencing data analyses and Assembly Contents 17 1 Importing and viewing trace data 0 0 0s eee ee es 278 17 41 14 SOMEM rsss sic E ee Ed ee es 218 17 1 2 Trace settings inthe Side Panel 0 0 0 50552 ses 278 17 2 Multiplexing 2 2 ee 279 17 2 1 Sort sequences by name 2 00 ee ee eee ee ee ee 279 17 2 2 Process tagged sequences 1 ee ee ee 283 17 3 Trim sequences asasan ae eae CRE he ee eee ee we we 288 17 3 1 Manual trimming ss ek ek eee wR ek eed we eG 288 17 3 2 Automatic TIMNMINE assassinas a eee ke ee ee ee wo 289 17 4 Assemble sequences 0 2 2 ee ee nnnnnnnnnnnn 291 17 5 Assemble to reference sequence 0088 eee eee en eee 293 17 6 Add sequences to an existing contig 2 02 eee ee ee 295 17 7 View and edit contigs 1 eee 296 1 1 View settings inthe Side Panel 2 05 2 58000 297 Lives OVINE TNECONUS sossarna PRR GRRE Re Rew SRA SE 299 Listes DOMINO surprise Ds cde weet gee ba ee do 300 Lera KRCa CONTOS ii sawa Rae we ee ee eh a ee oe 300 17 7 5
207. Utils wprintgc cgi mode c Codon usage database http www kazusa or jp codon Wikipedia on the genetic code http en wikipedia org wiki Genetic_code Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in its original form and CLC bio has to be clearly labeled as author and provider of the work You may not use this work for commercial purposes You may not alter transform nor build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents Chapter 16 Primers Contents 16 1 Primer design an introduction 0 0 ee eee et ee 249 16 1 1 Generalconcept oa a 2 eee ee ee a 249 16 1 2 Scoring primers e e sa eso are Re REESE REE 251 16 2 Setting parameters for primers and probes 8 00888 see eee 251 16 2 1 Primer Parameters kaw ack we ae we a eee a be ew oe ew eo 252 16 3 Graphical display of primer information 0 0088 2 ee eee 254 16 3 1 Compact information mode lt 42 5 ed Se bbw ee eK GS a 254 16 3 2 Detailed information mode 0 00 wee een nnn ene 254 16 4 Output from primer design 0 0 0 ee ee ee nnnnnnnne 255 164 1 Saving DNIMEIS s ss prasadi isst ho
208. V model Yang 1994a models All models are time reversible The JC and K80 models assume equal base frequencies and the HKY and GTR models allow the frequencies of the four bases to differ they will be estimated by the observed frequencies of the bases in the alignment In the JC model all substitutions are assumed to occur at equal rates in the K8O and HKY models transition and transversion rates are allowed to differ The GIR model is the general time reversible model and allows all substitutions to occur at different rates In case of the K8O and HKY models the user may set a transtion transversion ratio value which will be used as starting value or fixed depending on the level of estimation chosen by the user See below For the substitution rate matrices describing the substitution models we use the parametrization of Yang Yang 1994al e Rate variation in CLC DNA Workbench substitution rates may be allowed to differ among the individual nucleotide sites in the alignment by selecting the include rate variation box When selected the discrete gamma model of Yang Yang 1994b is used to model rate variation among sites The number of categories used in the dicretization of the gamma distribution as well as the gamma distribution parameter may be adjusted by the user as the gamma distribution is restricted to have mean 1 there is only one parameter in the distribution e Estimation estimation is done according to the maximum likelihood p
209. a M13mpB pUCS Tue Jun 30 smoensted Mismp pUce 7229 Linear pose Mismpo pUco Tue Jun 30 smoensted Mismpolpuca 7599 Linear Show column Por p lio Tue Jun 30 smoensted Cloning vector 3941 Linear Type me pala Tue Jun 30 smoensted Cloning vector 4245 Circular En pasa p Ma4 Tue Jun 30 smoensted Cloning vector 6000 Linear oe p Tiss Tue Jun 30 smoensted p T153 cloning 3658 Circular Modified ae p THi Tue Jun 30 smoensted Expression vec 3774 Linear Modified by pasa p THIO Tue Jun 30 smoensted Cloning vector 3771 Circular NO p THII Tue Jun 30 smoensted Cloning vector a7 72 Linear Description ae p TH Tue Jun 30 smoensted Cloning vector 3 53 Linear Length poa p THS Tue Jun 30 smoensted Cloning vector 3763 Circular me pELCAT Tue Jun 30 smoensted Plasmid pELCA 4496 Linear _ Latin Name me pELTATS Tue Jun 30 smoensted Plasmid pELCA 4344 Linear C Taxonomy mae pELCATS Tue Jun 30 smoensted Cloning vector 4404 Linear HO pELCATE Tue Jun 30 smoensted Cloning vector 4256 Linear Common Name HO pBR 322 Tue Jun 30 smoensted Cloning vector 4361 Circular Linear eo pBR 325 Tue Jun 30 smoensted pBR325 cloning 5996 Circular Select All Move to Recycle Bin Deselect All Figure 3 6 Viewing the elements in a folder Sorting the elements in a view does not affect the ordering of the elements in the Navigation Area Note The view only displays one
210. able for all views that can be zoomed in and out In figure 7 8 is a view of a circular sequence which is zoomed in so that you can only see a part of it AY738515 A HBD HBB lt JE Figure 7 8 A circular sequence as it looks on the screen When selecting Export visible area the exported file will only contain the part of the sequence that is visible in the view The result from exporting the view from figure 7 8 and choosing Export visible area can be seen in figure 7 9 Figure 7 9 The exported graphics file when selecting Export visible area On the other hand if you select Export whole view you will get a result that looks like figure 7 10 This means that the graphics file will also include the part of the sequence which is not visible when you have zoomed in Click Next when you have chosen which part of the view to export 7 3 2 Save location and file formats In this step you can choose name and save location for the graphics file see figure 7 11 CLC DNA Workbench supports the following file formats for graphics export CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS AY738615 180 bp 126 Figure 7 10 The exported graphics file when selecting Export whole view The whole sequence is shown even though the view is zoomed in on a part of the sequence q Export Graphics a m 1 Output options m pe ME 2 Save in file Lookin EE Desktop BE BE cio Ds Recent Items Desktop Documents
211. act our Support function E mail support clcbio com 1 2 Download and installation The CLC DNA Workbench is developed for Windows Mac OS X and Linux The software for either platform can be downloaded from http www clcbio com download 1 2 1 Program download The program is available for download on http www clcbio com download Before you download the program you are asked to fill in the Download dialog In the dialog you must choose e Which operating system you use e Whether you would like to receive information about future releases Depending on your operating system and your Internet browser you are taken through some download options When the download of the installer an application which facilitates the installation of the program is complete follow the platform specific instructions below to complete the installation procedure t 1 2 2 Installation on Microsoft Windows Starting the installation process is done in one of the following ways t You must be connected to the Internet throughout the installation process CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 13 If you have downloaded an installer Locate the downloaded installer and double click the icon The default location for downloaded files is your desktop If you are installing from a CD Insert the CD into your CD ROM drive Choose the Install CLC DNA Workbench from the menu displayed Installing the program is done in the following steps
212. aded and opened e Open at NCBI Opens the corresponding sequence s at GenBank at NCBI Here is stored additional information regarding the selected sequence s The default Internet browser is used for this purpose e Open structure If the hit sequence contain structure information the sequence is opened in a text view or a 3D view 3D view in CLC Protein Workbench and CLC Main Workbench You can do a text based search in the information in the BLAST table by using the filter at the upper right part of the view In this way you can search for e g species or other information which is typically included in the Description field The table is integrated with the graphical view described in section 12 2 3 so that selecting a hit in the table will make a selection on the corresponding sequence in the graphical view 12 3 Local BLAST databases BLAST databases on your local system can be made available for searches via your CLC DNA Workbench section 12 3 1 To make adding databases even easier you can download pre formatted BLAST databases from the NCBI from within your CLC DNA Workbench section 12 3 2 You can also easily create your own local blast databases from sequences within your CLC DNA Workbench section 12 3 3 CHAPTER 12 BLAST SEARCH 186 12 3 1 Make pre formatted BLAST databases available To use databases that have been downloaded or created outside the Workbench you can either e Put the database files in one of
213. al parsing is also available The default layout of the NCBI BLAST result is a graphical representation of the hits found a table of sequence identifiers of the hits together with scoring information and alignments of the query sequence and the hits The graphical output Shown in figure 12 19 gives a quick overview of the query Sequence and the resulting hit sequences The hits are colored according to the obtained alignment scores The table view shown in figure 12 20 provides more detailed information on each hit and furthermore acts as a hyperlink to the corresponding sequence in GenBank In the alignment view one can manually inspect the individual alignments generated by the BLAST algorithm This is particularly useful for detailed inspection of the sequence hit found sbjct and the corresponding alignment In the alignment view all Scores are described for each alignment CHAPTER 12 BLAST SEARCH 195 Color key for alignment scores lt 40 40 50 50 80 30 200 200 Query EE EE EE EE EE EE FO 140 210 200 350 g ee a CU OU a ca eT TE E ooo ooo To jo a E Sau Sse Es Sy ee O m E mu i as a Ee a RR OO SSS a EEE RR OO a ERES o ee E E a e EEE EEE EEE EEE aes E EE eh a TE a EEE A TE EE TT E s TE Figure 12 19 BLAST graphical view A si
214. alignments CLC DNA Workbench can join several alignments into one This feature can for example be used to construct Supergenes for phylogenetic inference by joining alignments of several disjoint genes into one spliced alignment Note that when alignments are joined all their annotations are carried over to the new spliced alignment Alignments can be joined by CHAPTER 19 SEQUENCE ALIGNMENT 360 select alignments to join Toolbox in the Menu Bar Alignments and Trees E Join Alignments or select alignments to join right click either selected alignment Toolbox Align ments and Trees 2 Join Alignments Ez This opens the dialog shown in figure 19 10 E q Join Alignments 1 Select alignments of Select align ents OF Sa pe type same Projects Selected Elements 2 CLC Data PE alignment 2 Example Data Cloning 55 Primers 4 7 Protein analyses F Protein orthologs iai i L alignment 1 4 RNA secondary str Sequencing data m ES EJ Qr lt enter search term gt A Figure 19 10 Selecting two alignments to be joined If you have selected some alignments before choosing the Toolbox action they are now listed in the Selected Elements window of the dialog Use the arrows to add or remove alignments from the selected elements Click Next opens the dialog shown in figure 19 11 o q Join Alignments 1 Select alignments
215. an also be saved by dragging it into the Navigation Area It is possible to select more sequences and drag all of them into the Navigation Area at the same time CHAPTER 11 ONLINE DATABASE SEARCH 170 Download GenBank search results using right click menu You may also select one or more sequences from the list and download using the right click menu see figure 11 2 Choosing Download and Save lets you select a folder where the sequences are saved when they are downloaded Choosing Download and Open opens a new view for each of the selected sequences Definition Ra File Edit View Toolbox Show T F F F F Download and Open lc HE Download and Save Open at NCBI KI Figure 11 2 By right clicking a search result it is possible to choose how to handle the relevant sequence Copy paste from GenBank search results When using copy paste to bring the search results into the Navigation Area the actual files are downloaded from GenBank To copy paste files into the Navigation Area select one or more of the search results Ctrl C 36 C on Mac select a folder in the Navigation Area Ctrl V Note Search results are downloaded before they are saved Downloading and saving several files may take some time However since the process runs in the background displayed in the Status bar it is possible to continue other tasks in the program Like the search process the download process can be stopped Thi
216. an select on or more entry clones see how to create an entry clone in section 18 2 2 If you wish to perform separate LR reactions with multiple entry clones you should run the Create Expression Clone in batch mode see section 9 1 When you have selected your entry clone s click Next This will display the dialog shown in figure 18 25 Create Expression Clones LR q Select Entry vectors j YO Seer 2 Select Destination vector Destination vector x pDESTI4 Tres Figure 18 25 Selecting one or more destination vectors Clicking the Browse uy button opens a dialog where you can select a destination vector You can download donor vectors from Invitrogen s web site http tools invitrogen com downloads Gateway S20vectors ma4 and import into the CLC DNA Workbench Note that the Workbench looks for the specific sequences of the attR sites in the sequences that you select in this dialog See how to change the definition of sites in appendix F Note that the CLC DNA Workbench only checks that valid attR sites are found it does not check that they correspond to the attL sites of the selected fragments at this step If the right combination of attL and attR sites is not found no entry clones will be produced When performing multi site gateway cloning the CLC DNA Workbench will insert the fragments contained in entry clones by matching the sites that are compatible If the sites have been defined correctly an express
217. and Probes tag Cloning and Restriction Sites BLAST Search 8h Database Search EE lE Processes Toolbox ee Idle 1 element s are selected Figure 3 18 An empty Workspace Workspace E in the Toolbar Select the Workspace to activate or Workspace in the Menu Bar Select Workspace E choose which Workspace to activate OK The name of the selected Workspace is shown after CLC DNA Workbench at the top left corner of the main window in figure 3 18 it says default 3 5 3 Delete Workspace Deleting a Workspace can be done in the following way Workspace in the Menu Bar Delete Workspace choose which Workspace to delete OK Note Be careful to select the right Workspace when deleting The delete action cannot be undone However no data is lost because a workspace is only a representation of data It is not possible to delete the default workspace 3 6 List of shortcuts The keyboard shortcuts in CLC DNA Workbench are listed below CHAPTER 3 USER INTERFACE Action Adjust selection Change between tabs Close Close all views Copy Cut Delete Exit Export Export graphics Find Next Conflict Find Previous Conflict Help Import Maximize restore size of View Move gaps in alignment Navigate sequence views New Folder New Sequence View Paste Print Redo Rename Save Search local data Search within a sequence Search NCBI Search UniProt Select All
218. and exported in the same way as bioinformatics files see sec tion 7 1 1 Bioinformatics files not recognized by CLC DNA Workbench are also treated as external files 1 3 Export graphics to files CLC DNA Workbench supports export of graphics into a number of formats This way the visible output of your work can easily be saved and used in presentations reports etc The Export Graphics function EI is found in the Toolbar CLC DNA Workbench uses a WYSIWYG principle for graphics export What You See Is What You Get This means that you should use the options in the Side Panel to change how your data e g a sequence looks in the program When you export it the graphics file will look exactly the same way It is not possible to export graphics of elements directly from the Navigation Area They must first be opened in a view in order to be exported To export graphics of the contents of a view select tab of View Graphics This will display the dialog shown in figure 7 7 g Export Graphics Es reo 1 Output options RBS saias sales Export options O Export visible area Export whole area Figure 7 Selecting to export whole view or to export only the visible area Pnet Finish X Cancel CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 125 1 3 1 Which part of the view to export In this dialog you can choose to e Export visible area or e Export whole view These options are avail
219. and stability and is of major importance to the study of proteins Knudsen and Miyamoto 2001 Knowledge of the underlying phylogeny is however paramount to comparative methods of inference as the phylogeny describes the underlying correlation from shared history that exists between data from different species In molecular epidemiology of infectious diseases phylogenetic inference is also an important tool The very fast substitution rate of microorganisms especially the RNA viruses means that these show substantial genetic divergence over the time scale of months and years Therefore the phylogenetic relationship between the pathogens from individuals in an epidemic can be resolved and contribute valuable epidemiological information about transmission chains and epidemiologically significant events Leitner and Albert 1999 Forsberg et al 2001 20 2 3 Reconstructing phylogenies from molecular data Traditionally phylogenies have been constructed from morphological data but following the growth of genetic information it has become common practice to construct phylogenies based on molecular data known as molecular phylogeny The data is most commonly represented in the form of DNA or protein sequences but can also be in the form of e g restriction fragment length polymorphism RFLP Methods for constructing molecular phylogenies can be distance based or character based Distance based methods Two common algorithms both based on
220. ar sequences The most simple example of a dot plot is obtained by plotting two homologous sequences of interest If very similar or identical Sequences are plotted against each other a diagonal line will occur The dot plot in figure 13 7 shows two related sequences of the Influenza A virus nucleoproteins infecting ducks and chickens Accession numbers from the two sequences are DQ232610 and DQ023146 Both sequences can be retrieved directly from http www ncbi nim nih gov gquery gquery fcgi Figure 13 7 Dot plot of DQ232610 vs DQ023146 Influenza A virus nucleoproteins showing and overall similarity Repeated regions Sequence repeats can also be identified using dot plots A repeat region will typically show up as lines parallel to the diagonal line If the dot plot shows more than one diagonal in the same region of a sequence the regions depending to the other sequence are repeated In figure 13 9 you can see a sequence with repeats CHAPTER 13 GENERAL SEQUENCE ANALYSES 206 Direct repeats gt PDD TO ACDEFGHIACDEFGHIACDEFGHIACDEFGHI Inverted repeats dt l ACDEFGHIIHGFEDCAACDEFGHIIHGFEDCA Figure 13 8 Direct and inverted repeats shown on an amino acid sequence generated for demonstration purposes Figure 13 9 The dot plot of a sequence showing repeated elements See also figure 13 8 Frame shifts Frame shifts in a nucleotide sequence can occur due to insertions deletions or mutations Such frame shif
221. arameters Max percentage point difference in G C content Max difference in melting temperatures within a primer pair Max hydrogen bonds between pairs Max hydrogen bonds between pair ends Maximum length of amplicon Xe re Figure 16 13 Calculation dialog shown when designing alignment based PCR primers 16 9 3 Alignment based TaqMan probe design CLC DNA Workbench allows the user to design solutions for TaqMan quantitative PCR which consist of four oligos a general primer pair which will amplify all sequences in the alignment a specific TaqMan probe which will match the group of included sequences but not match the excluded sequences and a specific TaqMan probe which will match the group of excluded sequences but not match the included sequences As above the selection boxes are used to indicate the status of a sequence if the box is checked the sequence belongs to the included sequences if not it belongs to the excluded sequences We use the terms included and excluded here to be consistent with the section above although a probe solution is presented for both groups In TaqMan mode primers are not allowed degeneracy or mismatches to any template sequence in the alignment variation is only allowed required in the TaqMan probes Pushing the Calculate button will cause the dialog shown in figure 16 14 to appear The top part of this dialog is identical to the Standard PCR dialog for designing primer pairs described above The cen
222. ardless of which option you chose above you will now see the dialog shown in figure 1 18 License Wizard xa d CLC DNA Workbench License Agreement Please read and accept the license agreement below to begin using you license END USER LICENSE AGREEMENT FOR CLC BIO SOFTWARE CLC Genomics Workbench 1 0 E 1 Recitals 1 1 This End User License Agreement EULA is a legal agreement between you either an individual person or a single legal entity who will be referred to in this EULA as You and CLC bio A S CVR no 28 30 50 87 for the software products that accompanies this EULA including any associated media printed materials and electronic documentation the Software Product I accept these terms If you experience any problems please contact The CLC Support Team Figure 1 18 Read the license agreement carefully Please read the License agreement carefully before clicking accept these terms and Finish 1 4 5 Configure license server connection If your organization has installed a license server you can use a floating license The license server has a set of licenses that can be used on all computers on the network If the server has e g 10 licenses it means that maximum 10 computers can use a license simultaneously When you have selected this option and click Next you will see the dialog shown in figure 1 19 This dialog lets you specify how to connect to the license server e Connect
223. arget vector from HindIII cut at 978979 to XhoI cut at 10521053 74bp TCGAGIC CAG Fragment ATP8a1 mRNA ATP8a1 fwd ATP8a1 rev 3 472bp linear Fragment from start of sequence to HindIII cut at 6 7 6bp AGCTTAT GAC Fragment from HindIII cut at 6 7 to XhoI cut at 346273463 3 456bp Fragment from XhoI cut at 346273463 to end of sequence 10bp ae V Target vector defined Fragments to insert 1 BED ATA CTGAGCT v Sequence details 7 Show gt Sequence layout gt Annotation layout gt Annotation types v Restriction sites 7 Show Labels Stacked w Sorting Aa GF GI b V Non cutters v V Single cutters HB m xa CE v V Double cutters E m Bami O DME O BB a Eora BB M Hinan 2 Bro v 7 Multiple cutters Hl sma 3 E m sar 3 O BB Vicon 3 BB vest Deselect All Figure 2 31 Press and hold the Ctrl key while you click first the Hindlll site and next the Xhol site A g Adapt overhangs Fragment ATP8a1 mRNA ATP8 b p AGCEAI a GAG 3 446bp CTGAGCT 1 Replace input sequences with result Figure 2 32 Showing the insertion point of the vector side and the fragment in the middle The fragment can be reverse complemented by clicking the Reverse complement fragment 489 but this is not necessary in this case Click Finish and your new construct will be opened When saving your work there are
224. arting with the term E g searching for brca will find both brcai and brca2 e Search related words If you don t know the exact spelling of a word you can append a question mark to the search term E g brac1 will find sequences with a brcal gene CHAPTER 4 SEARCHING YOUR DATA 101 e Include both terms AND If you write two search terms you can define if your results have to match both search terms by combining them with AND E g search for brcat1 AND human will find sequences where both terms are present e Include either term OR If you write two search terms you can define that your results have to match either of the search terms by combining them with OR E g search for brcat OR brca2 will find sequences where either of the terms is present e Name search name Search only the name of element e Organism search organism For sequences you can specify the organism to search for This will look in the Latin name field which is seen in the Sequence Info view see section 10 4 e Length search length START TO END Search for sequences of a specific length E g search for sequences between 1000 and 2000 residues length 1000 TO 2000 If you do not use this special syntax you will automatically search for both name description organism etc and search terms will be combined as if you had put OR between them 4 2 3 Quick search history You can access the 10 most recent searches by clicking th
225. arting at the position that you want to be the new starting point right click the selection Move Starting Point to Selection Start Note This can only be done for sequence that have been marked as circular 10 3 Working with annotations Annotations provide information about specific regions of a sequence A typical example is the annotation of a gene on a genomic DNA sequence Annotations derive from different sources e Sequences downloaded from databases like GenBank are annotated e In some of the data formats that can be imported into CLC DNA Workbench sequences can have annotations GenBank EMBL and Swiss Prot format e The result of a number of analyses in CLC DNA Workbench are annotations on the sequence e g finding open reading frames and restriction map analysis e You can manually add annotations to a sequence described in the section 10 3 2 Note Annotations are included if you export the sequence in GenBank Swiss Prot EMBL or CLC format When exporting in other formats annotations are not preserved in the exported file 10 3 1 Viewing annotations Annotations can be viewed in a number of different ways e As arrows or boxes in the sequence views Linear and circular view of sequences et 0 Alignments HE CHAPTER 10 VIEWING AND EDITING SEQUENCES 153 Graphical view of sequence lists BLAST views only the query sequence at the top can have annotations 8 Cloning editor i
226. arts before the first sequenced residue and continues up to and including residue 888 1 gt 888 The region starts at the first sequenced residue and continues beyond residue 888 102 110 Indicates that the exact location is unknown but that it is one of the residues between residues 102 and 110 inclusive 123 124 Points to a site between residues 123 and 124 join 12 78 134 202 Regions 12 to 78 and 134 to 202 should be joined to form one contiguous sequence complement 34 126 Start at the residue complementary to 126 and finish at the residue complementary to residue 34 the region is on the strand complementary to the presented strand complement join 2691 4571 4918 5163 Joins regions 2691 to 45 71 and 4918 to 5163 then complements the joined segments the region is on the strand complementary to the presented strand join complement 4918 5163 complement 2691 4571 Complements regions 4918 to 5163 and 2691 to 4571 then joins the complemented segments the region is on the strand complementary to the presented strand e Annotations In this field you can add more information about the annotation like comments and links Click the Add qualifier key button to enter information Select a qualifier which describes the kind of information you wish to add If an appropriate qualifier is not present in the list you can type your own qualifier The pre defined qualifiers are derived from
227. as shown in figure 17 21 Note This is only possible when you can see the residues on the reads This means that you need to have zoomed in to 100 or more and chosen Compactness levels Not compact Low CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 297 Fw d2 PAT CCACGTCGGTACAGAACAGGCTGC Trace data or Packed Otherwise the handles for dragging are not available this is done in order to make the visual overview more simple If reads have been reversed this is indicated by red Otherwise the residues are colored green The colors can be changed in the Side Panel as described in section 17 7 1 If you find out that the reversed reads should have been the forward reads and vice versa you can reverse complement the whole contig imagine flipping the whole contig right click in the empty white area of the contig Reverse Complement 17 7 1 View settings in the Side Panel Apart from this the view resembles that of alignments see section 19 2 but has some extra preferences in the Side Panel e Read layout A new preference group located at the top of the Side Panel CompactnessThe compactness is an overall setting that lets you control the level of detail to be displayed on the sequencing reads Please note that this setting affects many of the other settings in the Side Panel and the general behavior of the view as well For example if the compactness is set to Compact you will not be able to see quality scores o
228. ase pairs required For a match Number of consecutive base pairs required in 3 end Ket 2 Figure 16 7 Calculation dialog for PCR primers when only a single primer region has been defined The top part of this dialog shows the parameter settings chosen in the Primer parameters preference group which will be used by the design algorithm The lower part contains a menu where the user can choose to include mispriming as a criteria in the design process If this option is selected the algorithm will search for competing binding sites of the primer within the sequence The adjustable parameters for the search are CHAPTER 16 PRIMERS 258 e Exact match Choose only to consider exact matches of the primer i e all positions must base pair with the template for mispriming to occur e Minimum number of base pairs required for a match How many nucleotides of the primer that must base pair to the sequence in order to cause mispriming e Number of consecutive base pairs required in 3 end How many consecutive 3 end base pairs in the primer that MUST be present for mispriming to occur This option is included since 3 terminal base pairs are known to be essential for priming to occur Note Including a search for potential mispriming sites will prolong the search time substantially if long sequences are used as template and if the minimum number of base pairs required for a match is low If the region to be amplified is part of a ve
229. ated as the fragment first click one of the cut sites you wish to use Then press and hold the Ctrl key on Mac while you click the second cut site You can also right click the cut sites and use the Select This Site to select a site When this is done the panel below will update to reflect the selections see figure 18 4 In this example you can see that there are now three options listed in the panel below the view CHAPTER 18 CLONING AND CUTTING 311 d ATP8a1 rev Labels Sorting A TE I ERR CGATAAAG GAGTATCG am equence details GCTATTTC CTcATAGC b 7 Non cutters v V Single cutters O pcDNA4_TO 5 078bp circular vector 7 Hindin 2 Target vector from XhoI cut at 105271053 to HindIII cut at 978979 5 004bp lt TTA Target vector from HindIII cut at 978 979 to XhoI cut at 105271053 74bp Fragment ATP8a1 mRNA ATP8a1 fwd ATP8a1 rev 3 472bp linear Fragment from start of sequence to HindIII cut at 6 7 6bp AGCTTAT GAC E V Smal 3 Fragment from HindIII cut at 6 7 to XhoI cut at 346273463 3 456bp ATA CIGA sali 3 Fragment from XhoI cut at 346273463 to end of sequence 10bp E 7 Bali 3 5 EAEE Figure 18 4 Hindlll and Xhol cut sites selected to cut out fragment This is because there are now three options for selecting the fragment that should be used for cloning The fragment selected per default is the one that is in bet
230. ation of local BLAST database PubMed lookup Web based lookup of sequence data Search for structures at NCBI Main Main E Main Main E 3 7 Genomics E Genomics E Genomics Genomics Li APPENDIX A COMPARISON OF WORKBENCHES AND THE VIEWER General sequence analyses Linear sequence view Circular sequence view Text based sequence view Editing sequences Adding and editing sequence annotations Advanced annotation table Join multiple sequences into one Sequence statistics Shuffle sequence Local complexity region analyses Advanced protein statistics Comprehensive protein characteristics repor Nucleotide analyses Basic gene finding Reverse complement without loss of annota tion Restriction site analysis Advanced interactive restriction site analysis Translation of sequences from DNA to pro teins Interactive translations of Sequences anc alignments G C content analyses and graphs Protein analyses 3D molecule view Hydrophobicity analyses Antigenicity analysis Protein charge analysis Reverse translation from protein to DNA Proteolytic cleavage detection Prediction of signal peptides SignalP Transmembrane helix prediction TMHMM Secondary protein structure prediction PFAM domain search Viewer E LI Viewer E E Viewer Protein E Protein o EI Protein T DNA RNA Main DNA RNA Main DNA RNA Main 378 Genomics E Genomics E E Genomics E APPENDIX A C
231. ations This will add an annotation to the sequence when a motif is found an example is shown in figure 13 27 e Create table This will create an overview table of all the motifs found for all the input sequences TTAGCTGTGGCTGCTATTASAGAGATAATAGAAGATATTAAACGA Figure 13 27 Sequence view displaying the pattern found The search string was tataaa 13 7 3 Java regular expressions A regular expressions is a string that describes or matches a set of strings according to certain syntax rules They are usually used to give a concise description of a set without CHAPTER 13 GENERAL SEQUENCE ANALYSES 226 having to list all elements The simplest form of a regular expression is a literal string The syntax used for the regular expressions is the Java regular expression syntax see http java sun com docs books tutorial essential regex index html Below is listed some of the most important syntax rules which are also shown in the help pop up when you press Shift F1 A Z will match the characters A through Z Range You can also put single characters between the brackets The expression AGT matches the characters A G or T A D M P will match the characters A through D and M through P Union You can also put single characters between the brackets The expression AG M P matches the characters A G and M through P A M amp amp H P will match the characters between A and M lying between H and P Intersection You
232. ats for exporting graphics All data displayed in a graphical format can be exported using these formats Data represented in lists and tables can only be exported in pdf format see section 7 3 for further details Format Suffix Portable Network Graphics png JPEG jpg Tagged Image File tif PostScript ps Encapsulated PostScript eps Portable Document Format pdf Scalable Vector Graphics SVE Type bitmap bitmap bitmap vector graphics vector graphics vector graphics vector graphics Appendix H IUPAC codes for amino acids Single letter codes based on International Union of Pure and Applied Chemistry The information is gathered from http www ebi ac uk 2can tutorials aa html 396 APPENDIX H One letter abbreviation T Tomo ODO VU Z D gt x NU lt lt SA WCCO VU TZ Ss IUPAC CODES FOR AMINO ACIDS Three letter Description abbreviation Ala Alanine Arg Arginine Asn Asparagine Asp Aspartic acid Cys Cysteine Gin Glutamine Glu Glutamic acid Gly Glycine His Histidine Xle Leucine or Isoleucineucine Leu Leucine ILe Isoleucine Lys Lysine Met Methionine Phe Phenylalanine Pro Proline Pyl Pyrrolysine Sec Selenocysteine Ser Serine Thr Threonine Trp Tryptophan Tyr Tyrosine Val Valine ASX Aspartic acid or Asparagine Asparagine Glx Glutamic acid or Glutamine Glutamine Xaa Any amino acid 397 Appendix IUPAC codes for nucleotides Single letter codes based o
233. automatically translate the nucleotide sequence selected as database As Target select NC 000011 that you downloaded If you are used to BLAST you will know that you usually have to create a BLAST database before BLASTing but the Workbench does this on the fly when you just select one or more sequences Click Next leave the parameters at their default click Next again and then Finish Inspect BLAST result When the BLAST result appears make a split view so that both the table and graphical view is visible see figure 2 46 This is done by pressing Ctrl on Mac while clicking the table view 4 at the bottom of the view In the table start out by showing two additional columns Positive and Query start These should simply be checked in the Side Panel Now sort the BLAST table view by clicking the column header Positive Then press and hold the Ctrl button 46 on Mac and click the header Query start Now you have sorted the table first on Positive hits and then the start position of the query sequence Now you see that you actually have three regions with a 100 positive hit but at different locations on the chromosome sequence see figure 2 46 Why did we find on the protein level three identical regions between our query protein sequence and nucleotide database The beta globin gene is known to have three exons and this is exactly what we find in the BLAST search Each translated exon will hit the corresp
234. b delimited text Vector NTI archives Vector NTI Database Zip export Zip import Suffix fsa fasta abt abi clc cmo CSV CSV str strider bsml embl geg bk gb gp gck pro seq NXS NeEXUS phd pir any SCf SCf sdn SWp txt 393 Import Export Description X X X XxX X gt lt X K K X XK X X X X X X X X X X ma4 pa4 0a4 X Zip Zip gzip tar X X X Simple format name amp description Including chromatograms Including chromatograms Rich format including all information Annotations in csv format One sequence per line name de scription optional sequence Only nucleotide sequence Rich information incl annotations Rich information incl annotations Including chromatograms Simple format name amp description Only sequence no name Including chromatograms Including chromatograms Rich information only proteins Annotations in tab delimited text for mat Archives in rich format Special import full database Selected files in CLC format Contained files folder structure APPENDIX G FORMATS FOR IMPORT AND EXPORT 394 G 1 2 Contig formats File type Suffix Import Export Description ACE ace X X No chromatogram or quality score CLC cle X X Rich format including all information Zip export Zip X Selected files in CLC format Zip import zip gzip tar X Contained files folder structure G 1 3 Alignment formats File type Suffix
235. be inserted at the 5 end of the primer as shown in figure figure 2 27 ATP al fwd CGATAMAGCTTATGCCGACCATGCGGAGGA Figure 2 27 Adding restriction sites to a primer Perform the same process for the ATP8al rev primer this time using Xhol instead This time you should also add a few bases at the 5 end as was done in figure18 14 when inserting the Hindlll site Note The ATP8al rev primer is designed to match the negative strand so the restriction site Should be added at the 5 end of this Sequence as well Insert Restriction Site before Selection Save the two primers and close the views and you are ready for next step 2 6 3 Simulate PCR to create the fragment Now we want to extract the PCR product from the template ATP8a1 mRNA sequence using the two primers with restriction sites Toolbox Primers and Probes 1 Find Binding Sites and Create Fragments 72 y button Select the ATP8al mRNA sequence and click Next In this dialog use the Browse to select the two primer sequences Click Next and adjust the output options as shown in figure 2 28 Click Finish and you will now see the fragment table displaying the PCR product In the Side Panel you can choose to show information about melting temperature for the primers Right click the fragment and select Open Fragment as shown in figure 2 29 This will create a new sequence representing the PCR product Save the sequence in the Cloning folder an
236. bench allows you to search the NCBI GenBank database directly from the program giving you the opportunity to both open view analyze and save the search results without using any other applications To conduct a search in NCBI GenBank from CLC DNA Workbench you must be connected to the Internet This tutorial shows how to find a complete human hemoglobin DNA sequence in a situation where you do not know the accession number of the sequence To start the search Search Search for Sequences at NCBI g CHAPTER 2 TUTORIALS 44 This opens the search view We are searching for a DNA sequence hence Nucleotide Now we are going to adjust parameters for the search By clicking Add search parameters you activate an additional set of fields where you can enter search criteria Each search criterion consists of a drop down menu and a text field In the drop down menu you choose which part of the NCBI database to search and in the text field you enter what to search for Click Add search parameters until three search criteria are available choose Organism in the first drop down menu write human in the adjoining text field choose All Fields in the second drop down menu write hemoglobin in the adjoining text field choose All Fields in the third drop down menu write complete in the adjoining text field NCBI search Choose database Nucleotide O Protein All Fields v human All
237. bidopsis thaliana Saccharomyces cerevisiae Schizosaccharomyces pombe score Mus musculus Bos taurus Homo sapiens mo Mus musculus Bos taurus Homo sapiens Saccharomyces cerevisiae Schizosaccharomyces pombe Arabidopsis thaliana Arabidopsis thaliana Figure 20 6 Algorithm choices for phylogenetic inference The bottom shows a tree found by the neighbor joining algorithm while the top shows a tree found by the UPGMA algorithm The latter algorithm assumes that the evolution occurs at a constant rate in different lineages Neighbor Joining The neighbor joining algorithm Saitou and Nei 1987 on the other hand builds a tree where the evolutionary rates are free to differ in different lineages i e the tree does not have a particular root Some programs always draw trees with roots for practical reasons but for neighbor joining trees no particular biological hypothesis is postulated by the placement of the root The method works very much like UPGMA The main difference is that instead of using pairwise distance this method subtracts the distance to all other nodes from the pairwise distance This is done to take care of situations where the two closest nodes are not neighbors in the real tree The neighbor join algorithm is generally considered to be fairly good and is widely used Algorithms that improves its cubic time performance exist The improvement is only significant for quite large datasets Character based methods
238. blems were encountered while trying to locate a valid license Click on each error for a more detailed description License Server 192 168 1 200 port 6200 a No license available at the moment All licenses obtainable from the server are currently in use If the problem persists please contact your local license server administrator Additional licenses can be purchased by contacting the CLC bio sales team on sales clcbio com To import a new license or change your license server settings please click the License Assistant button If you experience any problems please contact The CLC Support Team Figure 1 20 No more licenses available on the server In this case please contact your organization s license server administrator To purchase additional licenses contact sales clcbio com You can also click the Limited Mode button see section 1 4 6 lf your connection to the license server is lost you will see a dialog as shown in figure 1 21 License Server Error 4 CLC Network Licensing Unable to locate a license server A license server could not be located on your network Ifthe problem persists please contact your local license server administrator Configure License Server Figure 1 21 Unable to contact license server In this case you need to make sure that you have access to the license server and that the server is running However there may be situations wher
239. c Genomic sequences from NCBI Reference Sequence Project est Database of GenBank EMBL DDBJ sequences from EST division est human Human subset of est 386 APPENDIX D BLAST DATABASES 387 e est mouse Mouse subset of est e est others Subset of est other than human or mouse e gss Genome Survey Sequence includes single pass genomic data exon trapped se quences and Alu PCR sequences e htgs Unfinished High Throughput Genomic Sequences phases O 1 and 2 Finished phase 3 HTG sequences are in nr e pat Nucleotides from the Patent division of GenBank e pdb Sequences derived from the 3 dimensional structure records from Protein Data Bank They are NOT the coding sequences for the corresponding proteins found in the same PDB record e month All new or revised GenBank EMBL DDBJ PDB sequences released in the last 30 days e alu Select Alu repeats from REPBASE suitable for masking Alu repeats from query sequences See Alu alert by Claverie and Makalowski Nature 3 1 752 1994 e dbsts Database of Sequence Tag Site entries from the STS division of GenBank EMBL DDBJ e chromosome Complete genomes and complete chromosomes from the NCBI Reference Sequence project It overlaps with refseq_genomic e wgs Assemblies of Whole Genome Shotgun sequences e env_nt Sequences from environmental samples such as uncultured bacterial samples isolated from soil or marine samples The largest single source is Sagarss
240. can also put single characters between the brackets The expression AM amp amp HGTDA matches the characters A through M which is H G T Dora A M will match any character except those between A and M Excluding You can also put single characters between the brackets The expression AG matches any character except A and G A Z amp amp M P will match any character A through Z except those between M and P Subtraction You can also put single characters between the brackets The expression A P amp amp CG matches any character between A and P except C and G The symbol matches any character X n will match a repetition of an element indicated by following that element with a numerical value or a numerical range between the curly brackets For example ACG 2 matches the string ACGG and ACG 2 matches ACGACG X n m will match a certain number of repetitions of an element indicated by following that element with two numerical values between the curly brackets The first number is a lower limit on the number of repetitions and the second number is an upper limit on the number of repetitions For example ACT 1 3 matches ACT ACTT and ACTTT X n represents a repetition of an element at least n times For example AC 2 matches all strings ACAC ACACAC ACACACAC The symbol restricts the search to the beginning of your sequence For example if you search through a sequence with the regular expression AC
241. ccept the License agreement and click Next CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 14 e Choose where you would like to install the application and click Next e Choose if CLC DNA Workbench should be used to open CLC files and click Next e Choose whether you would like to create desktop icon for launching CLC DNA Workbench and click Next e Choose if you would like to associate clc files to CLC DNA Workbench If you check this option double clicking a file with a clc extension will open the CLC DNA Workbench e Wait for the installation process to complete choose whether you would like to launch CLC DNA Workbench right away and click Finish When the installation is complete the program can be launched from your Applications folder or from the desktop shortcut you chose to create If you like you can drag the application icon to the dock for easy access 1 2 4 Installation on Linux with an installer Navigate to the directory containing the installer and execute it This can be done by running a command similar to sh CLCDNAWorkbench 6 JRE sh If you are installing from a CD the installers are located in the linux directory Installing the program is done in the following steps e On the welcome screen click Next e Read and accept the License agreement and click Next e Choose where you would like to install the application and click Next For a system wide installation you can choose for example opt or usr l
242. ce Contents Sb Wavigauon Alea i sc tee eee een tae ee we hee TS SS Rae E 11 Suit DGCI sarro EE E A E e we ee 18 3 1 2 Create new folders 0 2 eee ee ee ee ee ee 80 3 1 3 Sorting folders iw RES aS Se a ee ee Boke we ew S amp S 80 3 1 4 Multiselecting elements 0 0 0 0 ee a eee 80 3 1 5 Moving and copying elements nononono a 80 3 1 6 Changeelementnames 00 e a a nee ewes 82 3 1 7 Deleteelements 0 0 0 ee eee ee ee ee ee rar 83 3 1 8 Show folder elements in a table 2 00 2 ee eee 83 3 2 View Area 2 ee ee annn nnn nnen 84 Qu DNC aicese dr aa aaa Bean heb AS E E oD Ree a 85 3 2 2 Show element in another view 0 00 wee ee ee ewe 86 3 2 3 CloseviewS 2 e a ee ee ee ee 86 3 2 4 Save changes in a view 2 eoa eee ee 87 3 2 0 MO MEDO arseron aeaaaee we A 87 3 2 6 Arrange views in View Area oaaao a e a 88 Gat Sde Panel s dk ow ee eh ee ee ORE Ea eh ae ed E 90 3 3 Zoom and selection in View Area 0 0 ee eee ee nnnann 91 Seas OU ce ee ia ae eee eae ew Cae eae oe ae na 91 Soe POU sa bene ede Ge OSS Sw Oe we we DGS E SU E O 91 Seo PM Lisp ae een he Oe Boe ek DEM E OE 92 Pe COND anil ria ad ES E DS Le 92 e MMC es eee a ee tis ESA LS TILES E RE 92 33 0 Meto garras E aaa E E 92 3 3 7 Changing compactness 0 0 ce ee eee ee ee ee ws 92 3 4 Toolbox and Status Bar 2 2 ee 93 3 4 1 Processes 2 ce eee ee ee ee ee ee rara 93 Be TOON ce te
243. ce correction matrices substitution matrices take into account the likeliness of one amino acid changing to another e Window size A residue by residue comparison window size 1 would undoubtedly result in a very noisy background due to a lot of similarities between the two sequences of interest For DNA sequences the background noise will be even more dominant as a match between only four nucleotide is very likely to happen Moreover a residue by residue comparison window size 1 can be very time consuming and computationally demanding Increasing the window size will make the dot plot more smooth Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish CHAPTER 13 GENERAL SEQUENCE ANALYSES 203 a EB Create Dot Plot Es p 1 Select one or two set parameters sequences of same type 2 Set parameters Distance correction and window size Score model BLOSUM62 w Window size 9 Q Previous gt Next denis XX Cancel Figure 13 4 Setting the dot plot parameters 13 2 2 View dot plots A view of a dot plot can be seen in figure 13 5 You can select Zoom in 40 in the Toolbar and click the dot plot to zoom in to see the details of particular areas ATP8a1 vs 094296 1100 Ea fe 1000 900 800 700 600 ATP8al 500 400 300 200 5574 100 200 400 600 800 1000 1200 094296 Figure 13 5 A view is opened showi
244. cense Agreement Please read and accept the license agreement below to begin using you license END USER LICENSE AGREEMENT FOR CLC BIO SOFTWARE CLC Genomics Workbench 1 0 1 Recitals 1 1 This End User License Agreement EULA is a legal agreement between you either an individual person or a single legal entity who will be referred to in this EULA as You and CLC bio A S CVR no 28 30 50 87 for the software products that accompanies this EULA including any associated media printed materials and electronic documentation the Software Product I accept these terms If you experience any problems please contact The CLC Support Team Proxy Settings Previous Finish Quit Workbench Figure 1 6 Read the license agreement carefully Please read the License agreement carefully before clicking I accept these terms and Finish CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 19 1 4 2 Download a license When you purchase a license you will get a license ID from CLC bio Using this option you will get a license based on this ID When you have clicked Next you will see the dialog shown in 1 7 At the top enter the ID paste using Ctrl V or 3 V on Mac License Wizard Bal E CLC DNA Workbench Download a license Please copy pas bio r Lice a ID i ta nd choo sql U WO mpera ad you prio nse pda n pro des request the license sei will che de the License Order I vailable For do i ad
245. ch Import a license from a file Please click the button below and locate the file containing your license No file selected Choose License File If you experience any problems please contact The CLC Support Team Proxy Settings Previous Next Quit Workbench Figure 1 10 Importing the license downloaded from the web site Click the Choose License File button and browse to find the license file you saved before e g on your Desktop When you have selected the file click Next Accepting the license agreement Regardless of which option you chose above you will now see the dialog shown in figure 1 11 Please read the License agreement carefully before clicking accept these terms and Finish CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 21 License Wizard 88 d CLC DNA Workbench License Agreement Please read and accept the license agreement below to begin using you license END USER LICENSE AGREEMENT FOR CLC BIO SOFTWARE CLC Genomics Workbench 1 0 1 Recitals 1 1 This End User License Agreement EULA is a legal agreement between you either an individual person or a single legal entity who will be referred to in this EULA as You and CLC bio A S CVR no 28 30 50 87 for the software products that accompanies this EULA including any associated media printed matenals and electronic documentation the Software Product I accept these terms If you experience
246. character and R matches A G For proteins X matches any character and Z matches E Q Genome sequence often have large regions with unknown sequence These regions are very often padded with N s Ticking this checkbox will not display hits found in N regions and if a one residue in a motif matches to an N it will be treated as a mismatch The list of motifs shown in figure 13 22 is a pre defined list that is included with the CLC DNA Workbench You can define your own set of motifs to use instead In order to do this you first need to create a Motif list E This will bring up the dialog shown in figure 13 25 7 2 q Manage motifs s 1 Please choose motifs FELASA Select motif lists RE Example motifs Motif name Motif Description Type Motif name Motif Description Type N glycosyla N P ST N glycosyla Prosite SP6 G ITJATTTA SP6 promot Java Amidation site x G RK RK Amidation si Prosite Protein kina ST x RK Protein kina Prosite Bacterial his GSK F x 2 Bacterial his Prosite attB1 ACAAGTTT Gateway fo Simple attB2 A4CCCAGC Gateway re Simple T TAATACGA T promote Java Cy CGCAAATG CMY promot Simple T3 GCAATTAA T3 promote Simple pGEX 5 GGGCTGGC pGEX 5 primer Simple T7 terminator GCTAGTTA T7 terminat Simple His tag CAT CACH Standard hi Java C Kea Figure 13 25 Managing the motifs to be shown CHAPTER 1
247. choice of regions to clone However the cloning editor has a special layout with three distinct areas in addition to the Side Panel found in other sequence views as well e At the top there is a panel to switch between the sequences selected as input for the cloning You can also specify whether the sequence should be visualized as circular or as a fragment At the right hand side there is a button to the status of the sequence currently shown to vector e In the middle the selected sequence is shown This is the central area for defining how the cloning should be performed This is explained in details below e At the bottom there is a panel where the selection of fragments and target vector is performed see elaboration below CHAPTER 18 CLONING AND CUTTING 310 O Cloning exper Q Sequence 1 of 2 pcDNAS TO 5 078bp circular vector w Show as Circular vector pcDNAS TO Change to Current ene rm ee e A Sequence details J Show CMV forward prim er TATA box gt Sequence layout Bgill gt Annotation layout ea gt Annotation types Hindlll BamHI v Restriction sites EcoRI J Show Pst Labels Stacked w bla Ampicilli Sall 4000 pcDNA4_TO BGH pA ee g A LI 5 078bp BGH reverse primer E EcoRV gt 4 Non cutters Xhol Xbal 4 Single cutters 1 ori visao SV40 prom oter and ori Smal X Double cutters Smal ES J xhol 2 Sall sv40
248. chromatogram traces A green C blue G black and T red Foreground color Sets the color of the letter Background color Sets the background color of the residues Nucleotide info These preferences only apply to nucleotide sequences e Translation Displays a translation into protein just below the nucleotide sequence Depending on the zoom level the amino acids are displayed with three letters or one letter Frame Determines where to start the translation x ORF CDS If the sequence is annotated the translation will follow the CDS or ORF annotations If annotations overlap only one translation will be shown If only one annotation is visible the Workbench will attempt to use this annotation to mark the start and stop for the translation In cases where this is not possible the first annotation will be used i e the one closest to the 5 end of the sequence CHAPTER 10 VIEWING AND EDITING SEQUENCES 145 X Selection This option will only take effect when you make a selection on the sequence The translation will start from the first nucleotide selected Making a new selection will automatically display the corresponding translation Read more about selecting in section 10 1 3 1 to 1 Select one of the six reading frames All forward All reverse Shows either all forward or all reverse reading frames All Select all reading frames at once The translations will be displayed on top of each other x e
249. click the selection Realign selection This will open Step 2 in the Create alignment dialog allowing you to set the parameters for the realignment See section 19 1 It is possible for an alignment to become shorter or longer as a result of the realignment of a region This is because gaps may have to be inserted in or deleted from the sequences not selected for realignment This will only occur for entire columns of gaps in these sequences ensuring that their relative alignment is unchanged Realigning a selection is a very powerful tool for editing alignments in several situations e Removing changes If you change the alignment in a specific region by hand you may end up being unhappy with the result In this case you may of course undo your edits but another option is to select the region and realign it e Adjusting the number of gaps If you have a region in an alignment which has too many gaps in your opinion you can select the region and realign it By choosing a relatively high gap cost you will be able to reduce the number of gaps e Combine with fixpoints If you have an alignment where two residues are not aligned but you know that they should have been You can now set an alignment fixpoint on each of the two residues select the region and realign it using the fixpoints Now the two residues are aligned with each other and everything in the selected region around them is adjusted to accommodate this change 19 4 Join
250. cost Open 5 Extension 2 gt Command line options Figure 2 49 Settings for searching for primer binding sites Low complexity filter Expect value Standard BLAST blastp BLSUMO2 Remote homologues blastp 20000 PAM30 These settings are shown in figure 2 50 EB Local BLAST E 1 Select sequences of same Sem put parameters o type 2 Set program parameters 3 Set input parameters Choose parameters F Low Complexity Choose Filter Mask lower case Expect 20000 Word size 2 No of processors 2 Matrix PAM30 Gap cost Existence 9 Extension 1 w gt Command line options Figure 2 50 Settings for searching for remote homologues 2 9 4 Further reading A valuable source of information about BLAST can be found athttp blast ncbi nlm nih gov Blast cgi CMD Web amp PAGE_TYPE BlastDocs amp DOC_TYPE ProgSelectionGuide Remember that BLAST is a heuristic method This means that certain assumptions are made to CHAPTER 2 TUTORIALS 69 allow searches to be done in a reasonable amount of time Thus you cannot trust BLAST search results to be accurate For very accurate results you should consider using other algorithms such as Smith Waterman You can read Bioinformatics explained BLAST versus Smith Waterman here http www clcbio com BE 2 10 Tutorial Align protein sequences This tutorial outlines some of the alignment functionality of the CLC DNA Workbench In addition to creati
251. cription of the top database hit against each query sequence and the number of hits found 12 2 1 Graphical overview for each query sequence Double clicking on a given row of a tabular blast table opens a graphical overview of the blast results for a particular query sequence as shown in figure figure 12 8 In cases where only one sequence was entered into a BLAST search such a graphical overview is the default output Figure 12 8 shows an example of a BLAST result for an individual query sequence in the CLC DNA Workbench Detailed descriptions of the overview BLAST table and the graphical BLAST results view are described below 12 2 2 Overview BLAST table In the overview BLAST table for a multi sequence blast search as shown in figure 12 9 there is one row for each query sequence Each row represents the BLAST result for this query sequence Double clicking a row will open the BLAST result for this query sequence allowing more detailed investigation of the result You can also select one or more rows and click the Open BLAST Output button at the bottom of the view Clicking the Open Query Sequence will open a sequence list with the selected query sequences This can be useful in work flows where BLAST is used as a filtering mechanism where you can filter the table to include e g sequences that have a certain top hit and then extract those CHAPTER 12 BLAST SEARCH 181 ATP8al 2QU AT8BA1 HUMAN NTI2 ATBA2_H 8198 AT8B2_H
252. ct nucleotide set trim parameters sequences 2 Set trim parameters Sequence trimming Ignore existing trim information Trim using quality scores Limit 0 02 Trim using ambiguous nucleotides Residues Vector trimming Trim contamination from vectors in UniVec database Trim contamination from saved sequences to be chosen in the next step limit moderate CICS eem nee Sie Xo Figure 17 15 Setting parameters for trimming The following parameters can be adjusted in the dialog e Ignore existing trim information If you have previously trimmed the sequences you can check this to remove existing trimming annotation prior to analysis e Trim using quality scores If the sequence files contain quality scores from a base caller algorithm this information can be used for trimming sequence ends The program uses the modified Mott trimming algorithm for this purpose Richard Mott personal communication Quality scores in the Workbench are on a Phred scale in the Workbench formats using other scales are converted during import First step in the trim process is to convert the Q quality score Q to error probability perror 10 19 This now means that low values are high quality bases CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 290 Next for every base a new value is calculated Limit perror This value will be negative for low quality bases where the error probability is high
253. ction Site Analysis Es 1 Select DNA RNA Number of cut Site sequence s 2 Enzymes to be considered in calculation 3 Number of cut sites Display enzymes with F No restriction site 0 v One restriction site 1 E Two restriction sites 2 Three restriction sites 3 N restriction sites Any number of restriction sites gt 0 Ls Previous gt Next X cancel Figure 2 58 Selecting output for restriction map analysis Click Finish to start the restriction map analysis CHAPTER 2 TUTORIALS 4 View restriction site The restriction sites are shown in two views one view is in a tabular format and the other view displays the sites as annotations on the sequence The result is shown in figure 2 59 The restriction map at the bottom can also be shown as a act ATP8al mRNA ATP8al MRNA GTGGGAGGCGCGGCCCCGCGGCAGCTGAGCCCTCTGCGG Filter All Pattern Overhang Number of c Cut position s agtacec E 1 1208 cogcgg 3 1 119 Figure 2 59 The result of the restriction map analysis is displayed in a table at the bottom and as annotations on the sequence in the view at the top table of fragments produced by cutting the sequence with the enzymes Click the Fragments button at the bottom of the view In a similar way the fragments can be shown on a virtual gel Click the Gel button E at the bottom of the view Part Il Core Functionalities 15 Chapter 3 User interfa
254. culation 3 Number of cut sites eed utput options 4 Result handling sting Add restriction sites as annotations to sequence s v Create restriction map Create list of cutting enzymes Result handling Open Save Log handling Make log q Previous Ne wf Einish XX Cancel Figure 18 44 Choosing to add restriction sites as annotations or creating a restriction map e Create restriction map When a restriction map is created it can be shown in three different ways As a table of restriction sites as shown in figure 18 46 If more than one sequence were selected the table will include the restriction sites of all the sequences This makes it easy to compare the result of the restriction map analysis for two sequences As a table of fragments which shows the sequence fragments that would be the result of cutting the sequence with the selected enzymes see figure18 4 7 As a virtual gel simulation which shows the fragments as bands on a gel see figure 18 49 For more information about gel electrophoresis see section 18 4 The following sections will describe these output formats in more detail In order to complete the analysis click Finish See section 9 2 for information about the Save and Open options Restriction sites as annotation on the sequence If you chose to add the restriction sites as annotation to the sequence the result will be similar to the sequence sho
255. d BLAST databases These can either be created from within the Workbench using the Create BLAST Database tool see section 12 3 3 or they can be pre formatted BLAST databases The list of locations can be modified using the Add Location and Remove Location buttons Once the Workbench has scanned the locations it will keep a cache of the databases in order CHAPTER 12 BLAST SEARCH 189 to improve performance If you have added new databases that are not listed you can press Refresh Locations to clear the cache and search the database locations again By default a BLAST database location will be added under your home area in a folder called CLCdatabases This folder is scanned recursively through all subfolders to look for valid databases All other folderlocations are scanned only at the top level Below the list of locations all the BLAST databases are listed with the following information e Name The name of the BLAST database e Description Detailed description of the contents of the database e Date The date the database was created e Sequences The number of sequences in the database Type The type can be either nucleotide DNA or protein Total size 1000 residues The number of residues in the database either bases or amino acid e Location The location of the database Below the list of BLAST databases there is a button to Remove Database This option will delete the database files belonging to the database se
256. d Tongaonkar 1990 Kolaskar A S and Tongaonkar P C 1990 A semi empirical method for prediction of antigenic determinants on protein antigens FEBS Lett 2 0 1 2 1 2 174 Kyte and Doolittle 1982 Kyte J and Doolittle R F 1982 A simple method for displaying the hydropathic character of a protein J Mol Biol 15 1 105 132 Larget and Simon 1999 Larget B and Simon D 1999 Markov chain monte carlo algorithms for the bayesian analysis of phylogenetic trees Mol Biol Evol 16 750 759 Leitner and Albert 1999 Leitner T and Albert J 1999 The molecular clock of HIV 1 unveiled through analysis of a known transmission history Proc Natl Acad Sci USA 96 19 10752 1075 7 Maizel and Lenk 1981 Maizel J V and Lenk R P 1981 Enhanced graphic matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci US A 78 12 665 669 McGinnis and Madden 2004 McGinnis S and Madden T L 2004 BLAST at the core of a powerful and diverse set of sequence analysis tools Nucleic Acids Res 32 Web Server issue W20 W25 Meyer et al 2007 Meyer M Stenzel U Myles S Prufer K and Hofreiter M 2007 Targeted high throughput sequencing of tagged nucleic acid samples Nucleic Acids Res 35 15 e97 Michener and Sokal 1957 Michener C and Sokal R 1957 A quantitative approach to a problem in classification Evolution 11 130 162 Purvis 1995 Purvis A 1995 A composite e
257. d be accepted in the CLC interface Some commonly used Entrez queries are pre entered and can be chosen in the drop down menu CHAPTER 12 BLAST SEARCH 1 7 e Choose filter Low complexity Mask off segments of the query sequence that have low compo sitional complexity Filtering can eliminate statistically significant but biologically uninteresting reports from the BLAST output e g hits against common acidic basic or proline rich regions leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences Mask lower case If you have a sequence with regions denoted in lower case and other regions in upper case then choosing this option would keep any of the regions in lower case from being considered in your BLAST search e Expect The threshold for reporting matches against database sequences the default value is 10 meaning that under the circumstances of this search 10 matches are expected to be found merely by chance according to the stochastic model of Karlin and Altschul 1990 Details of how E values are calculated can be found at the NCBI http www ncbi nlm nih gov BLAST tutorial Altschul 1 htm1 If the E value ascribed to a match is greater than the EXPECT threshold the match will not be reported Lower EXPECT thresholds are more stringent leading to fewer chance matches being reported Increasing the threshold results in more matches being reported b
258. d close the views You do not need to save the fragment table CHAPTER 2 TUTORIALS 95 2 Find Binding Sites and Create Fragments X 1 Select nudeotide Resulehane ng sequence s to match primer against 2 Set Primer properties Output format 3 Result handling Add binding site annotations Create binding site table Create fragment table Min fragment length 100 Max fragment length 4 000 5 Result handling Figure 2 28 Creating the fragment table including fragments up to 4000 bp E ATP8a1 mRNA Fragment length Region Other fragments Fwd primer Melt temp Rev primer Melt temp Diff Melt temp Annotate Fragment Open Fragment Figure 2 29 Opening the fragment as a sequence 2 6 4 Specify restriction sites and perform cloning The final step in this tutorial is to insert the fragment into the cloning vector Toolbox in the Menu Bar Cloning and Restriction Sites si Cloning ij Select the Fragment ATP8al mRNA ATP8al fwd ATP8al rev sequence you just saved and click Next In this dialog use the Browse acy button to select pcDNA4_ TO cloning vector also located in the Cloning folder Click Finish You will now see the cloning editor where you will see the pcDNA4 TO vector in a circular view Press and hold the Ctrl 8 on Mac key while you click first the Hindlll site and next the Xhol site see figure 2 30 At the bottom of the view you can now see inf
259. d in the View Area by their tabs The order of the views can be changed using drag and drop E g drag the tab of one view onto the tab of a another The tab of the first view is now placed at the right side of the other tab lf a tab is dragged into a view an area of the view is made gray see fig 3 12 illustrating that the view will be placed in this part of the View Area The results of this action is illustrated in figure 3 13 You can also split a View Area horizontally or vertically using the menus Splitting horisontally may be done this way right click a tab of the view View Split Horizontally This action opens the chosen view below the existing view See figure 3 14 When the split is made vertically the new view opens to the right of the existing view Splitting the View Area can be undone by dragging e g the tab of the bottom view to the tab of the top view This is marked by a gray area on the top of the view CHAPTER 3 USER INTERFACE 89 ner P6046 ar PEB053O PF68225 RLLVVYPWTQRFFESFGDLSSPDAVMGNPK P6s225 VKAHGKKVLGAFSDGLNHLDNLKGTFAQLS PF68225 ELHCDKLHVDPENFKLLGNVLVCVLAHHFG Figure 3 12 When dragging a view a gray area indicates where the view will be shown ast POSO46 O ne Pagosa O act PEBDES EQ gt HHT F S063 LLIVYPWTQRFFASFGNLSSPTAIIGNPMV 4 agt PERZ2S ED Fhbo225 RLLVVYPWTORFFESFGDLSSPDAVMENPK Figure 3
260. d since you opened it you are asked if you want to save When saving a new view that has not been opened from the Navigation Area e g when opening a sequence from a list of search hits a save dialog appears figure 3 11 In the dialog you select the folder in which you want to save the element After naming the element press OK 3 2 5 Undo Redo If you make a change in a view e g remove an annotation in a sequence or modify a tree you can undo the action In general Undo applies to all changes you can make when right clicking in a view Undo is done by Click undo in the Toolbar CHAPTER 3 USER INTERFACE 88 E save SEIECE name and TOC clon Tor nen iN Folder Update All CLC_Data XX ATP8al mRNA fs ATP8al FEE alignment 1 4 ATP8al ortholog tree Syt P39524 Ss P57792 ifs Q29449 Ht QONTI2 fee 09033 x i d Q lt enter search term gt Name GERZE So ER Se Figure 3 11 Save dialog or Edit Undo or Ctrl Z If you want to undo several actions just repeat the steps above To reverse the undo action Click the redo icon in the Toolbar or Edit Redo or Ctrl Y Note Actions in the Navigation Area e g renaming and moving elements cannot be undone However you can restore deleted elements see section 3 1 7 You can set the number of possible undo actions in the Preferences dialog see section 5 3 2 6 Arrange views in View Area Views are arrange
261. d the origin of DNA and protein sequences Selections or the entire text of the Sequence Text View can be copied and pasted into other programs Much of the information is also displayed in the Sequence info where it is easier to get an overview see section 10 4 In the Side Panel you find a search field for searching the text in the view 10 6 Creating a new sequence A sequence can either be imported downloaded from an online database or created in the CLC DNA Workbench This section explains how to create a new sequence New 5 in the toolbar r BB Create Sequence ES e 1 Enter Sequence Data BE aL Es Name P7070 Common name house mouse Latin name Musmusculus Type X DNA xx RNA amp 0 Protein Description Probable phospholipid transporting ATPase IA Sequence required 180 1 mptmrrtvse irsraegyek tddvsektsl adqeevrtif ingpqltkfc nnh vstakyn 61 vitflprfly sqfrraansf flfiallqqi pdvsptgryt tlvpllfil a vaaikeiied 121 ikrhkadnav nkkqtqvlrn gaweivhwek vnvgdiviik gkeyipadt v llsssepqam NS Xen Figure 10 14 Creating a sequence The Create Sequence dialog figure 10 14 reflects the information needed in the GenBank format but you are free to enter anything into the fields The following description is a guideline for entering information about a sequence e Name The name of the sequence This is used for saving the sequence e Common name A common name for the species e Latin name Th
262. dd Default Rows Note that this will not reset the table but only add all the default rows to the existing rows 18 2 2 Create entry clones BP The next step in the Gateway cloning work flow is to recombine the attB flanked sequence of interest into a donor vector to create an entry clone the so called BP reaction Toolbox in the Menu Bar Cloning and Restriction Sites si Gateway Cloning H Create Entry Clone This will open a dialog where you can select on or more sequences that will be the sequence of interest to be recombined into your donor vector Note that the sequences you select should be flanked with attB sites see section 18 2 1 You can select more than one sequence as input and the corresponding number of entry clones will be created When you have selected your sequence s click Next This will display the dialog shown in figure 18 23 CHAPTER 18 CLONING AND CUTTING 325 1 Select attB flanked fragments 2 Select a donor vector Donor vectors x pDONR221 Fragments XxX ATP8a1 mRNA Atp8al attB 1 attB2 Figure 18 23 Selecting one or more donor vectors Clicking the Browse 55 button opens a dialog where you can select a donor vector You can download donor vectors from Invitrogen s web site http tools invitrogen com downloads Gateway S20vectors ma4 and import into the CLC DNA Workbench Note that the Workbench looks for the specific sequences of the attP sites in the sequences that y
263. dd to define the first element This will bring up the dialog shown in 17 7 At the top of the dialog you can choose which kind of element you wish to define CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 284 EI Define tag Linker type Linker length Barcode sequence Reverse sequence Min length Max length 250 da J Cancel Help Figure 17 7 Defining an element of the barcode system e Linker This is a sequence which should just be ignored it is neither the barcode nor the sequence of interest Following the example in figure 17 6 it would be the four nucleotides of the Srfl site For this element you simply define its length nothing else e Barcode The barcode is the stretch of nucleotides used to group the sequences For that you need to define what the valid bases are This is done when you click Next In this dialog you simply need to specify the length of the barcode e Sequence This element defines the sequence of interest You can define a length interval for how long you expect this sequence to be The sequence part is the only part of the read that is retained in the output Both barcodes and linkers are removed The concept when adding elements is that you add e g a linker a barcode and a sequence in the desired sequential order to describe the structure of each sequencing read You can of course edit and delete elements by selecting them and clicking the buttons below For the example from
264. de CGT Figure 17 12 A preview of the result CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 288 With this data set we got the four groups as expected shown in figure 17 13 The Not grouped list contains 445 560 reads that will have to be discarded since they do not have any of the barcodes el Ee tagged processed j ve Mot grouped i tem Barcode CCT j P Barcode CGT i tem Barcode GGT r Barcode AAT Figure 17 13 The result is one sequence list per barcode and a list with the remainders 17 3 Trim sequences CLC DNA Workbench offers a number of ways to trim your sequence reads prior to assembly Trimming can be done either as a separate task before assembling or it can be performed as an integrated part of the assembly process see section 17 4 Trimming as a separate task can be done either manually or automatically In both instances trimming of a sequence does not cause data to be deleted instead both the manual and automatic trimming will put a Trim annotation on the trimmed parts as an indication to the assembly algorithm that this part of the data is to be ignored see figure 17 14 This means that the effect of different trimming schemes can easily be explored without the loss of data To remove existing trimming from a sequence simply remove its trim annotation see section 10 3 2 Trim CAGCACAGAGGTCATACTGGCATTCTGAACG Figure 17 14 Trimming creates ann
265. ded when you click Finish Note that this list is dynamically updated when you change the number of cut sites The enzymes shown in brackets are enzymes which are already present in the Side Panel If you have selected more than one region on the sequence using Ctrl or they will be treated as individual regions This means that the criteria for cut sites apply to each region Show enzymes with compatible ends Besides what is described above there is a third way of adding enzymes to the Side Panel and thereby displaying them on the sequence It is based on the overhang produced by cutting with an enzyme and will find enzymes producing a compatible overhang right click the restriction site Show Enzymes with Compatible Ends T 1 This will display the dialog shown in figure 18 39 G Show Enzymes with Compatible Ends to Taql 1 Please choose enzymes Bs mice asas ules Enzyme list Exact matches only All matches Select enzymes to be added to Side Panel Enzymes with compatible ends Enzymes added to Side Panel Filter Filter Name Q lt a x w Oo Methylation Popularity Name Overhang Methylation Popularity Clal N6 methyl a Taql N 6 methyl 5 cg N6 methyl Clal S cg N6 methyl 5 methylcy Hpall 5 cg 5 methylcy 5 methylcy MspI 5 cg 5 methylcy 5 methylcy EM o a LANN O o EO oe RA et RM 9N NAAN MH
266. ded displaying this name Sequence The actual sequence to be inserted The sequence is always defined on the sense strand although the reverse primer would be reverse complement CHAPTER 18 CLONING AND CUTTING 324 Preferences RR Masi Ae ac 0 t Default Data Location General a a URL to use when blasting http blast ncbi nlm nih govsBlask cgi Maximum number of simultaneous requests Delay fin ms between requests 3000 URL to use when blasting http blast ncbi nim nih gowiBlast cgi Maximum number of simultaneous requests Delay fin ms between requests 3000 Sequence Annotation type Forward primer Reverse primer Shine Dalgarno ASGAGGT RBS m d o e pm BE E DOT U a Add Default Rows Delete Row Add Row Ls Ww Hk Figure 18 22 Configuring the list of primer additions available when adding attB sites Annotation type The annotation type used for the annotation that is added to the fragment Forward primer addition Whether this addition should be visible in the list of additions for the forward primer Reverse primer addition Whether this addition should be visible in the list of additions for the reverse primer You can either change the existing elements in the table by double clicking any of the cells or you can use the buttons below to Add Row or Delete Row If you by accident have deleted or modified some of the default primer additions you can press A
267. ds Res 36 19 e122 Crooks et al 2004 Crooks G E Hon G Chandonia J M and Brenner S E 2004 WebLogo a sequence logo generator Genome Res 14 6 1188 1190 Dayhoff and Schwartz 1978 Dayhoff M O and Schwartz R M 1978 Atlas of Protein Sequence and Structure volume 3 of 5 suppl pages 353 358 Nat Biomed Res Found Washington D C Dempster et al 1977 Dempster A Laird N Rubin D et al 1977 Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society 39 1 1 38 Eddy 2004 Eddy S R 2004 Where did the BLOSUM62 alignment score matrix come from Nat Biotechnol 22 8 1035 1036 Eisenberg et al 1984 Eisenberg D Schwarz E Komaromy M and Wall R 1984 Analysis of membrane and surface protein sequences with the hydrophobic moment plot J Mol Biol 1 9 1 125 142 400 BIBLIOGRAPHY 401 Emini et al 1985 Emini E A Hughes J V Perlow D S and Boger J 1985 Induction of hepatitis a virus neutralizing antibody by a virus specific synthetic peptide J Virol 55 3 836 839 Engelman et al 1986 Engelman D M Steitz T A and Goldman A 1986 Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins Annu Rev Biophys Biophys Chem 15 321 353 Felsenstein 1981 Felsenstein J 1981 Evolutionary trees from DNA sequences a maximum likelihood approach J Mol Evol 17 6 368 37
268. e e Organism Sequences which contain information about organism can be searched In this way you could search for e g Homo sapiens sequences Database fields If your data is stored in a CLC Bioinformatics Database you will be able to search for custom defined information Read more in the database user manual 98 CHAPTER 4 SEARCHING YOUR DATA 99 Only the first item in the list Name is available for all kinds of data The rest is only relevant for sequences If you wish to perform a search for sequence similarity use Local BLAST see section 12 1 3 instead 4 2 Quick search At the bottom of the Navigation Area there is a text field as shown in figure 4 1 o AA Lodo ta o EA CLC_Data S E Example Data FF Extra amp F5 Nucleotide oa Protein SER EADME cai Recycle bin 12 Qy center search term A Figure 4 1 Search simply by typing in the text field and press Enter To search simply enter a text to search for and press Enter 4 2 1 Quick search results To show the results the search pane is expanded as shown in figure 4 2 Li Lj Los ma L E H TLC Data Es Example Data Pa Extra PF Nucleotide EF Protein README aie Recycle bin 14 e Figure 4 2 Search resulis Showing 1 50 If there are many hits only the 50 first hits are immediately shown At the bottom of the pane you can click Next E to see the next 50 hits see figure 4 3 lf a search giv
269. e The name of the motif In the result of a motif search this name will appear as the name of the annotation and in the result table e Motif The actual motif See section 13 7 2 for more information about the syntax of motifs e Description You can enter a description of the motif In the result of a motif search the description will appear in the result table and added as a note to the annotation on the sequence visible in the Annotation table or by placing the mouse cursor on the annotation e Type You can enter three different types of motifs Simple motifs java regular expressions or PROSITE regular expression Read more in section 13 7 2 The motif list can contain a mix of different types of motifs This is practical because some motifs can be described with the simple syntax whereas others need the more advanced regular expression syntax Instead of manually adding motifs you can Import From Fasta File 55 This will show a dialog where you can select a fasta file on your computer and use this to create motifs This will automatically take the name description and sequence information from the fasta file and put it into the motif list The motif type will be simple CHAPTER 13 GENERAL SEQUENCE ANALYSES 228 Besides adding new motifs you can also edit and delete existing motifs in the list To edit a motif either double click the motif in the list or select and click the Edit 4 button at the bottom of the vie
270. e eo ni nih oer blast ab O Reilly book on BLAST http www oreilly com catalog blast Explanation of scoring substitution matrices and more http www clcbio com be CHAPTER 12 BLAST SEARCH 198 Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in its original form and CLC bio has to be clearly labeled as author and provider of the work You may not use this work for commercial purposes You may not alter transform nor build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents Chapter 13 General sequence analyses Contents 13 1 Shuffle sequence 0 00 ee eee ee ee a 199 13 2 Ot PIOUS osmose oo ee eee RD E oe 201 SA CORRO 4248 6 Pale oe bee ES Owe ee ee EE RE 201 13 2 2 View dotpiots visse DAE PEGA EEE ew wD Oe ee E 203 13 2 3 Bioinformatics explained Dot plots 2 008 oe 204 13 2 4 Bioinformatics explained Scoring matrices 208 13 3 Local complexity plot 1 2 ee ee te a 211 13 4 Sequence statistics a aoao oaos a a a s a 212 13 4 1 Bioinformatics explained Protein statistics o aoao aoao o a aa 215 13 5 Join sequences 2 24288 2b e
271. e we welcome all requests and feedback from users and hope suggest new features or more general improvements to the program on support clcbio com 1 5 2 Report program errors CLC bio is doing everything possible to eliminate program errors Nevertheless some errors might have escaped our attention If you discover an error in the program you can use the Report a Program Error function in the Help menu of the program to report it In the Report a Program Error dialog you are asked to write your e mail address optional This is because we would like to be able to contact you for further information about the error or for helping you with the problem Note No personal information is sent via the error report Only the information which can be seen in the Program Error Submission Dialog is submitted You can also write an e mail to Ssupport clcbio com Remember to specify how the program error can be reproduced All errors will be treated seriously and with gratitude We appreciate your help CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 29 Start in safe mode lf the program becomes unstable on start up you can start it in Safe mode This is done by pressing and holding down the Shift button while the program starts When starting in safe mode the user settings e g the settings in the Side Panel are deleted and cannot be restored Your data stored in the Navigation Area is not deleted When started in safe mode some of the funct
272. e 16 3 There are two different ways to display the information relating to a single primer the detailed and the compact view Both are shown below the primer regions selected on the sequence 16 3 1 Compact information mode This mode offers a condensed overview of all the primers that are available in the selected region When a region is chosen primer information will appear in lines beneath it see figure 16 4 To PERHIBOC Pa TIMER VeESIQner sec ha e 3 m onward primer re Primer information gt aii PERH3B GTGAGTCTGATGGGTCTGCCCATGGTIITCCTICCICTAGT 7 Show Lot 18 POPPER URC UEP EEE rs 2 Compact 19 2 Comp Lat al laa Primer covering positions 20 to 37 Lot 20 TIETE Detailed Fraction of G and 0 5 E Lat 24 eecceeeecee00e0000 Melting temperature 55 23 C Self annealing 16 Lot 22 eeeeeeeeeeeeeeeeees Sof and annealing 2 Secondary structure 10 60 ai a TICTGGGCTTACCTICCTATCAGA AGG AAA TGGGAAGAGA Lgt 19 w lt Figure 16 4 Compact information mode The number of information lines reflects the chosen length interval for primers and probes One line is shown for every possible primer length if the length interval is widened more lines will appear At each potential primer starting position a circle is shown which indicates whether the primer fulfills the requirements set in the primer parameters preference group A green primer indicates a
273. e Ctrl button while making selections Holding down the Shift button lets you extend or reduce an existing selection to the position you clicked To select a part of a sequence covered by an annotation CHAPTER 10 VIEWING AND EDITING SEQUENCES 149 right click the annotation Select annotation or double click the annotation To select a fragment between two restriction sites that are shown on the sequence double click the sequence between the two restriction sites Read more about restriction sites in section 10 1 2 Open a selection in a new view A selection can be opened in a new view and saved as a new sequence right click the selection Open selection in New View This opens the annotated part of the sequence in a new view The new sequence can be saved by dragging the tab of the sequence view into the Navigation Area The process described above is also the way to manually translate coding parts of sequences CDS into protein You simply translate the new sequence into protein This is done by right click the tab of the new sequence Toolbox Nucleotide Analyses GA Translate to Protein 2 A selection can also be copied to the clipboard and pasted into another program make a selection Ctrl C 36 C on Mac Note The annotations covering the selection will not be copied A selection of a sequence can be edited as described in the following section 10 1 4 Editing the sequence When you make a selection i
274. e Latin name for the species e Type Select between DNA RNA and protein e Circular Specifies whether the sequence is circular This will open the sequence in a circular view as default applies only to nucleotide sequences e Description A description of the sequence e Keywords A set of keywords separated by semicolons CHAPTER 10 VIEWING AND EDITING SEQUENCES 163 e Comments Your own comments to the sequence e Sequence Depending on the type chosen this field accepts nucleotides or amino acids Spaces and numbers can be entered but they are ignored when the sequence is created This allows you to paste Ctrl V on Windows and 3 V on Mac in a sequence directly from a different source even if the residue numbers are included Characters that are not part of the IUPAC codes cannot be entered At the top right corner of the field the number of residues are counted The counter does not count spaces or numbers Clicking Finish opens the sequence It can be saved by clicking Save or by dragging the tab of the sequence view into the Navigation Area 10 7 Sequence Lists The Sequence List shows a number of sequences in a tabular format or it can show the sequences together in a normal sequence view Having sequences in a sequence list can help organizing sequence data The sequence list may originate from an NCBI search chapter 11 1 Moreover if a multiple sequence fasta file is imported it is possible to stor
275. e On the welcome screen click Next e Read and accept the License agreement and click Next e Choose where you would like to install the application and click Next e Choose a name for the Start Menu folder used to launch CLC DNA Workbench and click Next e Choose if CLC DNA Workbench should be used to open CLC files and click Next e Choose where you would like to create shortcuts for launching CLC DNA Workbench and click Next e Choose if you would like to associate clc files to CLC DNA Workbench If you check this option double clicking a file with a clc extension will open the CLC DNA Workbench e Wait for the installation process to complete choose whether you would like to launch CLC DNA Workbench right away and click Finish When the installation is complete the program can be launched from the Start Menu or from one of the shortcuts you chose to create 1 2 3 Installation on Mac OS X Starting the installation process is done in one of the following ways If you have downloaded an installer Locate the downloaded installer and double click the icon The default location for downloaded files is your desktop If you are installing from a CD Insert the CD into your CD ROM drive and open it by double clicking on the CD icon on your desktop Launch the installer by double clicking on the CLC DNA Workbench icon Installing the program is done in the following steps e On the welcome screen click Next e Read and a
276. e a codon but with respect to its frequency in the organism As an example we want to translate an alanine to the corresponding codon Four different codons can be used for this reverse translation GCU GCC GCA or GCG By picking either one by random choice we will get an alanine The most frequent codon coding for an alanine in E coli is GCG encoding 33 7 of all alanines Then comes GCC 25 5 GCA 20 3 and finally GCU 15 3 The data are retrieved from the Codon usage database see below Always picking the most frequent codon does not necessarily give the best answer By selecting codons from a distribution of calculated codon frequencies the DNA sequence obtained after the reverse translation holds the correct or nearly correct codon distribution It CHAPTER 15 PROTEIN ANALYSES 241 should be kept in mind that the obtained DNA sequence is not necessarily identical to the original one encoding the protein in the first place due to the degeneracy of the genetic code In order to obtain the best possible result of the reverse translation one should use the codon frequency table from the correct organism or a closely related species The codon usage of the mitochondrial chromosome are often different from the native chromosome s thus mitochondrial codon frequency tables should only be used when working specifically with mitochondria Other useful resources The Genetic Code at NCBI http www ncbi nlm nih gov Taxonomy
277. e advantage of this tool e Many of the databases listed are very large Please make sure you have room for them If you are working on a shared system we recommend you discuss your plans with your system administrator and fellow users e Some of the databases listed are dependent on others This will be listed in the Dependencies column of the Download BLAST Databases window This means that while CHAPTER 12 BLAST SEARCH 187 the database your are interested in may seem very small it may require that you also download a very big database on which it depends An example of the second item above is Swissprot To download a database from the NCBI that would allow you to search just Swissprot entries you need to download the whole nr database in addition to the entry for Swissprot 12 3 3 Create local BLAST databases In the CLC DNA Workbench you can create a local database that you can use for local BLAST searches You can specify a location on your computer to save the BLAST database files to The Workbench will list the BLAST databases found in these locations when you set up a local BLAST search see section 12 1 3 DNA RNA and protein sequences located in the Navigation Area can be used to create BLAST databases from Any given BLAST database can only include one molecule type If you wish to use a pre formatted BLAST database instead see section 12 3 1 To create a BLAST database go to Toolbox BLAST Create BLAST
278. e and hold the mouse button By moving the mouse you move the sequence in the View 3 3 6 Selection The Selection mode h is used for selecting in a View selecting a part of a sequence selecting nodes in a tree etc It is also used for moving e g branches in a tree or sequences in an alignment When you make a selection on a sequence or in an alignment the location is shown in the bottom right corner of the screen E g 23 24 means that the selection is between two residues 23 means that the residue at position 23 is selected and finally 23 25 means that 23 24 and 25 are selected By holding ctrl 38 you can make multiple selections 3 3 7 Changing compactness There is a shortcut way of changing the compactness setting for read mappings or Press and hold Alt key Scroll using your mouse wheel or touchpad CHAPTER 3 USER INTERFACE 93 3 4 Toolbox and Status Bar The Toolbox is placed in the left side of the user interface of CLC DNA Workbench below the Navigation Area The Toolbox shows a Processes tab and a Toolbox tab 3 4 1 Processes By clicking the Processes tab the Toolbox displays previous and running processes e g an NCBI search or a calculation of an alignment The running processes can be stopped paused and resumed by clicking the small icon next to the process see figure 3 17 Running and paused processes are not deleted Toolbo Search Database nucleotide NC 012671 Al
279. e conflict or resolved Conflict Initially all the rows in the table have this status This means that there is one or more differences between the sequences at this position Resolved If you edit the sequences e g if there was an error in one of the sequences and they now all have the same residue at this position the status is set to Resolved e Note Can be used for your own comments on this conflict Right click in this cell of the table to add or edit the comments The comments in the table are associated with the conflict annotation in the graphical view Therefore the comments you enter in the table will also be attached to the annotation on the consensus sequence the comments can be displayed by placing the mouse cursor on the annotation for one second see figure 17 24 The comments are saved when you Save ED By clicking a row in the table the corresponding position is highlighted in the graphical view Clicking the rows of the table is another way of navigating the contig apart from using the Find Conflict button or using the Space bar You can use the up and down arrow keys to navigate the rows of the table 17 8 Reassemble contig If you have edited a contig changed trimmed regions or added or removed reads you may wish to reassemble the contig This can be done in two ways Toolbox in the Menu Bar Sequencing Data Analyses iii select the contig and click Next Reassemble Contig or righ
280. e displayed in the annotation s box Over annotation The labels are displayed above the annotations Before annotation The labels are placed just to the left of the annotation Flag The labels are displayed as flags at the beginning of the annotation Stacked The labels are offset so that the text of all labels is visible This means that there is varying distance between each sequence line to make room for the labels CHAPTER 10 VIEWING AND EDITING SEQUENCES 155 e Show arrows Displays the end of the annotation as an arrow This can be useful to see the orientation of the annotation for DNA sequences Annotations on the negative strand will have an arrow pointing to the left e Use gradients Fills the boxes with gradient color In the Annotation Types group you can choose which kinds of annotations that should be displayed This group lists all the types of annotations that are attached to the sequence s in the view For sequences with many annotations it can be easier to get an overview if you deselect the annotation types that are not relevant Unchecking the checkboxes in the Annotation Layout will not remove this type of annotations them from the sequence it will just hide them from the view Besides selecting which types of annotations that should be displayed the Annotation Types group is also used to change the color of the annotations on the sequence Click the colored square next to the relevant annotation typ
281. e ee ee ee 10 7 3 Extract sequences sc 2 66 Be eee EG Eww Oe E ew ww a CLC DNA Workbench offers five different ways of viewing and editing single sequences as described in the first five sections of this chapter Furthermore this chapter also explains how to create a new sequence and how to gather several sequences in a sequence list 10 1 View sequence When you double click a sequence in the Navigation Area the sequence will open automatically and you will see the nucleotides or amino acids The zoom options described in section 3 3 allow 141 CHAPTER 10 VIEWING AND EDITING SEQUENCES 142 you to e g zoom out in order to see more of the sequence in one view There are a number of options for viewing and editing the sequence which are all described in this section All the options described in this section also apply to alignments further described in section 19 2 10 1 1 Sequence settings in Side Panel Each view of a sequence has a Side Panel located at the right side of the view see figure 10 1 OS et Fit Width 100 Pan CSS Zoom In Zoom Out k Sequence layout k Annotation layout H Annotation types k Restriction sites k Residue coloring Nucleotide info k Find k Text Format Figure 10 1 Overview of the Side Panel which is always shown to the right of a view When you make changes in the Side Panel the view of the sequence is instantly updated To show or hide the Side Panel select the View Ctrl U o
282. e eee eh a ee Be SG 94 C435 SOUS DO si scara dass de ED A AO se haw ee as 94 3 5 MoMA Esses ET ES ee E 94 CHAPTER 3 USER INTERFACE Cf 3 5 1 Create Workspace ici ek ce etd deeb baw a Oe So Be E 94 3 5 2 Select Workspace 0 0 0 ee aa 94 3 5 3 Delete Workspace 0 2 ee ra 95 3 6 List of shortcuts 6 ech 66 2 bee Oe ewe Be ds Re ee we ee ee 95 This chapter provides an overview of the different areas in the user interface of CLC DNA Workbench As can be seen from figure 3 1 this includes a Navigation Area View Area Menu Bar Toolbar Status Bar and Toolbox 9 CLC Dna Workbench 3 0 Current workspace Default Sele File Edit Segch view Toolbox Workspace Help 5 apo eE T et Ba S W A E Ba ER X A Show New Import Export Graphics Print Copy Workspace Search Fit Width 100 Pan GOETAN Zoom In Zooms Menu Bar HSVIQSTION Area Acer AY738615 Toolbar Tey 1 SN a f Primer design A n n a e e mb H Restriction analy N avig ation Are Sequences Sequence layout gt 9 PERHZBD Spacing H206 HUMHBB A No spacing te Beh 736615 E DOC NM 000044 O No wrap View Area gt sequence lis E DOG PERHSBC v Auto wrap lt amp Fixed yap x AY738615 CCTTTAGTGATGGCCTGGCTCACCTG 70000 i Alignments and Trees Tool box a General Sequence Analyses C Double stranded KA Nucleotide Analyses ere Protein Analyses Se et oe oc a Sequencing Data Analyses Relative to
283. e icon GQ next to the search field see figure 4 5 Qe llength 100 TO 150 Search length 100 TO 150 ckagina signal e A human name humhbb insulin aboog Figure 4 5 Recent searches Clicking one of the recent searches will conduct the search again 4 3 Advanced search As a supplement to the Quick search described in the previous section you can use the more advanced search Search Local Search or Ctrl F 36 F on Mac This will open the search view as shown in figure 4 6 The first thing you can choose is which location should be searched All the active locations are shown in this list You can also choose to search all locations Read more about locations in section 3 1 1 Furthermore you can specify what kind of elements should be searched CHAPTER 4 SEARCHING YOUR DATA 102 Search O Search in Location within Add Filter x Label Description Length L Figure 4 6 Advanced search e All sequences e Nucleotide sequences e Protein sequences e All data When searching for sequences you will also get alignments sequence lists etc as result if they contain a sequence which match the search criteria Below are the search criteria First select a relevant search filter in the Add filter list For sequences you can search for e Name e Length e Organism See section 4 2 2 for more information on individual search terms F
284. e list to see more hits 2 4 2 Saving the sequence The sequences which are found during the search can be displayed by double clicking in the list of hits However this does not save the sequence You can save one or more sequence by selecting them and click Download and Save or drag the sequences into the Navigation Area CHAPTER 2 TUTORIALS 45 2 5 Tutorial Assembly In this tutorial you will see how to assemble data from automated sequencers into a contig and how to find and inspect any conflicts that may exist between different reads This tutorial shows how to assemble sequencing data generated by conventional Sanger sequencing techniques For high throughput sequencing data we refer to the CLC Genomics Workbench see http www clcbio com genomics The data used in this tutorial are the sequence reads in the Sequencing reads folder in the Sequencing data folder of the Example data in the Navigation Area If you do not have the example data please go to the Help menu to impor it 2 5 1 Trimming the sequences The first thing to do when analyzing sequencing data is to trim the sequences Trimming serves a dual purpose it both takes care of parts of the reads with poor quality and it removes potential vector contamination Trimming the sequencing data gives a better result in the further analysis Toolbox in the Menu Bar Sequencing Data Analyses 161 Trim Sequences Select the 9 sequences and click Next
285. e mode 29 Save changes in a view 87 sequence 44 style sheet 110 view preferences 110 workspace 94 Save enzyme list 330 Scale traces 2 8 SCF2 file format 393 SCF3 file format 393 Score BLAST search 183 Scoring matrices Bioinformatics explained 208 BLOSUM 208 PAM 208 Scroll wheel to zoom in 91 to zoom out 91 Search 101 in one location 101 BLAST 174 175 GenBank 16 7 GenBank file 162 handle results from GenBank 169 hits number of 105 in a sequence 14 7 in annotations 14 7 in Navigation Area 99 Local BLAST 1 8 local data 3 0 options GenBank 167 own motifs 227 parameters 168 patterns 219 221 PubMed references 1 1 sequence in UniProt 1 1 sequence on Google 1 1 sequence on NCBI 1 1 sequence on web 1 0 troubleshooting 103 Secondary peak calling 305 Secondary structure predict RNA 3 9 Secondary structure prediction 3 8 Secondary structure for primers 253 Select 413 exact positions 14 7 in sequence 148 parts of a sequence 148 workspace 94 Select annotation 148 Selection mode in the toolbar 92 Selection adjust 148 Selection expand 148 Selection location on sequence 92 Self annealing 252 Self end annealing 253 Separate sequences on gel 341 using restriction enzymes 341 Sequence alignment 347 analysis 199 display different information 82 extract from sequence list 165 find 147 information 160 join 218 layout 142 lists 163 logo 3 8 logo Bioi
286. e needs to be fulfilled Match any For each filter criterion you first have to select which column it should apply to Next you choose an operator For numbers you can choose between e equal to lt smaller than e gt greater than e lt gt not equal to e abs value lt absolute value smaller than This is useful if it doesn t matter whether the number is negative or positive e abs value gt absolute value greater than This is useful if it doesn t matter whether the number is negative or positive For text based columns you can choose between e contains the text does not have to be in the beginning e doesn t contain APPENDIX C WORKING WITH TABLES 385 e the whole text in the table cell has to match also lower upper case Once you have chosen an operator you can enter the text or numerical value to use If you wish to reset the filter simply remove E all the search criteria Note that the last one will not disappear it will be reset and allow you to start over Figure C 3 shows an example of an advanced filter which displays the open reading frames larger than 400 that are placed on the negative strand Find reading Rows 15 169 Find reading Frame output Filter Match any Match all Length o rr 128 Apply Start End Length Found ak strand Start codon 14 rate ovo negative ANT A 3462 ao 426 negative CAL ot 14 Sot 1851 negative CAL
287. e of the Side Panel of a sequence view Sequence layout Annotation layout F Annotation types k Restriction sites Residue coloring b Nucleotide info Find k Text Format Figure 5 8 The Side Panel of a sequence contains several groups Sequence layout Annotation types Annotation layout etc Several of these groups are present in more views E g Sequence layout is also in the Side Panel of alignment views By clicking the black triangles or the corresponding headings the groups can be expanded or collapsed An example is shown in figure 5 9 where the Sequence layout is expanded The content of the groups is described in the sections where the functionality is explained E g Sequence Layout for sequences is described in chapter 10 1 1 When you have adjusted a view of e g a sequence your settings in the Side Panel can be saved When you open other sequences which you want to display in a similar way the saved settings can be applied The options for saving and applying are available in the top of the Side Panel see figure 5 10 To save and apply the saved settings click seen in figure 5 10 This opens a menu where the following options are available e Save Settings This brings up a dialog as shown in figure 5 11 where you can enter a name for your settings Furthermore by clicking the checkbox Always apply these settings you can choose to use these settings every time you open a new view of this type
288. e reporter dye It is recommended that the melting temperature of the TaqMan probe is about 10 degrees celsius higher than that of the primer pair Primer design for TaqMan technology involves designing a primer pair and a TaqMan probe In TaqMan the user must thus define three regions a Forward primer region a Reverse primer region and a TaqMan probe region The easiest way to do this is to designate a TaqMan primer probe region spanning the sequence region where TaqMan amplification is desired This will automatically add all three regions to the sequence If more control is desired about the placing of primers and probes the Forward primer region Reverse primer region and TaqMan probe region can all be defined manually If areas are known where primers or probes must not bind e g repeat rich areas one or more No primers here regions can be defined The regions are defined by making a selection on the sequence and right clicking the selection It is required that at least a part of the Forward primer region is located upstream of the TaqMan Probe region and that the TaqMan Probe region is located upstream of a part of the Reverse primer region CHAPTER 16 PRIMERS 263 In TaqMan mode the Inner melting temperature menu in the primer parameters panel is activated allowing the user to set a Separate melting temperature interval for the TaqMan probe After exploring the available primers see section 16 3 and setting the desired parameter
289. e the data in a sequences list A Sequence List can also be generated using a dialog which is described here select two or more sequences right click the elements New Sequence List This action opens a Sequence List dialog q Create Sequence List Es 1 Select sequences of same me EC SEQUETICES UI Same typ Projects Selected Elements 6 CLC Data 094296 Example Data XxX ATP8al genomit XX ATP8al mRNA s ATP8al P39524 P57792 Q29449 QONTI2 Q95X33 feces Protein analyse Protein ortholog 233322 al RNA secondary Sequencing dat Q lt enter search term gt 4 wf Ok XX Cancel Figure 10 15 A Sequence List dialog The dialog allows you to select more sequences to include in the list or to remove already chosen sequences from the list Clicking Finish opens the sequence list It can be saved by clicking Save E or by dragging the tab of the view into the Navigation Area Opening a Sequence list is done by right click the sequence list in the Navigation Area Show 48 Graphical Sequence List OR Table FE CHAPTER 10 VIEWING AND EDITING SEQUENCES 164 The two different views of the same sequence list are shown in split screen in figure 10 16 B sequence list 9 50 100 A PERHIBA 50 100 PERHIBE _ j me 50 100 En PERH2ZBA E 100 w E o agaa eee
290. e to change the color This will display a dialog with three tabs Swatches HSB and RGB They represent three different ways of specifying colors Apply your settings and click OK When you click OK the color settings cannot be reset The Reset function only works for changes made before pressing OK Furthermore the Annotation Types can be used to easily browse the annotations by clicking the small button next to the type This will display a list of the annotations of that type see figure 10 8 Snnotation types Ge cos PR 4 Conflict B exon E SIE EST EM 2 mena H8G2 24478 36069 HBGL 39414 40985 BO Dl Old sequen 54740 56389 oP HEB 62137 63742 BM O Precuty oe thalassemia lt 62187 62389 CF Repea BO Repeat unit T Figure 10 8 Browsing the gene annotations on a sequence Clicking an annotation in the list will select this region on the sequence In this way you can quickly find a specific annotation on a long sequence View Annotations in a table Annotations can also be viewed in a table select the sequence in the Navigation Area Show 22 Annotation Table E or If the sequence is already open Click Show Annotation Table at the lower left part of the view This will open a view similar to the one in figure 10 9 In the Side Panel you can show or hide individual annotation types in the table E g if you CHAPTER 10 VIEWING AND EDITING SEQUENCES 156 ES NM 000044
291. e way your sequences alignments and other data are shown You will also see how to save the changes that you made in the Side Panel Open the protein alignment located under Protein orthologs in the Example data The initial view of the alignment has colored the residues according to the Rasmol color scheme and the alignment is automatically wrapped to fit the width of the view Shown in figure 2 5 EE ATPase protei O a a ela Pett O o nasa ou oo A TS P39524 MN BERET PPERKPCERE THE BETEN 094296 MARBEBNKON AKRISRDEDE DEEACESMic RTEDNPEECE a Every 10 residues z P57792 MAT GRRRAR No wrap Q9Sx33 Micce c FRRRRR i Pere Consensus MAT X XXRRXR eee eee tee ew ee eee eee ee F 100 Fixed wrap a Conservation RE P 7 o 0 ever 60 residues e 80 Q29449 Numbers on sequences Q29449 2000000 una Re uu F O9NTIZ cece eee Bee eee ee es sane eee ee ee Relative to 1 p39524 BOTTSHScSR SKM TNSHANG WilpPsHZEP EETRDEDADO 65 094296 BEREDRECSE SQMMSSSCQN STNP BRAD cs ae P57792 us abe Sauces Ys ces ss es ea waea cs es ee es ee Ce le es es ae ee eet ee eee es ae 11 Ide abels Q9SX33 2 eee eee Be ee umumaumuma Re ee eee 11 J Lock labels Consersus sees 8 5 see enter es eee ee ee eee Se eee sees Sequence label Conservation Name x 0 Dees Deen er ee ot v Show selection boxes Eva Figure 2 5
292. e you wish to use another license or see information about the license you currently use In this case open the license manager Help License Manager E The license manager is shown in figure 1 22 Besides letting you borrow licenses see section 1 4 5 this dialog can be used to e See information about the license e g what kind of license when it expires CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 2 d CLC DNA Workbench Feature na License type Expires in Status Borrow limit clednawb Network 192 168 1 200 50 days valid 3 days License Manager License overview License borrowing IF you use a license server and need to work outside of your organization network for an extended period of time your can borrow a copy of your licenses From the license server The borrowed license will allow you to use the application for the specified number of hours Borrow the selected licenses for a period of Configure Network License Upgrade License Figure 1 22 The license manager e Configure how to connect to a license server Configure License Server the button at the lower left corner Clicking this button will display a dialog similar to figure 1 19 e Upgrade from an evaluation license by clicking the Upgrade license button This will display the dialog shown in figure 1 1 If you wish to switch away from using a floating license click Configure License Server and choose not to connect to a
293. eason why the Enzyme lists folder is not listed as a batch unit is that it does not contain any sequences In this overview dialog the Workbench has filtered the data so that only the types of data accepted by the tool is shown DNA sequences in the example above 9 1 2 Batch filtering and counting At the bottom of the dialog shown in figure 9 3 the Workbench counts the number of files that will be run in total 90 in this case This is counted across all the batch units In some situations it is useful to filter the input for the batching based on names As an example this could be to include only paired reads for a mapping by only allowing names where paired is part of the name This is achieved using the Only use elements containing and Exclude elements containing text fields Note that the count is dynamically updated to reflect the number of input files based on the filtering lf a complete batch unit should be removed you can select it right click and choose Remove Batch Unit You can also remove items from the contents of each batch unit using right click and Remove Element 9 1 3 Setting parameters for batch runs For some tools the subsequent dialogs depend on the input data In this case one of the units is specified as parameter prototype and will be used to guide the choices in the dialogs Per default this will be the first batch unit marked in bold but this can be changed by right clicking another batch unit and
294. ee eee we eee te 28 Lose Report program errorS css E TES DS 28 1 5 3 CLC Sequence Viewer vs Workbenches 0 0084 29 1 6 When the program is installed Getting started 0 80 0888 29 LO TIRO assinar owe ea hee Ea da 29 1 6 2 Import of example data aoao a e a a 30 Er FPI ose hee Ea ORES ES SDRAM A 30 1 7 1 Installing plug iNS 2 c005 da 0 ap ek a koe E oe ee ee oe ee E 30 1 7 2 Uninstalling plug ins 2 2 ee ee es 31 1 7 3 Updating plug iNnS cn cee eae a weed SRS dad do 32 1 4 Pe s ara tae Fo oe eh a ee oe oe ee oe ee 32 1 8 Network configuration 1 2 eee annn 33 1 9 The format of the user manual 2 00 2 eee eee ee es 34 Lo ECRM 2c ao Pa ee a oe ee eee oe ee am ES 39 10 CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 11 Welcome to CLC DNA Workbench a software package supporting your daily bioinformatics work We strongly encourage you to read this user manual in order to get the best possible basis for working with the software package This software is for research purposes only CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 12 1 1 Contact information The CLC DNA Workbench is developed by CLC bio A S Science Park Aarhus Finlandsgade 10 12 8200 Aarhus N Denmark hetp www clco1o com VAT no DK 28 30 50 87 Telephone 45 70 22 55 09 Fax 45 70 22 55 19 E mail info clcbio com If you have questions or comments regarding the program you are welcome to cont
295. ee figure 19 6 Use this procedure to add fixpoints to the other sequence s that should be forced to align to each other When you click Create alignment and go to Step 2 check Use fixpoints in order to force the alignment algorithm to align the fixpoints in the selected sequences to each other In figure 19 7 the result of an alignment using fixpoints is illustrated You can add multiple fixpoints e g adding two fixpoints to the sequences that are aligned will force their first fixpoints to be aligned to each other and their second fixpoints will also be CHAPTER 19 SEQUENCE ALIGNMENT 352 peso46 ALCTADERA AvVTABWCREN HoEVEcCEADc 25 P6so53 MACTGEBRA AVTABWGRIN PDEVECEADRSG 29 P6B225 PE rodas Es Ener BAs o MMA T Copy Selection 7 Open Selection in Mew view PhBoTs PE 5 p MEGEEEERES AN Edit Selection p6s228 MENLSGDERN AV Ei Add Annotation 4dd Gaps After P68231 MMALSG DERN AV Add Gaps Before n EH Delete Selection P68063 WAWTA E Efo EM Realign Selection 9 Set Alignment Fixpoink Here PERIJS WHWTA E Eko LI Set Numbers Relative to This Selection 5 Consensus MVHLTXEEKN AV E create Pairwise Comparison Figure 19 6 Adding a fixpoint to a sequence in an existing alignment At the top you can see a fixpoint that has already been added ri g 100 HBA ANAPE HBA_ANSSE ED Se HBB_ANAPP HBB_AQUCH HBB_CALJA 100 200 HBA ANAPE E HBA ANSSE HBA_ACCGE
296. eins you can search with a Prosite regular expression and you should enter a protein pattern from the PROSITE database e Accuracy If you search with a simple motif you can adjust the accuracy of the motif to the match on the sequence If you type in a simple motif and let the accuracy be 80 the motif search algorithm runs through the input sequence and finds all subsequences of the same length as the simple motif such that the fraction of identity between the subsequence and the simple motif is at least 80 A motif match is added to the sequence as an annotation with the exact fraction of identity between the subsequence and the simple motif If you use a list of motifs the accuracy applies only to the simple motifs in the list e Search for reverse motif This enables searching on the negative strand on nucleotide sequences e Exclude unknown regions Genome sequence often have large regions with unknown sequence These regions are very often padded with N s Ticking this checkbox will not display hits found in N regions Motif search handles ambiguous characters in the way that two residues are different if they do not have any residues in common For example For nucleotides N matches any character and R matches A G For proteins X matches any character and Z matches E Q Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish There are two types of results that can be produced e Add annot
297. elect one or two select one or two sequences or Same type sequences of same type Projects Selected Elements 2 CLC Data Ns 09429 Example Data St ATP8al XX ATP8a1 genomic 7c ATP8al mRNA Ae Cloning Primers Protein analyses Protein ortholog Ss Sys P39524 St P57792 t Q29449 lt a 4e QONTIZ Xs Q9SX33 RNA secondary Sequencing data v 5 X Cancel Figure 13 3 Selecting sequences for the dot plot If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove elements from the selected elements Click Next to adjust dot plot parameters Clicking Next opens the dialog Shown in figure 13 4 Notice Calculating dot plots take up a considerable amount of memory in the computer Therefore you see a warning if the sum of the number of nucleotides amino acids in the sequences is higher than 8000 If you insist on calculating a dot plot with more residues the Workbench may shut down allowing you to save your work first However this depends on your computer s memory configuration Adjust dot plot parameters There are two parameters for calculating the dot plot e Distance correction only valid for protein sequences In order to treat evolutionary transitions of amino acids a distance correction measure can be used when calculating the dot plot These distan
298. elf annealing score is measured in number of hydrogen bonds between two copies of primer molecules with A T base pairs contributing 2 hydrogen bonds and G C base pairs contributing 3 hydrogen bonds Self end annealing Determines the maximum self end annealing value of all primers and probes This determines the number of consecutive base pairs allowed between the 3 end of one primer and another copy of that primer This score is calculated in number of hydrogen bonds the example below has a score of 4 derived from 2 A T base pairs each with 2 hydrogen bonds AATTCCCTACAATCCCCAAA AAACCCCTAACATCCCTTAA Secondary structure Determines the maximum score of the optimal secondary DNA structure found for a primer or probe Secondary structures are scored by the number of hydrogen bonds in the structure and 2 extra hydrogen bonds are added for each stacking base pair in the structure e 3 end G C restrictions When this checkbox is selected it is possible to specify restrictions concerning the number of G and C molecules in the 3 end of primers and probes A low G C content of the primer probe 3 end increases the specificity of the reaction A high G C content facilitates a tight binding of the oligo to the template but also increases the possibility of mispriming Unfolding the preference groups yields the following options End length The number of consecutive terminal nucleotides for which to consider the C G cont
299. ely to predict genes which are not real Setting a relatively high minimum length of the ORFs will reduce the number of false positive predictions but at the same time short genes may be missed see figure 14 9 th g J a NC 000913 selection ORF 8000 ORF aax ORF on e mp E ORF yaal Figure 14 9 The first 12 000 positions of the E coli sequence NC 000913 downloaded from GenBank The blue dark annotations are the genes while the yellow brighter annotations are the ORFs with a length of at least 100 amino acids On the positive strand around position 11 000 a gene starts before the ORF This is due to the use of the standard genetic code rather than the bacterial code This particular gene starts with CTG which is a start codon in bacteria Two short genes are entirely missing while a handful of open reading frames do not correspond to any of the annotated genes NC 000913 selection NC 000913 selection Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish Finding open reading frames is often a good first step in annotating sequences such as cloning vectors or bacterial genomes For eukaryotic genes ORF determination may not always be very helpful since the intron exon structure is not part of the algorithm Chapter 15 Protein analyses Contents 15 1 Protein CNAE cis eek Bea wae ee OE ee Oe A ew Oe es 237 15 1 1 Modifying the layout 45 oe ee we 2 ee E D
300. ement Delete 4 or select the element press Delete key This will cause the element to be moved to the Recycle Bin ff where it is kept until the recycle bin is emptied This means that you can recover deleted elements later on For deleting annotations instead of folders or elements see section 10 3 4 Restore Deleted Elements The elements in the Recycle Bin lj can be restored by dragging the elements with the mouse into the folder where they used to be If you have deleted large amounts of data taking up very much disk space you can free this disk space by emptying the Recycle Bin ff Edit in the Menu Bar Empty Recycle Bin ie Note This cannot be undone and you will therefore not be able to recover the data present in the recycle bin when it was emptied 3 1 8 Show folder elements in a table A location or a folder might contain large amounts of elements It is possible to view their elements in the View Area select a folder or location Show 2 in the Toolbar Contents H An example is shown in figure 3 6 When the elements are shown in the view they can be sorted by clicking the heading of each of the columns You can further refine the sorting by pressing Ctrl on Mac while clicking the heading of another column CHAPTER 3 USER INTERFACE Cloning vecto E Rows BO 84 a a a ha 45 Column width Mame Modified Modifie Description Length Linear m
301. en B M ller M B and Wibling G 2000 Statistical alignment computational properties homology testing and goodness of fit J Mol Biol 302 1 265 279 Henikoff and Henikoff 1992 Henikoff S and Henikoff J G 1992 Amino acid substitution matrices from protein blocks Proc Natl Acad Sci US A 89 22 10915 10919 Hopp and Woods 1983 Hopp T P and Woods K R 1983 A computer program for predicting protein antigenic determinants Mol Immunol 20 4 483 489 Ikai 1980 Ikai A 1980 Thermostability and aliphatic index of globular proteins J Biochem Tokyo 88 6 1895 1898 Janin 1979 Janin J 1979 Surface and inside volumes in globular proteins Nature 21 5696 491 492 BIBLIOGRAPHY 402 Jukes and Cantor 1969 Jukes T and Cantor C 1969 Mammalian Protein Metabolism chapter Evolution of protein molecules pages 21 32 New York Academic Press Karplus and Schulz 1985 Karplus P A and Schulz G E 1985 Prediction of chain flexibility in proteins Naturwissenschaften 2 212 213 Kimura 1980 Kimura M 1980 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences J Mol Evol 16 2 111 120 Knudsen and Miyamoto 2001 Knudsen B and Miyamoto M M 2001 A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins Proc Natl Acad Sci USA 98 25 14512 1451 7 Kolaskar an
302. en this is run on a CLC Server see http clcbio com server all the processes are placed in the queue and the queue is then taking care of distributing the jobs This means that if the server set up includes multiple nodes the jobs can be run in parallel If you need to stop the whole batch run you need to stop the master process 9 1 5 Running de novo assembly and read mapping in batch De novo assembly and read mapping are special in batch mode because they usually have the option of assigning individual mapping parameters to each input file When running in batch mode this is not possible Instead you can change the default parameters used for long and short reads respectively You can also set the paired distance for paired data Note that this means that you cannot use a combination of paired end and mate pair data for batching Figure 9 4 shows the parameter dialog when running read mapping in batch Note that you can only specify one setting for all short reads and one setting for all long reads When the analysis is run the reads are automatically categorized as either long or short and the parameters specified in the dialog are applied The same goes for all reads that are imported as paired where the minimum and maximum distances are applied 9 2 How to handle results of analyses This section will explain how results generated from tools in the Toolbox are handled by CLC DNA Workbench Note that this also applies to too
303. ence Click Zoom Out 73 in the toolbar click in the view until you reach a satisfying zoomlevel or Press on your keyboard The last option for zooming out is only available if you have a mouse with a scroll wheel or Press and hold Ctrl 38 on Mac Move the scroll wheel on your mouse backwards CHAPTER 3 USER INTERFACE 92 When you choose the Zoom Out mode the mouse pointer changes to a magnifying glass to reflect the mouse mode Note You might have to click in the view before you can use the keyboard or the scroll wheel to ZOOM If you want to get a quick overview of a sequence or a tree use the Fit Width function instead of the Zoom Out function If you press Shift while clicking in a View the zoom function is reversed Hence clicking on a sequence in this way while the Zoom Out mode toolbar item is selected zooms in instead of zooming out 3 3 3 Fit Width The Fit Width 4 function adjusts the content of the View so that both ends of the sequence alignment or tree is visible in the View in question This function does not change the mode of the mouse pointer 3 3 4 Zoom to 100 The Zoom to 100 function zooms the content of the View so that it is displayed with the highest degree of detail This function does not change the mode of the mouse pointer 3 3 5 Move The Move mode allows you to drag the content of a View E g if you are studying a sequence you can click anywhere in the sequenc
304. ence logo graph Color The sequence logo can be displayed in black or Rasmol colors For protein alignments a polarity color scheme is also available wnere hydrophobic residues are shown in black color hydrophilic residues as green acidic residues as red and basic residues as blue 19 2 1 Bioinformatics explained Sequence logo In the search for homologous sequences researchers are often interested in conserved sites residues or positions in a sequence which tend to differ a lot Most researches use alignments see Bioinformatics explained multiple alignments for visualization of homology on a given set of either DNA or protein sequences In proteins active sites in a given protein family are often highly conserved Thus in an alignment these positions which are not necessarily located in proximity are fully or nearly fully conserved On the other hand antigen binding sites in the Fap unit of immunoglobulins tend to differ quite a lot whereas the rest of the protein remains relatively unchanged In DNA promoter sites or other DNA binding sites are highly conserved see figure 19 8 This is also the case for repressor sites as seen for the Cro repressor of bacteriophage A When aligning such sequences regardless of whether they are highly variable or highly conserved at specific sites it is very difficult to generate a consensus sequence which covers the actual variability of a given position In order to better understand the in
305. ension cost 349 fraction 354 378 insert 357 open cost 349 Gateway cloning add attB sites 319 create entry clones 324 create expression clones 326 Gb Division 160 gbk file format 395 GC content 252 GCG Alignment file format 394 GCG Sequence file format 393 gck file format 395 GCK Gene Construction Kit file format 393 Gel INDEX separate sequences without restriction en zyme digestion 341 tabular view of fragments 339 Gel electrophoresis 340 380 marker 343 view 341 view preferences 341 when finding restriction sites 338 GenBank view sequence in 161 file format 393 search 167 377 search sequence in 1 1 tutorial 43 Gene Construction Kit file format 393 Gene expression analysis 377 Gene finding 234 General preferences 104 General Sequence Analyses 199 Genetic code reverse translation 245 Getting started tutorial 37 gff file format 395 Google sequence 1 1 Graph export data points in csv format 128 Graph Side Panel 381 Graphics data formats 395 export 124 gzip file format 395 Gzip file format 395 Half life 216 Handling of results 136 Header 116 Heat map 377 Help 29 Heterozygotes discover via secondary peaks 305 Hide show Toolbox 94 High throughput sequencing 376 History 131 export 123 preserve when exporting 132 source elements 132 Homology pairwise comparison of sequences in alignments 363 Hydrophobicity 239 378 Bioinformatics explained 2
306. ent Max no of G C The maximum number of G and C nucleotides allowed within the specified length interval Min no of G C The minimum number of G and C nucleotides required within the specified length interval e 5 end G C restrictions When this checkbox is selected it is possible to specify restrictions concerning the number of G and C molecules in the 5 end of primers and probes A high G C content facilitates a tight binding of the oligo to the template but also increases the possibility of mis priming Unfolding the preference groups yields the same options as described above for the 3 end e Mode Specifies the reaction type for which primers are designed Standard PCR Used when the objective is to design primers or primer pairs for PCR amplification of a single DNA fragment Nested PCR Used when the objective is to design two primer pairs for nested PCR amplification of a single DNA fragment Sequencing Used when the objective is to design primers for DNA sequencing TaqMan Used when the objective is to design a primer pair and a probe for TaqMan quantitative PCR Each mode is described further below e Calculate Pushing this button will activate the algorithm for designing primers CHAPTER 16 PRIMERS 254 16 3 Graphical display of primer information The primer information settings are found in the Primer information preference group in the Side Panel to the right of the view see figur
307. entical sequences A long stretch of the query protein is matched to the database e 10e 50 lt E value lt 10e 10 Closely related sequences could be a domain match or similar e 10e 10 lt E value lt 1 Could be a true homologue but it is a gray area e E value gt 1 Proteins are most likely not related e E value gt 10 Hits are most likely junk unless the query sequence is very short Gap costs For blastp it is possible to specify gap cost for the chosen substitution matrix There is only a limited number of options for these parameters The open gap cost is the price of introducing gaps in the alignment and extension gap cost is the price of every extension past the initial opening gap Increasing the gap costs will result in alignments with fewer gaps CHAPTER 12 BLAST SEARCH 194 Filters It is possible to set different filter options before running the BLAST search Low complexity regions have a very simple composition compared to the rest of the sequence and may result in problems during the BLAST search Wootton and Federhen 1993 A low complexity region of a protein can for example look like this fftfflllsss which in this case is a region as part of a signal peptide In the output of the BLAST search low complexity regions will be marked in lowercase gray characters default setting The low complexity region cannot be thought of as a significant match thus disabling the low complexity filter is likely to generate
308. ents window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish Note This is not the same as a reverse complement If you wish to create the reverse complement please refer to section 14 3 14 5 Translation of DNA or RNA to protein In CLC DNA Workbench you can translate a nucleotide sequence into a protein sequence using the Toolbox tools Usually you use the 1 reading frame which means that the translation starts from the first nucleotide Stop codons result in an asterisk being inserted in the protein sequence at the corresponding position It is possible to translate in any combination of the six reading frames in one analysis To translate select a nucleotide sequence Toolbox in the Menu Bar Nucleotide Analyses 4 Translate to Protein 7 or right click a nucleotide sequence Toolbox Nucleotide Analyses 5 Translate to Protein 5 CHAPTER 14 NUCLEOTIDE ANALYSES 233 This opens the dialog displayed in figure 14 5 o Translate to Protein ES 1 Select nucleotide q sequences Projects Selected Elements 1 5 CLC_Data Xc ATP8al mRNA Example Data XX ATP8a1 genomic sec x A Cloning H Primers 7 Protein analyses gt Protein orthologs 7 RNA secondary strui H Sequencing data Fl E DED xs fts 4 w
309. eotides respectively will not be included e Maximal difference in melting temperature of primers in a pair the number of degrees Celsius that primers in a pair are all allowed to differ e Max hydrogen bonds between pairs the maximum number of hydrogen bonds allowed between the forward and the reverse primer in a primer pair e Maximum length of amplicon determines the maximum length of the PCR fragment CHAPTER 16 PRIMERS 268 The output of the design process is a table of single primers or primer pairs as described for primer design based on single sequences These primers are specific to the included sequences in the alignment according to the criteria defined for specificity The only novelty in the table is that melting temperatures are displayed with both a maximum a minimum and an average value to reflect that degenerate primers or primers with mismatches may have heterogeneous behavior on the different templates in the group of included sequences Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum G C content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing 8 0 Maximum secondary structure 3 end must meet G C requirements No 5 end must meet G C requirements No Exclusion parameters Minimum number of mismatches 15 Minimum number of mismatches in 3 end 05 Length of 3 end 15 Primer combination p
310. epresented either by a simple sequence or a more advanced regular expression These advanced search Capabilities are available for use in both DNA and protein sequences There are two ways to access this functionality CHAPTER 13 GENERAL SEQUENCE ANALYSES 222 e When viewing sequences it is possible to have motifs calculated and shown on the sequence in a similar way as restriction sites see section 18 3 1 This approach is called Dynamic motifs and is an easy way to spot known sequence motifs when working with sequences for cloning etc e For more refined and systematic search for motifs can be performed through the Toolbox This will generate a table and optionally add annotations to the sequences The two approaches are described below 13 7 1 Dynamic motifs In the Side Panel of sequence views there is a group called Motifs see figure 13 22 Motifs Show Found 1 motif Labels Nolabels Include reverse motif F Exclude unknown regions attE1 0 attBz 0 SP6 0 T tO cM 1 T3 0 pGEX 5 00 T7 terminator 0 His tag 0 Seleck All Deselect All Manage Motifs Figure 13 22 Dynamic motifs in the Side Panel The Workbench will look for the listed motifs in the sequence that is open and by clicking the check box next to the motif it will be shown in the view as illustrated in figure 13 23 EEROR ATARAR IDE T IUGURCENANATCARCEUBAL TCE 440 460 AAAATGTCGTAACAACTCCG
311. equence in the table sequence list is displayed with e Name CHAPTER 10 VIEWING AND EDITING SEQUENCES 165 e Accession Description Modification date e Length The number of sequences in the list is reported as the number of Rows at the top of the table view Learn more about tables in section C Adding and removing sequences from the list is easy adding is done by dragging the sequence from another list or from the Navigation Area and drop it in the table To delete sequences simply select them and press Delete 4 You can also create a subset of the sequence list select the relevant sequences right click Create New Sequence List This will create a new sequence list which only includes the selected sequences 10 7 3 Extract sequences It is possible to extract individual sequences from a sequence list in two ways If the sequence list is Opened in the tabular view it is possible to drag with the mouse one or more sequences into the Navigation Area This allows you to extract specific sequences from the entire list Another option is to extract all sequences found in the list This can also be done for e Alignments EE e Contigs and read mappings gt Read mapping tables f BLAST result e BLAST overview tables RNA Seq samples 2 and of course sequence lists For mappings and BLAST results the main sequences i e reference consensus and query sequence will not be extracted To ext
312. equences by Name This opens a dialog where you can add the sequences you wish to sort You can also add sequence lists or the contents of an entire folder by right clicking the folder and choose Add folder contents When you click Next you will be able to specify the details of how the grouping should be performed First you have to choose how each part of the name should be identified There are three options e Simple This will simply use a designated character to split up the name You can choose a character from the list Underscore _ Dash Hash number sign pound sign Pipe Tilde Dot e Positions You can define a part of the name by entering the start and end positions e g from character number 6 to 14 For this to work the names have to be of equal lengths e Java regular expression This is an option for advanced users where you can use a special syntax to have total control over the splitting See more below CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 281 In the example above it would be sufficient to use a simple split with the underscore _ character since this is how the different parts of the name are divided When you have chosen a way to divide the name the parts of the name will be listed in the table at the bottom of the dialog There is a checkbox next to each part of the name This checkbox is used to specify which of the name parts should be used for grouping
313. equirements The system requirements of CLC DNA Workbench are these e Windows XP Windows Vista or Windows 7 Windows Server 2003 or Windows Server 2008 e Mac OS X 10 5 or newer PowerPC G4 G5 or Intel CPU required e Linux RedHat 5 or later SUSE 10 or later e 32 or 64 bit e 256 MB RAM required e 512 MB RAM recommended 1024 x 68 display recommended 1 4 Licenses When you have installed CLC DNA Workbench and start for the first time you will meet the license assistant shown in figure 1 1 The following options are available They will be described in detail in the following sections e Request an evaluation license The license is a fully functional time limited license see below e Download a license When you purchase a license you will get a license ID from CLC bio Using this option you will get a license based on this ID e Import a license from a file If CLC bio has provided a license file or if you have downloaded a license from our web based licensing system you can import it using this option e Upgrade license If you already have used a previous version of CLC DNA Workbench and you are entitled to upgrading to the new CLC DNA Workbench 6 6 select this option to get a license upgrade CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 16 License Wizard al d CLC DNA Workbench You need a license In order to use this application you need a valid license Please choose how you would like
314. er you have changed the preference you have to re open your tables to see the effect 5 2 2 Import and export Side Panel settings If you have created a special set of settings in the Side Panel that you wish to share with other CLC users you can export the settings in a file The other user can then import the settings To export the Side Panel settings first select the views that you wish to export settings for Use Ctri click click on Mac or Shift click to select multiple views Next click the Export button Note that there is also another export button at the very bottom of the dialog but this will export the other settings of the Preferences dialog see section 5 5 A dialog will be shown see figure 5 6 that allows you to select which of the settings you wish to export When multiple views are selected for export all the view settings for the views will be shown in the dialog Click Export and you will now be able to define a save folder and name for the exported file The settings are saved in a file with a vsf extension View Settings File To import a Side Panel settings file make sure you are at the bottom of the View panel of the CHAPTER 5 USER PREFERENCES AND SETTINGS 108 x q Select Settings To Export Non compact 4 No annotations No restriction sites XX Cancel Figure 5 6 Exporting all settings for circular views Preferences dialog and click the
315. erent Side Panel settings that are saved for each view See section 5 6 for more about how to create and save style sheets lf there are other settings beside CLC Standard Settings you can use this overview to choose which of the settings should be used per default when you open a view see an example in figure 5 4 In this example the CLC Standard Settings is chosen as default CHAPTER 5 USER PREFERENCES AND SETTINGS 107 EB Preferences ees User Defined view Settings Available Editors amp 30 Molecule IEE Alignment General 8 BLAST Graphics ES BLAST Table CLC Standard Settings Non com pact 5 Motif List editor are e E Multi BLAST Table Read Mappin No restriction sites BE Scatter Plot Advanced amp Search Parame ters Aer Sequence Small RNA sample ES Table FEB Table Te Tree Export Import Help XX Cancel Export Import Figure 5 4 Selecting the default view setting 5 2 1 Number formatting in tables In the preferences you can specify how the numbers should be formatted in tables see figure 5 5 PN E PALHA TORA TCA A MS MO CALA Number of fraction digits 2 12 35 1 43 0 12 Examples 0 01 1 23E 3 1 23E 4 1 23E 5 Figure 5 5 Number formatting of tables The examples below the text field are updated when you change the value so that you can see the effect Aft
316. eria concern single primers aS primer pairs are not generated until the Calculate button is pressed Parameters regarding primer and probe sets are described in detail for each reaction mode see below e Length Determines the length interval within which primers can be designed by setting a maximum and a minimum length The upper and lower lengths allowed by the program are 50 and 10 nucleotides respectively e Melting temperature Determines the temperature interval within which primers must lie When the Nested PCR or TaqMan reaction type is chosen the first pair of melting tempera ture interval settings relate to the outer primer pair i e not the probe Melting temperatures are calculated by a nearest neighbor model which considers stacking interactions between neighboring bases in the primer template complex The model uses state of the art thermo dynamic parameters SantaLucia 1998 and considers the important contribution from the dangling ends that are present when a short primer anneals to a template sequence Bom marito et al 2000 A number of parameters can be adjusted concerning the reaction mixture and which influence melting temperatures See below Melting temperatures are corrected for the presence of monovalent cations using the model of SantaLucia 1998 and temperatures are further corrected for the presence of magnesium deoxynucleotide triphosphates dNTP and dimethyl sulfoxide DMSO using the model of von Ahsen et al
317. ers that are general for all primers in an alignment simply add them all to the set of included sequences by checking all selection boxes Specificity of priming is determined by criteria set by the user in the dialog box which is shown when the Calculate button is pressed see below Different options can be chosen concerning the match of the primer to the template sequences in the included group CHAPTER 16 PRIMERS 267 e Perfect match Specifies that the designed primers must have a perfect match to all relevant sequences in the alignment When selected primers will thus only be located in regions that are completely conserved within the sequences belonging to the included group e Allow degeneracy Designs primers that may include ambiguity characters where hetero geneities occur in the included template sequences The allowed fold of degeneracy is user defined and corresponds to the number of possible primer combinations formed by a degenerate primer Thus if a primer covers two 4 fold degenerate site and one 2 fold degenerate site the total fold of degeneracy is 4 x 4 x 2 32 and the primer will when supplied from the manufacturer consist of a mixture of 32 different oligonucleotides When scoring the available primers degenerate primers are given a score which decreases with the fold of degeneracy e Allow mismatches Designs primers which are allowed a specified number of mismatches to the included template sequences The melti
318. es R Y etc The contig will display an ambiguity nucleotide reflecting the different nucleotides found in the reads For an overview of ambiguity codes see Appendix Vote A C G T The conflict will be solved by counting instances of each nucleotide and then letting the majority decide the nucleotide in the contig In case of equality ACGT are given priority over one another in the stated order Note that conflicts will always be highlighted no matter which of the options you choose Furthermore each conflict will be marked as annotation on the contig sequence and will be present if the contig sequence is extracted for further analysis As a result the details of any experimental heterogeneity can be maintained and used when the result of single sequence analyzes is interpreted When the parameters have been adjusted click Next to see the dialog shown in figure 17 18 f g Assemble Sequences to Reference amp s 1 Select some nucleotide set algorithm parameters sequences 2 Set reference parameters 3 Set algorithm parameters Alignment options Minimum aligned read length 50 Alignment stringency Medium w Trimming options Use existing trim information Generally not necessary since a reference sequence is used Output options Show tabular view of contigs A Previous gt Next Enh Xena Figure 17 18 Different options for the output of the assembly In this d
319. es alphabetically Right click the name of a sequence Sort Sequences Alphabetically If you change the Sequence name in the Sequence Layout view preferences you will have to ask the program to sort the sequences again The sequences can also be sorted by similarity grouping similar sequences together Right click the name of a sequence Sort Sequences by Similarity 19 3 6 Delete rename and add sequences Sequences can be removed from the alignment by right clicking the label of a sequence right click label Delete Sequence This can be undone by clicking Undo in the Toolbar If you wish to delete several Sequences you can check all the sequences right click and choose CHAPTER 19 SEQUENCE ALIGNMENT 359 Delete Marked Sequences To show the checkboxes you first have to click the Show Selection Boxes in the Side Panel A sequence can also be renamed right click label Rename Sequence This will show a dialog letting you rename the sequence This will not affect the sequence that the alignment is based on Extra sequences can be added to the alignment by creating a new alignment where you select the current alignment and the extra sequences see section 19 1 The same procedure can be used for joining two alignments 19 3 7 Realign selection If you have created an alignment it is possible to realign a part of it leaving the rest of the alignment unchanged select a part of the alignment to realign right
320. es are pooled before running the analysis If you want individual outputs for each sequence you would need to run the tool five times or alternatively use the Batching mode Batching mode is activated by clicking the Batch checkbox in dialog where the input data is selected Batching simply means that each data set is run separately just as if the tool has been run manually for each one For some analyses this simply means that each input sequence should be run separately but in other cases it is desirable to pool sets of files together in one run This selection of data for a batch run is defined as a batch unit When batching is selected the data to be added is the folder containing the data you want to batch The content of the folder is assigned into batch units based on this concept 133 CHAPTER 9 BATCHING AND RESULT HANDLING 134 A EB Find Binding Sites and Create Fragments 1 Choose where to run Navigation Area 2 Select nucleotide le CLC Data Example Data Cloning 3 Cloning vector libr H206 sequence s to match primer against m gt BOC JOG NE RNG JOG pATI53 XX PATHI 906 p THIO 98 pATHII XM paTHZ J96 PATHS DOG pBLCAT2 4 TT r JOC PBLCATS z gt gt Q enter search term gt Batch Previous gt Next OURS lI m Selected Elements 5 RRRRR Mi3mp8 pucs Mi3mpa puco pACYC177 pAcyc1s4 p M34
321. es available e To the right there is a list of the enzymes that will be used Select enzymes in the left side panel and add them to the right panel by double clicking or clicking the Add button E gt If you e g wish to use EcoRV and BamHI select these two enzymes and add them to the right side panel If you wish to use all the enzymes in the list Click in the panel to the left press Ctrl A 8 A on Mac Add gt The enzymes can be sorted by clicking the column headings i e Name Overhang Methylation or Popularity This is particularly useful if you wish to use enzymes which produce e g a 3 overhang In this case you can sort the list by clicking the Overhang column heading and all the enzymes producing 3 overhangs will be listed together for easy selection When looking for a specific enzyme it is easier to use the Filter If you wish to find e g Hindlll sites simply type Hindlll into the filter and the list of enzymes will shrink automatically to only include the Hindlll enzyme This can also be used to only show enzymes producing e g a 3 overhang as shown in figure 18 51 The CLC DNA Workbench comes with a standard set of enzymes based on http www rebase neb com You can customize the enzyme database for your installation see section E CHAPTER 18 CLONING AND CUTTING 333 Restriction Site Analysis Select DNA RNA sequence s Enzyme list Enzymes to be considered ae i 6 rear di Use
322. es no hits you will be asked if you wish to search for matches that start with your search term If you accept this an asterisk will be appended to the search term Pressing the Alt key while you click a search result will high light the search hit in its folder in the Navigation Area CHAPTER 4 SEARCHING YOUR DATA 100 Fe GA lol td CLC_Data se Example Data 6 Fa 2 Fo Nucleotide a Protein E README ee Recycle bin 14 Figure 4 3 Page two of the search results In the preferences see 5 you can specify the number of hits to be shown 4 2 2 Special search expressions When you write a search term in the search field you can get help to write a more advanced search expression by pressing Shift F1 This will reveal a list of guides as shown in figure 4 4 Wildcard search Search related words Include both terms AMD Include either term OR Any field search contents Name search name Length search length START TO ENC J Organism search organism Figure 4 4 Guides to help create advanced search expressions You can select any of the guides using mouse or keyboard arrows and start typing If you e g wish to search for sequences named BRCA1 select Name search name and type BRCA1 Your search expression will now look like this name BRCA1 The guides available are these e Wildcard search Appending an asterisk to the search term will find matches st
323. es of same type Projects Selected Elements 1 CLC_Data XC ATP8al mRNA B b Example Data 7c ATP8al genomic xx Shs ATP8al H H Cloning H Primers Protein analyses H H Protein ortholog RNA secondary __ Sequencing data gt 4 p p Qy lt enter search term gt 4 Previous gt Next Finish X Cancel Figure 13 1 Choosing sequence for shuffling If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Click Next to determine how the shuffling should be performed In this step shown in figure 13 2 For nucleotides the following parameters can be set r g Shuffle Sequence ls 1 Select one or more sequences of same type 2 Set parameters Resampling methods Mononucleotide shuffling Mononucleotide sampling from zero order Markov chain Dinucleotide shuffling Dinucleotide sampling From first order Markov chain Number of sequences 10 CJS Ce Revs nee ma Xena Figure 13 2 Parameters for shuffling e Mononucleotide shuffling Shuffle method generating a sequence of the exact same mononucleotide frequency e Dinucleotide shuffling Shuffle method generating a sequence of the exact same dinu cleotide frequency e Mononucleotide sampling from zero order Markov chain Resampling meth
324. esigning primers 2 1 Specifying a region for the forward primer First zoom out to get an overview of the sequence by clicking Fit Width a You can now see the blue gene annotation labeled AtpSa1 and just before that there is the green CMV promoter This may be hidden behind restriction site annotations Remember that you can always choose not to Show these by altering the settings in the right hand pane CHAPTER 2 TUTORIALS 58 In this tutorial we want the forward primer to be in a region between positions 600 and 900 just before the gene you may have to zoom in 40 to make the selection Select this region right click and choose Forward primer region here see figure 2 34 ACGTATTAGTCATCACTATTACPATARTARATANrARTTTTAGCAGTA Forward primer region here fe Reverse primer region here gt Forward inner primer region here 44 Reverse inner primer region here E Region to amplify Pa TaqMan primer probe region a No primers here Figure 2 34 Right clicking a selection and choosing Forward primer region here This will add an annotation to this region and five rows of red and green dots are seen below as shown in figure 2 35 CMV promoter Forward primer region pcDNA3 atp8a1l ACGTATTAGTCAT CGCTAT TACCAT GGT GATGCGCGTTTTGGCAGTAC Lgt 18 000000000000000000000000000000000004 Lgt 19 0000000000000000000000000000000 00004 Lgt 20 000000000000000000000000000000000 004 Lgt 21 000000000000000000000000
325. et the minimum and maximum sizes of the fragments to be shown The table is described in detail below Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish An example of a binding site annotation is shown in figure 16 19 primer 6 mismatches 1 o o jm p x Primer binding site primer6 mismatches 11 inote Primer GCTaGACCGACAATTGOCATGA fnote Number of mismatches 1 fnote Number of other hits on peONAS atpeat O fnote Primer binding region 151 171 GGCAAGGCTTG GCTTAGGGT Figure 16 19 Annotation showing a primer match The annotation has the following information e Sequence of the primer Positions with mismatches will be in lower case see the fourth position in figure 16 19 where the primer has an a and the template sequence has a T e Number of mismatches e Number of other hits on the same sequence This number can be useful to check specificity of the primer e Binding region This region ends with the 3 exact match and is simply the primer length upstream This means that if you have 5 extensions to the primer part of the binding region covers sequence that will actually not be annealed to the primer CHAPTER 16 PRIMERS pcDNAJ atp al Rows amp Primer name 5 HindIII Binding sites Orientation Fwd Fid Fwd rey rey rey rey 274 Region Weng 1575 1594 3316 3337 Column width Mismatches Number of
326. ew by zooming in or out Click Zoom in 540 or Zoom out in the Toolbar and click the view Finally you can modify the format of the text heading each lane in the Text format preferences in the Side Panel 18 5 Restriction enzyme lists CLC DNA Workbench includes all the restriction enzymes available in the REBASE database However when performing restriction site analyses it is often an advantage to use a customized list of enzymes In this case the user can create special lists containing e g all enzymes available in the laboratory freezer all enzymes used to create a given restriction map or all enzymes that are available form the preferred vendor In the example data see section 1 6 2 under Nucleotide gt Restriction analysis there are two enzyme lists one with the 50 most popular enzymes and another with all enzymes that are included in the CLC DNA Workbench This section describes how you can create an enzyme list and how you can modify it 18 5 1 Create enzyme list CLC DNA Workbench uses enzymes from the REBASE restriction enzyme database at http rebase neb com To create an enzyme list of a subset of these enzymes You can customize the enzyme database for your installation see section E SYou can customize the enzyme database for your installation see section E CHAPTER 18 CLONING AND CUTTING 344 File New Enzyme list 5 E This opens the dialog shown in figure 18 50 a Create
327. f annotations Sometimes you end up with annotations which do not have a meaningful name In that case there is an advanced batch rename functionality Open the Annotation Table E select the annotations that you want to rename right click the selection Advanced Rename This will bring up the dialog shown in figure 10 11 EI Rename o Use this qualifier iF exists organism Lise annotation type as name Ken Figure 10 11 The Advanced Rename dialog In this dialog you have two options e Use this qualifier Use one of the qualifiers as name A list of all qualifiers of all the selected annotations is shown Note that if one of the annotations do not have the qualifier you have chosen it will not be renamed If an annotation has multiple qualifiers of the same type the first is used for naming e Use annotation type as name The annotation s type will be used as name e g if you have an annotation of type Promoter it will get Promoter as its name by using this option A similar functionality is available for batch re typing annotations is available in the right click menu as well in case your annotations are not typed correctly Open the Annotation Table E select the annotations that you want to retype right click the selection Advanced Retype This will bring up the dialog shown in figure 10 12 In this dialog you have two options e Use this qualifier Use one of the qualifiers as type A lis
328. f broken pairs 298 Data storage location 8 Data formats bioinformatic 392 graphics 395 Data preferences 108 Data sharing 8 Data structure 78 Database GenBank 167 local 78 407 NCBI 186 nucleotide 386 peptide 386 Shared BLAST database 186 Db source 160 db_xref references 1 2 de multiplexing 279 Delete element 83 residues and gaps in alignment 357 workspace 95 Description 160 batch edit 84 DGE 377 Digital gene expression 3 DIP detection 3 6 Dipeptide distribution 218 Discovery studio file format 393 Distance pairwise comparison of Sequences in alignments 363 DNA translation 232 DNAstrider file format 393 Dot plots 3 9 Bioinformatics explained 204 create 201 print 203 Double cutters 329 Double stranded DNA 142 Download and open search results GenBank 1 0 Download and save search results GenBank 1 0 Download of CLC DNA Workbench 12 Drag and drop Navigation Area 80 search results GenBank 169 DS Gene file format 393 E PCR 271 Edit alignments 35 7 3 8 annotations 156 158 377 enzymes 330 sequence 149 sequences 37 7 single bases 149 Element INDEX delete 83 rename 83 embl file format 395 Embl file format 393 Encapsulated PostScript export 126 End gap cost 349 End gap costs cheap end caps 349 free end gaps 349 Entry clone creating 324 Enzyme list 343 create 343 edit 345 view 345 eps format export 126 Error reports 28 E
329. f the plot can be set by clicking the color box For Colors the color box is replaced by a gradient color box as described under Foreground color Protein info These preferences only apply to proteins The first nine items are different hydrophobicity scales and are described in section 15 2 2 e Kyte Doolittle The Kyte Doolittle scale is widely used for detecting hydrophobic regions in proteins Regions with a positive value are hydrophobic This scale can be used for identifying both surface exposed regions as well as transmembrane regions depending on the window size used Short window sizes of 5 generally work well for predicting putative surface exposed regions Large window sizes of 19 21 are well suited for finding transmembrane domains if the values calculated are above 1 6 Kyte and Doolittle 1982 These values should be used as a rule of thumb and deviations from the rule may occur e Cornette Cornette et al computed an optimal hydrophobicity scale based on 28 published scales Cornette et al 1987 This optimized scale is also suitable for prediction of alpha helices in proteins e Engelman The Engelman hydrophobicity scale also known as the GES scale is another scale which can be used for prediction of protein hydrophobicity Engelman et al 1986 As the Kyte Doolittle scale this scale is useful for predicting transmembrane regions in proteins e Eisenberg The Eisenberg scale is a normalized consensus hydrophobic
330. f type All Files Options Automatic import Force import as type ACE files ace Force import as external file s Figure 7 1 The import dialog Next select one or more files or folders to import and click Select This allows you to select a place for saving the result files If you import one or more folders the contents of the folder is automatically imported and placed in that folder in the Navigation Area If the folder contains subfolders the whole folder structure iS imported In the import dialog figure 7 1 there are three import options Automatic import This will import the file and CLC DNA Workbench will try to determine the format of the file The format is determined based on the file extension e g SwissProt files have swp at the end of the file name in combination with a detection of elements in the file that are specific to the individual file formats If the file type is not recognized it will be imported as an external file In most cases automatic import will yield a successful result but if the import goes wrong the next option can be helpful CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 119 Force import as type This option should be used if CLC DNA Workbench cannot successfully determine the file format By forcing the import as a specific type the automatic determination of the file format is bypassed and the file is imported as the type specified Force import as external file This
331. f you select Print whole view you will get a result that looks like figure 6 4 This means that you also print the part of the sequence which is not visible when you have zoomed in CHAPTER 6 PRINTING 115 Figure 6 4 A print of the sequence selecting Print whole view The whole sequence is shown even though the view is zoomed in on a part of the sequence 6 2 Page setup No matter whether you have chosen to print the visible area or the whole view you can adjust page setup of the print An example of this can be seen in figure 6 5 EB Page Setup o Portrait Landscape Paper Size A4 X Fit to pages Horizontal pages Vertical pages wf OK X Cancel Help Figure 6 5 Page Setup In this dialog you can adjust both the setup of the pages and specify a header and a footer by clicking the tab at the top of the dialog You can modify the layout of the page using the following options e Orientation Portrait Will print with the paper oriented vertically Landscape Will print with the paper oriented horizontally e Paper size Adjust the size to match the paper in your printer e Fit to pages Can be used to control how the graphics should be split across pages see figure 6 6 for an example Horizontal pages If you set the value to e g 2 the printed content will be broken up horizontally and split across 2 pages This is useful for Sequences that are not wrapped Vertical
332. fgdaikn P68945 EWAWHaeeKaNitaolWokWnVadeGgaealarlSSSSViVVOWEGFEFsSFGnVSsptailoApmM PARGKKVEt sfgdavk n P6a063 EWAWHaeeKalitalWa FEE EEE EEE EET n NP 032247 mynftaeektlinglwskunveevagealori ESSivvypwthrffosfonissasaimonprukahokkvltafgesiknl CAA32220 myhftacekaaitsiwdkvdlekvogetloriEssanivypwtarffokfontssagaimonprikahokkvEtsglavkni CAA24102 HEREN FER ESE BRSEEEE CRRCEREEEERNNNEEEEEEEEEEA ERRERA E E HEE ni P04443 GUNFLoeekoBitsiWGkWAI ckVGGeRNGFISEEENiWYOWLGREToKTonissocainonprikah RRVEtsiglavkn Q6WN28 muhltoceksavtalwokvynvdevogealori Essitvvypwtartfesfodistodavmnnpkvkahokkvlgafsdalth Q6WN21 muhltgesksavit AHH CH HHHH HHHH HH EAH et A P67821 M APRacekKsavitt lWokWAVdevgGea Gr IESESivvyoWtaGrhfoSfgdist pdavmnApkWKERGKKV gafsdgith CAA26204 muhlltpesksavtalwakvynvdevagealorivsriivvypwtartfesfodistpdavmonpkvkahokkvlgafsdalah P68873 MYANEpSeKsavtalWakWAVdeVG Gea NOR ERRENA Rne SHGdNSt pdavmoAPKYRARGKKVEoafsdalah Figure 19 16 The tabular format of a multiple alignment of 24 Hemoglobin protein sequences Sequence names appear at the beginning of each row and the residue position is indicated by the numbers at the top of the alignment columns The level of sequence conservation is shown on a color scale with blue residues being the least conserved and red residues being the most conserved It is therefore commonplace to either ignore this complication and assume sequences to be unrelated or to use heuristic corrections for shared ancestry The
333. figure 17 6 the dialog should include a linker for the Srfl site a barcode a sequence a barcode now reversed and finally a linker again as shown in figure 17 8 If you have paired data the dialog shown in figure 17 8 will be displayed twice one for each part of the pair Clicking Next will display a dialog as shown in figure 17 9 The barcodes can be entered manually by clicking the Add E button You can edit the barcodes and the names by clicking the cells in the table The name is used for naming the results In addition to adding barcodes manually you can also Import E barcode definitions from an Excel or CSV file The input format consists of two columns the first contains the barcode sequence the second contains the name of the barcode An acceptable csv format file would contain columns of information that looks like AAAAAA Sample1 GGGGGG Sample2 CCCCCC Sample3 The Preview column will show a preview of the results by running through the first 10 000 reads At the top you can choose to search on both strands for the barcodes this is needed for some 454 protocols where the MID is located at either end of the read CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 285 q Process Tagged Sequences 1 Choose where to run Net parameters Tag list 2 Select nucleotide sequences 1 Linker Linker length 4 nucleotides gt Barcode Barcodes length 6 Define barcodes in next step Sequence Seq
334. formation content or significance of certain positions a sequence logo can be used The sequence logo displays the information content of all positions in an alignment as residues or nucleotides stacked on top of each other see figure 19 8 The sequence logo provides a far more detailed view of the entire alignment than a simple consensus sequence Sequence logos can aid to identify protein binding sites on DNA sequences and can also aid to identify conserved residues in aligned domains of protein sequences and a wide range of other applications Each position of the alignment and consequently the sequence logo shows the sequence information in a computed score based on Shannon entropy Schneider and Stephens 1990 The height of the individual letters represent the sequence information content in that particular position of the alignment A sequence logo is a much better visualization tool than a simple consensus sequence An example hereof is an alignment where in one position a particular residue is found in 70 of the sequences If a consensus sequence is used it typically only displays the single residue with TO coverage In figure 19 8 an un gapped alignment of 11 E coli start codons including flanking regions are shown In this example a consensus sequence would only display ATG as the start codon in position 1 but when looking at the sequence logo it is seen that a GTG is also allowed as a Start codon CHAPTER 19 SEQUENCE ALIGNMENT 35
335. gle Lower comparison Selects the comparison to show in the lower triangle Choose the same comparison as in the upper triangle to show all the results of an asymmetric comparison Lower comparison gradient Selects the color gradient to use for the lower triangle Diagonal from upper Use this setting to show the diagonal results from the upper comparison Diagonal from lower Use this setting to show the diagonal results from the lower comparison No Diagonal Leaves the diagonal table entries blank e Layout Lock headers Locks the sequence labels and table headers when scrolling the table Sequence label Changes the sequence labels CHAPTER 19 SEQUENCE ALIGNMENT 364 e Text format Text size Changes the size of the table and the text within it Font Changes the font in the table Bold Toggles the use of boldface in the table 19 6 Bioinformatics explained Multiple alignments Multiple alignments are at the core of bioinformatical analysis Often the first step in a chain of bioinformatical analyses is to construct a multiple alignment of a number of homologs DNA or protein sequences However despite their frequent use the development of multiple alignment algorithms remains one of the algorithmically most challenging areas in bioinformatical research Constructing a multiple alignment corresponds to developing a hypothesis of how a number of sequences have evolved through
336. gment with additional sequences by extending the primers 5 of the template specific part of the primer i e between the template specific part and the attB sites See an example of this in figure 18 21 where a Shine Dalgarno site has been added between the attB site and the gene of interest At the top of the dialog see figure 18 16 you can specify primer additions such as a Shine Dalgarno site start codon etc Click in the text field and press Shift F1 to show some of the most common additions see figure 18 17 Use the up and down arrow keys to select a tag and press Enter This will insert the selected sequence as shown in figure 18 18 At the bottom of the dialog you can see a preview of what the final PCR product will look like In the middle there is the sequence of interest i e the sequence you selected as input In the beginning is the attB1 site and at the end is the attB2 site The primer additions that you have inserted are shown in colors like the green Shine Dalgarno site in figure 18 18 This default list of primer additions can be modified see section 18 2 1 CHAPTER 18 CLONING AND CUTTING 321 Add attB Sites 1 Select nucleotide sequences Insets 2 Specify auxiliary insets Forward insets Press Shift F1 For options Reverse insets Press Shift F1 For options Preview GGGG ACAAGTTTGTACAAAAAAGCAGGCTTA Sequence of interest AACCCAGCTTTCTTGTACAAAGTGGT CCCC attB2 Figure 18 16 Primer addit
337. ground distribution of amino acids from a range of organisms Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish This will open a view showing the patterns found as annotations on the original sequence see figure 13 21 If you have selected several Sequences a corresponding number of views will be opened Pattern1 Pattern1 L sVCNKNGQTA EDLAWSYGFP ECARFLTMIK CMQTARSSGE Figure 13 21 Sequence view displaying two discovered patterns 13 6 2 Pattern search output If the analysis is performed on several sequences at a time the method will search for patterns in the sequences and open a new view for each of the sequences in which a pattern was discovered Each novel pattern will be represented as an annotation of the type Region More information on each found pattern is available through the tool tip including detailed information on the position of the pattern and quality scores It is also possible to get a tabular view of all found patterns in one combined table Then each found pattern will be represented with various information on obtained scores quality of the pattern and position in the sequence A table view of emission values of the actual used HMM model is presented in a table view This model can be saved and used to search for a similar pattern in new or unknown sequences 13 7 Motif Search CLC DNA Workbench offers advanced and versatile options to search for known motifs r
338. h of the four nucleotides the trace data can be selected and unselected CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 219 e Scale traces A slider which allows the user to scale the height of the trace area Scaling the traces individually is described in section 17 1 1 acer readi 20 A sequence Settings 4 gt readi CTGGGCATCGACTGAGACACGCTGTGGATATATG riscas Nucleotide info J b Translation w Trace data Show A trace C trace G trace T trace C Show confidence Trace data read Trace data readi CTTCAGCTTTGGTGGGTTTACATTTAAAAGAACA E Trace height medium v Trace data Scaling drag trace data in view 120 gt GIC content gt Secondary structure readi AGCGGGTCATCAGTCAAAAAAGAGGAAGAAGTGC b Find J v HosstoryR 2 E Figure 17 3 A sequence with trace data The preferences for viewing the trace are shown in the Side Panel 17 2 Multiplexing When you do batch sequencing of different samples you can use multiplexing techniques to run different samples in the same run There is often a data analysis challenge to separate the sequencing reads so that the reads from one sample are mapped together The CLC DNA Workbench supports automatic grouping of samples for two multiplexing techniques e By name This supports grouping of reads based on their name e By sequence tag This supports grouping of reads based on information within the sequence tagged sequences The details
339. he Frame box The result is shown in figure 2 20 You can see that the variation is on the third base of the codon coding for threonine so this iS a synonymous substitution That is why the T is colored yellow If it was a non synonymous substitution it would be colored in red CHAPTER 2 TUTORIALS 50 ContigSettinas x ha 4 iE AAGTCAAAGTCATCACAICT TGCCATCGGGGATC sembly layout Q V K V T L A G D gt Sequence layout gt Annotation layout Conflict gt Annotation types gt Residue coloring 4 AAGTCAAAGTCATCACECTTGCCATCGGGGATC gt Alignment info Q V K V l T i L A l G E v Nucleotide info E gt Color space encoding E Translation Show Frame ORFICDS vi Table AAGTCAAAGTCATCACGCTTGCCATCGGGGATC Standard K Q V K V l T L A G D Only AUG start codons Single letter codes MyWii Vv N Qualy score gt GIC content gt Secondary structure gt Find gt Text Format AAGTCAAAGTCATCACGICTTGCCATCGGGGATC e Figure 2 20 Showing the translation along the contig VA 2 5 8 Getting an overview of the conflicts Browsing the conflicts by clicking the Find Conflict button is useful in many cases but you might also want to get an overview of all the conflicts in the entire contig This is easily achieved by Showing the contig in a table view Press and hold the Ctrl button 38 on Mac Click Show Table H8 at the bottom of the view This will open a table showing the confl
340. he information Note CLC files can be exported from and imported into all the different CLC Workbenches Backup If you wish to secure your data from computer breakdowns it is advisable to perform regular backups of your data Backing up data in the CLC DNA Workbench is done in two ways e Making a backup of each of the folders represented by the locations in the Navigation Area e Selecting all locations in the Navigation Area and export E in zip format The resulting file will contain all the data stored in the Navigation Area and can be imported into CLC DNA Workbench if you wish to restore from the back up at some point CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 124 No matter which method is used for backup you may have to re define the locations in the Navigation Area if you restore your data from a computer breakdown 1 2 External files In order to help you organize your research projects CLC DNA Workbench lets you import all kinds of files E g if you have Word Excel or pdf files related to your project you can import them into the Navigation Area of CLC DNA Workbench Importing an external file creates a copy of the file which is stored at the location you have chosen for import The file can now be opened by double clicking the file in the Navigation Area The file is opened using the default application for this file type e g Microsoft Word for doc files and Adobe Reader for pdf External files are imported
341. he Help menu of the program This installs the data automatically You can also go to http www clcbio com download and download the example data from there lf you download the file from the website you need to import it into the program See chapter 7 1 for more about importing data 1 7 Plug ins When you install CLC DNA Workbench it has a standard set of features However you can upgrade and customize the program using a variety of plug ins As the range of plug ins is continuously updated and expanded they will not be listed here Instead we refer to http www clcbio com plug ins fora full list of plug ins with descriptions of their functionalities 1 7 1 Installing plug ins Plug ins are installed using the plug in manager Help in the Menu Bar Plug ins and Resources E or Plug ins in the Toolbar The plug in manager has four tabs at the top e Manage Plug ins This is an overview of plug ins that are installed e Download Plug ins This is an overview of available plug ins on CLC bio s server 7In order to install plug ins on Windows Vista the Workbench must be run in administrator mode Right click the program shortcut and choose Run as Administrator Then follow the procedure described below CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 31 e Manage Resources This is an overview of resources that are installed e Download Resources This is an overview of available resources on CLC bio s server
342. he sequences AT CATCAAATAGTGTCAA e Right click the sequence name to the left to manipulate the whole sequence e Right click a selection to manipulate the selection The two menus are described in the following Manipulate the whole sequence Right clicking the sequence name at the left side of the view reveals several options on sorting opening and editing the sequences in the view see figure 18 9 CHAPTER 18 CLONING AND CUTTING 314 a a ee a ee a M SS PBR O Open Sequence in Circular View uence det Duplicate Sequence mid Reverse Complement Sequence Digest and Create Restriction Map Rename Sequence Select Sequence Delete Sequence Open Copy of Sequence in Mew view Open This Sequence in New View Make Sequence Linear Sort Sequence List by Name Sort Sequence List by Length o Web Info k Figure 18 9 Right click on the sequence in the cloning view e Open sequence in circular view 0 Opens the sequence in a new circular view If the sequence is not circular you will be asked if you wish to make it circular or not This will not forge ends with matching overhangs together use Make Sequence Circular instead e Duplicate sequence Adds a duplicate of the selected sequence The new sequence will be added to the list of sequences shown on the screen e Insert sequence after this sequence Insert another sequence after this sequence The sequence to be inserted can be selected from
343. he work You may not use this work for commercial purposes You may not alter transform nor build upon this work SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents Chapter 20 Phylogenetic trees Contents 20 1 Inferring phylogenetic trees 2 2 ee tee ee ee es 366 20 1 1 Phylogenetic tree parameters 2 0 08 ee eee ee ee 367 20 1 2 Tree View Preferences osooso ce a a a eee eee ee ee 369 20 2 Bioinformatics explained phylogenetics 080 8808 ee enue 371 20 2 1 The phylogenetic tree 2 2 oao aoa a a a a a a 371 20 2 2 Modem usage of phylogenies nononono oa o a 2 0 02 eee ee eee 372 20 2 3 Reconstructing phylogenies from molecular data 372 20 2 4 Interpreting phylogenies oaoa oaoa a a e a a 374 CLC DNA Workbench offers different ways of inferring phylogenetic trees The first part of this chapter will briefly explain the different ways of inferring trees in CLC DNA Workbench The second part Bioinformatics explained will give a more general introduction to the concept of phylogeny and the associated bioinformatics methods 20 1 Inferring phylogenetic trees For a given set of aligned sequences see chapter 19 it is possible to infer their evolutionary relationships In CLC DNA Workbench this may be done either by using a distance based method see Bioinformatics explained in section 20 2 or by us
344. here is no directionality indicated when setting parameters for melting temperature differences between inner and outer primer pair i e it is not specified whether the inner pair should have a lower or higher Tm Instead this is determined by the allowed temperature intervals for inner and outer primers that are set in the primer parameters preference group in the side panel If a higher Tm of inner primers is desired choose a Tm interval for inner primers which has higher values than the interval for outer primers Two radio buttons allowing the user to choose between a fast and an accurate algorithm for primer prediction CHAPTER 16 PRIMERS 262 16 6 1 Nested PCR output table In nested PCR there are four primers in a solution forward outer primer FO forward inner primer FI reverse inner primer RI and a reverse outer primer RO The output table can show primer pair combination parameters for all four combinations of primers and single primer parameters for all four primers in a solution See section on Standard PCR for an explanation of the available primer pair and single primer information The fragment length in this mode refers to the length of the PCR fragment generated by the inner primer pair and this is also the PCR fragment which can be exported 16 7 TaqMan CLC DNA Workbench allows the user to design primers and probes for TaqMan PCR applications TaqMan probes are oligonucleotides that contain a fluorescent repor
345. hey will be cut off CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 296 a EB Add Sequences to Contig 1 Select some nucleotide s paramete sequences and one contig 2 Set parameters Alignment options Minimum aligned read length 50 Alignment stringency Medium w Trimming options Use existing trim information Generally not necessary since a reference sequence is used Output options Show tabular view of contigs q Previous gt Next A Einish XX Cancel Figure 17 19 Setting assembly parameters when assembling to an existing contig 17 7 View and edit contigs The result of the assembly process is one or more contigs where the sequence reads have been aligned see figure 17 20 540 a60 a Consensus TGAATACTCCAGTACAGAGAGGGTG radi TGAATACTCCAGITIACAGAGAGGGTG recat Pala VW read TGAATACTCCAGICIACAGAG tees WWW read3 TGAATACTCCAGITIACAGAGAGGGTG fees IN ASA Da IN Nw wy Vy Figure 17 20 The view of a contig Notice that you can zoom to a very detailed level in contigs You can see that color of the residues and trace at the end of one of the reads has been faded This indicates that this region has not contributed to the contig This may be due to trimming before or during the assembly or due to misalignment to the other reads You can easily adjust the trimmed area to include more of the read in the contig simply drag the edge of the faded area
346. horizontal axis x axis Enter a value in Min and Max and press Enter This will update the view If you wait a few seconds without pressing Enter the view will also be updated e Vertical axis range Sets the range of the vertical axis y axis Enter a value in Min and Max and press Enter This will update the view If you wait a few seconds without pressing Enter the view will also be updated e X axis at zero This will draw the x axis at y O Note that the axis range will not be changed e Y axis at zero This will draw the y axis at x O Note that the axis range will not be changed e Show as histogram For some data series it is possible to see the graph as a histogram rather than a line plot 381 APPENDIX B GRAPH PREFERENCES 382 The Lines and plots below contains the following settings e Dot type None Cross Plus Square Diamond Circle Triangle Reverse triangle Dot Dot color Allows you to choose between many different colors Click the color box to select a color Line width Thin Medium Wide e Line type None Line Long dash Short dash e Line color Allows you to choose between many different colors Click the color box to select a color For graphs with multiple data series you can select which curve the dot and line preferences Should apply to This setting is at the top of the Side Panel group Note that the graph title and the axes
347. how to draw the line no matter what the zoom factor is thereby always giving a correct image This format is good for e g graphs and reports but less usable for e g dot plots If the image is to be resized or edited vector graphics are by far the best format to store graphics If you open a vector graphics file in an application like e g Adobe Illustrator you will be able to manipulate the image in great detail Graphics files can also be imported into the Navigation Area However no kinds of graphics files can be displayed in CLC DNA Workbench See section 7 2 for more about importing external files into CLC DNA Workbench 7 3 3 Graphics export parameters When you have specified the name and location to save the graphics file you can either click Next or Finish Clicking Next allows you to set further parameters for the graphics export whereas clicking Finish will export using the parameters that you have set last time you made a graphics export in that file format if it is the first time it will use default parameters Parameters for bitmap formats For bitmap files clicking Next will display the dialog shown in figure 7 12 f EB Export Graphics Es 1 Output options Ma 2 Save in file 3 Export size Choose resolution Screen resolution 530x3072 pixels 9 MB memory usage Low resolution 286x1660 pixels 2 MB memory usage Medium resolution 1145x6640 pixels 43 MB memory usage High resolution 4582x2656
348. ialog you can specify more options e Minimum aligned read length The minimum number of nucleotides in a read which must be successfully aligned to the contig If this criteria is not met by a read this is excluded from the assembly e Alignment stringency Specifies the stringency of the scoring function used by the alignment step in the contig assembly algorithm A higher stringency level will tend to produce contigs CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 295 with less ambiguities but will also tend to omit more sequencing reads and to generate more and shorter contigs Three stringency levels can be set Low Medium High e Use existing trim information When using a reference sequence trimming is generally not necessary but if you wish to use trimming you can check this box It requires that the sequence reads have been trimmed beforehand see section 17 3 for more information about trimming e Show tabular view of contigs A contig can be shown both in a graphical as well as a tabular view If you select this option a tabular view of the contig will also be opened Even if you do not select this option you can show the tabular view of the contig later on by clicking Show 4 and selecting Table F5 For more information about the tabular view of contigs see section 1 Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish This will start the asse
349. ical level the CLC DNA Workbench uses the NCBI s blast software see ftp ftp ncbi nlm nih gov blast executables blast LATEST Thus the results of using a particular data set to search the same database with the same search parameters would give the same results whether run locally or at the NCBI There are a number of options for what you can search against e You create a database based on data already imported into your Workbench see sec tion 12 3 3 e You can add pre formatted databases see section 12 3 1 e You can use sequence data from the Navigation Area directly without creating a database first To conduct a BLAST search or Toolbox BLAST Local BLAST 2 This opens the dialog seen in figure 12 5 Select one or more sequences of the same type DNA or protein and click Next This opens the dialog seen in figure 12 6 At the top you can choose between different BLAST programs See section 12 1 1 for information about these methods You then specify the target database to use CHAPTER 12 BLAST SEARCH 179 e Local BLAST 1 Select sequences of same peido tek ee Navigation Area Selected Elements 1 Ga CLC Data ss ATPSal ka Example Data su gt 5 Protein orthologs x ATP8al MRNA Protein analyses B Cloning XX ATP8al genomic sequence tj Sequencing data F Primers Eq RNA secondary structure a E e R A R za
350. icts You can right click the Note field and enter your own comment In this dialog enter a new text in the Name and click OK When you edit a comment this is reflected in the conflict annotation on the consensus sequence This means that when you use this sequence later on you will easily be able to see the comments you have entered The comment could be e g your interpretation of the conflict 2 5 9 Documenting your changes Whenever you make a change like deleting a T it will be noted in the contig s history To open the history click the fHistory Li icon at the bottom of the view In the history you can see the details of each change see figure 2 21 2 5 10 Using the result for further analyses When you have finished editing the contig it can be saved and you can also extract and save the consensus sequence CHAPTER 2 TUTORIALS o1 Ch Reference contig C gt NO LLU FL User smoensted Parameters Read name Fide Old aligned region New aligned region 139 955 37 J05 Ill Comments Edit Wo Comment User smoensted Parameters Region 977 Modified element Revs Comments Edit Wo Comment Figure 2 21 The history of the contig showing that a T has been deleted and that the aligned region has been moved Right click the name Consensus Open Copy of Sequence Save HD This will make it possible to use this sequence for further analyses in the CLC DNA Workbench
351. idues and gaps 2 ee a a a a a Sof Poe NCCU BCS oo wee eh owe Oe ee ee we eee Bo eee 357 19 3 3 Delete residues and gaps 1 aoao a a a a o a a 357 19 3 4 Copy annotations to other sequences aooaa a a 358 19 3 5 Move sequences up and down a a a a a ee es 358 19 3 6 Delete rename and add sequences 0 800 ee ee eee 358 19 3 7 Realign selection inde ktathadeevRiett bee dut ae a ot 359 19 4 Join alignments 462 ce Be eee eR EES RELEASES a 6 E 359 19 4 1 How alignments are joined 2 ee ee 361 19 5 Pairwise comparison 0 0 ee ee ee 361 19 5 1 Pairwise comparison on alignment selection 361 19 5 2 Pairwise comparison parameters 0 a a a ee a a 362 19 5 3 The pairwise comparison table a a a a a ee ee ee a 363 19 6 Bioinformatics explained Multiple alignments 2 22080808 364 19 6 1 Use of multiple alignments 0 200500 eee 364 19 6 2 Constructing multiple alignments 2 0 50208 364 CLC DNA Workbench can align nucleotides and proteins using a progressive alignment algorithm see section 19 6 or read the White paper on alignments in the Science section of http www clcbio com This chapter describes how to use the program to align sequences The chapter also describes alignment algorithms in more general terms 347 CHAPTER 19 SEQUENCE ALIGNMENT 348 19 1 Create an alignment Alignments can be created from sequences sequence lis
352. ight of the graph x Type The type of the graph Line plot Displays the graph as a line plot Bar plot Displays the graph as a line plot Colors Displays the graph as a color bar using a gradient like the foreground and background colors x Color box Specifies the color of the graph for line and bar plots and specifies a gradient for colors e Color different residues Indicates differences in aligned residues Foreground color Colors the letter Background color Sets a background color of the residues e Sequence logo A sequence logo displays the frequencies of residues at each position in an alignment This is presented as the relative heights of letters along with the degree of sequence conservation as the total height of a stack of letters measured in bits of information The vertical scale is in bits with a maximum of 2 bits for nucleotides and approximately 4 32 bits for amino acid residues See section 19 2 1 for more details CHAPTER 19 SEQUENCE ALIGNMENT 355 Foreground color Color the residues using a gradient according to the information content of the alignment column Low values indicate columns with high variability whereas high values indicate columns with similar residues Background color Sets a background color of the residues using a gradient in the same way as described above Logo Displays sequence logo at the bottom of the alignment x Height Specifies the height of the sequ
353. ill have entries named Element deleted An easy way to export an element with all its source elements is to use the Export Dependent Elements function described in section 7 1 3 The history view can be printed To do so click the Print icon 55 The history can also be exported as a pdf file Select the element in the Navigation Area Export ES in File of type choose History PDF Save Chapter 9 Batching and result handling Contents 9 1 Batch processing tea tet ee eb ee ns A ew Oe ee EO 133 S411 CANON cerrada eS we Se a we we 134 9 1 2 Batch filtering and counting cisnes bow Bowe E Es 135 9 1 3 Setting parameters for batch runs 2 002 ee eee 135 9 1 4 Running the analysis and organizing the results 136 9 1 5 Running de novo assembly and read mapping in batch 136 9 2 How to handle results of analyses 2 0 ee eee et es 136 9 2 1 TADIGOUIDUIG sirene dd SSR A Re ee 138 9 2 4 MEN woke eee eee ee ee ERS eRe REE ERASE ee i 138 9 1 Batch processing Most of the analyses in the Toolbox are able to perform the same analysis on several elements in one batch This means that analyzing large amounts of data is very easily accomplished As an example if you use the Find Binding Sites and Create Fragments 2 tool if you supply five sequences as shown in figure 9 1 the result table will present an overview of the results for all five sequences This is because the input Sequenc
354. imal alignment of the forward and the reverse primer in a primer pair e Pair end annealing the maximum score of consecutive end base pairings found between the ends of the two primers in the primer pair in units of hydrogen bonds e Fragment length the length number of nucleotides of the PCR fragment generated by the primer pair 16 6 Nested PCR Nested PCR is a modification of Standard PCR aimed at reducing product contamination due to the amplification of unintended primer binding sites mispriming If the intended fragment can not be amplified without interference from competing binding sites the idea is to seek out a larger outer fragment which can be unambiguously amplified and which contains the smaller intended fragment Having amplified the outer fragment to large numbers the PCR amplification of the inner fragment can proceed and will yield amplification of this with minimal contamination Primer design for nested PCR thus involves designing two primer pairs one for the outer fragment and one for the inner fragment In Nested PCR mode the user must thus define four regions a Forward primer region the outer forward primer a Reverse primer region the outer reverse primer a Forward inner primer region and a Reverse inner primer region These are defined by making a selection on the sequence and right clicking the selection If areas are known where primers must not bind e g repeat rich areas one or more No primers here
355. imer 1 269 154 401 imer 6 EcoRy primer 5 HindIII 1483 133 1615 Figure 16 22 Right clicking a fragment allows you to annotate the region on the input sequence or open the fragment as a new sequence This will put a PCR fragment annotations on the input sequence covering the region specified in the table AS you can see from figure 16 22 you can also choose to Open Fragment This will create a new sequence representing the PCR product that would be the result of using these two primers Note that if you have extensions on the primers they will be used to construct the new sequence If you are doing restriction cloning using primers with restriction site extensions you can use this functionality to retrieve the PCR fragment for us in the cloning editor see section 18 1 16 12 Order primers To facilitate the ordering of primers and probes CLC DNA Workbench offers an easy way of displaying and saving a textual representation of one or more primers select primers in Navigation Area Toolbox in the Menu Bar Primers and Probes E2 Order Primers This opens a dialog where you can choose additional primers Clicking OK opens a textual representation of the primers see figure 16 23 The first line states the number of primers being ordered and after this follows the names and nucleotide sequences of the primers in 5 3 orientation From the editor the primer information can be copied and pasted to web forms or e mails
356. in clu d E Select reads to include Paired end status Include paired end reads From broken pairs Include single reads Match specificity Include specific matches Include non specific matches Alignment quality Include perfectly aligned reads Include reads with less than perfect alignment Figure 17 23 Selecting the reads to include Include paired reads from broken pairs When a pair is broken either because only one read in the pair matches or because the distance or relative orientation is wrong the reads are placed and colored as single reads but you can still extract them by checking this box Include single reads This will include reads that are marked as single reads aS opposed to paired reads Note that paired reads that have been broken during assembly are not included in this category Single reads that come from trimming paired sequence lists are included in this category Match specificity Include specific matches Reads that only are mapped to one position Include non specific matches Reads that have multiple equally good alignments to the reference These reads are colored yellow per default Alignment quality Include perfectly aligned reads Reads where the full read is perfectly aligned to the reference sequence or consensus sequence for de novo assemblies Note that at the end of the contig reads may extend beyond the contig this is not visible unless you make a selection on the read and observe the posi
357. in this CHAPTER 14 NUCLEOTIDE ANALYSES 234 printable version of the user manual Instead the tables are included in the Help menu in the Menu Bar in the appendix Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish The newly created protein is shown but is not saved automatically To save a protein sequence drag it into the Navigation Area or press Ctrl S 6 S on Mac to activate a save dialog 14 5 1 Translate part of a nucleotide sequence If you want to make separate translations of all the coding regions of a nucleotide sequence you can check the option Translate CDS and ORF in the translation dialog see figure 14 6 If you want to translate a specific coding region which is annotated on the sequence use the following procedure Open the nucleotide sequence right click the ORF or CDS annotation Translate CDS ORF F choose a translation table OK If the annotation contains information about the translation this information will be used and you do not have to specify a translation table The CDS and ORF annotations are colored yellow as default 14 6 Find open reading frames The CLC DNA Workbench Find Open Reading Frames function can be used to find all open reading frames ORF in a sequence or by choosing particular start codons to use it can be used as a rudimentary gene finder ORFs identified will be shown as annotations on the sequence You have
358. ing Primers Protein analyses Protein orthologs JR econdary structure Sequencing data dli Previous gt Next Finis XX Cancel Figure 18 40 Choosing sequence ATP8a 1 MRNA for restriction map analysis If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Selecting sorting and filtering enzymes Clicking Next lets you define which enzymes to use as basis for finding restriction sites on the sequence At the top you can choose to Use existing enzyme list Clicking this option lets you select an enzyme list which is stored in the Navigation Area See section 18 5 for more about creating and modifying enzyme lists Below there are two panels e To the left you see all the enzymes that are in the list select above If you have not chosen to use an existing enzyme list this panel shows all the enzymes available e To the right there is a list of the enzymes that will be used The CLC DNA Workbench comes with a standard set of enzymes based on http www rebase neb com You can customize the enzyme database for your installation see section E CHAPTER 18 CLONING AND CUTTING 336 Select enzymes in the left side panel and add them to the right panel by double clicking or clicking the Add button E gt If you e g wish
359. ing Data Analyses 140 Relative to 1 Eal Primers and Probes fa Cloning and Restriction Sites HUMDINUC CACACACACACACACACACACACTGC Ce BLAST Search 160 180 Follow selection 5A Database Search v E E H E E H EE E Processes Toolbox _ Idle 1 element s are selected Figure 2 2 The HUMDINUC file is imported and opened 2 2 Tutorial View sequence This brief tutorial will take you through some different ways to display a sequence in the program The tutorial introduces zooming on a sequence dragging tabs and opening selection in new view We will be working with the sequence called pcDNAS atp8a1 located in the Cloning folder in the Example data Double click the sequence in the Navigation Area to open it The sequence is displayed with annotations above it See figure 2 3 ack pcONAS atp al 140 160 pcONA3 atpsal TTTAAGCTACAACAAGGCAAGGCTTGACCGACAATTGCATGAA CMV promoter Ee em pcONAs atpsal GAATCTGCTTAGGCGTTAGGCGTTTTGCGCTGCTTCGCGATGTA promoter pcONAs atpsal CGGGCCAGATATACGCGTTGACATTGATTATTGACTAGTTATT fe O Ea ODE Figure 2 3 Sequence pcDNA3 atp8al opened in a view CHAPTER 2 TUTORIALS 40 As default CLC DNA Workbench displays a sequence with annotations colored arrows on the sequence like the green promoter region annotation in figure 2 3 and zoomed to see the residues In this tutorial we want to have an overview of the whole sequence
360. ing restriction sites to use for cloning and inserting the fragment into the vector 2 6 1 Locating the data to use Open the Example data folder in the Navigation Area Open the Cloning folder and inside this folder open the Primer folder If you do not have the example data please go to the Help menu to import it The data to use in these folders is shown in figure 2 23 E Example Data Z Ho ATPSal genomic sequence RR a isis Mus ATFEa E 7 Cloning Gene of 5 Cloning vector library EF Enzyme lists pee Nucleotide motifs Expression vector ATRE ATPBal fwd DOE ATP Bal rev PO ala Primers Figure 2 23 The data to use in this tutorial Double click the ATP8a1 mRNA sequence and zoom to Fit Width _ and you will see the yellow annotation which is the coding part of the gene This is the part that we want to insert into the pcDNA4 TO vector The primers have already been designed using the primer design tool in CLC DNA Workbench to learn more about this please refer to the Primer design tutorial 2 6 2 Add restriction sites to primers First we add restriction sites to the primers In order to see which restriction enzymes can be used we create a split view of the vector and the fragment to insert In this way we can easily make a visual check to find enzymes from the multiple cloning site in the vector that do not cut in the gene of interest To create the split view double click the pcDN
361. ing the data to use duke 6b 0 SH ew oe we E AE RE 52 2 6 2 Add restriction sites to primers 1 a a 52 2 6 3 Simulate PCR to create the fragment n nononono oa a a a 54 2 6 4 Specify restriction sites and perform cloning aoao aoa a a 55 2 7 Tutorial Primer design 1 0 ee eee et ee 57 2 1 Specifying a region for the forward primer 2 2 858205 5 2 2 Examining the primer suggestions 0 2 502 50006 58 Edo Calculating a primer pair and oe ee be ee ee A EEE 60 CHAPTER 2 TUTORIALS 37 2 8 Tutorial BLAST search 2 02 ee eet ee 61 2 8 1 Performing the BLAST search 2 a eee ee ee eee 61 2 8 2 Inspecting the results n soosoo a a a a eed b w Gea ea amp ws 63 2 8 3 Using the BLAST table view 00 80 eee ee ee ne 63 2 9 Tutorial Tips for specialized BLAST searches 082 2000 64 2 9 1 Locate a protein sequence on the chromosome 64 2 9 2 BLAST for primer binding sites 0 a 67 2 9 3 Finding remote protein homologues 0 55806 67 2 9 4 Further reading 2 2 6 2 ee ee rara 68 2 10 Tutorial Align protein sequences 0 088 ee eee een nee 69 2 10 1 The alignment dialog 2 4 eee we Pee eee RE ee ee E E E A 69 2 11 Tutorial Create and modify a phylogenetic tree 0258 0888 71 2 11 1 BECO bw a REDE EOE Eee eee ee he 11 2 12 Tutorial Find restriction sites 1 eee a 72 2 12 1 The Side Panel wa
362. ing the statistically founded maximum likelihood ML approach Felsenstein 1981 Both approaches generate a phylogenetic tree The tools are found in Toolbox Alignments and trees 3 To generate a distance based phylogenetic tree choose Create Tree and to generate a maximum likelihood based phylogenetic tree choose Maximum Likelihood Phylogeny f In both cases the dialog displayed in figure 20 1 will be opened 366 CHAPTER 20 PHYLOGENETIC TREES 367 E q Create Tree 1 Select alignments of Select alia ments Of sa E ee Projects Selected Elements 1 a omni IEE alignment 1 k Example Data Cloning Primers Protein analyses 5 Protein orthologs RNA secondary str Sequencing data b Q lt enter search term gt A E Previous gt Next Finish X Cancel Figure 20 1 Creating a Tree If an alignment was selected before choosing the Toolbox action this alignment is now listed in the Selected Elements window of the dialog Use the arrows to add or remove elements from the Navigation Area Click Next to adjust parameters 20 1 1 Phylogenetic tree parameters Distance based methods E BB create Tree EJ 1 Select alignments of See parameters same type 2 Set parameters Algorithm Neighbor Joining w Bootstrapping V Perform bootstrap analysis Replicates 100 JL9 Previous et Jensh Xena Figure 20 2 Adjusting pa
363. ings Apply Saved Settings P Figure 2 8 Saving the settings of the Side Panel This will open the dialog shown in figure 2 9 CHAPTER 2 TUTORIALS 43 a Save Settings ES Please enter a name for these user settings my settings Always apply these settings ECA Figure 2 9 Dialog for saving the settings of the Side Panel In this way you can save the current state of the settings in the Side Panel so that you can apply them to alignments later on If you check Always apply these settings these settings will be applied every time you open a view of the alignment Type My settings in the dialog and click Save 2 3 2 Applying saved settings When you click the Save Restore Settings button i again and select Apply Saved Settings you will see My settings in the menu together with some pre defined settings that the CLC DNA Workbench has created for you see figure 2 10 ad a Save Settings Delete Settings Apply Saved Settings b Black white Conservation color Mon compack Show annotations my settings CLIC Standard Settings Figure 2 10 Menu for applying saved settings Whenever you open an alignment you will be able to apply these settings Each kind of view has its own list of settings that can be applied At the bottom of the list you will see the CLC Standard Settings which are the default settings for the view 2 4 Tutorial GenBank search and download The CLC DNA Work
364. ino acids are the basic components of proteins The amino acid distribution in a protein is simply the percentage of the different amino acids represented in a particular protein of interest Amino acid composition is generally conserved through family classes in different organisms which can be useful when studying a particular protein or enzymes across species borders Another interesting observation is that amino acid composition variate slightly between CHAPTER 13 GENERAL SEQUENCE ANALYSES 218 proteins from different subcellular localizations This fact has been used in several computational methods used for prediction of subcellular localization Annotation table This table provides an overview of all the different annotations associated with the sequence and their incidence Dipeptide distribution This measure is simply a count or frequency of all the observed adjacent pairs of amino acids dipeptides found in the protein It is only possible to report neighboring amino acids Knowledge on dipeptide composition have previously been used for prediction of subcellular localization Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in its original form and CLC bio has to be clearly labeled as
365. ion by studying its annotation or by aligning it to the query sequence 2 8 3 Using the BLAST table view at the bottom As an alternative to the graphic BLAST view you can click the Table View H This will display a tabular view of the BLASt hits as shown in figure 2 45 CHAPTER 2 TUTORIALS 64 EE ATP amp al BLAST amp Rows 54 Summary of hits From query ATPBal Filter All x Description Probable phospholipid transporting ATPase IB ATPase class I type 8A me 0 00 4 058 00 Probable phospholipid transporting ATPase ID ATPase class I type 8B me 0 00 2 120 00 Probable phospholipid transporting ATPase IM ATPase class I type 8B me 0 00 2 109 00 Probable phospholipid transporting ATPase IC Familial intrahepatic cholest 0 00 2 078 00 Probable phospholipid transporting ATPase IF ATPase class I type 11B A 0 00 1 732 00 Probable phospholipid transporting ATPase IH ATPase class I type 11A 0 00 1 711 00 Probable phospholipid transporting ATPase IG ATPase class I type 11C 0 00 1 670 00 Probable ia tl ali sake ATPase IE ATPase class I type 8B me 2 93E 151 1 372 00 4 99E 50 499 00 Download and Open Download and Open Download and Save Open at NCBI Open Structure Figure 2 45 Output of a BLAST search shown in a table This view provides more statistics about the hits and you can use the filter to search for e g a specific type of protein etc If you wish to d
366. ion clone containing all the fragments will be created You can find an explanation of the multi site gateway system at http tools invitrogen com downloads gateway multisite seminar html CHAPTER 18 CLONING AND CUTTING 327 Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish The output is a number of expression clones depending on how many entry clones and destination vectors that you selected The attL and attR sites have been used for the recombination and the expression clone is now equipped with attB sites as shown in figure 18 26 hine Dalgarno tp8a1 ROP atp8a1_CDS pDEST14 Expression Clone 8086 bp pBR322 __ m SS bla promot Figure 18 26 The resulting expression clone opened in a circular view You can choose to create a sequence list with the bi products as well 18 3 Restriction site analysis There are two ways of finding and showing restriction sites e In many cases the dynamic restriction sites found in the Side Panel of Sequence views will be useful since it is a quick and easy way of showing restriction sites e In the Toolbox you will find the other way of doing restriction site analyses This way provides more control of the analysis and gives you more output options e g a table of restriction sites and you can perform the same restriction map analysis on several sequences in one step This chapter first describes the dynamic restriction sites fol
367. ion of DNA or RNA to protein 0 000004 eee Find open reading frames 15 Protein analyses toal Taa Protein charge Hydrophobicity 15 3 Reverse translation from protein into DNA aoaaa a a ee ee ee 16 Primers 16 1 16 2 16 3 16 4 16 5 16 6 16 7 16 8 16 9 Primer design an introduction Setting parameters for primers and probes nononono oaoa oa e a e a Graphical display of primer information a aoa aoao a e 0 0000 ee eee Output from primer design Standard PCR Nested PCR AM a ce o anaana Sequencing primers Alignment based primer and probe design 2 0 00 ee ee eee 16 10 Analyze primer properties 16 11 Find binding sites and create fragments a a ee ees 16 12 Order primers 17 Sequencing data analyses and Assembly ay a Importing and viewing trace data 212 218 219 221 229 229 230 231 232 232 234 237 231 239 243 248 249 251 254 255 256 260 262 264 265 269 2 1 215 277 CONTENTS f do MNA Cok eee eee eee ee EEE TS ee eS 279 17 3 Trim sequences 1 rara 288 1 4 Assemble sequences 0 0 a a aa 291 17 5 Assemble to reference sequence 1 ee a 293 17 6 Add sequences to an existing contig oaoa oa 0 eee ee 295 List WIEW ANG COM Contigs lt aa ss a eee a ee hee Be ee ee DAS ew 296 1f 0 Reassemble CONU sane eae bea bbe waa eo be wee Pk eat
368. ionalities are missing and you will have to restart the CLC DNA Workbench again without pressing Shift 1 5 3 CLC Sequence Viewer vs Workbenches The advanced analyses of the commercial workbenches CLC Protein Workbench CLC RNA Workbench and CLC DNA Workbench are not present in CLC Sequence Viewer Likewise some advanced analyses are available in CLC DNA Workbench but not in CLC RNA Workbench or CLC Protein Workbench and vice versa All types of basic and advanced analyses are available in CLC Main Workbench However the output of the commercial workbenches can be viewed in all other workbenches This allows you to share the result of your advanced analyses from e g CLC Main Workbench with people working with e g CLC Sequence Viewer They will be able to view the results of your analyses but not redo the analyses The CLC Workbenches and the CLC Sequence Viewer are developed for Windows Mac and Linux platforms Data can be exported imported between the different platforms in the same easy way as when exporting importing between two computers with e g Windows 1 6 When the program is installed Getting started CLC DNA Workbench includes an extensive Help function which can be found in the Help menu of the program s Menu bar The Help can also be shown by pressing F1 The help topics are sorted in a table of contents and the topics can be searched We also recommend our Online presentations where a product specialist from CLC bi
369. ions 5 of the template specific part of the primer p Insets abe Forward insets Shine Dalgarno 4664667 Kozak ATG ACCATGG Kozak upstream of ATG ACC Start codon 47S Pi His tag PCATCACCATCACCATCAC Kozak Peroxisomal targetting sequence 4CCATGSGCTGOCGTSETGCTCOCECGSCGSC Lumio tag TSTTSTCCTGSCTSTTGC TEY cleavage site G4444CCTSTATTITOAGGGA EK cleavage site GACGATGACGATAAA Sequence oF Interest o ee AACCCAGCTTTCTTGTACASAGTGGT CCCC atthe Figure 18 17 Pressing Shift F1 shows some of the common additions This default list can be modified see section 18 2 1 You can also manually type a sequence with the keyboard or paste in a sequence from the clipboard by pressing Ctrl v v on Mac Clicking Next allows you to specify the length of the template specific part of the primers as shown in figure 18 19 The CLC DNA Workbench is not doing any kind of primer design when adding the attB sites As a user you simply specify the length of the template specific part of the primer and together with the attB sites and optional primer additions this will be the primer The primer region will be annotated in the resulting attB flanked sequence and you can also get a list of primers as you can see when clicking Next see figure 18 20 CHAPTER 18 CLONING AND CUTTING 322 Add attB Sites 1 Select nucleotide sequences Insets 2 Specify auxiliary insets Forward insets AGGAGGT Press
370. is inserted it will be marked with a selection HBG2 HBG2 HBG2 Conflict tet Conflict Conflict bla Insert ROP protein bla pBR322 prm ee ee TcTccTTGeATG AGG GTCGCATG ACCATTC sequence detalls reregan ace E AE Figure 18 13 One sequence is now inserted into the cloning vector The sequence inserted is automatically selected 18 1 4 Insert restriction site If you make a selection on the sequence right click you find this option for inserting the recognition sequence of a restriction enzyme before or after the region you selected This will display a dialog as shown in figure 18 14 At the top you can select an existing enzyme list or you can use the full list of enzymes default Select an enzyme and you will see its recognition sequence in the text field below the list AAGCTT If you wish to insert additional residues such as tags etc this can be typed into the text fields adjacent to the recognition sequence Click OK will insert the sequence before or after the selection If the enzyme selected was not already present in the list in the Side Panel it will now be added and selected Furthermore an restriction site annotation is added 18 2 Gateway cloning CLC DNA Workbench offers tools to perform in silico Gateway cloning including Multi site Gateway cloning The three tools for doing Gateway cloning in the CLC DNA Workbench mimic the procedure followed in the lab Gateway is a registered trademark of I
371. is option is useful when comparing sequence reads to a closely related reference sequence e g when sequencing for SNP characterization Only include part of the reference sequence in the contig If the aligned sequence reads only cover a small part of the reference sequence it may not be desirable to include the whole reference sequence in the contig data object When selected this option lets you specify how many residues from the reference sequence that should be kept on each side of the region spanned by sequencing reads by entering the number in the Extra residues field CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 294 e Do not include reference sequence in contig s This will produce a contig data object without the reference sequence The contig is created in the same way as when you make an ordinary assembly see section 17 4 but the reference sequence is omitted in the resulting contig In the assembly process the reference sequence is only used as a scaffold for alignment This option is useful when performing assembly with a reference sequence that is not closely related to the sequencing reads Conflicts resolved with If there is a conflict i e a position where there is disagreement about the residue A C T or G you can specify how the contig sequence should reflect this conflict x Unknown nucleotide N The contig will be assigned an N character in all positions with conflicts Ambiguity nucleotid
372. is term limits the query to proteins of human origin E E nce elast 88 1 Select sequences of same EEn aramee type 2 Set program parameters 3 Set input parameters Choose parameters Limit by entrez query aeuo 4 Low complexity Human repeats Choose filter Mask For lookup Mask lower case Expect 10 Word size 3 v Matrix BLOSUM62 v Gap cost Existence 11 Extension 1 w COCO ej Coe LE Res Figure 2 43 The BLAST search is limited to homo sapiens ORGN The remaining parameters are left as default Choose to Open your results Click Finish to accept the parameter settings and begin the BLAST search The computer now contacts NCBI and places your query in the BLAST search queue After a short while the result should be received and opened in a new view CHAPTER 2 TUTORIALS 63 2 8 2 Inspecting the results The output is shown in figure 2 44 and consists of a list of potential homologs that are sorted by their BLAST match score and shown in descending order below the query sequence ATPa mn c MM nan Imal ATP8al BLAST 2QU ATBA1 HUMAN a a nn NTI2 ATBA2_H 8198 ATBB2_H sp Q9NTIZIATEA HUMAN Probable phospholipicttransporting ATPase IB ATPase class 2 ML 1 score 1567 8 bits 4058 Expect OE00 TF62 AT864_H7 identities 779 144 68 Positives 933 1144 82 Gaps 29 1144 2 3520
373. ision Abbreviation of GenBank divisions See section 3 3 in the GenBank release notes for a full list of GenBank divisions e Length The length of the sequence e Modification date Modification date from the database This means that this date does not reflect your own changes to the sequence See the history section 8 for information about the latest changes to the sequence after it was downloaded from the database e Organism Scientific name of the organism first line and taxonomic classification levels second and subsequent lines The information available depends on the origin of the sequence Sequences downloaded from database like NCBI and UniProt see section 12 have this information On the other hand some sequence formats like fasta format do not contain this information Some of the information can be edited by clicking the blue Edit text This means that you can add your own information to sequences that do not derive from databases Note that for other kinds of data the Element info will only have Name and Description 10 5 View as text sequence can be viewed as text without any layout and text formatting This displays all the information about the sequence in the GenBank file format To view a sequence as text CHAPTER 10 VIEWING AND EDITING SEQUENCES 162 select a sequence in the Navigation Area Show in the Toolbar As text This way it is possible to see background information about e g the authors an
374. ist alignment or contig you have two additional options right click an annotation Delete Delete All Annotations from All Sequences right click an annotation Delete Delete Annotations of Type type from All Sequences 10 4 Element information The normal view of a sequence by double clicking shows the annotations as boxes along the sequence but often there is more information available about sequences This information is available through the Element info view To view the sequence information select a sequence in the Navigation Area Show in the Toolbar Element info 15 This will display a view similar to fig 10 13 All the lines in the view are headings and the corresponding text can be shown by clicking the text CHAPTER 10 VIEWING AND EDITING SEQUENCES 161 Name Edit Description Edit Comments Edit KeyWords Edit Db Source gt Gb Division gt Length Modification Date gt Latin name Edit gt Common name Edit gt Taxonomy name Edit Figure 10 13 The initial display of sequence info for the HUMHBB DNA sequence from the Example data e Name The name of the sequence which is also shown in sequence views and in the Navigation Area e Description A description of the sequence e Comments The author s comments about the sequence e Keywords Keywords describing the sequence e Db source Accession numbers in other databases concerning the same sequence e Gb Div
375. ites si Restriction Site Analysis of Click Next to set parameters for the restriction map analysis In this step first select Use existing enzyme list and click the Browse for enzyme list button acy Select the Popular enzymes in the Cloning folder under Enzyme lists Then write 3 into the filter below to the left Select all the enzymes and click the Add button The result should be like in figure 2 57 Restriction Site Analysis 1 Select DNA RNA Eneymes t O DE considered im calculatiom sequence s Enzyme list 2 Enzymes to be considered Use existing enzyme list Popular enzymes v 19 in calculation ae Enzymes in Popular en Enzymes to be used Filter a Filter Name Overhang Methylat Popul Name Overhang Methyla Pop PstI 3 tgca 5 N6 met KpnI 3 gtac 5 N met Sacl 3 agct 5 S meth SphI 3 catg Apal 3 ggcc S S meth Ball 3 nnn 5 N4 met Chal 3 gate etek EI FokI 5 lt NA gt 3 N met Hhal 3 cg 5 S meth Nsil 3 tgca Sacll 3 gc 5 S meth Figure 2 57 Selecting enzymes Click Next In this step you specify that you want to show enzymes that cut the sequence only once This means that you should de select the Two restriction sites checkbox Click Next and select that you want to Add restriction sites as annotations on sequence and Create restriction map See figure 2 58 EB Restri
376. ity scale which shares many features with the other hydrophobicity scales Eisenberg et al 1984 e Rose The hydrophobicity scale by Rose et al is correlated to the average area of buried amino acids in globular proteins Rose et al 1985 This results in a scale which is not showing the helices of a protein but rather the surface accessibility e Janin This scale also provides information about the accessible and buried amino acid residues of globular proteins Janin 1979 e Hopp Woods Hopp and Woods developed their hydrophobicity scale for identification of potentially antigenic sites in proteins This scale is basically a hydrophilic index where apolar residues have been assigned negative values Antigenic sites are likely to be predicted when using a window size of 7 Hopp and Woods 1983 e Welling Welling et al 1985 Welling et al used information on the relative occurrence of amino acids in antigenic regions to make a scale which is useful for prediction of antigenic regions This method is better than the Hopp Woods scale of hydrophobicity which is also used to identify antigenic regions CHAPTER 10 VIEWING AND EDITING SEQUENCES 14 7 e Kolaskar Tongaonkar A semi empirical method for prediction of antigenic regions has been developed Kolaskar and Tongaonkar 1990 This method also includes information of Surface accessibility and flexibility and at the time of publication the method was able to predict antigenic deter
377. j Primer designer both for single sequences and alignments TE Elis Contig mapping view e In the table of annotations E e In the text view of sequences In the following sections these view options will be described in more detail In all the views except the text view annotations can be added modified and deleted This is described in the following sections View Annotations in sequence views Figure 10 6 shows an annotation displayed on a sequence CDS 20 HUMHBB GGCCCTGTTCTGATCATGGGCCCTTCCTAACACTGCATGACTACCTTA CDS HUMHBB TTCTTGTTAGGATCCAAGCAACGGATTCTGCTGGAGCTGTCGTTTTTT CDS we 140 HUMHBB CTGGGTGTGTCTCCAACAAGTCCTGAGCACACATAACTGGAAACAATG Figure 10 6 An annotation showing a coding region on a genomic dna sequence The various sequence views listed in section 10 3 1 have different default settings for showing annotations However they all have two groups in the Side Panel in common e Annotation Layout e Annotation Types The two groups are shown in figure 10 7 In the Annotation layout group you can specify how the annotations should be displayed notice that there are some minor differences between the different sequence views e Show annotations Determines whether the annotations are shown e Position On sequence The annotations are placed on the sequence The residues are visible through the annotations if you have zoomed in to 100 Next to
378. kbench Figure 1 14 An old license is detected When you click Next the Workbench checks on CLC bio s web server to see if you are entitled to upgrade your license Note If you should be entitled to get an upgrade and you do not get one automatically in this process please contact support clcbio com In this dialog there are two options e Direct download The workbench will attempt to contact the online CLC Licenses Service and download the license directly This method requires internet access from the workbench e Go to license download web page The workbench will open a Web Browser with the License Download web page when you click Next From there you will be able to download your license as a file and import it This option allows you to get a license even though the Workbench does not have direct access to the CLC Licenses Service If you select the first option and it turns out that you do not have internet access from the CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 23 Workbench because of a firewall proxy server etc you will be able to click Previous and use the other option instead Direct download Selecting the first option takes you to the dialog shown in figure 1 15 License Wizard zs d CLC DNA Workbench Requesting a license with id CLC LICENSE SRENMNSTED 0D43CA9 Requesting and downloading a license by establishing a direct connection to the CLC bio License Web Service Your
379. king Next will show the dialog in figure 18 38 fa BB Show Enzymes Cutting Inside Outside Selection es 1 Enzymes to be considered NumberoPcurstes O ZJ I Ff in calculation 2 Number of cut sites Selected region 883 975 Cut sites Inside selection Outside selection F No cut sites 0 V No cut sites 0 7 One cut site 1 AND One cut site 1 E Two cut sites 2 Two cut sites 2 Preview 1 enzymes will be added to Side Panel Enzyme name of cuts within selection of cuts elsewhere NotI a lo Figure 18 38 Deciding number of cut sites inside and outside the selection At the top of the dialog you see the selected region and below are two panels CHAPTER 18 CLONING AND CUTTING 334 e Inside selection Specify how many times you wish the enzyme to cut inside the selection In the example described above One cut site 1 should be selected to only show enzymes cutting once in the selection e Outside selection Specify how many times you wish the enzyme to cut outside the selection i e the rest of the sequence In the example above No cut sites O should be selected These panels offer a lot of flexibility for combining number of cut sites inside and outside the selection respectively To give a hint of how many enzymes will be added based on the combination of cut sites the preview panel at the bottom lists the enzymes which will be ad
380. l This feature is useful when e g designing an experiment which will allow the differentiation CHAPTER 18 CLONING AND CUTTING 341 of a successful and an unsuccessful cloning experiment on the basis of a restriction map There are two main ways to simulate gel separation of nucleotide sequences e One or more sequences can be digested with restriction enzymes and the resulting fragments can be separated on a gel e A number of existing sequences can be separated on a gel There are several ways to apply these functionalities as described below 18 4 1 Separate fragments of sequences on gel This section explains how to simulate a gel electrophoresis of one or more sequences which are digested with restriction enzymes There are two ways to do this e When performing the Restriction Site Analysis from the Toolbox you can choose to create a restriction map which can be shown as a gel This is explained in section 18 3 2 e From all the graphical views of sequences you can right click the name of the sequence and choose Digest Sequence with Selected Enzymes and Run on Gel El The views where this option is available are listed below Circular view see section 10 2 Ordinary sequence view see section 10 1 Graphical view of sequence lists see section 10 7 Cloning editor see section 18 1 Primer designer see section 16 3 Furthermore you can also right click an empty part of the view of the graphical view of
381. l alignment is taken into account and not the full length query sequence Identity Shows the number of identical residues in the query and hit sequence Yldentity Shows the percentage of identical residues in the query and hit sequence CHAPTER 12 BLAST SEARCH 185 e Positive Shows the number of similar but not necessarily identical residues in the query and hit sequence e Positive Shows the percentage of similar but not necessarily identical residues in the query and hit sequence e Gaps Shows the number of gaps in the query and hit sequence e Gaps Shows the percentage of gaps in the query and hit sequence e Query Frame Strand Shows the frame or strand of the query sequence e Hit Frame Strand Shows the frame or strand of the hit sequence In the BLAST table view you can handle the hit sequences Select one or more sequences from the table and apply one of the following functions e Download and Open Download the full sequence from NCBI and opens it If multiple sequences are selected they will all open if the same sequence is listed several times only one copy of the sequence is downloaded and opened e Download and Save Download the full sequence from NCBI and save it When you click the button there will be a save dialog letting you specify a folder to save the sequences If multiple sequences are selected they will all open if the same sequence is listed several times only one copy of the sequence is downlo
382. l will not be saved CHAPTER 2 TUTORIALS 42 Annotation layout Show annotations Position Next to sequence Offset More offset Label Stacked we Show arrows Use gradients Annotation types DM Do Active site EM 4 Gene OD Metal binding site DO modified site BO Dl pr binding Protein DD Region 9 CF Source Select All Deselect All Figure 2 6 The Annotation Layout and the Annotation Types in the Side Panel HEE ATPase protei ill 540 560 I ha r gt Alignment info Transmembrane region Topological domain Cad Show Limit Majority F No gaps Ambiguous symbol x z w Conservation Foreground color 094296 B KGLOBFwif vysnLvENSEr BTFELVRYIG aoLiissdLDi Background color Transmembrane region Topological domain m 0 100 __ Graph Height low Bar plot Y gt Gap Fraction b Color different residues v lt m P57792 APMAA IYHFERALME NSYFIBBSBY BsiBiVKvlG sir nog H EE DY Of Figure 2 1 The alignment when all the above settings have been changed This means that you would have to perform the changes again next time you open the alignment To save the changes to the Side Panel click the Save Restore Settings button at the top of the Side Panel and click Save Settings see figure 2 8 ave Settings Delete Sett
383. layer at a time the content of subfolders is not visible in this view Also note that only sequences have the full span of information like organism etc Batch edit folder elements You can select a number of elements in the table right click and choose Edit to batch edit the elements In this way you can change the e g the description or common name of several elements in one go In figure 3 7 you can see an example where the common name of five sequence are renamed in one go In this example a dialog with a text field will be shown letting you enter a new common name for these five sequences Note This information is directly saved and you cannot undo 3 2 View Area The View Area is the right hand part of the screen displaying your current work The View Area may consist of one or more Views represented by tabs at the top of the View Area This is illustrated in figure 3 8 The tab concept is central to working with CLC DNA Workbench because several operations can be performed by dragging the tab of a view and extended right click menus can be activated from the tabs CHAPTER 3 USER INTERFACE Type Mame Modified Modifi Descri ae Mismps Tue Jun smoensted Mi3mp ae Mismp9 Tue Jun smoensted MiSmp oe Aa oe soe Delete ae ame PATH LO Tue Jun Edit b ate pATHI I Tue JUN Sioe Sorin ae p THE Tue Jun smoensted Clonin ae p THS Tue Jun smoensted Clonirn HC pELCATS Tue Jun smoens
384. lected 12 4 1 Migrating from a previous version of the Workbench In versions released before 2011 the BLAST database management was very different from this In order to migrate from the older versions please add the folders of the old BLAST databases as locations in the BLAST database manager see section 12 4 The old representations of the BLAST databases in the Navigation Area can be deleted If you have saved the BLAST databases in the default folder they will automatically appear because the default database location used in CLC DNA Workbench 6 6 is the same as the default folder specified for saving BLAST databases in the old version 12 5 Bioinformatics explained BLAST BLAST Basic Local Alignment Search Tool has become the defacto standard in search and alignment tools Altschul et al 1990 The BLAST algorithm is still actively being developed and is one of the most cited papers ever written in this field of biology Many researchers use BLAST as an initial screening of their sequence data from the laboratory and to get an idea of what they are working on BLAST is far from being basic as the name indicates it is a highly advanced algorithm which has become very popular due to availability speed and accuracy In short a BLAST search identifies homologous sequences by searching one or more databases usually hosted by NCBI http www ncbi nlm nih gov on the query sequence of interest McGinnis and Madden 2004 BLAST is
385. lection to All Reads The opposite is also possible make a selection on one of the reads right click and Transfer Selection to Contig Sequence 17 7 5 Output from the contig Due to the integrated nature of CLC DNA Workbench it is easy to use the consensus sequences as input for additional analyses There are three options when you are viewing a mapping CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 301 right click the name of the consensus sequence to the left Open Copy of Sequence Save HD the new sequence right click the name of the consensus sequence to the left Open Copy of Sequence Including Gaps Save HD the new sequence right click the name of the consensus sequence to the left Open This Sequence Open Copy of Sequence creates a copy of the sequence omitting all gap regions which can be saved and used independently Open Copy of Sequence Including Gaps replaces all gaps with Ns Any regions that appear to be deletions will be removed if this option is chosen For example reference CCCGGAAAGGTTT consensus CCC AAA TTT matchl CCC AAA match2 TTT Here if you chose to open a copy of the consensus with gaps you would get this output CCCAAANNTTT Open This Sequence will not create a new sequence but simply let you see the sequence in a sequence view ThiS means that the sequence still belong to the contig and will be saved together with the contig It also means that if you add annotations to the
386. ligonucleotide melting temperatures under PCR conditions nearest neighbor corrections for Mg 2 deoxynu cleotide triphosphate and dimethyl sulfoxide concentrations with comparison to alternative empirical formulas Clin Chem 47 11 1956 1961 Welling et al 1985 Welling G W Weijer W J van der Zee R and Welling Wester S 1985 Prediction of sequential antigenic regions in proteins FEBS Lett 188 2 215 218 Wootton and Federhen 1993 Wootton J C and Federhen S 1993 Statistics of local complexity in amino acid sequences and sequence databases Computers in Chemistry 17 149 163 Yang 1994a Yang Z 1994a Estimating the pattern of nucleotide substitution Journal of Molecular Evolution 39 1 105 111 Yang 1994b Yang Z 1994b Maximum likelihood phylogenetic estimation from DNA se quences with variable rates over sites Approximate methods Journal of Molecular Evolution 39 3 306 314 Yang and Rannala 1997 Yang Z and Rannala B 1997 Bayesian phylogenetic inference using DNA sequences a Markov Chain Monte Carlo Method Mol Biol Evol 14 7 717 24 Part V Index 404 Index contig extract from selection 301 454 sequencing data 3 6 AB1 file format 393 Abbreviations amino acids 396 ABI file format 393 About CLC Workbenches 27 Accession number display 82 ace file format 395 ACE file format 394 Add annotations 156 3 sequences to alignment 359 sequences t
387. lity and at the time of publication the method was able to predict antigenic determinants with an accuracy of 75 Surface Probability Display of surface probability based on the algorithm by Emini et al 1985 This algorithm has been used to identify antigenic determinants on the surface of proteins Chain Flexibility isplay of backbone chain flexibility based on the algorithm by Karplus and Schulz 1985 It is known that chain flexibility is an indication of a putative antigenic determinant Many more scales have been published throughout the last three decades Even though more advanced methods have been developed for prediction of membrane spanning regions the simple and very fast calculations are still highly used Other useful resources AAindex Amino acid index database http www genome ad jp dbget aaindex html Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in its original form and CLC bio has to be clearly labeled as author and provider of the work You may not use this CHAPTER 15 PROTEIN ANALYSES 243 aa aa Kyte Hopp Cornette Eisenberg Rose Janin Engelman Doolittle Woods GES A Alanine 1 80 0 50 0 20 0 62 0 74 0 30 1 60 C Cysteine 2 50 1 00 4 10 0 29 0 91 0 90 2 00 D
388. lowed by the toolbox way This section also includes an explanation of how to simulate a gel with the selected enzymes The final section in this chapter focuses on enzyme lists which represent an easy way of managing restriction enzymes 18 3 1 Dynamic restriction sites If you open a sequence a sequence list etc you will find the Restriction Sites group in the Side Panel As shown in figure 18 27 you can display restriction sites as colored triangles and lines on the sequence The Restriction sites group in the side panel shows a list of enzymes represented by different colors corresponding to the colors of the triangles on the sequence By selecting or deselecting the enzymes in the list you can specify which enzymes restriction sites should be displayed CHAPTER 18 CLONING AND CUTTING 328 Restriction sites Labels Stacked Sorting A LI Po 54 Non cutters 4 Single cutters 9 Bami O P 7 EcoRI O E T Eo ao O E 7 Hinan 1 O veio pita vis O Double cutters E cot 2 ED smal 2 O 54 Multiple cutters no no E F sat 3 Figure 18 27 Showing restriction sites of ten restriction enzymes ST TAGAGGGCCCGTTTAAACC The color of the restriction enzyme can be changed by clicking the colored box next to the enzyme s name The name of the enzyme can also be shown next to the restriction site by selecting Show name flags above the list of restriction enzymes There is also an option
389. ls Molecular Biology Resources Promega Corporation EURx Ltd 345 Figure 18 52 Showing additional information about an enzyme like recognition sequence or a list of commercial vendors 18 5 2 View and modify enzyme list An enzyme list is shown in figure 18 53 The list can be sorted by clicking the columns h ie Ta z v Column width l Automatic v Show column E5 all enzymes O Rows 1362 Table of restriction enzymes Filter Name Recognition sequence Overhang Suppliers Methylation sensitivity Star activity EcoR gatatc Blunt GE Healthc N methyladenosine Yes BglII agatct 5 gatc GE Healthc N4 methylcytosine No Sall gtcgac 5 tega GE Healthc N6 methyladenosine Yes xhol ctcgag 5 tcga GE Healthc N methyladenosine No HindIII aagctt 5 agct GE Healthc N methyladenosine Yes Xbal tctaga 5 ctag GE Healthc N methyladenosine Yes EcoRI gaattc 5 aatt GE Healthc N methyladenosine Yes PstI ctgcag 3 tgca GE Healthc N6 methyladenosine Yes BamHI ggatcc 5 gatc GE Healthc N methylcytosine Yes Clal atcgat 5 cg GE Healthc N methyladenosine No NotI gcggecge 5 ggce GE Healthc N methylcytosine No NdeI catatg 5 ta GE Healthc N methyladenosine Yes SacI gagctc 3 agct GE Healthc 5 methylcytosine Yes Pyull cagctg Blunt GE Healthc N4 methylcytosine Yes v Ha Name Recognition sequence Overhang Sup
390. ls not running in batch mode see above All the analyses in the Toolbox are performed in a step by step procedure First you select elements CHAPTER 9 BATCHING AND RESULT HANDLING 137 a E Map Reads to Reference NR Es Choose where to run Short reads mapping parameters Long reads mapping parameters Select sequencing reads Batch overview Set references Mismatch cost 24 Mismatch cost Set mapping parameters Limit 85 Insertion cost 7 Fast ungapped alignment Deletion cost Insertion cost 3k Length fraction 0 5 Deletion cost 36 Similarity 0 8 Global alignment Global alignment Color space alignment Color space alignment Colorspace error cost 35 Colorspace error cost Paired parameters Minimum distance 180 Maximum distance 250 2 Enis Figure 9 4 Read mapping parameters in batch for analyses and then there are a number of steps where you can specify parameters some of the analyses have no parameters e g when translating DNA to RNA The final step concerns the handling of the results of the analysis and it is almost identical for all the analyses so we explain it in this section in general oC EB Convert DNA to RNA 88 1 Select DNA sequences MEE LE 2 Result handling Result handling Open Save ZJE gt nen Figure 9 5 The last step of the analyses exemplified by Translate DNA to RNA In this step shown in figure 9 5 you have
391. lysis shown as annotations e Overhangs If there is an overhang this is displayed with an abbreviated version of the fragment and its overhangs The two rows of dots represent the two strands of the fragment and the overhang is visualized on each side of the dots with the residue s that make up the overhang If there are only the two rows of dots it means that there is no overhang e Left end The enzyme that cuts the fragment to the left 5 end e Right end The enzyme that cuts the fragment to the right 3 end e Conflicting enzymes If more than one enzyme cuts at the same position or if an enzyme s recognition site is cut by another enzyme a fragment is displayed for each possible combination of cuts At the same time this column will display the enzymes that are in conflict If there are conflicting enzymes they will be colored red to alert the user If the same experiment were performed in the lab conflicting enzymes could lead to wrong results For this reason this functionality is useful to simulate digestions with complex combinations of restriction enzymes If views of both the fragment table and the sequence are open clicking in the fragment table will select the corresponding region on the sequence Gel The restriction map can also be shown as a gel This is described in section 18 4 1 18 4 Gel electrophoresis CLC DNA Workbench enables the user to simulate the separation of nucleotide sequences on a ge
392. m self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements 5 end must meet G C requirements Primer combination parameters Max percentage point difference in G C content Max difference in melting temperatures within a primer pair Max hydrogen bonds between pairs Max hydrogen bonds between pair ends Minimum difference in melting temperature Inner Outer Fast 8 Accurate Mispriming parameters Use mispriming as exclusion criteria Exact match Minimum number of base pairs required for a match Number of consecutive base pairs required in 3 end Kea Figure 16 9 Calculation dialog e Maximal difference in melting temperature of primers in a pair the number of degrees Celsius that primers in a pair are all allowed to differ This criteria is applied to both primer pairs independently Maximum pair annealing score the maximum number of hydrogen bonds allowed between the forward and the reverse primer in a primer pair This criteria is applied to all possible combinations of primers Minimum difference in the melting temperature of primers in the inner and outer primer pair all comparisons between the melting temperature of primers from the two pairs must be at least this different otherwise the primer set is excluded This option is applied to ensure that the inner and outer PCR reactions can be initiated at different annealing temperatures Please note that to ensure flexibility t
393. mat 395 Circular view of sequence 150 377 clc file format 123 395 CLC Standard Settings 111 CLC Workbenches 2 CLC file format 393 395 associating with CLC DNA Workbench 13 Clone Manager file format 393 Cloning 307 377 380 insert fragment 317 Close view 86 Clustal file format 394 Coding sequence translate to protein 149 Codon frequency tables reverse translation 245 usage 246 col file format 395 Color residues 354 Comments 160 Common name batch edit 84 Compare workbenches 3 6 Compatible ends 334 Complexity plot 211 Configure network 33 Conflicting enzymes 339 Conflicts overview in assembly 303 Consensus sequence 353 3 8 INDEX open 353 Consensus sequence extract 300 Conservation 353 graphs 3 8 Contact information 12 Contig 3 76 ambiguities 303 BLAST 301 create 291 reverse complement 297 view and edit 296 Copy 130 annotations in alignments 358 elements in Navigation Area 80 into sequence 149 search results GenBank 1 0 sequence 162 163 sequence selection 231 text selection 162 cpf file format 109 chp file format 395 Create alignment 348 dot plots 201 enzyme list 343 local BLAST database 18 new folder 80 workspace 94 Create index file BLAST database 187 CSV export graph data points 128 formatting of decimal numbers 122 csv file format 395 CSV file format 393 395 ct file format 395 Custom annotation types 157 Dark color o
394. mbly process See section 17 7 on how to use the resulting contigs 17 6 Add sequences to an existing contig This section describes how to assemble sequences to an existing contig This feature can be used for example to provide a steady work flow when a number of exons from the same gene are sequenced one at a time and assembled to a reference sequence Note that the new sequences will be added to the existing contig which will not be extended If the new sequences extend beyond the existing contig they will be cut off To start the assembly select one contig and a number of sequences Toolbox in the Menu Bar Sequencing Data Analyses R Add Sequences to Contig E or right click in the empty white area of the contig Add Sequences to Contig E This opens a dialog where you can alter your choice of sequences which you want to assemble You can also add sequence lists When the elements are selected click Next and you will see the dialog shown in figure 17 19 The options in this dialog are similar to the options that are available when assembling to a reference sequence see section 17 5 Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish This will start the assembly process See section 17 7 on how to use the resulting contig Note that the new sequences will be added to the existing contig which will not be extended If the new sequences extend beyond the existing contig t
395. me The following parameters can be added to the search e All fields Text searches in all parameters in the NCBI database at the same time e Organism Text e Description Text e Modified Since Between 30 days and 10 years e Gene Location Genomic DNA RNA Mitochondrion or Chloroplast e Molecule Genomic DNA RNA mRNA or rRNA e Sequence Length Number for maximum or minimum length of the sequence e Gene Name Text The search parameters are the most recently used The All fields allows searches in all parameters in the NCBI database at the same time All fields also provide an opportu nity to restrict a search to parameters which are not listed in the dialog E g writing gene Feature key AND mouse in All fields generates hits in the GenBank database which CHAPTER 11 ONLINE DATABASE SEARCH 169 contains one or more genes and where mouse appears somewhere in GenBank file You can also write e g CD9 NOT homo sapiens in All fields Note The Feature Key option is only available in GenBank when searching for nucleotide sequences For more information about how to use this syntax see http www ncbi nlm ninagovs beoks NBRScs7 7 When you are satisfied with the parameters you have entered click Start search Note When conducting a search no files are downloaded Instead the program produces a list of links to the files in the NCBI database This ensures a much faster search 11 1 2 Handling of GenBank
396. me Specify settings type 2 Set algorithm parameters Simple Positions PR o Java regular expression Sh Jd Press Shift F1 for options Preview Sequence name ddl 3_ddl1 F Resulting group 3 Number of sequences Number of groups 3 Use for grouping Name ddl d 3 ddli F Figure 17 5 Dividing the sequence into three groups based on the number in the middle of the name bottom of the dialog In this example we actually did not need the first and last set of brackets so the expression could also have been x x in which case only one group would be listed in the table at the bottom of the dialog CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 283 17 2 2 Process tagged sequences Multiplexing as described in section 17 2 1 is of course only possible if proper sequence names could be assigned from the sequencing process With many of the new high throughput technologies this is not possible However there is a need for being able to input several different samples to the same sequencing run so multiplexing is still relevant it just has to be based on another way of identifying the sequences A method has been proposed to tag the sequences with a unique identifier during the preparation of the sample for sequencing Meyer et al 2007 With this technique each sequence will have a sample specific tag a special sequence of nucleotides before and after the seque
397. mer e Self annealing the maximum self annealing score of the primer in units of hydrogen bonds e Self annealing alignment a visualization of the highest maximum scoring self annealing alignment e Self end annealing the maximum score of consecutive end base pairings allowed between the ends of two copies of the same molecule in units of hydrogen bonds e GC content the fraction of G and C nucleotides in the primer e Melting temperature of the primer template complex e Secondary structure score the score of the optimal secondary DNA structure found for the primer Secondary structures are scored by adding the number of hydrogen bonds in the structure and 2 extra hydrogen bonds are added for each stacking base pair in the structure e Secondary structure a visualization of the optimal DNA structure found for the primer If both a forward and a reverse region are selected a table of primer pairs is shown where the above columns excluding the score are represented twice once for the forward primer designated by the letter F and once for the reverse primer designated by the letter R Before these and following the score of the primer pair are the following columns pertaining to primer pair information available CHAPTER 16 PRIMERS 260 e Pair annealing the number of hydrogen bonds found in the optimal alignment of the forward and the reverse primer in a primer pair e Pair annealing alignment a visualization of the opt
398. mes amp 3 1 Select nucleotide Ms arames sequences 2 Set parameters Start Codon AUG Any All start codons in genetic code 5 Other AUG CUG UUG V Both strands Open ended sequence Genetic code 1 Standard v Minimum length codons 100 E V Include stop codon in result Cos Cems Cae ie Xe Figure 14 8 Create Reading Frame dialog AUG Most commonly used start codon Any Find all open reading frames All start codons in genetic code Other Here you can specify a number of start codons separated by commas e Both strands Finds reading frames on both strands e Open ended Sequence Allows the ORF to start or end outside the sequence If the sequence studied is a part of a larger sequence it may be advantageous to allow the ORF to start or end outside the sequence e Genetic code translation table e Include stop codon in result The ORFs will be shown as annotations which can include the stop codon if this option is checked The translation tables are occasionally updated from NCBI The tables are not available in this printable version of the user manual Instead the tables are included in the Help menu in the Menu Bar in the appendix CHAPTER 14 NUCLEOTIDE ANALYSES 236 e Minimum Length Specifies the minimum length for the ORFs to be found The length is specified as number of codons Using open reading frames for gene finding is a fairly simple approach which is lik
399. minants with an accuracy of 5 e Surface Probability Display of surface probability based on the algorithm by Emini et al 1985 This algorithm has been used to identify antigenic determinants on the surface of proteins e Chain Flexibility Display of backbone chain flexibility based on the algorithm by Karplus and Schulz 1985 It is known that chain flexibility is an indication of a putative antigenic determinant Find The Find function can also be invoked by pressing Ctrl Shift F 96 Shift F on Mac The Find function can be used for searching the sequence Clicking the find button will search for the first occurrence of the search term Clicking the find button again will find the next occurrence and so on If the search string is found the corresponding part of the sequence will be selected e Search term Enter the text to search for The search function does not discriminate between lower and upper case characters e Sequence search Search the nucleotides or amino acids For amino acids the single letter abbreviations should be used for searching The sequence search also has a set of advanced search parameters Include negative strand This will search on the negative strand as well Treat ambiguous characters as wildcards in search term If you search for e g ATN you will find both ATG and ATC If you wish to find literally exact matches for ATN i e only find ATN not ATG this option should not be
400. mines how conserved the sequences must be in order to agree on a consensus Here you can also choose IUPAC which will display the ambiguity code when there are differences between the sequences E g an alignment with A and a G at the same position will display an R in the consensus line if the IUPAC option is selected The IUPAC codes can be found in section and H No gaps Checking this option will not show gaps in the consensus Ambiguous symbol Select how ambiguities should be displayed in the consensus line as N or This option has now effect if IUPAC is selected in the Limit list above The Consensus Sequence can be opened in a new view simply by right clicking the Consensus Sequence and click Open Consensus in New View e Conservation Displays the level of conservation at each position in the alignment The conservation shows the conservation of all Sequence positions The height of the bar or the gradient of the color reflect how conserved that particular position is in the alignment If one position is 100 conserved the bar will be shown in full height and it is colored in the color specified at the right side of the gradient slider CHAPTER 19 SEQUENCE ALIGNMENT 354 Foreground color Colors the letters using a gradient where the right side color is used for highly conserved positions and the left side color is used for positions that are less conserved Background color Sets a background color
401. more hits to sequences which are not truly related Word size Change of the word size has a great impact on the seeded sequence space as described above But one can change the word size to find sequence matches which would otherwise not be found using the default parameters For instance the word size can be decreased when searching for primers or short nucleotides For blastn a suitable setting would be to decrease the default word size of 11 to 7 increase the E value significantly 1000 and turn off the complexity filtering For blastp a similar approach can be used Decrease the word size to 2 increase the E value and use a more stringent substitution matrix e g a PAM3O0 matrix Fortunately the optimal search options for finding short nearly exact matches can already be found on the BLAST web pages http www ncbi nlm nih gov BLAST Substitution matrix For protein BLAST searches a default substitution matrix is provided If you are looking at distantly related proteins you should either choose a high numbered PAM matrix or a low numbered BLOSUM matrix See Bioinformatics Explained on scoring matrices on http www clcbio com be The default scoring matrix for blastp is BLOSUM62 12 5 6 Explanation of the BLAST output The BLAST output comes in different flavors On the NCBI web page the default output is html and the following description will use the html output as example Ordinary text and xml output for easy computation
402. mple graphical overview of the hits found aligned to the query sequence The alignments are color coded ranging from black to red as indicated in the color label at the top Sequences producing significant alignments k headers to sort columns NM 174886 1 Homo sapiens TGFB induced factor TALE family homeobox TGIF 339 563 85 1e 90 100 U E GM NM 173210 1 Homo sa piens TGFB induced factor TALE family homeobox TGIF 339 563 85 1e 90 100 MEO NM 173209 1 Homo sapiens TGFB induced factor TALE family homeobox TGIF 339 563 85 1e 90 100 UEGM NM 173211 1 Homo sa piens TGFB induced factor TALE family homeobox TGIF 339 563 85 1e 90 100 maT NM 173207 1 Homo sapiens TGFB induced factor TALE family homeobox TGIF 339 563 85 1e 90 100 UEGM NM 173208 1 Homo sapiens TGFB induced factor TALE family homeobox TGIF 339 563 85 1e 90 100 U E GM NM 170695 2 Homo sa piens TGFB induced factor TALE family homeobox TGIF 339 563 85 1e 90 100 UE GM NM 003244 2 Homo sapiens TGFB induced factor TALE family homeobox TGIF 339 563 85 1e 90 100 UEGM NM 003246 2 Homo sapiens thrombospondin 1 THBS1 mRNA 38 2 38 2 4 7 2 100 maT NM 177965 2 Homo sapiens chromosome 8 open reading frame 37 C8orf37 38 2 38 2 4 7 2 100 UEGM Genomic sequences show first NT 010859 14 Homo sapiens chromosome 18 genomic contig reference assembly 339 602 85 1e 90 100 NW 926940 1 Homo sapiens chrom
403. ms in the right side of the Toolbar apply to the function of the mouse pointer When e g Zoom Out is selected you zoom out each time you click in a view where zooming is relevant texts tables and lists cannot be zoomed The chosen mode is active until another mode toolbar item is selected Fit Width and Zoom to 100 do not apply to the mouse pointer FE amp mM Lo 7 le a Fit Width 100 Pan SOCATA Zoom In Zoom Out Figure 3 16 The mode toolbar items 3 3 1 Zoom In There are four ways of Zooming In Click Zoom In 5 in the toolbar click the location in the view that you want to zoom in on or Click Zoom In 55 in the toolbar click and drag a box around a part of the view the view now zooms in on the part you selected or Press on your keyboard The last option for zooming in is only available if you have a mouse with a scroll wheel or Press and hold Ctrl 38 on Mac Move the scroll wheel on your mouse forward When you choose the Zoom In mode the mouse pointer changes to a magnifying glass to reflect the mouse mode Note You might have to click in the view before you can use the keyboard or the scroll wheel to ZOOM lf you press the Shift button on your keyboard while clicking in a View the zoom function is reversed Hence clicking on a sequence in this way while the Zoom In mode toolbar item is selected zooms out instead of zooming in 3 3 2 Zoom Out It is possible to zoom out step by step on a sequ
404. n Sall Sv40 pA _ _ Sequence details pcDNA4_TO 5 078bp circular vector Target vector from XhoI cut at 105271053 to HindIII cut at 9787979 5 004bp a Target vector from HindIII cut at 978979 to XhoI cut at 1052 1053 74bp war v Target vector defined j X Define fragments to insert eS Op ly BE TCGAGIC CAG 56 V Show gt Sequence layout gt Annotation layout gt Annotation types Restriction sites 7 Show Labels Stacked w Sorting Aa TE hI V Non cutters v 4 ERECT DARON Es S Single cutters 7 XbaI 1 4 Double cutters 7 BamHI 2 5 V EcoRI 2 7 EcoRV 2 HindIII 2 v XhoI 2 4 E tiple cutters 7 Smal 3 7 sali 3 7 Bai 3 5 W Pst 49 Deselect All Figure 2 30 Press and hold the Ctrl key while you click first the Hindlll site and next the Xhol site G Cloning exper Sequence 2 of 2 Fragment ATP8a1 mRNA ATPS 7 Show as Linear 1 000 2 000 3 000 Atp8a1 pre Smal Hindli EcoRI Pst ma y y d ATP8a1 rev CGATAAAG GCTATTTC GAGTATCG CTcaTacc Sequence details ooo gt Vector pcDNA4_TO Change to Current 7 le O pcDNA4_TO 5 078bp circular vector 6 Target vector from XhoI cut at 10521053 to HindIII cut at 9782979 5 004bp TTA AATICGA T
405. n International Union of Pure and Applied Chemistry The information is gathered from http www iupac org and http www ebi ac uk 2can tutorials dashtml Code Description Adenine Cytosine Guanine Thymine Uracil Purine A or G Pyrimidine C T or U CorA T U or G T U or A CorG C T U or G not A A T U or G not C A T U or C not G A C or G not T not U Any base A C G T or U Zz lt TVOWH SE RAKXDWDCAHO gt 398 Appendix J Custom codon frequency tables You can edit the list of codon frequency tables used by CLC DNA Workbench Note Please be aware that this process needs to be handled carefully otherwise you may have to re install the Workbench to get it to work In the Workbench installation folder under res there is a folder named codonfreg This folder contains all the codon frequency tables organized into subfolders in a hierarchy In order to change the tables you simply add delete or rename folders and the files in the folders If you wish to add new tables please use the existing ones as template Restart the Workbench to have the changes take effect Please note that when updating the Workbench to a new version this information is not preserved This means that you should keep this information in a separate place as back up The ability to change the tables is mainly aimed at centrally deployed installations of the Workbench 399 Bibliography Altschul a
406. n add data by adding a new location see section 3 1 1 If a file or another element is dropped on a folder it is placed at the bottom of the folder If it is dropped on another element it will be placed just below that element If the element already exists in the Navigation Area you will be asked whether you wish to create CHAPTER 3 USER INTERFACE 80 a copy 3 1 2 Create new folders In order to organize your files they can be placed in folders Creating a new folder can be done in two ways right click an element in the Navigation Area New Folder H or File New Folder H0 If a folder is selected in the Navigation Area when adding a new folder the new folder is added at the bottom of this folder If an element is selected the new folder is added right above that element You can move the folder manually by selecting it and dragging it to the desired destination 3 1 3 Sorting folders You can sort the elements in a folder alphabetically right click the folder Sort Folder On Windows subfolders will be placed at the top of the folder and the rest of the elements will be listed below in alphabetical order On Mac both subfolders and other elements are listed together in alphabetical order 3 1 4 Multiselecting elements Multiselecting elements means that you select more than one element at the same time This can be done in the following ways e Holding down the lt Ctrl gt key on Mac while clicking o
407. n be brought to front by clicking its tab Note If you right click an open tab of any element click Show and then choose a different view of the same element this new view is automatically opened in a split view allowing you to see both views See section 3 1 5 for instructions on how to open a view using drag and drop 3 2 2 Show element in another view Each element can be shown in different ways A sequence for example can be shown as linear circular text etc In the following example you want to see a sequence in a circular view If the sequence is already open in a view you can change the view to a circular view Click Show As Circular at the lower left part of the view The buttons used for switching views are shown in figure 3 9 Ee AE Figure 3 9 The buttons shown at the bottom of a view of a nucleotide sequence You can click the buttons to change the view to e g a circular view or a history view If the sequence is already open in a linear view at and you wish to see both a circular and a linear view you can split the views very easily Press Ctrl 38 on Mac while you Click Show As Circular at the lower left part of the view This will open a split view with a linear view at the bottom and a circular view at the top see 10 5 You can also show a circular view of a sequence without opening the sequence first Select the sequence in the Navigation Area Show 45 As Circular Q
408. n multiple elements selects the elements that have been clicked e Selecting one element and selecting another element while holding down the lt Shift gt key selects all the elements listed between the two locations the two end locations included e Selecting one element and moving the curser with the arrow keys while holding down the lt Shift gt key enables you to increase the number of elements selected 3 1 5 Moving and copying elements Elements can be moved and copied in several ways Using Copy i Cut and Paste S from the Edit menu Using Ctrl C 38 C on Mac Ctrl X X on Mac and Ctrl V V on Mac Using Copy 5 Cut and Paste j4 in the Toolbar Using drag and drop to move elements CHAPTER 3 USER INTERFACE 81 e Using drag and drop while pressing Ctrl Command to copy elements In the following all of these possibilities for moving and copying elements are described in further detail Copy cut and paste functions Copies of elements and folders can be made with the copy paste function which can be applied in a number of ways select the files to copy right click one of the selected files Copy 55 right click the location to insert files into Paste or select the files to copy Ctrl C 3 C on Mac select where to insert files Ctrl P 3 P on Mac or select the files to copy Edit in the Menu Bar Copy 755 select where to insert files Edit
409. n name accession Common name Common name accession Annotation Layout and Annotation Types See section 10 3 1 Restriction sites See section 10 1 2 CHAPTER 10 VIEWING AND EDITING SEQUENCES 144 Motifs See section 13 7 1 Residue coloring These preferences make it possible to color both the residue letter and set a background color for the residue e Non standard residues For nucleotide sequences this will color the residues that are not C G A T or U For amino acids only B Z and X are colored as non standard residues Foreground color Sets the color of the letter Click the color box to change the color Background color Sets the background color of the residues Click the color box to change the color e Rasmol colors Colors the residues according to the Rasmol color scheme See http www openrasmol org doc rasmol html Foreground color Sets the color of the letter Click the color box to change the color Background color Sets the background color of the residues Click the color box to change the color e Polarity colors only protein Colors the residues according to the polarity of amino acids Foreground color Sets the color of the letter Click the color box to change the color Background color Sets the background color of the residues Click the color box to change the color e Trace colors only DNA Colors the residues according to the color conventions of
410. n of se quences in alignments 363 Personal information 28 Pfam domain search 378 phr file format 395 PHR file format 395 Phred file format 393 phy file format 395 Phylip file format 394 Phylogenetic tree 366 3 9 tutorial 71 Phylogenetics Bioinformatics explained 3 1 pir file format 395 PIR NBRF file format 393 Plot dot plot 201 local complexity 211 Plug ins 30 png format export 126 Polarity colors 144 Portrait Print orientation 115 Positively charged residues 217 PostScript export 126 Preference group 110 Preferences 104 advanced 109 Data 108 export 109 General 104 import 109 style sheet 110 toolbar 106 View 106 view 90 Primer 2 1 analyze 209 based on alignments 205 Buffer properties 252 design 3 9 design from alignments 3 9 display graphically 254 INDEX length 252 mode 253 nested PCR 253 order 2 5 sequencing 253 Standard 253 TaqMan 253 tutorial 57 Primers find binding sites 2 1 Print 113 dot plots 203 preview 116 visible area 114 whole view 114 pro file format 395 Problems when starting up 29 Processes 93 Properties batch edit 84 Protein charge 237 378 hydrophobicity 241 Isoelectric point 215 report 3 statistics 215 translation 243 Proteolytic cleavage 3 8 Proxy server 33 ps format export 126 psi file format 395 PubMed references search 1 1 PubMed references search 377 Quality of chromatogram trace
411. n the contig move the vertical slider at position 2073 to the left see figure 17 21 YO TC CACGTCGGTACAGAACAGGCTGC Trace data You will now see how the gaps in the consensus sequence are replaced by real sequence information Note that you can only move the sliders when you are zoomed in to see the sequence residues 2 5 6 Inspecting the traces Clicking the Find Conflict button again will find the next conflict Here both reads are different than the reference sequence We now inspect the traces in more detail In order to see the details we zoom in on this position Zoom in in the Tool Bar 5 Click the selected base Click again three times Now you have zoomed in on the trace see figure 2 18 CHAPTER 2 TUTORIALS 49 T C A T C A DADALA DAMNA EN g PED NA ala ANAS Figure 2 18 Now you can see all the details of the traces This gives more space between the residues but if we would like to inspect the peaks even more simply drag the peaks up and down with your mouse see figure 17 2 T c A c G c T T G Cc Cc A T N NA trace data by dragging up and down Wass Figure 2 19 Grab the traces to scale 2 5 7 Synonymous substitutions In this case we have sequenced the coding part of a gene Often you want to know what a variation like this would mean on the protein level To do this show the translation along the contig Nucleotide info in the Side Panel Translation Show Select ORF CDS in t
412. n this computer License Order ID CLC LICENSE SRENMNSTED 0D43CASEDF4X0XxOXxxD844 AF COC4BOOKXK Direct Download The workbench will attempt to contact the CLC Licenses Service and download the license directly This method requires internet access from the workbench Go to License Download web page Th rkbench will open a We o owser with the License Download web page From there you will be able to download your license as a file np in the next step If you experience any problems please contact The CLC Support Team Proxy Settings Previous Previous Next Quit Workbench Quit Workbench Figure 1 7 Entering a license ID provided by CLC bio the license ID in this example is artificial In this dialog there are two options e Direct download The workbench will attempt to contact the online CLC Licenses Service and download the license directly This method requires internet access from the workbench e Go to license download web page The workbench will open a Web Browser with the License Download web page when you click Next From there you will be able to download your license as a file and import it This option allows you to get a license even though the Workbench does not have direct access to the CLC Licenses Service If you select the first option and it turns out that you do not have internet access from the Workbench because of a firewall proxy server etc you will be able to click Previous and u
413. name to display in the Workbench Restart the Workbench and the new database will be visible in the BLAST dialog Appendix E Restriction enzymes database configuration CLC DNA Workbench uses enzymes from the REBASE restriction enzyme database at http cebase neb com If you wish to add enzymes to this list you can do this by manually using the procedure described here Note Please be aware that this process needs to be handled carefully otherwise you may have to re install the Workbench to get it to work First download the following file http www clcbio com wbsettings link emboss e custom In the Workbench installation folder under settings create a folder named rebase and place the extracted link emboss e custom file here Open the file in a text editor The top of the file contains information about the format and at the bottom there are two example enzymes that you should replace with your own Restart the Workbench to have the changes take effect 389 Appendix F Technical information about modifying Gateway cloning sites The CLC DNA Workbench comes with a pre defined list of Gateway recombination sites These sites and the recombination logics can be modified by downloading and editing a properties file Note that this is a technical procedure only needed if the built in functionality is not sufficient for your needs The properties file can be downloaded from http www clcbio com wbsettings gatewaycloning
414. nce of interest This principle is shown in figure 17 6 please refer to Meyer et al 2007 for more detailed information A mmm as i yu III o A T C NNNNG GTCGATGCCCGGGCATCGAC a V Srfl specific complemented A site tag arget sequence tag Srfl site Sample 1 e a VI GGGCATCGAC GTCGATGCCC Sample 2 GGGCTCGCTG Figure 17 6 Tagging the target sequence Figure from Meyer et al 2007 The sample specific tag also called the barcode can then be used to distinguish between the different samples when analyzing the sequence data This post processing of the sequencing data has been made easy by the multiplexing functionality of the CLC DNA Workbench which simply divides the data into separate groups prior to analysis Note that there is also an example using Illumina data at the end of this section The first step is to separate the imported sequence list into sublists based on the barcode of the sequences Toolbox High throughput Sequencing f Multiplexing H3 Process Tagged Sequences m This opens a dialog where you can add the sequences you wish to sort You can also add sequence lists When you click Next you will be able to specify the details of how the de multiplexing should be performed At the bottom of the dialog there are three buttons which are used to Add Edit and Delete the elements that describe how the barcode is embedded in the sequences First click A
415. nd Gish 1996 Altschul S F and Gish W 1996 Local alignment statistics Methods Enzymol 266 460 480 Altschul et al 1990 Altschul S F Gish W Miller W Myers E W and Lipman D J 1990 Basic local alignment search tool J Mol Biol 215 3 403 410 Andrade et al 1998 Andrade M A O Donoghue S l and Rost B 1998 Adaptation of protein surfaces to subcellular location J Mol Biol 276 2 51 7 525 Bachmair et al 1986 Bachmair A Finley D and Varshavsky A 1986 In vivo half life of a protein is a function of its amino terminal residue Science 234 4 773 1 79 186 Bommarito et al 2000 Bommarito S Peyret N and SantaLucia J 2000 Thermodynamic parameters for DNA sequences with dangling ends Nucleic Acids Res 28 9 1929 1934 Clote et al 2005 Clote P Ferr F Kranakis E and Krizanc D 2005 Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency RNA 11 5 5 8 591 Cornette et al 1987 Cornette J L Cease K B Margalit H Spouge J L Berzofsky J A and DeLisi C 1987 Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins J Mol Biol 195 3 659 685 Cronn et al 2008 Cronn R Liston A Parks M Gernandt D S Shen R and Mockler T 2008 Multiplex sequencing of plant chloroplast genomes using solexa sequencing by synthesis technology Nucleic Aci
416. nding parameters This opens the dialog displayed in figure 16 17 At the top select one or more primers by clicking the browse gry button In CLC DNA Workbench CHAPTER 16 PRIMERS 2 2 Find Binding Sites and Create Fragments 1 Select nucleotide mem pessimo estos sequence s to match primer against 2 Set Primer properties Primer Select primer s to match against sequence s Match criteria Exact match Minimum number of base pairs required for a match 1 Number of consecutive base pairs required in 3 end 10 Concentrations Primer concentration nM 2004 Salt concentration rn 100 Figure 16 17 Search parameters for finding primer binding sites primers are just DNA sequences like any other but there is a filter on the length of the sequence Only sequences up to 400 bp can be added The Match criteria for matching a primer to a sequence are e Exact match Choose only to consider exact matches of the primer i e all positions must base pair with the template e Minimum number of base pairs required for a match How many nucleotides of the primer that must base pair to the sequence in order to cause priming mispriming e Number of consecutive base pairs required in 3 end How many consecutive 3 end base pairs in the primer that MUST be present for priming mispriming to occur This option is included since 3 terminal base pairs are known to be essential for priming to occur
417. ndling Output options Create primer list Result handling Open Save Log handling Make log 15 ETT EI Figure 18 20 Besides the main output which is a copy of the the input sequence s now including attB sites and primer additions you can get a list of primers as output ace peDNAS atpSal O 20 49 8 ao Sequence settings 9 l l I l r WS Forward primer Shine D algamo Annotation types attB1 Atp a 1 EM cos GC Exon GGGGACAAGTTTGTACAAAAAAGCAGGCT TAAGGAGG TATGCCGACCATG CGGAGGAC AGT GTCGGAGAT CCGC TCGCG CG CGGAAGGTTATGAI C Gene gt aii EE 4 Misc recombination C Ea Primer Mires C Source a EH O sT oO su EA Deselect All a HOoOBEZ RADL Figure 18 21 the attB site plus the Shine Dalgarno primer addition is annotated Extending the pre defined list of primer additions The list of primer additions shown when pressing Shift F1 in the dialog shown in figure 18 16 can be configured and extended If there is a tag that you use a lot you can add it to the list for convenient and easy access later on This is done in the Preferences Edit Preferences Advanced In the advanced preferences dialog scroll to the part called Gateway cloning primer additions see figure 18 22 Each element in the list has the following information Name The name of the sequence When the sequence fragment is extended with a primer addition an annotation will be ad
418. nel If you wish to use all the enzymes in the list Click in the panel to the left press Ctrl A 38 A on Mac Add gt The enzymes can be sorted by clicking the column headings i e Name Overhang Methylation or Popularity This is particularly useful if you wish to use enzymes which produce e g a 3 overhang In this case you can sort the list by clicking the Overhang column heading and all the enzymes producing 3 overhangs will be listed together for easy selection When looking for a specific enzyme it is easier to use the Filter If you wish to find e g Hindlll sites simply type Hindlll into the filter and the list of enzymes will shrink automatically to only include the Hindlll enzyme This can also be used to only show enzymes producing e g a 3 overhang as shown in figure 18 51 The CLC DNA Workbench comes with a standard set of enzymes based on http www rebase neb com You can customize the enzyme database for your installation see section E CHAPTER 18 CLONING AND CUTTING 331 Restriction Site Analysis Select DNA RNA _ Enzymes to be considered in calculi sequence s Enzyme list Enzymes to be considered fogs Sa v Use existing enzyme list Popular enzymes v in calculation anz EE Y Enzymes in Popular en Enzymes to be used Filter Filter Name Overhang Methylat Popul Name Overhang Methyla Pop PstI tgca S N6 met te KpnI gtac S N met
419. netic inference The bottom shows a tree found by neighbor joining while the top shows a tree found by UPGMA The latter method assumes that the evolution occurs at a constant rate in different lineages Maximum likelihood phylogeny E 88 Maximum Likelihood Phylogeny 28 1 Select one nucleotide Setparancers eee Set starting tree 2 Set parameters Neighbor Joining UPGMA Use tree from file Select substitution model Jukes Cantor v Transition transversion ratio 2 Rate variation Include rate variation Number of substitution rate categories 4 Gamma distribution parameter 1 Estimation V Estimate substitution rate parameter s Estimate topology Estimate Gamma distribution parameter faman a erevious J Que Yeinsh Xena Figure 20 4 Adjusting parameters for ML phylogeny Figure 20 4 shows the parameters that can be set for the ML phylogenetic tree reconstruction e Starting tree the user is asked to specify a starting tree for the tree reconstruction There are three possibilities Neighbor joining UPGMA CHAPTER 20 PHYLOGENETIC TREES 369 Use tree from file e Select substitution model CLC DNA Workbench allows maximum likelihood tree estimation to be performed under the assumption of one of four substitution models the Jukes Cantor Jukes and Cantor 1969 the Kimura 80 Kimura 1980 the HKY Hasegawa et al 1985 and the GTR also known as the RE
420. nformatics explained 355 new 162 region types 149 search 14 select 148 shuffle 199 Statistics 212 view 141 view as text 161 view circular 150 view format 82 web info 1 0 Sequence logo 354 Sequencing data 3 6 Sequencing primers 3 9 Share data 78 3 6 Share Side Panel Settings 107 Shared BLAST database 186 Shortcuts 95 Show enzymes cutting selection 331 results from a finished process 93 Show dialogs 105 Show enzymes with compatible ends 334 INDEX Show hide Toolbox 94 Shuffle sequence 199 377 Side Panel tutorial 41 Side Panel Settings export 107 import 107 Share with others 107 Side Panel location of 106 Signal peptide 378 Single base editing in contig 299 in sequences 149 Single cutters 329 SNP detection 3 76 Solexa see Illumina Genome Analyzer SOLID data 3 0 Sort sequences alphabetically 358 sequences by similarity 358 Sort sequences by name 2 9 Sort folders 80 Source element 132 Species display name 82 Staden file format 393 Standard layout trees 3 0 Standard Settings CLC 111 Star activity 343 Start Codon 234 Start up problems 29 Statistics about sequence 3 protein 215 sequence 212 Status Bar 93 94 illustration Str file format 395 Structure scanning 3 9 Style sheet preferences 110 Subcontig extract part of a contig 301 Support mail 12 Surface probability 146 svg format export 126 Swiss Prot file format 393 Swi
421. ng Primers Protein analyses 4 7 Protein orthologs RNA secondary st gt OG Xx ATP8al mRNA 5 Sequencing data ce 4 al p Qe zenter search term gt 4 aie Xena gt Figure 14 2 Translating RNA to DNA If a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish This will open a new view in the View Area displaying the new DNA sequence The new sequence is not saved automatically To save the sequence drag it into the Navigation Area or press Ctrl CHAPTER 14 NUCLEOTIDE ANALYSES 231 S S on Mac to activate a save dialog Note You can select multiple RNA sequences and sequence lists at a time If the sequence list contains DNA sequences as well they will not be converted 14 3 Reverse complements of sequences CLC DNA Workbench is able to create the reverse complement of a nucleotide sequence By doing that a new sequence is created which also has all the annotations reversed since they now occupy the opposite strand of their previous location To quickly obtain the reverse complement of a sequence or part of a sequence you may select a region on the negative strand and open it in a new view right click
422. ng alignments of nucleotide or peptide sequences the software offers several ways to view alignments The alignments can then be used for building phylogenetic trees Sequences must be available via the Navigation Area to be included in an alignment If you have sequences open in a View that you have not saved then you just need to select the view tab and press Ctrl S or S on Mac to save them In this tutorial six protein sequences from the Example data folder will be aligned See figure 2 51 Example data Xx ATPS8al genomic sequence XxX ATPSal mRNA sys ATP8al Figure 2 51 Six protein sequences in Sequences from the Protein orthologs folder of the Example data To align the sequences select the sequences from the Protein folder under Sequences Toolbox Alignments and Trees Create Alignment iE 2 10 1 The alignment dialog This opens the dialog shown in figure 2 52 q Create Alignment Xu 1 Select sequences of same Bad eiilbsicibeiesiMlli9 1 Projects Selected Elements 6 094296 P39524 P57792 Q29449 QONTI2 Q95x33 Elta CLC Data gt Example Data XxX ATP8al genomit XxX ATP8al mRNA ht ATP8al feces Protein analyse 5 Protein ortholog SEE ATPSal orth 222222 RNA secondary Sequencing dat j Qy lt enter search term gt Figure 2 52 The alignment dialog displaying the six protein sequences CH
423. ng temperature algorithm employed includes the latest thermodynamic parameters for calculating Zm when single base mismatches occur When in Standard PCR mode clicking the Calculate button will prompt the dialog shown in figure 16 13 The top part of this dialog shows the single primer parameter settings chosen in the Primer parameters preference group which will be used by the design algorithm The central part of the dialog contains parameters pertaining to primer specificity this is omitted if all sequences belong to the included group Here three parameters can be set e Minimum number of mismatches the minimum number of mismatches that a primer must have against all sequences in the excluded group to ensure that it does not prime these e Minimum number of mismatches in 3 end the minimum number of mismatches that a primer must have in its 3 end against all sequences in the excluded group to ensure that it does not prime these e Length of 3 end the number of consecutive nucleotides to consider for mismatches in the 3 end of the primer The lower part of the dialog contains parameters pertaining to primer pairs this is omitted when only designing a single primer Here three parameters can be set e Maximum percentage point difference in G C content if this is set at e g 5 points a pair of primers with 45 and 49 G C nucleotides respectively will be allowed whereas a pair of primers with 45 and 51 G C nucl
424. ng the dot plot The Side Panel to the right let you specify the dot plot preferences The gradient color box can be adjusted to get the appropriate result by dragging the small pointers at the top of the box Moving the slider from the right to the left lowers the thresholds which can be directly seen in the dot plot where more diagonal lines will emerge You can also choose another color gradient by clicking on the gradient box and choose from the list Adjusting the sliders above the gradient box is also practical when producing an output for printing Too much background color might not be desirable By crossing one slider over the other the two sliders change side the colors are inverted allowing for a white background If you choose a color gradient which includes white Se figure 13 5 CHAPTER 13 GENERAL SEQUENCE ANALYSES 204 PRECES va PERDEI 140 130 120 1104 Sequence 2 10 a 30 am 50 ED TD Em Bo 100 110 im 130 140 Sequence Figure 13 6 Dot plot with inverted colors practical for printing 13 2 3 Bioinformatics explained Dot plots Realization of dot plots Dot plots are two dimensional plots where the x axis and y axis each represents a sequence and the plot itself shows a comparison of these two sequences by a calculated score for each position of the sequence If a window of fixed size on one sequence one axis match to the other sequence a dot is drawn at
425. ngs open CLC DNA Workbench and go to the Advanced tab of the Preferences dialog figure 1 28 and enter the appropriate information The Preferences dialog is opened from the Edit menu CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 34 Use HTTP Proxy Server O u HTTP Proxy Port f HTTP Proxy Requires Login Passwor d Use SOCKS Proxy Server SOCKS Host Port be ILE You may have to restart the application For these changes to take effect Default Data Location Default Data Location CLC Data w m CSI BLAST URL to use when blasting http blast ncbi nim nih gov Blast cgi Maximum number of simultaneous requests 10 Delay in ms between requests 3000 X Cancel Help Export Import Figure 1 28 Adjusting proxy preferences You have the choice between a HTTP proxy and a SOCKS proxy CLC DNA Workbench only supports the use of a SOCKS proxy that does not require authorization Exclude hosts can be used if there are some hosts that should be contacted directly and not through the proxy server The value can be a list of hosts each separated by a and in addition a wildcard character can be used for matching For example f00 com localhost If you have any problems with these settings you should contact your systems administrator 1 9 The format of the user manual This user manual offers support to Windows Mac OS X and Linux users The software is
426. nsus MVHLTXEEKN AVTGLWGKVN VDEVGGEALG Auto wrap Fixed wrap Sequence log asi avraLihek EE a Numbers on sequences Relative to 1 P68046 MEENE ET REEDSEGcDES sPDANMGNPR 59 Cia ei p6s053 REENNMPWTO REBDSEcCDBs sPBalmcnPk 59 p6s225 REGUURPWTQ RERESEcBEs sPBamcnPkK co numb P68873 REBWMNPWTQ RERESEGDES TPBANMGNPK 60 Hd label p6s226 REBMUMPWTR RERESEGDES TABANMNNPK 60 Falta mio at ot v 1 elements are selected Figure 3 15 A maximized view The function hides the Navigation Area and the Toolbox 3 2 7 Side Panel The Side Panel allows you to change the way the contents of a view are displayed The options in the Side Panel depend on the kind of data in the view and they are described in the relevant sections about sequences alignments trees etc Side Panel are activated in this way select the view Ctrl U 36 U on Mac or right click the tab of the view View Show Hide Side Panel 15 Note Changes made to the Side Panel will not be saved when you save the view See how to save the changes in the Side Panel in chapter 5 The Side Panel consists of a number of groups of preferences depending on the kind of data CHAPTER 3 USER INTERFACE 91 being viewed which can be expanded and collapsed by clicking the header of the group You can also expand or collapse all the groups by clicking the icons at the top 3 3 Zoom and selection in View Area The mode toolbar ite
427. ntation original Drag handles to adjust sequence overhangs pcDNA4_TO pcDNAS atp8ai NotI Apal pcDNA4 TO l b d l 5 3 vo ES GECECCL AGECECEC Mello pr z e CCGCCG OGAT o BEE CCGGGCA b v Summary Vector sequence pcDNA4 TO Positive strand 3 no change Positive strand 5 no change Negative strand 3 no change a Negative strand 5 no change Insert sequence pcDNA3 atp8a1_ NotI Apal orientation original o a Figure 18 12 Drag the handles to adjust overhangs At the top is a button to reverse complement the inserted sequence Below is a visualization of the insertion details The inserted sequence is at the middle shown in red and the vector has been split at the insertion point and the ends are shown at each side of the inserted sequence CHAPTER 18 CLONING AND CUTTING 318 If the overhangs of the sequence and the vector do not match you can blunt end or fill in the overhangs using the drag handles Whenever you drag the handles the status of the insertion point is indicated below e The overhangs match f e The overhangs do not match 2 In this case you will not be able to click Finish Drag the handles to make the overhangs match At the bottom of the dialog is a summary field which records all the changes made to the overhangs This contents of the summary will also be written in the history Li when you click Finish When you click Finish and the sequence
428. nter concerning the properties of primer probe pairs or sets e g primer pair annealing and Tm difference between primers If the latter is desired the user can use the Calculate button at the bottom of the Primer parameter preference group This will activate a dialog the contents of which depends on the chosen mode Here the user can set primer pair specific setting such as allowed or desired Tm CHAPTER 16 PRIMERS 251 difference and view the single primer parameters which were chosen in the Primer parameters preference group Upon pressing finish an algorithm will generate all possible primer sets and rank these based on their characteristics and the chosen parameters A list will appear displaying the 100 most high scoring sets and information pertaining to these The search result can be saved to the navigator From the result table suggested primers or primer probe sets can be explored since clicking an entry in the table will highlight the associated primers and probes on the sequence It is also possible to save individual primers or sets from the table through the mouse right click menu For a given primer pair the amplified PCR fragment can also be opened or saved using the mouse right click menu 16 1 2 Scoring primers CLC DNA Workbench employs a proprietary algorithm to rank primer and probe solutions The algorithm considers both the parameters pertaining to single oligos such as e g the secondary structure score and parameters
429. nts in the right hand side of the dialog To run the contents of the Cloning folder in batch double click to select it When the Cloning folder is selected and you click Next a batch overview is shown 9 1 1 Batch overview The batch overview lists the batch units to the left and the contents of the selected unit to the right see figure 9 3 CHAPTER 9 BATCHING AND RESULT HANDLING 135 dq Find Binding Sites and Create Fragments n 28 a 1 Choose where to run BBSSIS nasais Units Contents een Miamp puce sequence s to match primer against pcDNAS atp8al mS M1i3mp9 pUC9 pcDNA4_TO pACYC177 3 Batch overview Processed data pACYC184 pAM34 p Tis3 pATH1 pATH10 pATHI1 pATH2 p TH3 pBLCATZ pBLCAT3 pBLCATS pBLCAT6 pBR322 pBR325 pBR327 x Only use elements containing Exclude elements containing 90 elements in total Finist Previous gt Next cancel X JRRERRR RRR RR RRR RRR Figure 9 3 Overview of the batch run In this example the two sequences are defined as separate batch units because they are located at the top level of the Cloning folder There were also three folders in the Cloning folder see figure 9 2 and two of them are listed as well This means that the contents of these folders are pooled in one batch run you can see the contents of the Cloning vector library batch run in the panel at the right hand side of the dialog The r
430. nvitrogen Corporation CHAPTER 18 CLONING AND CUTTING 319 EI Insert restriction site before selection o a 1 Please choose enzymes a cera lites Demat pdoe Enzyme list Use existing enzyme list All enzymes Filter HindIII Name Methylation HindIII 5 N6 methyladenosine Sequence to be inserted 5 additional HindIII 3 additional cgath AAGCTT Figure 18 14 Inserting the Hindlll recognition sequence e First attB sites are added to a sequence fragment e Second the attB flanked fragment is recombined into a donor vector the BP reaction to construct an entry clone e Finally the target fragment from the entry clone is recombined into an expression vector the LR reaction to construct an expression clone For Multi site gateway cloning multiple entry clones can be created that can recombine in the LR reaction During this process both the attB flanked fragment and the entry clone can be saved For more information about the Gateway technology please visit http www invitrogen com site us en home Products and Services Applications Cloning Gateway Cloning html To perform these analyses in the CLC DNA Workbench you need to import donor and expression vectors These can be downloaded from Invitrogen s web site and directly imported into the Workbench http tools invitrogen com downloads Gateway 20vectors ma4 18 2 1 Add attB sites The first step in the Gateway cloning p
431. o Se A we 238 15 2 Hydrophobicity sacarose EA RAE Oe E Oe 239 15 2 1 Hydrophobicity plot gaa bee c ae ew eres dd ASS EGE HOS 239 15 2 2 Hydrophobicity graphs along sequence n noos osom oa 286 239 15 2 3 Bioinformatics explained Protein hydrophobicity 241 15 3 Reverse translation from protein into DNA 0 0088 ee eee 243 15 3 1 Reverse translation parameters aoao oaoa ee a 244 15 3 2 Bioinformatics explained Reverse translation 245 CLC DNA Workbench offers analyses of proteins as described in this chapter 15 1 Protein charge In CLC DNA Workbench you can create a graph in the electric charge of a protein as a function of pH This is particularly useful for finding the net charge of the protein at a given pH This knowledge can be used e g in relation to isoelectric focusing on the first dimension of 2D gel electrophoresis The isoelectric point pl is found where the net charge of the protein is zero The calculation of the protein charge does not include knowledge about any potential post translational modifications the protein may have The pKa values reported in the literature may differ slightly thus resulting in different looking graphs of the protein charge plot compared to other programs In order to calculate the protein charge Select a protein sequence Toolbox in the Menu Bar Protein Analyses la Create Protein Charge Plot or right click a protein sequence Toolbox
432. o Sea project This does overlap with nucleotide nr D 3 Adding more databases Besides the databases that are part of the default configuration you can add more databases located at NCBI by configuring files in the Workbench installation directory The list of databases that can be added is here http www ncbi nlm nih gov staff tao URLAPI remote blastdblist html In order to add a new database find the settings folder in the Workbench installation directory e g C Program files CLC Genomics Workbench 4 Download unzip and place the following files in this directory to replace the built in list of databases e Nucleotide databases http www clcbio com wbsettings NCBI BlastNucleotideDataba Zip e Protein databases http www clcbio com wbsettings NCBI_BlastProteinDatabases zip Open the file you have downloaded into thesettings folder e g NCBI_BlastProteinDatabases proper in a text editor and you will see the contents look like this APPENDIX D BLAST DATABASES 388 nr clcdefault Non redundant protein sequences refseq_protein Reference proteins swissprot Swiss Prot protein sequences pat Patented protein sequences pdb Protein Data Bank proteins env_nr Environmental samples month New or revised GenBank sequences Simply add another database as a new line with the first item being the database name taken from http www ncbi nlm nih gov staff tao URLAPI remote blastdblist html and the second part is the
433. o contig 295 Adjust selection 148 Adjust trim 296 Advanced preferences 109 Advanced search 101 Algorithm alignment 347 neighbor joining 3 3 UPGMA 372 Align alignments 350 protein sequences tutorial 69 sequences 3 8 Alignment see Alignments Alignment Primers Degenerate primers 266 267 PCR primers 266 Primers with mismatches 266 267 Primers with perfect match 266 267 TaqMan Probes 266 Alignment based primer design 265 Alignments 347 378 add sequences to 359 compare 361 create 348 design primers for 265 edit 357 fast algorithm 349 join 359 multiple Bioinformatics explained 364 remove sequences from 358 view 353 view annotations on 153 Aliphatic index 215 aln file format 395 Alphabetical sorting of folders 80 Ambiguities reverse translation 246 Amino acid composition 217 Amino acids abbreviations 396 UIPAC codes 396 Analyze primer properties 269 Annotation select 148 Annotation Layout in Side Panel 153 Annotation types define your own 157 Annotation Types in Side Panel 153 Annotations add 156 copy to other sequences 358 edit 156 158 in alignments 358 introduction to 152 links 1 2 overview of 155 show hide 153 table of 155 trim 288 types of 153 view on sequence 153 viewing 152 Annotations add links to 158 Antigenicity 378 Append wildcard search 168 Arrange layout of sequence 39 405 INDEX views in View Area 88
434. o demon strates our software This is a very easy way to get started using the program Read more about online presentations here http clcbio com presentation 1 6 1 Quick start When the program opens for the first time the background of the workspace is visible In the background are three quick start shortcuts which will help you getting started These can be seen in figure 1 23 Figure 1 23 Three available Quick start short cuts available in the background of the workspace CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 30 The function of the three quick start shortcuts is explained here e Import data Opens the Import dialog which you let you browse for and import data from your file system e New sequence Opens a dialog which allows you to enter your own sequence e Read tutorials Opens the tutorials menu with a number of tutorials These are also available from the Help menu in the Menu bar 1 6 2 Import of example data It might be easier to understand the logic of the program by trying to do simple operations on existing data Therefore CLC DNA Workbench includes an example data set When downloading CLC DNA Workbench you are asked if you would like to import the example data set If you accept the data is downloaded automatically and saved in the program If you didn t download the data or for some other reason need to download the data again you have two options You can click Install Example Data c in t
435. o find annotations or a subset of annotations e You can copy and paste annotations e g from one sequence to another e f you wish to edit many annotations consecutively the double click editing makes this very fast see section 10 3 2 10 3 2 Adding annotations Adding annotations to a sequence can be done in two ways open the sequence in a sequence view double click in the Navigation Area make a selection covering the part of the sequence you want to annotate right click the selection Add Annotation CHAPTER 10 VIEWING AND EDITING SEQUENCES 157 or select the sequence in the Navigation Area Show 45 Annotations E Add Annotation This will display a dialog like the one in figure 10 10 E c Add annotation Annotation types tj Protein Features Protein Functional Features Protein Sequence Features DNA RNA Sequence Features 1 Alignment fixpoint Properties Name Test Type Misc feature a Region 10 16 Annotation notes Add qualifier key da X Cancel Help Figure 10 10 The Add Annotation dialog The left hand part of the dialog lists a number of Annotation types When you have selected an annotation type it appears in Type to the right You can also select an annotation directly in this list Choosing an annotation type is mandatory If you wish to use an annotation type which is not present in the list simply ente
436. o the clipboard which will enable it for use in other programs e Duplicate Selection If a selection on the sequence is duplicated the selected region will be added as a new sequence to the cloning editor with a new sequence name representing the length of the fragment When a sequence region between two restriction sites are double clicked the entire region will automatically be selected This makes it very easy to make a new sequence from a fragment created by cutting with two restriction sites right click the selection and choose Duplicate selection e Open Selection in New View Log This will open the selected region in the normal sequence view e Edit Selection 1 8 This will open a dialog box in which is it possible to edit the selected residues e Delete Selection This will delete the selected region of the sequence e Add Annotation This will open the Add annotation dialog box e Show Enzymes Only Cutting Selection 4 This will add enzymes cutting this selection to the Side Panel e Insert Restriction Sites before after Selection This will show a dialog where you can choose from a list restriction enzymes see section 18 1 4 CHAPTER 18 CLONING AND CUTTING 317 Insert one sequence into another Sequences can be inserted into each other in several ways as described in the lists above When you chose to insert one sequence into another you will be presented with a dialog where all sequences in the view are p
437. obes based on an alignment of multiple sequences The primer designer for alignments can be accessed in two ways select alignment Toolbox Primers and Probes 71 Design Primers 1x OK or If the alignment is already open Click Primer Designer at the lower left part of the view In the alignment primer view see figure 16 12 the basic options for viewing the template alignment are the same as for the standard view of alignments See section 19 for an explanation of these options Note This means that annotations such as e g known SNP s or exons can be displayed on the template sequence to guide the choice of primer regions Since the definition of groups of sequences is essential to the primer design the selection boxes of the standard view are shown as default in the alignment primer view small nucleot 20 P Primer Designer settings x i l mm PERH2BD O GTGAGTCTGA TGGGTCTGCC CATGGTTTTC TTCCTCTAGT Ss LSS PERH3BC O GTGAGTCTGA TGGGTCTGCC CATGGTTTCC TTCCTCTAGT Primer parameters A Consensus GTGAGTCTGA TGGGTCTGCC CATGGTTTNC TTCCTCTAGT Length meros GTGAGTCTGA TOGGTOTGCE CATOGTTISO TTCCTCTAGT Min 18 gt j PERH2BD O TTCTGGGGTT ACCTTCCTAT CAGAAGGAAA GGGGAAGAGA Melt temp C E PERH3BC O TTCTGGGCTT ACCTTCCTAT CAGAAGGAAA TGGGAAGAGA Max 58 es Consensus TTCTGGGNTT ACCTTCCTAT CAGAAGGAAA NGGGAAGAGA Min 48 eso TTT TT ACTTCCTAT CACHAN Stes Max PERH2BD O TTCTAGGGAG TC
438. ocal If you do not have root privileges you can choose to install in your home directory e Choose where you would like to create symbolic links to the program DO NOT create symbolic links in the same location as the application Symbolic links should be installed in a location which is included in your environment PATH For a system wide installation you can choose for example usr local bin If you do not have root privileges you can create a bin directory in your home directory and install symbolic links there You can also choose not to create symbolic links e Wait for the installation process to complete and click Finish If you choose to create symbolic links in a location which is included in your PATH the program can be executed by running the command clcdnawb6 Otherwise you start the application by navigating to the location where you choose to install it and running the command clcdnawb6 CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 15 1 2 5 Installation on Linux with an RPM package Navigate to the directory containing the rom package and install it using the rpm tool by running a command similar to rpm ivh CLCDNAWorkbench 6 JRE rpm If you are installing from a CD the rpm packages are located in the RPMS directory Installation of RPM packages usually requires root privileges When the installation process is finished the program can be executed by running the command clcdnawb6 1 3 System r
439. od generating a sequence of the same expected mononucleotide frequency CHAPTER 13 GENERAL SEQUENCE ANALYSES 201 e Dinucleotide sampling from first order Markov chain Resampling method generating a sequence of the same expected dinucleotide frequency For proteins the following parameters can be set e Single amino acid shuffling Shuffle method generating a sequence of the exact same amino acid frequency e Single amino acid sampling from zero order Markov chain Resampling method generating a sequence of the same expected single amino acid frequency e Dipeptide shuffling Shuffle method generating a sequence of the exact same dipeptide frequency e Dipeptide sampling from first order Markov chain Resampling method generating a sequence of the same expected dipeptide frequency For further details of these algorithms see Clote et al 2005 In addition to the shuffle method you can specify the number of randomized sequences to output Click Next if you wish to adjust how to handle the results See section 9 2 If not click Finish This will open a new view in the View Area displaying the shuffled sequence The new sequence is not saved automatically To save the sequence drag it into the Navigation Area or press ctrl S S on Mac to activate a save dialog 13 2 Dot plots Dot plots provide a powerful visual comparison of two sequences Dot plots can also be used to compare regions of similarity within a sequence
440. oes not meet the requirements set in the Primer parameters See figure 2 38 Primer parameters Length Min 18 amp Melt temp C Max a Min 48 3 Inner Melk temp 20 t Advanced parameters Mode Standard PCR O TaqMan O Nested PCR Sequencing Calculate w E Figure 2 38 The Primer parameters The default maximum melting temperature is 58 This is the reason why the primer in figure 2 31 with a melting temperature of 58 55 does not meet the requirements and is colored red If you raise the maximum melting temperature to 59 the primer will meet the requirements and the dot becomes green In figure 2 37 there is an asterisk before the melting temperature This indicates that this primer does not meet the requirements regarding melting temperature In this way you can easily see why a specific primer represented by a dot fails to meet the requirements By adjusting the Primer parameters you can define primersto meet your specific needs Since the dots are dynamically updated you can immediately see how a change in the primer parameters affects the number of red and green dots CHAPTER 2 TUTORIALS 60 2 3 Calculating a primer pair Until now we have been looking at the forward primer To mark a region for the reverse primer make a selection from position 1200 to 1400 and Right click the selection Reverse primer region here The two regions should now be located as shown in figure 2 39
441. of the sites right click the site on the sequence and choose De select This Site When the right target vector is selected you are ready to Perform Cloning ij see below Perform cloning Once selections have been made for both fragments and vector click Perform Cloning 4 This will display a dialog to adapt overhangs and change orientation as shown in figure 18 6 CHAPTER 18 CLONING AND CUTTING O Cloning exper Sequence 1 of 2 pcDNA4_TO 5 078bp circular ve v b Ampicilli pUC all la indill Y l Sall l 1 MZ PstI ori pcDNA4 TO YNES BGH reverse primer 5 078bp gt BGH pA Rs g l o SV40 pA V Show as Circular Vector poDNA4 TO Change to ent Hindi E 1 ori SV40 promoter and ori Smal EM 7 Zeocin O pcDNA4_TO 5 078bp circular vector z D Eg O Target vector from XhoI cut at 1052 1053 to HindIII cut at 978 979 5 004bp re Target vector from HindIII cut at 9787979 to XhoI cut at 1052 1053 74bp Op y TCGAGIC CAG V Target vector defined X Define fragments to t 312 Sequence details 4 Show gt Sequence layout gt Annotation layout Annotation types v Restriction sites V Show Labels Stacked w Sorting Aa E GI b V Non cutters v V Single cutters BB vi a a F Double cutters HN samt 2 E F Eco O geo B T Hinan 2 Maa O w V Multiple cutters E Wisma 3 Drs vison
442. of these two functionalities are described below 17 2 1 Sort sequences by name With this functionality you will be able to group sequencing reads based on their file name A typical example would be that you have a list of files named like this AUZ Asp F 016 2007 01 L0 A Z Asp R UL6 2007 01 L0 AOZ Gln F Olo 7007 01 AO Z Gli ROLO 2007 0 1 11 A03 Asp F 051 2007 01 10 CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 280 A03 Asp R 051 2007 01 10 AUS Gln PF 031 2007 01 11 AOS Gln B OS 2007 01 11 In this example the names have five distinct parts we take the first name as an example e A02 which is the position on the 96 well plate e Asp which is the name of the gene being sequenced e F which describes the orientation of the read forward reverse e 016 which is an ID identifying the sample e 2007 01 10 which is the date of the sequencing run To start mapping these data you probably want to have them divided into groups instead of having all reads in one folder If for example you wish to map each sample separately or if you wish to map each gene separately you cannot simply run the mapping on all the sequences in one step That is where Sort Sequences by Name comes into play It will allow you to specify which part of the name should be used to divide the sequences into groups We will use the example described above to show how it works Toolbox High throughput Sequencing Multiplexing H3 Sort S
443. omitted whereas the found BLAST hits will automatically be placed right below the query sequence e Compactness You can control the level of sequence detail to be displayed Not compact Full detail and spaces between the sequences Low The normal settings where the residues are visible when zoomed in but with no extra spaces between Medium The sequences are represented as lines and the residues are not visible There is some space between the sequences Compact Even less space between the sequences e BLAST hit coloring You can choose whether to color hit sequences and you can adjust the coloring e Coverage In the Alignment info in the Side Panel you can visualize the number of hit sequences at a given position on the query sequence The level of coverage is relative to the overall number of hits included in the result Foreground color Colors the letters using a gradient where the left side color is used for low coverage and the right side is used for maximum coverage Background color Colors the background of the letters using a gradient where the left side color is used for low coverage and the right side is used for maximum coverage Graph The coverage is displayed as a graph beneath the query sequence Learn how to export the data behind the graph in section 4 x Height Specifies the height of the graph x Type The graph can be displayed as Line plot Bar plot or as a Color bar
444. on S55 N Rseg T ae Dobs logs i gt pn logo pn n 1 Pn is the observed frequency of a amino acid residue or nucleotide of symbol n at a particular position and N is the number of distinct symbols for the sequence alphabet either 20 for proteins or four for DNA RNA This means that the maximal sequence information content per position is log 4 2 bits for DNA RNA and logs 20 4 32 bits for proteins The original implementation by Schneider does not handle sequence gaps We have slightly modified the algorithm so an estimated logo is presented in areas with Sequence gaps lf amino acid residues or nucleotides of one sequence are found in an area containing gaps we have chosen to show the particular residue as the fraction of the sequences Example if one position in the alignment contain 9 gaps and only one alanine A the A represented in the logo has a hight of 0 1 Other useful resources The website of Tom Schneider http www Immb nci ferf gov Loms WebLogo CHAPTER 19 SEQUENCE ALIGNMENT 357 http weblogo berkeley edu Crooks et al 2004 19 3 Edit alignments 19 3 1 Move residues and gaps The placement of gaps in the alignment can be changed by modifying the parameters when creating the alignment see section 19 1 However gaps and residues can also be moved after the alignment is created select one or more gaps or residues in the alignment drag the selection to move This can be done both fo
445. on site regular expression Use motif list O Motif 8333 Accuracy 80 w Search for the reverse motif Include negative strand Exclude unknown regions Exclude matches in N regions for simple motifs g IA Ceen ame gema Xena Figure 13 26 Setting parameters for the motif search The options for the motif search are e Motif types Choose what kind of motif to be used Simple motif Choosing this option means that you enter a simple motif e g ATGATGNNATG Java regular expression See section 13 7 3 Prosite regular expression For proteins you can enter different protein patterns from the PROSITE database protein patterns using regular expressions and describing specific amino acid sequences The PROSITE database contains a great number of patterns and have been used to identify related proteins see http www expasy Org eg1 bin prosi be liet pl CHAPTER 13 GENERAL SEQUENCE ANALYSES 225 nr zl Use motif list Clicking the small button acy will allow you to select a saved motif list see section 13 7 4 e Motif If you choose to search with a simple motif you should enter a literal string as your motif Ambiguous amino acids and nucleotides are allowed Example ATGATGNNATG If your motif type is Java regular expression you should enter a regular expression according to the syntax rules described in section 13 7 3 Press Shift F1 key for options For prot
446. on machinery Selenocysteines are very rare amino acids The table below shows the Standard Genetic Code which is the default translation table CHAPTER 15 PROTEIN ANALYSES TTT F Phe TCT S Ser TAT Y Tyr TGT C Cys TTC F Phe TCC S Ser TAC Y Tyr TGC C Cys TTA L Leu TCA S Ser TAA Ter TGA Ter TTG L Leu i TCG S Ser TAG Ter TGG W Trp CTT L Leu CCT P Pro CAT H His CGT R Arg CTC L Leu CCC P Pro CAC H His CGC R Arg CTA L Leu CCA P Pro CAA Q Gin CGA R Arg CTG L Leu i CCG P Pro CAG Q Gin CGG R Arg ATT I lle ACT T Thr AAT N Asn AGT S Ser ATC I lle ACC T Thr AAC N Asn AGC S Ser ATA I Ile ACA T Thr AAA K Lys AGA R Arg ATG M Met i ACG T Thr AAG K Lys AGG R Arg GTT V Val GCT A Ala GAT D Asp GGT G Gly GTC V Val GCC A Ala GAC D Asp GGC G Gly GTA V Val GCA A Ala GAA E Glu GGA G Gly GTG V Val GCG A Ala GAG E Glu GGG G Gly 246 Challenge of reverse translation A particular protein follows from the translation of a DNA sequence whereas the reverse translation need not have a specific solution according to the Genetic Code The Genetic Code is degenerate which means that a particular amino acid can be translated into more than one codon Hence there are ambiguities of the reverse translation Solving the ambiguities of reverse translation In order to solve these ambiguities of reverse translation you can define how to prioritize the codon selection e g e Choose a codon randomly e Select the most frequent codon in a given organism e Randomiz
447. on sequences alignments or trees See section 3 2 5 for more on this topic e Audit Support If this option is checked all manual editing of sequences will be marked with an annotation on the sequence see figure 5 2 Placing the mouse on the annotation will reveal additional details about the change made to the sequence see figure 5 3 Note that no matter whether Audit Support is checked or not all changes are also recorded in the History Lil see section 8 e Number of hits The number of hits shown in CLC DNA Workbench when e g searching NCBI The sequences shown in the program are not downloaded until they are opened or dragged saved into the Navigation Area e Locale Setting Specify which country you are located in This determines how punctation is used in numbers all over the program e Show Dialogs A lot of information dialogs have a checkbox Never show this dialog again When you see a dialog and check this box in the dialog the dialog will not be shown again If you regret and wish to have the dialog displayed again click the button in the General Preferences Show Dialogs Then all the dialogs will be shown again Deleted selection Editing of sequence selection 220 0 260 GAGATGCCATGCGGAGGACAGTCGGAGATCCGCTCGCGCGCGGA Figure 5 2 Annotations added when the sequence is edited 260 GATCCGCTCGCGCGCGGAAGGTTAT Figure 5 3 Details of the editing CHAPTER 5 USER PREFERENCES AND SETTINGS 106
448. on site 0 One restriction site 1 Two restriction sites 2 Three restriction sites 3 N restriction sites Minimum 1 Viaximum Any number of restriction sites gt 0 it na CE TEME teh _ TE Figure 18 43 Selecting number of cut sites If you wish the output of the restriction map analysis only to include restriction enzymes which cut the sequence a specific number of times use the checkboxes in this dialog e No restriction site 0 e One restriction site 1 e Two restriction sites 2 e Three restriction site 3 N restriction sites Minimum Maximum e Any number of restriction sites gt O The default setting is to include the enzymes which cut the sequence one or two times You can use the checkboxes to perform very specific searches for restriction sites e g if you wish to find enzymes which do not cut the sequence or enzymes cutting exactly twice Output of restriction map analysis Clicking next shows the dialog in figure 18 44 This dialog lets you specify how the result of the restriction map analysis should be presented e Add restriction sites as annotations to sequence s This option makes it possible to see the restriction sites on the sequence see figure 18 45 and save the annotations for later use CHAPTER 18 CLONING AND CUTTING 338 q Restriction Site Analysis e v 1 Select DNA RNA Result handling sequence s 2 Enzymes to be considered in cal
449. onding sequence on the chromosome If you place the mouse cursor on the sequence hits in the graphical view you can see the reading frame which is 1 2 and 3 for the three hits respectively Verify the result Open NC 000011 in a view and go to the Hit start position 5 204 29 and zoom to see the blue gene annotation You can now see the exon structure of the Human beta globin gene Showing the three exons on the reverse strand see figure 2 47 If you wish to verify the result make a selection covering the gene region and open it in a new view right click Open Selection in New View L Save 5 Save the sequence and perform a new BLAST search e Use the new sequence as query CHAPTER 2 TUTORIALS 66 i AAA15F34 BLAST HBB he Tr FE ali Blast layout AAA 163934 Gather sequences at top ie i NIBL ORD IDJO nIBL ORD ID 0 7 Sequence color nIBL ORD IDJ0 Identity inl BL_ORD_IDja nIBL ORD IDO nIBL ORD IDJO Blast hit coloring UI 40 100 Sequence layout NIIRI ORD IDIN s sai E u gt V Numbers on sequences E NE Rows 17 Summary of hits From query 44416334 hm E 5 Descrip Ion A E value Hit start Query start Query end Identity Positive RE Ee r me E value 2 356 54 5204380 5204607 31 106 99 Score 1 06E 16 5203407 5203539 5204827 E 147 38 F Bit score 3 06E 50 5211794 5212021 31 106 95 9456
450. ontributing sequence reads CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 298 will automatically be placed right below the reference This setting is not relevant when the compactness is packed Show sequence ends Regions that have been trimmed are shown with faded traces and residues This illustrates that these regions have been ignored during the assembly Show mismatches When the compactness is packed you can highlight mismatches which will get a color according to the Rasmol color scheme A mismatch is whenever the base is different from the reference sequence at this position This setting also causes the reads that have mismatches to be floated at the top of the view Packed read height When the compactness is packed you can choose the height of the visible reads When there are more reads than the height specified an overflow graph will be displayed below the reads Find Conflict Clicking this button selects the next position where there is an conflict between the sequence reads Residues that are different from the reference are colored as default providing an overview of the conflicts Since the next conflict is automatically selected it is easy to make changes You can also use the Space key to find the next conflict e Alignment info There is one additional parameter Coverage Shows how many sequence reads that are contributing information to a given position in the contig The level of coverage
451. oolbox 2 12 1 The Side Panel way of finding restriction sites When you open a sequence there is a Restriction sites setting in the Side Panel By default 10 of the most popular restriction enzymes are shown see figure 2 56 Restriction sites Show Labels Stacked Sorting Ra LI Po 54 Non cutters Single cutters 4 BamHI O F F EoR O E Eo ao O B 7 Hinan 9 O BE rao O BE vob O Figo amp Double cutters Men o DO sma 2 Multiple cutters e P 7 sat 3 O Figure 2 56 Showing restriction sites of ten restriction enzymes ST TAGAGGGCCCGTTTAAACC The restriction sites are shown on the sequence with an indication of cut site and recognition sequence In the list of enzymes in the Side Panel the number of cut sites is shown in parentheses for each enzyme e g Sall cuts three times If you wish to see the recognition sequence of the enzyme place your mouse cursor on the enzyme in the list for a short moment and a tool tip will appear You can add or remove enzymes from the list by clicking the Manage enzymes button CHAPTER 2 TUTORIALS 3 2 12 2 The Toolbox way of finding restriction sites Suppose you are working with sequence ATP8a1 MRNA from the example data and you wish to know which restriction enzymes will cut this sequence exactly once and create a 3 overhang Do the following select the ATP8a1 mRNA sequence Toolbox in the Menu Bar Cloning and Restriction S
452. op right side of the Workbench e g Pan Zooming tools etc Once the sequence is selected right click and choose to Insert Restriction Site Before Selection as shown in figure 2 25 TGCCGACC 4 Tceceecare Move Starting Point to Selection Start ATP8a1 fwd Copy Open Selection in New View Edit Selection Delete Selection Add Annotation Trim Sequence Left Trim Sequence Right Set Alignment Fixpoint Here Set Numbers Relative to This Selection Insert Restriction Site After Selection Insert Restriction Site Before Selection HE Show Enzymes Cutting Inside Outside Selection BLAST Selection against NCBI 4 BLAST Selection against Local Data Add Structure Prediction Constraints gt Figure 2 25 Adding restriction sites to a primer In the Filter box enter Hindlll and click on it At the bottom of the dialog add a few extra bases 5 of the cut site this is done to increase the efficiency of the enzyme as shown figure 18 14 CHAPTER 2 TUTORIALS 54 E E EI Insert restriction site before selection E 4 E ease choose enzymes Enzyme list 1 Please choose enzymes Use existing enzyme list All enzymes Filter HindIII Name Methylation Popularity HindIII 5 N6 methyladenosine Sequence to be inserted 5 additional HindIII cgath AAGCTT da Xen Figure 2 20 Adding restriction sites to a primer Click OK and the sequence will
453. option should be used if a file is imported as a bioinformatics file when it should just have been external file It could be an ordinary text file which is imported as a sequence Import using drag and drop It is also possible to drag a file from e g the desktop into the Navigation Area of CLC DNA Workbench This is equivalent to importing the file using the Automatic import option described above If the file type is not recognized it will be imported as an external file Import using copy paste of text If you have e g a text file or a browser displaying a sequence in one of the formats that can be imported by CLC DNA Workbench there is a very easy way to get this sequence into the Navigation Area Copy the text from the text file or browser Select a folder in the Navigation Area Paste 71 This will create a new sequence based on the text copied This operation is equivalent to saving the text in a text file and importing it into the CLC DNA Workbench If the sequence is not formatted i e if you just have a text like this ATGACGAATAGGAGTTC TAGCTA you can also paste this into the Navigation Area Note Make sure you copy all the relevant text otherwise CLC DNA Workbench might not be able to interpret the text 7 1 2 Import Vector NTI data There are several ways of importing your Vector NTI data into the CLC Workbench The best way to go depends on how your data is currently stored in Vector NTI e Your data is stored
454. opy of the selected sequence in a normal sequence view e Open this sequence W This will open the selected sequence in a normal sequence view e Make sequence circular This will convert a sequence from a linear to a circular form If the sequence have matching overhangs at the ends they will be merged together If the sequence have incompatible overhangs a dialog is displayed and the sequence cannot be made circular The circular form is represented by gt gt and lt lt at the ends of the sequence e Make sequence linear This will convert a sequence from a circular to a linear form removing the lt lt and gt gt at the ends Manipulate parts of the sequence Right clicking a selection reveals several options on manipulating the selection see figure 18 10 380 7 A RAFAT Aar ESS T Duplicate Selection TIGAC Replace Selection With Sequence mo E Insert Sequence Before Selection Eh Insert Sequence After Selection ME Cut Sequence Before Selection OS Cut Sequence After Selection ITT Make Positive Strand Single Stranded LI Make Negative Strand Single Stranded mor Make Double Stranded i Copy Open Selection in Mew view Edit Selection Er Delete Selection Gi Add Annotation Insert Restriction Site After Selection Insert Restriction Site Before Selection Fei Show Enzymes Cutting Inside Outside Selection Add Structure Prediction Constraints b Figure 18 10 Right click on a seq
455. or all other data you can only search for name lf you use Any field it will search all of the above plus the following e Description Keywords Common name Taxonomy name CHAPTER 4 SEARCHING YOUR DATA 103 To see this information for a sequence switch to the Element Info 5 view see section 10 4 For each search line you can choose if you want the exact term by selecting is equal to or if you only enter the start of the term you wish to find select begins with An example is shown in figure 4 7 dl Search Search in Location CLC Data we within Nucleotide Sequence Organism homo sapiens E 4 Add Filter Search Description Length Path BCOS0969 Homo sapiens breast cancer 1 early onset mRNA fc 2050 WLS Datal omo sapiens breast cancer 1 early onset mRNA c 3273 ELC Datat omo sapiens breast cancer 1 early onset mRNA fc 1468 JLC Data omo sapiens breast cancer 1 early onset mANA fc 7790 CLE Daal omo sapiens breast cancer 1 early onset mRNA c 5654 MILE Datal E Co62429 omo sapiens breast cancer 1 early onset mRNA fc 1322 CLC Data Showing 1 ap Figure 4 7 Searching for human sequences shorter than 10 000 nucleotides This example will find human nucleotide sequences organism is Homo sapiens and it will only find sequences shorter than 10 000 nucleotides Note that a search can be saved for later use You do not save the search results
456. or remove sequences or sequence lists from the selected elements Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish Note You can select multiple DNA sequences and sequence lists at a time If the sequence list contains RNA sequences as well they will not be converted 229 CHAPTER 14 NUCLEOTIDE ANALYSES 230 a EB Convert DNA to RNA ba 1 Select DNA sequences Select DNA seo UENCES Projects Selected Elements 1 J CLC Data xx ATP8al mRNA gt Example Data Xx ATP8al genomic s xx Cloning Primers Protein analyses Protein orthologs RNA secondary st Sequencing data gt Q nter search term gt 4 poros tee Kena Figure 14 1 Translating DNA to RNA 14 2 Convert RNA to DNA CLC DNA Workbench lets you convert an RNA sequence into DNA substituting the U residues Urasil for T residues Thymine select an RNA sequence in the Navigation Area Toolbox in the Menu Bar Nucleotide Analyses A Convert RNA to DNA 3 or right click a sequence in Navigation Area Toolbox Nucleotide Analyses 4 Convert RNA to DNA 3 This opens the dialog displayed in figure 14 2 Fo EB Convert RNA to DNA 58 1 Select RNA sequences SelecP RNA sec JUENCeS Projects Selected Elements 1 CLC_Data xx ATP8a1 mRNA 3 UTR large gt Example Data Xc ATP8al genomic s XxX ATP8al mRNA 4 Cloni
457. ormation about how the vector will be cut open Since the vector has now been split into two fragments you can decide which one to use as the target vector If you selected first the Hindlll site and next the Xho site the CLC DNA Workbench has already selected the right fragment as the target vector If you click one of the vector fragments the corresponding part of the sequence will be high lighted Next step is to cut the fragment At the top of the view you can switch between the sequences used for cloning at this point it says pcDNA4_TO 5 078bp circular vector Switch to the fragment sequence and perform the same selection of cut sites as before while pressing the Ctrl on Mac key You should now see a view identical to the one shown in figure 2 31 When this is done the Perform Cloning 5 button at the lower right corner of the view is active because there is now a valid selection of both fragment and target vector Click the Perform Cloning button and you will see the dialog shown in figure 2 32 This dialog lets you inspect the overhangs of the cut site showing the vector sequence on each CHAPTER 2 TUTORIALS O Cloning exper Sequence 1 of 2 pcDNA4 TO 5 078bp circular ve v 7 Show as Circular Vector pcDNA4 TO Baill MV CMV forward primer TATA box Sall Pstl bla Ampicilli pcDNA4_TO YNES 5 078bp pUC ori BGH reverse primer BGH pA coRV 1 ori SV40 promoter and ori EM 7 Zeoci
458. osen parameters Maximum primer length Minimum primer length Maximum amp C content Minimum GIC content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements Send must meet G C requirements Mispriming parameters Use mispriming as exclusion criteria Exact match Minimum number of base pairs required for a match 12 A wil Number of consecutive base pairs required in 3 end x Cancel Help Figure 16 11 Calculation dialog for sequencing primers Since design of sequencing primers does not require the consideration of interactions between primer pairs this dialog is identical to the dialog shown in Standard PCR mode when only a single primer region is chosen See the section 16 5 for a description CHAPTER 16 PRIMERS 265 16 8 1 Sequencing primers output table In this mode primers are predicted independently for each region but the optimal solutions are all presented in one table The solutions are numbered consecutively according to their position on the sequence such that the forward primer region closest to the 5 end of the molecule is designated F1 the next one F2 etc For each solution the single primer information described under Standard PCR is available in the table 16 9 Alignment based primer and probe design CLC DNA Workbench allows the user to design PCR primers and TaqMan pr
459. osome 18 genomic contig alternate assembly 339 602 85 1e 90 100 NT 011109 15 Homo sapiens chromosome 19 genomic contig reference assembly 262 375 73 3e 67 94 NW 927217 1 Homo sa piens chromosome 19 genomic contig alternate assembly 62 375 73 3e 67 94 Figure 12 20 BLAST table view A table view with one row per hit showing the accession number and description field from the sequence file together with BLAST output scores and the start and stop positions for the query and hit sequence are listed The strand and orientation for query sequence and hits are also found here In most cases the table view of the results will be easier to interpret than tens of sequence alignments CHAPTER 12 BLAST SEARCH 196 gt ref NM 173209 1 UE GM Homo sapiens TGFB induced factor TALE family homeobox TGIF transcript variant 5 mRNA Length 1382 Sort alignments for this subject sequence by Evalue Score Percent identity LEFY start position subject STaArt position Score 339 bits 171 Expect 12 50 Identities 171 171 100 Gaps 0 171 03 Stcrand Plus Plus Query Sbjct Query 1113 ATTIGCACATGGGATIGCTAAAACAGCTICCIGITACIGAGATGICITCAATGGAATACA DESEESEEENSSEENSSEENENEERENEENENEENEREERERENEEEEREERENEREEE ATITGCACATGGGAITIGCTAAAACAGCITCCIGITACIGAGAIGICITCAATIGGAATACA GICATTCCAAGAACTATAAACTTAAAGCTACTGTAGAAACAAAGGSTTITCITITITAAA PECTED GICATICCAAGAACTATABACTTASAGCTACTGIAGAAACAAAGEGTITICITITITASA IGITICITGGIAGATIATICA
460. osts and Lambda ratio will result in alignments which decrease the number of Gaps introduced e Max number of hit sequences The maximum number of database sequences where BLAST found matches to your query sequence to be included in the BLAST report The parameters you choose will affect how long BLAST takes to run A search of a small database requesting only hits that meet stringent criteria will generally be quite quick Searching large databases or allowing for very remote matches will of course take longer Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish CHAPTER 12 BLAST SEARCH 178 12 1 2 BLAST a partial sequence against NCBI You can search a database using only a part of a sequence directly from the sequence view select the sequence region to send to BLAST right click the selection BLAST Selection Against NCBI i This will go directly to the dialog shown in figure 12 3 and the rest of the options are the same as when performing a BLAST search with a full Sequence 12 1 3 BLAST against local data Running BLAST searches on your local machine can have several advantages over running the searches remotely at the NCBI e It can be faster e It does not rely on having a stable internet connection e It does not depend on the availability of the NCBI BLAST blast servers e You can use longer query sequences e You use your own data sets to search against On a techn
461. ot click Finish A new sequence list will be generated for each group It will be named according to the group e g Asp016 will be the name of one of the groups in the example shown in figure 17 4 Advanced splitting using regular expressions You can see a more detail explanation of the regular expressions syntax in section 13 7 3 In this section you will see a practical example showing how to create a regular expression Consider a CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 282 list of files as shown below adk 29 adkln F adk 29 adk2n R adk 3 adkln F adk 3 adkzn R adk 66 adkln F adk 66_adk2n R atp 29 atpAln F aco 20 atpAZn KR atp 3 atpAln F atp 3 atpAZn R arcp CO atpALN E atp 66 etpAZn R In this example we wish to group the sequences into three groups based on the number after the and before the _ i e 29 3 and 66 The simple splitting as shown in figure 17 4 requires the same character before and after the text used for grouping and since we now have both a and a _ we need to use the regular expressions instead note that dividing by position would not work because we have both single and double digit numbers 3 29 and 66 The regular expression for doing this would be _ as shown in figure 17 5 The round brackets denote the part of the name that will be listed in the groups table at the Sort Sequences by Name 1 Select at least 2 et algorithm parameters sequences of the sa
462. otate with GFF file CLC bio support clcbio com Version 1 03 Using this plug in it is possible to annotate a sequence from list of annotations found in a GFF file Located in the Toolbox Extract Annotations O CLC bio support clcbio com version 1 02 Extracts annotations from one or more sequences The result is a sequence list containing sequences covered by the specified annotations Uninstall v Figure 1 25 The plug in manager with plug ins installed Click the plug in Uninstall If you do not wish to completely uninstall the plug in but you don t want it to be used next time you start the Workbench click the Disable button When you close the dialog you will be asked whether you wish to restart the workbench The plug in will not be uninstalled before the workbench is restarted 1 7 3 Updating plug ins If a new version of a plug in is available you will get a notification during start up as shown in figure 1 26 In this list select which plug ins you wish to update and click Install Updates If you press Cancel you will be able to install the plug ins later by clicking Check for Updates in the Plug in manager see figure 1 25 1 7 4 Resources Resources are downloaded installed un installed and updated the same way as plug ins Click the Download Resources tab at the top of the plug in manager and you will see a list of available resources see figure 1 27 Currently the only re
463. otations on the regions that will be ignored in the assembly process 17 3 1 Manual trimming Sequence reads can be trimmed manually while inspecting their trace and quality data Trimming sequences manually corresponds to adding annotation see also section 10 3 2 but is special in the sense that trimming can only be applied to the ends of a sequence double click the sequence to trim in the Navigation Area select the region you want to trim right click the selection Trim sequence left right to determine the direction of the trimming This will add trimming annotation to the end of the sequence in the selected direction CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 289 17 3 2 Automatic trimming Sequence reads can be trimmed automatically based on a number of different criteria Automatic trimming is particularly useful in the following situations e f you have many sequence reads to be trimmed e f you wish to trim vector contamination from sequence reads e f you wish to ensure that the trimming is done according to the same criteria for all the sequence reads To trim Sequences automatically select sequence s or sequence lists to trim Toolbox in the Menu Bar Sequencing Data Analyses 157 Trim Sequences This opens a dialog where you can alter your choice of Sequences When the sequences are selected click Next This opens the dialog displayed in figure 17 15 g Trim Sequences xX la 1 Sele
464. other complementiis complement 15 complementiso complementiae Show column Primer name Orientation Region Mismatches Number of other matches Melt temp Self annealing Self annealing alignment Self end annealing GC content Secondary structure score Secondary structure Select All Deselect All Figure 16 20 A table showing all binding sites An example of the primer binding site table is shown in figure 16 20 The information here is the same as in the primer annotation and furthermore you can see additional information about melting temperature etc by selecting the options in the Side Panel See a more detailed description of this information in section 16 5 2 You can use this table to browse the binding sites If you make a split view of the table and the sequence see section 3 2 6 you can browse through the binding positions by clicking in the table This will cause the sequence view to jump to the position of the binding site An example of a fragment table is shown in figure 16 21 ES pcOWAT atpaal O Rows 7 prirmer 5 primer primer primer Fragments Rev primer 2 primer 1 primer 5 primer S primer 6 EcoR primer 1 primer 6 EcoR primer 5 primer 6 EcoRV primer 5 HindIII HindIII Fragment length 1458 eal 1465 1451 269 1453 1469 Column width 1575 3062 151 401 Show column 151 1615 7 Fwd 151 1
465. ou choose an appropriate codon frequency table e Map annotations to reverse translated sequence If this checkbox is checked then all annotations on the protein sequence will be mapped to the resulting DNA sequence In the tooltip on the transferred annotations there is a note saying that the annotation derives from the original sequence The Codon Frequency Table is used to determine the frequencies of the codons Select a frequency table from the list that fits the organism you are working with A translation table of an organism is created on the basis of counting all the codons in the coding sequences Every codon in a Codon Frequency Table has its own count frequency per thousand and fraction which are calculated in accordance with the occurrences of the codon in the organism You can customize the list of codon frequency tables for your installation see section J Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish The newly created nucleotide sequence is shown and if the analysis was performed on several protein sequences there will be a corresponding number of views of nucleotide sequences The new sequence is not saved automatically To save the sequence drag it into the Navigation Area or press Ctrl S 96 S on Mac to show the save dialog 15 3 2 Bioinformatics explained Reverse translation In all living cells containing hereditary material such as DNA a transcription to mRN
466. ou have to perform separate mappings for each sequence list CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 28 An example using Illumina barcoded sequences The data set in this example can be found at the Short Read Archive at NCBI http www ncbi nlm nih gov sra SRX014012 It can be downloaded directly in fastq format via the URL http trace ncbi nlim nih gov Traces sra sra cgi emd dload amp run list SRR030730 amp format fastq The file you download can be imported directly into the Workbench The barcoding was done using the following tags at the beginning of each read CCT AAT GGT CGT see supplementary material of Cronn et al 2008 at http nar oxfordjournals Ong coi dala gendOZ DCI The settings in the dialog should thus be as shown in figure 17 11 m q Process Tagged Sequences E Choose where to run Spams CJ Tag list Select nucleotide sequences 1 Barcode Barcodes length 3 Define barcodes in next step Sequence Sequence length from 1 to 500 nucleotides A Define tags 2 J Figure 17 11 Setting the barcode length at three Click Next to specify the bar codes as shown in figure 17 12 use the Add button g Process Tagged Sequences j Choose where to run Spams JJ Barcodes Select nucleotide sequences Barcode Name of reads in input Define tags CCT Barcode CCT Set barcode options AAT Barcode 44T GGT Barcode GGT CGT Barco
467. ou select in this dialog see how to change the definition of sites in appendix F Note that the CLC DNA Workbench only checks that valid attP sites are found it does not check that they correspond to the attB sites of the selected fragments at this step If the right combination of attB and attP sites is not found no entry clones will be produced Below there is a preview of the fragments selected and the attB sites that they contain This can be used to get an overview of which entry clones should be used and check that the right attB sites have been added to the fragments Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish The output is one entry clone per sequence selected The attB and attP sites have been used for the recombination and the entry clone is now equipped with attL sites as shown in figure 18 24 atp8a1_CDS pDONR221 Entry Clone 6004 bp Figure 18 24 The resulting entry vector opened in a circular view CHAPTER 18 CLONING AND CUTTING 326 Note that the bi product of the recombination is not part of the output 18 2 3 Create expression clones LR The final step in the Gateway cloning work flow is to recombine the entry clone into a destination vector to create an expression clone the so called LR reaction Toolbox in the Menu Bar Cloning and Restriction Sites si H Create Expression Clone Gateway Cloning This will open a dialog where you c
468. ownload several of the hit sequences this is easily done in this view Simply select the relevant sequences and drag them into a folder in the Navigation Area 2 9 Tutorial Tips for specialized BLAST searches Here you will learn how to e Use BLAST to find the gene coding for a protein in a genomic sequence e Find primer binding sites on genomic sequences e Identify remote protein homologues Following through these sections of the tutorial requires some experience using the Workbench so if you get stuck at some point we recommend going through the more basic tutorials first 2 9 1 Locate a protein sequence on the chromosome If you have a protein sequence but want to see the actual location on the chromosome this is easy to do using BLAST In this example we wish to map the protein sequence of the Human beta globin protein to a chromosome We know in advance that the beta globin is located somewhere on chromosome 11 Data used in this example can be downloaded from GenBank Search Search for Sequences at NCBI Human chromosome 11 NC_000011 consists of 134452384 nucleotides and the beta globin AAA16334 protein has 147 amino acids CHAPTER 2 TUTORIALS 65 BLAST configuration Next conduct a local BLAST search Toolbox BLAST Search Local BLAST 2 Select the protein Sequence as query sequence and click Next Since you wish to BLAST a protein sequence against a nucleotide sequence use tblastn which will
469. parts of the sequences will be ignored in the subsequent assembly A natural question is Why not simply delete the trimmed regions instead of annotating them In some cases deleting the regions would do no harm but in other cases these regions could potentially contain valuable information and this information would be lost if the regions were deleted instead of annotated We will see an example of this later in this tutorial 2 5 2 Assembling the sequencing data The next step is to assemble the sequences This is the technical term for aligning the sequences where they overlap and reverse the reverse reads to make a contiguous sequence also called a contig In this tutorial we will use assembly to a reference sequence This can be used when you have a reference sequence that you know is similar to your Sequencing data Toolbox in the Menu Bar Sequencing Data Analyses i Assemble Sequences to Reference FF In the first dialog select the nine sequencing reads and click Next to go to the second step of the assembly where you select the reference sequence Click the Browse and select button G and select the ATP8a1 mRNA reference from the Sequencing data folder see figure 2 14 You can leave the other options in this window set to their defaults A g Assemble Sequences to Reference ead 1 Select some nucleotide Setreference parameters sequences 2 Set reference parameters Reference sequence Choose reference
470. pliers Methylation sensitivity C Recognizes palindrome Star activity C Popularity Figure 18 53 An enzyme list and you can use the filter at the top right corner to search for specific enzymes recognition sequences etc If you wish to remove or add enzymes click the Add Remove Enzymes button at the bottom of the view This will present the same dialog as shown in figure 18 50 with the enzyme list shown to the right If you wish to extract a subset of an enzyme list CHAPTER 18 CLONING AND CUTTING 346 open the list select the relevant enzymes right click Create New Enzyme List from Selection i If you combined this method with the filter located at the top of the view you can extract a very specific set of enzymes E g if you wish to create a list of enzymes sold by a particular distributor type the name of the distributor into the filter and select and create a new enzyme list from the selection Chapter 19 Sequence alignment Contents 19 1 Create an alignment 2 2 ee ee 348 alee CO oa oe eee ewe ea eo ee eG ES eS 349 19 1 2 Fast or accurate alignment algorithm 26 4 349 19 1 3 Aligning alignments a 5464 4 eee eee Gee eee eae Res 350 Loe POS ota eee Oka ea we ee Reed ADS RB E MA 351 19 2 Viewalignments 2 00 ee ee ee ee 353 19 2 1 Bioinformatics explained Sequence logo 22086 355 19 3 Edit alignments 1 2 ee rensas ssnannna 357 19 3 1 Move res
471. port them as an archive through the File menu This will produce a file with a ma4 pa4 or oa4 extension Back in the CLC Workbench click Import and select the file Importing single files In Vector NTI you can save a sequence in a file instead of in the database see figure 6 This will give you file with a gb extension This file can be easily imported into the CLC Workbench Import select the file Select You don t have to import one file at a time You can simply select a bunch of files or an entire folder and the CLC Workbench will take care of the rest Even if the files are in different formats You can also simply drag and drop the files into the Navigation Area of the CLC Workbench The Vector NTI import is a plug in which is pre installed in the Workbench It can be uninstalled and updated using the plug in manager see section 1 7 CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 122 Save As ES save As File Save in DNA ANAs Database As Remote Sources Save jn E Desktop E Al oR pace E File name Adenoz gb Files format DNA RNA Documents gb OF Cancel Figure 7 6 Saving a sequence as a file in Vector NTI 7 1 3 Export of bioinformatics data CLC DNA Workbench can export bioinformatic data in most of the formats that can be imported There are a few exceptions See section 7 1 1 To export a file select the element to export Export E2 choose whe
472. quence has number 59 in front of the sequence this means that 58 residues are found upstream of this position but these are not included in the alignment By right clicking the sequence name in the Graphical BLAST output it is possible to download the full hits sequence from NCBI with accompanying annotations and information It is also possible to just open the actual hit sequence in a new view 12 2 4 BLAST table In addition to the graphical display of a BLAST result it is possible to view the BLAST results in a tabular view In the tabular view one can get a quick and fast overview of the results Here you can also select multiple sequences and download or open all of these in one single step Moreover there is a link from each sequence to the sequence at NCBI These possibilities are either available through a right click with the mouse or by using the buttons below the table lf the BLAST table view was not selected in Step 4 of the BLAST search the table can be shown in the following way Click the Show BLAST Table button 8 at the bottom of the view CHAPTER 12 BLAST SEARCH 184 ES CAAZ6204 BLAST O Rows 103 Summary oF hits From query C4426204 Fite sid Description E value Score Bit score 1COH E Chain 6 4lpha Ferrous Carbonmonoxy Beta Cobaltou 3 36E 66 624 244 97 1 85 B Chain E T To T High Quaternary Transitions In Human 3 36E 66 244 973 1483 8 Chain B T To T High Quaternary Transitions In Human
473. quences oF same ype type Projects Selected Elements 2 p CLC_Data Ms 09429 EE Example Data ss P39524 XxX ATP8al genomit Xx ATP8al mRNA Ss ATP8al H E Cloning H Primers H E Protein analyse EE Protein ortholog s o on 44 P57792 oN 29449 olf QONTI2 ofthe Q9SX33 H RNA secondary H Sequencing dat mW Qy zenter search term gt 4 amp Previous gt Next Finish Figure 13 17 Selecting two sequences to be joined r W Join Sequences ES 1 Select sequences of same Separates ccm T type 2 Set parameters Set order of concatenation top First ss 094296 a ss P39524 7a OCS Ce Coe ie Xe Figure 13 18 Setting the order in which sequences are joined Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish The result is shown in figure 13 19 Gene Joined Sequence Figure 13 19 The result of joining sequences is a new sequence containing the annotations of the joined sequences they each had a HBB annotation 13 6 Pattern Discovery With CLC DNA Workbench you can perform pattern discovery on both DNA and protein sequences Advanced hidden Markov models can help to identify unknown sequence patterns across single or even multiple sequences In order to search for unknown patterns CHAPTER 13 GENERAL SEQUENCE ANALYSES 220 Select DNA or protein
474. r Click the J at the top right corner of the Side Panel to hide Click the gray Side Panel button to the right to show Below each group of settings will be explained Some of the preferences are not the same for nucleotide and protein sequences but the differences will be explained for each group of settings Note When you make changes to the settings in the Side Panel they are not automatically saved when you save the sequence Click Save restore Settings i to save the settings see section 5 6 for more information Sequence Layout These preferences determine the overall layout of the sequence e Spacing Inserts a space at a specified interval No spacing The sequence is shown with no spaces Every 10 residues There is a space every 10 residues starting from the beginning of the sequence Every 3 residues frame 1 There is a space every 3 residues corresponding to the reading frame starting at the first residue CHAPTER 10 VIEWING AND EDITING SEQUENCES 143 Every 3 residues frame 2 There is a space every 3 residues corresponding to the reading frame starting at the second residue Every 3 residues frame 3 There is a space every 3 residues corresponding to the reading frame starting at the third residue e Wrap sequences Shows the sequence on more than one line No wrap The sequence is displayed on one line Auto wrap Wraps the sequence to fit the width of the view not
475. r all of the nodes the tree contains CHAPTER 20 PHYLOGENETIC TREES 370 Text size The size of the text representing the nodes can be modified in tiny small medium large or huge Font Sets the font of the text of all nodes Bold Sets the text bold if enabled e Tree Layout Different layouts for the tree Node symbol Changes the symbol of nodes into box dot circle or none if you don t want a node symbol Layout Displays the tree layout as standard or topology Show internal node labels This allows you to see labels for the internal nodes Initially there are no labels but right clicking a node allows you to type a label Label color Changes the color of the labels on the tree nodes Branch label color Modifies the color of the labels on the branches Node color Sets the color of all nodes Line color Alters the color of all lines in the tree e Labels Specifies the text to be displayed in the tree Nodes Sets the annotation of all nodes either to name or to species Branches Changes the annotation of the branches to bootstrap length or none if you don t want annotation on branches Note Dragging in a tree will change it You are therefore asked if you want to save this tree when the Tree View is closed You may select part of a Tree by clicking on the nodes that you want to select Right click a selected node opens a menu with the following options Set root above node defines the root of the tree to
476. r annotations on the reads no matter how this is specified in the respective settings And when the compactness is Packed it is not possible to edit the bases of any of the reads There is a shortcut way of changing the compactness Press and hold the Alt key while you scroll using your mouse wheel or touchpad X X e x Not compact The normal setting with full detail Low Hides trace data quality scores and puts the reads annotations on the sequence Medium The labels of the reads and their annotations are hidden and the residues of the reads cannot be seen Compact Even less space between the reads Packed All the other compactness settings will stack the reads on top of each other but the packed setting will use all space available for displaying the reads When zoomed in to 100 you can see the residues but when zoomed out the reads will be represented as lines just as with the Compact setting Please note that the packed mode is special because it does not allow any editing of the read sequences and selections and furthermore the color coding that can be specified elsewhere in the Side Panel does not take effect An example of the packed compactness setting is shown in figure 17 22 Gather sequences at top Enabling this option affects the view that is shown when scrolling horizontally If selected the sequence reads which did not contribute to the visible part of the mapping will be omitted whereas the c
477. r of hydrophobicity scales which are further explained in section 15 2 3 Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish The result can be seen in figure 15 4 See section B in the appendix for information about the graph view 15 2 2 Hydrophobicity graphs along sequence Hydrophobicity graphs along sequence can be displayed easily by activating the calculations from the Side Panel for a sequence right click protein sequence in Navigation Area Show Sequence open Protein info in Side Panel or double click protein sequence in Navigation Area Show Sequence open Protein info in Side Panel CHAPTER 15 PROTEIN ANALYSES 240 Hydrophobicity plot of ATP8a1 E o lt EL 2 T I Engelman Eisenberg Kyte Doolittle 0 100 200 300 400 500 600 700 800 900 1 Position Figure 15 4 The result of the hydrophobicity plot calculation and the associated Side Panel These actions result in the view displayed in figure 15 5 F RESIS CONJ Wy Protein info k Evte Doolikkle Cornette k Engelman k Eisenberg k Rose k Janin k Hopp voods t Welling k Kolaskar Tongaonkar k Surface Probability Chain Flexibility Find Figure 15 5 The different available scales in Protein info in CLC DNA Workbench The level of hydrophobicity is calculated on the basis of the different scales The different scales add different values to each type of amino
478. r sequence names This is useful for searching sequence lists mapping results and BLAST results This concludes the description of the View Preferences Next the options for selecting and editing sequences are described Text format These preferences allow you to adjust the format of all the text in the view both residue letters sequence name and translations if they are shown e Text size Five different sizes e Font Shows a list of Fonts available on your computer e Bold residues Makes the residues bold 10 1 2 Restriction sites in the Side Panel Please see section 18 3 1 10 1 3 Selecting parts of the sequence You can select parts of a sequence Click Selection Ch in Toolbar Press and hold down the mouse button on the sequence where you want the selection to start move the mouse to the end of the selection while holding the button release the mouse button Alternatively you can search for a specific interval using the find function described above If you have made a selection and wish to adjust it drag the edge of the selection you can see the mouse cursor change to a horizontal arrow or press and hold the Shift key while using the right and left arrow keys to adjust the right side of the selection If you wish to select the entire sequence double click the sequence name to the left Selecting several parts at the same time multiselect You can select several parts of sequence by holding down th
479. r single sequences but also for multiple sequences by making a selection covering more than one sequence When you have made the selection the mouse pointer turns into a horizontal arrow indicating that the selection can be moved see figure 19 9 Note Residues can only be moved when they are next to a gap AGG GAGTCAT AGG GAGTCAT AGG GAGTCAT AGG GAGTCAT AGG GAGCAGT AGG GAGCAGT AGG GTACAGT ATG GTGCACC ATG GTGCACC ATG GTGCATC ATG GTGCATC Figure 19 9 Moving a part of an alignment Notice the change of mouse pointer to a horizontal arrow 19 3 2 Insert gaps The placement of gaps in the alignment can be changed by modifying the parameters when creating the alignment However gaps can also be added manually after the alignment is created To insert extra gaps select a part of the alignment right click the selection Add gaps before after If you have made a selection covering e g five residues a gap of five will be inserted In this way you can easily control the number of gaps to insert Gaps will be inserted in the sequences that you selected If you make a selection in two sequences in an alignment gaps will be inserted into these two sequences This means that these two sequences will be displaced compared to the other sequences in the alignment 19 3 3 Delete residues and gaps Residues or gaps can be deleted for individual sequences or for the whole alignment For individual sequences CHAPTER 19 SEQUENCE ALIGN
480. r this type into the Type field 2 The right hand part of the dialog contains the following text fields e Name The name of the annotation which can be shown on the label in the sequence views Whether the name is actually shown depends on the Annotation Layout preferences see section 10 3 1 e Type Reflects the left hand part of the dialog as described above You can also choose directly in this list or type your own annotation type e Region If you have already made a selection this field will show the positions of the selection You can modify the region further using the conventions of DDBJ EMBL and GenBank The following are examples of how to use the syntax based on http www ncbi nlm nih gov collab FT 467 Points to a single residue in the presented sequence 340 565 Points to a continuous range of residues bounded by and including the starting and ending residues lt 345 500 Indicates that the exact lower boundary point of a region is unknown The location begins at some residue previous to the first residue specified which is not 2Note that your own annotation types will be converted to unsure when exporting in GenBank format As long as you use the sequence in CLC format you own annotation type will be preserved CHAPTER 10 VIEWING AND EDITING SEQUENCES 158 necessarily contained in the presented sequence and continues up to and including the ending residue lt 1 888 The region st
481. r use the view of enzyme lists see 18 5 Click Finish to open the enzyme list The CLC DNA Workbench comes with a standard set of enzymes based on http www rebase neb com You can customize the enzyme database for your installation see section E CHAPTER 18 CLONING AND CUTTING Restriction Site Analysis 1 Select DNA RNA sequence s 2 Enzymes to be considered in calculation Enzyme list Use existing enzyme list Popular enzymes v Enzymes in Popular en Filter Name Overhang PstI tgca Kpnl gtac Sacl aget SphI catg Apal gace Ball nnn Chal gatc FokI lt NA gt Hhal cg Nsil tgca SacII gc Methylat Popul N6 met n N met i S meth t se stot S meth it RE N6 met S meth S meth Enzymes to be used Filter Name Overhang Methyla Pop Figure 18 51 Selecting enzymes All enzymes Filter 3 Name Overh Methyl Pop PstI 3 N6 meth eer a KpnI 3 N meth ee Sacl 3 S methyl ee SphI 3 Hk Apal 3 S methyl 8 Sacll 3 5 methyl e Nsil Enzyme Sacll Chal Recognition site pattern CCGCGG Ball Suppliers GE Healthcare Hhal Qbiogene Xml American Allied Biochemical Inc Dralll Nippon Gene Co Ltd Takara Bio Inc Banll New England Biolabs Toyobo Biochemica
482. r view of the contig later on by clicking Table H8 at the bottom of the view For more information about the tabular view of contigs see section 1 e Create only consensus sequences This will not display a contig but will only output the assembled contig sequences as single nucleotide sequences If you choose this option it is not possible to validate the assembly process and edit the contig based on the traces If you have chosen to Trim sequences click Next and you will be able to set trim parameters see section 17 3 2 Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish When the assembly process has ended a number of views will be shown each containing a contig of two or more sequences that have been matched If the number of contigs seem too high or low try again with another Alignment stringency setting Depending on your choices of output options above the views will include trace files or only contig sequences However the calculation of the contig is carried out the same way no matter how the contig is displayed See section 1 on how to use the resulting contigs CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 293 17 5 Assemble to reference sequence This section describes how to assemble a number of Sequence reads into a contig using a reference sequence A reference sequence can be particularly helpful when the objective is to characterize SNP variation in
483. ract the sequences Toolbox General Sequence Analyses KA Extract Sequences This will allow you to select the elements that you want to extract sequences from see the list above Clicking Next displays the dialog shown in 10 17 Here you can choose whether the extracted sequences should be placed in a new list or extracted as single sequences For sequence lists only the last option makes sense but for alignments mappings and BLAST results it would make sense to place the sequences in a list CHAPTER 10 VIEWING AND EDITING SEQUENCES 166 g Extract Sequences 1 Please select a sequencelist 2 Select destination Destination C Extract to single sequences Extract to new sequence list Number of sequences 12 sequences ot paired end pairs found Figure 10 17 Choosing whether the extracted sequences should be placed in a new list or as single sequences Below these options you can see the number of sequences that will be extracted Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish Chapter 11 Online database search Contents 11 1 GenBank search tanec ae nike ee ee wee dee HOw oe ee Re ee 167 11 1 1 GenBank search options 0 0 00 ee ee ee ee ee a 167 11 1 2 Handling of GenBank search results aoao oaoa oa oaoa a a a 00008 169 11 1 3 Save GenBank search parameters oa a a a a 170 11 2 Sequence web info nassaan ee eee te a 170
484. rameters type 2 Choose program Choose program and database Program blastp Protein sequence and database lw Database Swiss Prot protein sequences swissprot M a afan Previous gt Next X Cancel Figure 12 3 Choose a BLAST Program and a database for the search BLAST programs for DNA query sequences CHAPTER 12 BLAST SEARCH 1 6 e BLASTn DNA sequence against a DNA database Used to look for DNA sequences with homologous regions to your nucleotide query sequence e BLASTx Translated DNA sequence against a Protein database Automatic translation of your DNA query sequence in six frames these translated sequences are then used to search a protein database e tBLASTx Translated DNA sequence against a Translated DNA database Automatic translation of your DNA query sequence and the DNA database in six frames The resulting peptide query sequences are used to search the resulting peptide database Note that this type of search is computationally intensive BLAST programs for protein query sequences e BLASTp Protein sequence against Protein database Used to look for peptide sequences with homologous regions to your peptide query sequence e tBLASTn Protein sequence against Translated DNA database Peptide query sequences are searched against an automatically translated in six frames DNA database Click Next This window see figure 12 4 allows you to choose parameters to tune your BLAS
485. rameters Codons Use random codon Use only the most frequent codon Use codon based on Frequency distribution Bacteria a Invertebrates Mammalia Plants Primates Rodents Jertebrates Eyi y Ey Transfer annotations Map annotations to reverse translated sequence Q Previous gt Next Enish XX Cancel Figure 15 9 Choosing parameters for the reverse translation e Use random codon This will randomly back translate an amino acid to a codon without using the translation tables Every time you perform the analysis you will get a different result e Use only the most frequent codon On the basis of the selected translation table this parameter option will assign the codon that occurs most often When choosing this option the results of performing several reverse translations will always be the same contrary to the other two options e Use codon based on frequency distribution This option is a mix of the other two options The selected translation table is used to attach weights to each codon based on its frequency The codons are assigned randomly with a probability given by the weights A more frequent codon has a higher probability of being selected Every time you perform the analysis you will get a different result This option yields a result that is closer to the CHAPTER 15 PROTEIN ANALYSES 245 translation behavior of the organism assuming y
486. rameters for distance based methods Figure 20 2 shows the parameters that can be set for the distance based methods e Algorithms The UPGMA method assumes that evolution has occurred at a constant rate in the different lineages This means that a root of the tree is also estimated The neighbor joining method builds a tree where the evolutionary rates are free to differ in different lineages CLC DNA Workbench always draws trees with roots for practical reasons but with the neighbor joining method no particular biological hypothesis is postulated by the placement of the root Figure 20 3 shows the difference between the two methods CHAPTER 20 PHYLOGENETIC TREES 368 e To evaluate the reliability of the inferred trees CLC DNA Workbench allows the option of doing a bootstrap analysis A bootstrap value will be attached to each branch and this value is a measure of the confidence in this branch The number of replicates in the bootstrap analysis can be adjusted in the wizard The default value is 100 For a more detailed explanation see Bioinformatics explained in section 20 2 Arabidopsis thaliana Arabidopsis thaliana Saccharomyces cerevisiae Schizosaccharomyces pombe sco7 Mus musculus Bos taurus Homo sapiens soot Mus musculus Bos taurus Homo sapiens Saccharomyces cerevisiae Schizosaccharomyces pombe Arabidopsis thaliana Arabidopsis thaliana Figure 20 3 Method choices for phyloge
487. ransform nor build upon this work CHAPTER 13 GENERAL SEQUENCE ANALYSES 208 Figure 13 11 The dot plot showing a inversion in a sequence See also figure 13 8 SOME RIGHTS RESERVED See http creativecommons org licenses by nc nd 2 5 for more information on how to use the contents 13 2 4 Bioinformatics explained Scoring matrices Biological sequences have evolved throughout time and evolution has shown that not all changes to a biological sequence is equally likely to happen Certain amino acid substitutions change of one amino acid to another happen often whereas other substitutions are very rare For instance tryptophan W which is a relatively rare amino acid will only on very rare occasions mutate into a leucine L Based on evolution of proteins it became apparent that these changes or substitutions of amino acids can be modeled by a scoring matrix also refereed to as a substitution matrix See an example of a scoring matrix in table 13 1 This matrix lists the substitution scores of every single amino acid A score for an aligned amino acid pair is found at the intersection of the corresponding column and row For example the substitution score from an arginine R to a lysine K is 2 The diagonal show scores for amino acids which have not changed Most substitutions changes have a negative score Only rounded numbers are found in this matrix The two most used matrices are the BLOSUM Henikoff and Henikoff 19
488. rch report etc is saved where it is dropped If the element already exists you are asked whether you want to save a copy You drag from the View Area by dragging the tab of the desired element CHAPTER 3 USER INTERFACE 82 Use of drag and drop is supported throughout the program also to open and re arrange views see section 3 2 6 Note that if you move data between locations the original data is kept This means that you are essentially doing a copy instead of a move operation Copy using drag and drop To copy instead of move using drag and drop hold the Ctrl on Mac key while dragging click the element click on the element again and hold left mouse button drag the element to the desired location press Ctrl 38 on Mac while you let go of mouse button release the Ctrl 3 button 3 1 6 Change element names This section describes two ways of changing the names of sequences in the Navigation Area In the first part the sequences themselves are not changed it s their representation that changes The second part describes how to change the name of the element Change how sequences are displayed Sequence elements can be displayed in the Navigation Area with different types of information e Name this is the default information to be shown e Accession Sequences downloaded from databases like GenBank have an accession number e Latin name e Latin name accession e Common name e Common name accession
489. re 18 31 Sorting Aya ee Figure 18 31 Buttons to sort restriction enzymes e Sort enzymes alphabetically Aa Clicking this button will sort the list of enzymes alphabetically e Sort enzymes by number of restriction sites li This will divide the enzymes into four groups Non cutters Single cutters Double cutters Multiple cutters There is a checkbox for each group which can be used to hide show all the enzymes in a group amp e Sort enzymes by overhang T 7 This will divide the enzymes into three groups Blunt Enzymes cutting both strands at the same position 3 Enzymes producing an overhang at the 3 end 5 Enzymes producing an overhang at the 5 end There is a checkbox for each group which can be used to hide show all the enzymes in a group CHAPTER 18 CLONING AND CUTTING 330 Manage enzymes The list of restriction enzymes contains per default 20 of the most popular enzymes but you can easily modify this list and add more enzymes by clicking the Manage enzymes button This will display the dialog shown in figure 18 32 3 E q Manage enzymes ff 1 Please choose enzymes Bj Enzyme list Use existing enzyme list Popular enzymes v io Enzymes in Popular en Enzymes shown in Side Panel Filter Filter Name Overhang Methylation Popula Name Overhang Methylation Popula BamHI 5 gatc N4 methy tt a EcoRI 5 aatt N6 methy
490. re adjusted along the left and right edges of the view 10 2 1 Using split views to see details of the circular molecule In order to see the nucleotides of a circular molecule you can open a new view displaying a circular view of the molecule Press and hold the Ctrl button 36 on Mac click Show Sequence zz at the bottom of the view This will open a linear view of the sequence below the circular view When you zoom in on the linear view you can see the residues as shown in figure 10 5 Note If you make a selection in one of the views the other view will also make the corresponding selection providing an easy way for you to focus on the same region in both views 10 2 2 Mark molecule as circular and specify starting point You can mark a DNA molecule as circular by right clicking its name in either the sequence view or the circular view In the right click menu you can alSo make a circular molecule linear A circular molecule displayed in the normal sequence view will have the sequence ends marked with a CHAPTER 10 VIEWING AND EDITING SEQUENCES 152 O pBR322 gt bla bl pBR322 1000 4361 bp tdB 02 7946 AGP pBR322 O 2 40 amp l ee eee tet et 60 80 pBR322 AGTTTATCACAGTTAAATTGCTAACGCAGTCAGGCACCGTGTA z m o B MEE ge Figure 10 5 Two views showing the same sequence The bottom view is zoomed in The starting point of a circular sequence can be changed by make a selection st
491. re to export to select File of type enter name of file Save When exporting to CSV and tab delimited files decimal numbers are formatted according to the Locale setting of the Workbench see section 5 1 If you open the CSV or tab delimited file with spreadsheet software like Excel you should make sure that both the Workbench and the spreadsheet software are using the same Locale Note The Export dialog decides which types of files you are allowed to export into depending on what type of data you want to export E g protein sequences can be exported into GenBank Fasta Swiss Prot and CLC formats Export of folders and multiple elements The zip file type can be used to export all kinds of files and is therefore especially useful in these situations e Export of one or more folders including all underlying elements and folders e f you want to export two or more elements into one file Export of folders is similar to export of single files Exporting multiple files of different formats is done in zip format This is how you export a folder select the folder to export Export E2 choose where to export to enter name Save CHAPTER 7 IMPORT EXPORT OF DATA AND GRAPHICS 123 You can export multiple files of the same type into formats other than ZIP zip E g two DNA sequences can be exported in GenBank format select the two sequences by lt Ctrl gt click 36 click on Mac or lt Shift gt click Export E
492. resent see figure 18 11 g Replace Selection with Sequence ES 1 Choose Insert Sequence Select sequence to be inserted Name 7 Length Overhangs Compatible ends left Compatible ends right GAC GTC CDNA4 TO Vector 5078 pene Sees CTG CAG Aanl AatI Acc1131 Ac Aanl AatI Acc1131 Ac Aaal Acol AocII Apal Aaal Acol AocII Apal CTA GGC CCGGGAT CCGCCGG pcDNA3 atp8al_NotI_Apal 5426 AcclII Agel AhyI Ama AccIII Agel Ahyl Ama i Figure 18 11 Select a sequence for insertion The sequence that you have chosen to insert into will be marked with bold and the text vector iS appended to the sequence name Note that this is completely unrelated to the vector concept in the cloning work flow described in section 18 1 2 The list furthermore includes the length of the fragment an indication of the overhangs and a list of enzymes that are compatible with this overhang for the left and right ends respectively If not all the enzymes can be shown place your mouse cursor on the enzymes and a full list will be shown in the tool tip Select the sequence you wish to insert and click Next This will show the dialog in figure 18 12 Si Replace Selection with Sequence ES 1 Choose Insert Sequence Adapt Insert Sequ Ence t Ss 2 Adapt Insert Sequence to z A 3 Vector Change insert orientation Change orientation of pcDNA3 atp8a1_NotI_Apal Orie
493. rimer design Viewer Protein DNA RNA Main Advanced primer design tools E T Detailed primer and probe parameters E E Graphical display of primers E E Generation of primer design output E E Support for Standard PCR a E Support for Nested PCR E T Support for TaqMan PCR T Support for Sequencing primers E T Alignment based primer design o E Alignment based TaqMan probedesign E T Match primer with sequence E Ordering of primers E o Advanced analysis of primer properties E i Molecular cloning Viewer Protein DNA RNA Main Advanced molecular cloning o H Graphical display of in silico cloning E Advanced sequence manipulation Li u Virtual gel view Viewer Protein DNA RNA Main Fully integrated virtual 1D DNA gel simulator E For a more detailed comparison we refer to http www clcbio com compare 380 Genomics E Genomics Genomics E Appendix B Graph preferences This section explains the view settings of graphs The Graph preferences at the top of the Side Panel includes the following settings e Lock axes This will always show the axes even though the plot is zoomed to a detailed level e Frame Shows a frame around the graph e Show legends Shows the data legends e Tick type Determine whether tick lines should be shown outside or inside the frame Outside Inside e Tick lines at Choosing Major ticks will show a grid behind the graph None Major ticks e Horizontal axis range Sets the range of the
494. rinciple that is a search is performed for the values of the free parameters in the model assumed that results in the highest likelihood of the observed alignment Felsenstein 1981 By ticking the estimate substitution rate parameters box maximum likelinood values of the free parameters in the rate matrix describing the assumed substitution model are found If the Estimate topology box is selected a search in the space of tree topologies for that which best explains the alignment is performed If left un ticked the starting topology is kept fixed at that of the starting tree The Estimate Gamma distribution parameter is active if rate variation has been included in the model and in this case allows estimation of the Gamma distribution parameter to be switched on or off If the box is left un ticked the value is fixed at that given in the Rate variation part In the absence of rate variation estimation of substitution parameters and branch lengths are carried out according to the expectation maximization algorithm Dempster et al 1977 With rate variation the maximization algorithm is performed The topology space is searched according to the PHYML method Guindon and Gascuel 2003 allowing efficient search and estimation of large phylogenies Branch lengths are given in terms of expected numbers of substitutions per nucleotide site 20 1 2 Tree View Preferences The Tree View preferences are these e Text format Changes the text format fo
495. rocess is to amplify the target sequence with primers including so called attB sites In the CLC DNA Workbench you can add attB sites to a sequence fragment in this way Toolbox in the Menu Bar Cloning and Restriction Sites si Gateway Cloning H Add attB Sites This will open a dialog where you can select on or more sequences Note that if your fragment is part of a longer sequence you need to extract it first This can be done in two ways e If the fragment is covered by an annotation if you want to use e g a CDS simply right click the annotation and Open Annotation in New View CHAPTER 18 CLONING AND CUTTING 320 e Otherwise you can simply make a selection on the sequence right click and Open Selection in New View In both cases the selected part of the sequence will be copied and opened as a new sequence which can be Saved When you have selected your fragment s click Next This will allow you to choose which attB sites you wish to add to each end of the fragment as shown in figure 18 15 R E Add att8 Sites s 1 Select one or more H Elliot fragments 2 Select attB sites Figure 18 15 Selecting which attB sites to ada The default option is to use the attB1 and attB2 sites If you have selected several fragments and wish to add different combinations of sites you will have to run this tool once for each combination Click Next will give you options to extend the fra
496. rocesses a dialog will ask if you are sure that you want to close the program Closing the program will stop the process and it cannot be restarted when you open the program again CHAPTER 3 USER INTERFACE 94 3 4 2 Toolbox The content of the Toolbox tab in the Toolbox corresponds to Toolbox in the Menu Bar The Toolbox can be hidden so that the Navigation Area is enlarged and thereby displays more elements View Show Hide Toolbox The tools in the toolbox can be accessed by double clicking or by dragging elements from the Navigation Area to an item in the Toolbox 3 4 3 Status Bar As can be seen from figure 3 1 the Status Bar is located at the bottom of the window In the left side of the bar is an indication of whether the computer is making calculations or whether it is idle The right side of the Status Bar indicates the range of the selection of a sequence See chapter 3 3 0 for more about the Selection mode button 3 5 Workspace If you are working on a project and have arranged the views for this project you can save this arrangement using Workspaces A Workspace remembers the way you have arranged the views and you can switch between different workspaces The Navigation Area always contains the same data across Workspaces It is however possible to open different folders in the different Workspaces Consequently the program allows you to display different clusters of the data in separate Workspaces All Workspace
497. rs 3 9 Network configuration 33 Network drive shared BLAST database 186 Never show this dialog again 105 New feature request 28 folder 80 folder tutorial 37 sequence 162 New sequence INDEX create from a selection 149 Newick file format 394 Next Generation Sequencing 376 nexus file format 395 Nexus file format 393 394 NGS 3 6 nhr file format 395 NHR file format 395 Non standard residues 144 Nucleotide info 144 sequence databases 386 Nucleotides UIPAC codes 398 Numbers on sequence 142 nwk file format 395 nxs file format 395 oa4 file format 395 Open consensus sequence 353 from clipboard 119 Open reading frame determination 234 Open ended sequence 234 Order primers 2 5 379 ORF 234 Organism 160 Origins from 132 Overhang of fragments from restriction digest 339 Overhang find restriction enzymes based on 330 332 336 344 pa4 file format 395 Page heading 116 Page number 116 Page setup 115 Pairwise comparison 361 PAM scoring matrices 208 Parameters search 168 Partition function 379 Paste text to create a new sequence 119 Paste copy 130 Pattern Discovery 219 Pattern discovery 379 Pattern Search 221 PCR primers 379 411 PCR perform virtually 271 pdb file format 395 seq file format 395 PDB file format 395 pdf format export 126 Peak call secondary 305 Peptide sequence databases 386 Percent identity pairwise compariso
498. rse RO ID O reverse RD IDO reverse LMT Figure 2 48 Verification of the result at the top a view of the whole BLAST result At the bottom the same view is zoomed in on exon 3 to show the amino acids either do not know the name of the gene or the genomic sequence is poorly annotated In these cases the approach described in this tutorial can be very productive 2 9 2 BLAST for primer binding sites You can adjust the BLAST parameters so it becomes possible to match short primer sequences against a larger sequence Then it is easy to examine whether already existing lab primers can be reused for other purposes or if the primers you designed are specific apne Sparse Standard BLAST 11 Primer searah beem 7 o 1006 These settings are shown in figure 2 49 2 9 3 Finding remote protein homologues If you look for short identical peptide sequences in a database the standard BLAST param eters will have to be reconfigured Using the parameters described below you are likely to be able to identify whether antigenic determinants will cross react to other proteins CHAPTER 2 TUTORIALS 68 EB Local BLAST sz 1 Select sequences of same Setinputparameters yYWM gt H stztt type 2 Set program parameters 3 Set input parameters Choose parameters Low Complexity Choose Filter Mask lower case Expect 1000 Word size Z No of processors 2 Match Mismatch Match 1 Mismatch 3 v Gap
499. ry long molecule and mispriming is a concern consider extracting part of the sequence prior to designing primers When both forward and reverse regions are defined If both a forward and a reverse region are defined primer pairs will be suggested by the algorithm After pressing the Calculate button a dialog will appear see figure 16 8 Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum GJE content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements S end must meet G C requirements Primer combination parameters Max percentage point difference in G C content Max difference in melting temperatures within a primer pair Max hydrogen bonds between pairs Max hydrogen bonds between pair ends Maximum length of amplicon Mispriming parameters Use mispriming as exclusion criteria Exact match Minimum number of base pairs required For a match Number of consecutive base pairs required in 3 end E Cancel Help Figure 16 8 Calculation dialog for PCR primers when two primer regions have been defined Again the top part of this dialog shows the parameter settings chosen in the Primer parameters preference group which will be used by the design algorithm The lower part again contains a menu where the user can choose to include mispriming of both
500. s Eisenberg et al 1984 Hopp Woods scale Hopp and Woods developed their hydrophobicity scale for identification of potentially antigenic sites in proteins This scale is basically a hydrophilic index where apolar residues have been assigned negative values Antigenic sites are likely to be predicted when using a window size of 7 Hopp and Woods 1983 Cornette scale Cornette et al computed an optimal hydrophobicity scale based on 28 published scales Cornette et al 1987 This optimized scale is also suitable for prediction of alpha helices in proteins Rose scale The hydrophobicity scale by Rose et al is correlated to the average area of buried amino acids in globular proteins Rose et al 1985 This results in a scale which is not showing the helices of a protein but rather the surface accessibility Janin scale This scale also provides information about the accessible and buried amino acid residues of globular proteins Janin 1979 Welling scale Welling et al used information on the relative occurrence of amino acids in antigenic regions to make a scale which is useful for prediction of antigenic regions This method is better than the Hopp Woods scale of hydrophobicity which is also used to identify antigenic regions Kolaskar Tongaonkar A semi empirical method for prediction of antigenic regions has been developed Kolaskar and Tongaonkar 1990 This method also includes information of surface accessibility and flexibi
501. s a dialog In Step 1 you can change remove and add DNA and protein sequences When the relevant sequences are selected clicking Next takes you to Step 2 This step allows you to adjust the window size from which the complexity plot is calculated Default is set to 11 amino acids and the number should always be odd The higher the number the less volatile the graph Figure 13 14 shows an example of a local complexity plot CHAPTER 13 GENERAL SEQUENCE ANALYSES 212 Complexity plot of CAA24102 0 38 0 96 0 54 0 32 0 50 0 88 0 86 0 84 0 82 0 80 0 78 Complexity Local complexity 5 10 15 20 25 30 35 40 45 Position Figure 13 14 An example of a local complexity plot Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish The values of the complexity plot approaches 1 0 as the distribution of amino acids become more complex See section B in the appendix for information about the graph view 13 4 Sequence statistics CLC DNA Workbench can produce an output with many relevant statistics for protein sequences Some of the statistics are also relevant to produce for DNA sequences Therefore this section deals with both types of statistics The required steps for producing the statistics are the same To create a statistic for the sequence do the following select sequence s Toolbox in the Menu Bar General Sequence Analyses A Create Sequence Statistics
502. s are automatically saved when closing down CLC DNA Workbench The next time you run the program the Workspaces are reopened exactly as you left them Note It is not possible to run more than one version of CLC DNA Workbench at a time Use two or more Workspaces instead 3 5 1 Create Workspace When working with large amounts of data it might be a good idea to split the work into two or more Workspaces As default the CLC DNA Workbench opens one Workspace Additional Workspaces are created in the following way Workspace in the Menu Bar Create Workspace enter name of Workspace OK When the new Workspace is created the heading of the program frame displays the name of the new Workspace Initially the selected elements in the Navigation Area is collapsed and the View Area is empty and ready to work with See figure 3 18 3 5 2 Select Workspace When there is more than one Workspace in the CLC DNA Workbench there are two ways to switch between them CHAPTER 3 USER INTERFACE 95 9 CLC Dna Workbench 3 0 Current workspace Default SEE File Edit Search View Toolbox Workspace Help DE GS Sd DD E E T ONA Show New Import Expor Workspace Search Pan EOC Zoom In Zoom Out JS UE 3 Example data iH Nucleotide H Protein w Extra README R Recycle bin 1 t Alignments and Trees KA General Sequence Analyses A Nucleotide Analyses gas Protein Analyses pf Sequencing Data Analyses Cal Primers
503. s directly to the dialog described in the next section CHAPTER 19 SEQUENCE ALIGNMENT 362 a BB Create Pairwise Comparison EA 1 Select alignments of Select alignments of same typ same type Projects Selected Elements 1 Data PE ATP8a1 ortholog alignment xample Data Cloning Primers P Protein analyses gt Protein orthologs RNA secondary structure Fj Sequencing data Th Qr zenter search term gt Figure 19 13 Creating a pairwise comparison table 19 5 2 Pairwise comparison parameters There are four kinds of comparison that can be made between the sequences in the alignment as shown in figure 19 14 a I Create Pairwise Comparison EJ 1 Select alignments of select comparisons to p TL same type 2 Select comparisons to perform Select comparisons J Gaps 4 Differences Distance 4 Similarity 4 Identities caiam Dao previous pu J Jin J Kca Figure 19 14 Adjusting parameters for pairwise comparison e Gaps Calculates the number of alignment positions where one sequence has a gap and the other does not e Identities Calculates the number of identical alignment positions to overlapping alignment positions between the two sequences e Differences Calculates the number of alignment positions where one sequence is different from the other This includes gap differences as in the Gaps compari
504. s from 230 to 240 inclusive and 250 to 260 inclusive 10 2 Circular DNA A sequence can be shown as a circular molecule select a sequence in the Navigation Area Show in the Toolbar As Circular Q or If the sequence is already open Click Show As Circular at the lower left part of the view This will open a view of the molecule similar to the one in figure 10 4 CHAPTER 10 VIEWING AND EDITING SEQUENCES 151 TR 1000 pBR322 4361 bp S protein Figure 10 4 A molecule shown in a circular view This view of the sequence shares some of the properties of the linear view of sequences as described in section 10 1 but there are some differences The similarities and differences are listed below e Similarities The editing options Options for adding editing and removing annotations Restriction Sites Annotation Types Find and Text Format preferences groups e Differences In the Sequence Layout preferences only the following options are available in the circular view Numbers on plus strand Numbers on sequence and Sequence label You cannot zoom in to see the residues in the circular molecule If you wish to see these details split the view with a linear view of the sequence In the Annotation Layout you also have the option of showing the labels as Stacked This means that there are no overlapping labels and that all labels of both annotations and restriction sites a
505. s in this solution are highlighted on the sequence 16 4 1 Saving primers Primer solutions in a table row can be saved by selecting the row and using the right click mouse menu This opens a dialog that allows the user to save the primers to the desired location Primers and probes are saved as DNA sequences in the program This means that all available DNA analyzes can be performed on the saved primers including BLAST Furthermore the primers can be edited using the standard sequence view to introduce e g mutations and restriction sites 16 4 2 Saving PCR fragments The PCR fragment generated from the primer pair in a given table row can also be saved by selecting the row and using the right click mouse menu This opens a dialog that allows the user to save the fragment to the desired location The fragment is saved as a DNA sequence and the position of the primers is added as annotation on the sequence The fragment can then be used for further analysis and included in e g an in silico cloning experiment using the cloning editor 16 4 3 Adding primer binding annotation You can add an annotation to the template sequence specifying the binding site of the primer Right click the primer in the table and select Mark primer annotation on sequence 16 5 Standard PCR This mode is used to design primers for a PCR amplification of a single DNA fragment CHAPTER 16 PRIMERS 257 16 5 1 User input In this mode the user must define either a
506. s is done in the Toolbox in the Processes tab 11 1 3 Save GenBank search parameters The search view can be saved either using dragging the search tab and and dropping it in the Navigation Area or by clicking Save ED When saving the search only the parameters are saved not the results of the search This is useful if you have a special search that you perform from time to time Even if you don t save the search the next time you open the search view it will remember the parameters from the last time you did a search 11 2 Sequence web info CLC DNA Workbench provides direct access to web based search in various databases and on the Internet using your computer s default browser You can look up a sequence in the databases of NCBI and UniProt search for a sequence on the Internet using Google and search for Pubmed CHAPTER 11 ONLINE DATABASE SEARCH 1 1 references at NCBI This is useful for quickly obtaining updated and additional information about a sequence The functionality of these search functions depends on the information that the sequence contains You can see this information by viewing the sequence as text see section 10 5 In the following sections we will explain this in further detail The procedure for searching is identical for all four search options see also figure 11 3 Open a sequence or a sequence list Right click the name of the sequence Web Info p select the desired search function 20
507. se the other option instead Direct download Selecting the first option takes you to the dialog shown in figure 1 8 A progress for getting the license is shown and when the license is downloaded you will be able to click Next Go to license download web page Selecting the second option Go to license download web page opens the license web page as shown in 1 9 Click the Request Evaluation License button and you will be able to save the license on your computer e g on the Desktop Back in the Workbench window you will now see the dialog shown in 1 10 CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 20 License Wizard s d CLC DNA Workbench Requesting a license with id CLC LICENSE SRENMNSTED 0D43CA9 Requesting and downloading a license by establishing a direct connection to the CLC bio License Web Service Your License was successfully downloaded The License is valid until 2008 08 01 If you experience any problems please contact The CLC Support Team Proxy Settings Previou Next Quit Workbench Figure 1 8 A license has been downloaded Download a license This your License Order ID CLC LICENSE SRENMNSTED 0D43CASEDF 4XXXXXD844A 4COC 4BXXXXX 1 AB1AEFF9 19F our license please dick the button below Download License adec a file containing the license willl b Figure 1 9 The license web page where you can download a license License Wizard os d CLC DNA Workben
508. search results The search result is presented as a list of links to the files in the NCBI database The View displays 50 hits at a time This can be changed in the Preferences see chapter 5 More hits can be displayed by clicking the More button at the bottom right of the View Each sequence hit is represented by text in three columns e Accession e Description Modification date e Length It is possible to exclude one or more of these columns by adjust the View preferences for the database search view Furthermore your changes in the View preferences can be saved See section 5 6 Several sequences can be selected and by clicking the buttons in the bottom of the search view you can do the following e Download and open doesn t save the sequence e Download and save lets you choose location for saving sequence e Open at NCBI searches the sequence at NCBI s web page Double clicking a hit will download and open the sequence The hits can also be copied into the View Area or the Navigation Area from the search results by drag and drop copy paste or by using the right click menu as described below Drag and drop from GenBank search results The sequences from the search results can be opened by dragging them into a position in the View Area Note A sequence is not saved until the View displaying the sequence is closed When that happens a dialog opens Save changes of sequence x Yes or No The sequence c
509. sembly 303 Vector see cloning 307 Vector contamination find automatically 289 Vector design 307 Vector graphics export 126 VectorNTI file format 393 View 84 alignment 353 dot plots 203 GenBank format 101 preferences 90 save changes 87 sequence 141 sequence as text 161 View Area 84 illustration View preferences 106 show automatically 106 style sheet 110 View settings user defined 106 Virtual gel 380 vsf file format for settings 107 Web page import sequence from 119 Wildcard append to search 168 415 Windows installation 12 Workspace 94 create 94 delete 95 save 94 select 94 Wrap sequences 142 xIs file format 395 xIsx file format 395 xml file format 395 Zip file format 393 395 Zoom 91 tutorial 39 Zoom In 91 Zoom Out 91 Zoom to 100 92
510. sequence XxX ATP8al mRNA reference pa Reference sequence Include reference sequence in contig s Only include part of the reference sequence in the contig Extra residues 100 Do not include reference sequence in contig s Conflicts resolved with prejme Previous next Finish 1 06 Cancel Figure 2 14 The ATP8a1 mRNA reference sequence selected as reference sequence for the assembly Click Next and choose to use the trim information that you have just added CHAPTER 2 TUTORIALS 4 Click Next and choose to Save your results The next step will ask you for a location to save the results to You can just accept the default location or you could use the left hand icon under the Save in folder heading to create a new folder to save your assembly into Click Finish and the assembly process will begins 2 5 3 Getting an overview of the contig The result of the assembly is a Contig which is an alignment of the nine reads to the reference sequence Click Fit width W to see an overview of the contig To help you determine the coverage display a coverage graph see figure 2 15 Alignment info in Side Panel Coverage Graph Atp8a1 NERI trt nd cal A bly t ATP 881 MRNA reference ee eo E ather sequences at top f Show sequence ends Conflict 2 Conflict Find Inconsistency Conflict Low coverage threshold 8 Conflict Conflict Consensus 3 gt Annotation
511. sequence The annotations are placed above the sequence Separate layer The annotations are placed above the sequence and above restriction sites only applicable for nucleotide sequences CHAPTER 10 VIEWING AND EDITING SEQUENCES SEQUENCE SELUNOS gA e rh k Sequence layout Annotation layout 4 Show annotations Position Next to sequence Offset Little offset Label Stacked Show arrows Use gradients Annotation types EM RI cos exon E 4 Gene source Goss Select All Deselect All Restriction sites Residue coloring Nucleotide info k Find k Text Format Figure 10 7 Changing the layout of annotations in the Side Panel 154 e Offset If several annotations cover the same part of a sequence they can be spread out Piled The annotations are piled on top of each other Only the one at front is visible Little offset The annotations are piled on top of each other but they have been offset a little More offset Same as above but with more spreading Most offset The annotations are placed above each other with a little space between This can take up a lot of Space on the screen e Label The name of the annotation can shown as a label Additional information about the sequence is shown if you place the mouse cursor on the annotation and keep it still No labels No labels are displayed On annotation The labels ar
512. sequence s Toolbox in the Menu Bar General Sequence Analyses 7 Pattern Discovery 42 or right click DNA or protein sequence s Toolbox General Sequence Analyses GA Pattern Discovery 4 If a sequence was selected before choosing the Toolbox action the sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements You can perform the analysis on several DNA or several protein sequences at a time If the analysis is performed on several sequences at a time the method will search for patterns which is common between all the sequences Annotations will be added to all the sequences and a view is opened for each sequence Click Next to adjust parameters see figure 13 20 E q Pattern Discovery LES 1 Select one or more sequences of same type 2 Set parameters Define model Create and search with new model Use existing model o Set motif parameters Pattern length Min 4 Max 9 Noise 1 v Number of patterns to predict 1 w 2 q Previous gt Next Enshi XX Cancel Figure 13 20 Setting parameters for the pattern discovery See text for details In order to search unknown sequences with an already existing model Select to use an already existing model which is seen in figure 13 20 Models are represented with the following icon in the navigation area HAR
513. signing process Then follows instructions on how to adjust parameters for primers how to inspect and interpret primer properties graphically and how to interpret save and analyze the output of the primer design analysis After a description of the different reaction types for which primers can be designed the chapter closes with sections on how to match primers with other sequences and how to create a primer order 16 1 Primer design an introduction Primer design can be accessed in two ways select sequence Toolbox in the Menu Bar Primers and Probes E1 Design Primers TZ OK or right click sequence Show Primer 7 In the primer view see figure 16 1 the basic options for viewing the template sequence are the same as for the standard sequence view See section 10 1 for an explanation of these options Note This means that annotations such as e g known SNP s or exons can be displayed on the template sequence to guide the choice of primer regions Also traces in Sequencing reads can be shown along with the structure to guide e g the re sequencing of poorly resolved regions Tor PERH3BC Es 20 AMBER VESIONeEr DECIS Ds EO he e E PERH3BC GTGAGTCTGATGGGTCTGCCCATGGTTTCC F Lat 18 Primer parameters gr Length Lgt 19 Max 22 Min 18 F L t 20 Melt temp 20 Max 58 2 Lgt 21 Min 48 Inner Melk temp 20 Lgt 22 Max 625 Min 40 60 k Advanced parameters Mode PERH3SBC CCTCTAGT
514. so generate tables In addition to the Open and Save options you can also choose whether the result of the analysis should be added as annotations on the sequence or shown on a table If both options are selected you will be able to click the results in the table and the corresponding region on the sequence will be selected If you choose to add annotations to the sequence they can be removed afterwards by clicking Undo in the Toolbar 9 2 2 Batch log For some analyses there is an extra option in the final step to create a log of the batch process see e g figure 9 7 This log will be created in the beginning of the process and continually updated with information about the results See an example of a log in figure 9 8 In this example the log displays information about how many open reading frames were found CHAPTER 9 BATCHING AND RESULT HANDLING 139 EB log Rows 9 Log Filter Name Description Type Time 4 738615 Found 10 reading frames Fri Nov 17 HUMDINUC Found 5 reading frames Fri Nov 17 PERHIBA Found 5 reading frames Fri Nov 17 PERHIBB Found 7 reading Frames NE Fri Nov 17 PERHZBA Found 4 reading frames Fri Nov 17 PERH2BB Found 7 reading Frames Fri Nov 17 PERH2BD Found 8 reading Frames Fri Nov 17 PERH3BA Found 3 reading Frames Fri Nov 17 PERH3BC Found 7 reading frames Fri Nov 17 Figure 9 8 An example of a ba
515. son e Distance Calculates the Jukes Cantor distance between the two sequences This number is given as the Jukes Cantor correction of the proportion between identical and overlapping alignment positions between the two sequences e Percent identity Calculates the percentage of identical residues in alignment positions to overlapping alignment positions between the two sequences CHAPTER 19 SEQUENCE ALIGNMENT Click Next if you wish to adjust how to handle the results see section 9 2 If not click Finish 19 5 3 The pairwise comparison table The table shows the results of selected comparisons see an example in figure 19 15 Since comparisons are often symmetric the table can show the results of two comparisons at the same time one in the upper right and one in the lower left triangle EE TDPZ BOMMD al TOP2 BOMMO TOP2 DROME TOP2 PEA TOP2 ARATH TOP2 PLAFK TOP2 CANGA TOP2 YEAST TOP2 CANAL TOP2 SCHPO TOP2 LEICH TOP2 CRIFA TOP2_TRYBB TOP2_TRYCR TOP2 ASFM2 TOP2 ASFB7 ES Op E n os voa eu or o a sm ais we ral so aval o 1145 1089 1109 1124 3 m ae 1157 1084 1113 1136 1161 413 396 Figure 19 15 A pairwise comparison table The following settings are present in the side panel e Contents Upper comparison Selects the comparison to show in the upper triangle of the table Upper comparison gradient Selects the color gradient to use for the upper trian
516. sources available are PFAM databases for use with CLC Protein Workbench and CLC Main Workbench Because procedures for downloading installation uninstallation and updating are the same as for plug ins see section 1 7 1 and section 1 7 2 for more information CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 33 Updates available nip CLC Plugins Updates are available for your plug ins and or resources Use the list below to select which updates you would like to install IF you prefer you can install the updates manually through the plugin and resource manager Additional Alignments Version 1 03 Size 12 5 MB Updated bo At new versions of the CLC Workbenches Figure 1 26 Plug in updates Manage Plug ins and Resources Manage Plug ins Download Plug ins Manage Resources Download Resources PFAM 100 A Version 1 01 Top 100 occuring protein domains G PF AM 100 Size 5 MB Download and Install 3 Version 1 0 PFAM 500 D inti Version 1 0 sop Top 500 occuring protein domains PFAM Full Version 1 0 Complete PFAM database Mi Figure 1 27 Resources available for download 1 8 Network configuration If you use a proxy server to access the Internet you must configure CLC DNA Workbench to use this Otherwise you will not be able to perform any online activities e g searching GenBank CLC DNA Workbench supports the use of a HTTP proxy and an anonymous SOCKS proxy To configure your proxy setti
517. ss Prot TrEMBL 377 swp file format 395 System requirements 15 Tab delimited file format 395 414 Tab file format 393 Table of fragments 339 Tabs use of 84 Tag based expression profiling 3 0 Tags insert into sequence 318 TaqMan primers 3 9 tar file format 395 Tar file format 395 Taxonomy batch edit 84 tBLASTn 176 tBLASTx 175 Terminated processes 93 Text format 148 user manual 35 view sequence 161 Text file format 395 tif format export 126 Tips for BLAST searches 64 Toolbar illustration preferences 106 Toolbox 93 94 illustration show hide 94 Topology layout trees 3 0 Trace colors 144 Trace data 2 8 3 6 quality 289 Traces scale 2 8 Translate a selection 145 along DNA sequence 144 annotation to protein 149 CDS 234 coding regions 234 DNA to RNA 229 nucleotide sequence 232 ORF 234 protein 243 RNA to DNA 230 to DNA 378 to protein 232 3 8 Translation of a selection 145 show together with DNA sequence 144 Transmembrane helix prediction 3 8 INDEX Trim 288 3 76 Trimmed regions adjust manually 296 TSV file format 393 Tutorial Getting started 3 txt file format 395 UIPAC codes amino acids 396 Undo limit 105 Undo Redo 87 UniProt search 377 search sequence in 1 1 UniVec trimming 289 UPGMA algorithm 3 2 379 Urls Navigation Area 124 User defined view settings 106 User interface Variance table as
518. stimate of primate phylogeny Philos Trans R Soc Lond B Biol Sci 348 1326 405 421 Rose et al 1985 Rose G D Geselowitz A R Lesser G J Lee R H and Zehfus M H 1985 Hydrophobicity of amino acid residues in globular proteins Science 229 4 16 834 838 Saitou and Nei 1987 Saitou N and Nei M 1987 The neighbor joining method a new method for reconstructing phylogenetic trees Mol Biol Evol 4 4 406 425 SantaLucia 1998 SantaLucia J 1998 A unified view of polymer dumbbell and oligonu cleotide DNA nearest neighbor thermodynamics Proc Natl Acad Sci U S A 95 4 1460 1465 BIBLIOGRAPHY 403 Schneider and Stephens 1990 Schneider T D and Stephens R M 1990 Sequence logos a new way to display consensus sequences Nucleic Acids Res 18 20 609 7 6100 Siepel and Haussler 2004 Siepel A and Haussler D 2004 Combining phylogenetic and hidden Markov models in biosequence analysis J Comput Biol 11 2 3 413 428 Smith and Waterman 1981 Smith T F and Waterman M S 1981 Identification of common molecular subsequences J Mol Biol 147 1 195 197 Sneath and Sokal 1973 Sneath P and Sokal R 1973 Numerical Taxonomy Freeman San Francisco Tobias et al 1991 Tobias J W Shrader T E Rocap G and Varshavsky A 1991 The N end rule in bacteria Science 254 5036 13 74 1377 von Ahsen et al 2001 von Ahsen N Wittwer C T and Schutz E 2001 O
519. t Secondary structure Show column Sequence Melk temp Self annealing TSGTTTCCTTOCTCT AGT TGATCTCCTICCTITGGT c A Self annealing alignmen Self end annealing GC content Secondary structure sc Secondary structure Figure 16 16 Properties of a primer from the Example Data In the Side Panel you can specify the information to display about the primer The information parameters of the primer properties table are explained in section 16 5 2 16 11 Find binding sites and create fragments In CLC DNA Workbench you have the possibility of matching known primers against one or more DNA sequences or a list of DNA sequences This can be applied to test whether a primer used in a previous experiment is applicable to amplify e g a homologous region in another species or to test for potential mispriming This functionality can also be used to extract the resulting PCR product when two primers are matched This is particularly useful if your primers have extensions in the 5 end To search for primer binding sites os Toolbox Primers and Probes E Find Binding Sites and Create Fragments 72 If a sequence was already selected this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences or sequence lists from the selected elements Click Next when all the sequence have been added Note You should not add the primer sequences at this step 16 11 1 Bi
520. t can be edited by right click the selection Edit Selection 2 A dialog appears displaying the sequence You can add remove or change the text and click OK The original selected part of the sequence is now replaced by the sequence entered in the dialog This dialog also allows you to paste text into the sequence using Ctrl V 3 V on Mac If you delete the text in the dialog and press OK the selected text on the sequence will also be deleted Another way to delete a part of the sequence is to right click the selection Delete Selection If you wish to only correct only one residue this is possible by simply making the selection only cover one residue and then type the new residue Another way to edit the sequence is by inserting a restriction site See section 18 1 4 10 1 5 Sequence region types The various annotations on sequences cover parts of the sequence Some cover an interval some cover intervals with unknown endpoints some cover more than one interval etc In the CHAPTER 10 VIEWING AND EDITING SEQUENCES 150 following all of these will be referred to as regions Regions are generally illustrated by markings often arrows on the sequences An arrow pointing to the right indicates that the corresponding region is located on the positive strand of the sequence Figure 10 2 is an example of three regions with separate colors Ses mp Figure 10 2 Three regions on a human beta globin DNA sequence HUMHBB
521. t click in the empty white area of the contig Reassemble contig This opens a dialog as shown in figure 17 25 In this dialog you can choose e De novo assembly This will perform a normal assembly in the same way as if you had selected the reads as individual sequences When you click Next you will follow the same steps as described in section 17 4 The consensus sequence of the contig will be ignored CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 305 q Reassemble Contig Es 1 Select a single contig SEIECE assembly aigoritnim 2 Select assembly algorithm Assembly algorithm De novo assembl Reassemble reads and ignore old consensus sequence Reference assembly Reassemble reads using old consensus sequence as reference a Cem deed Ye kee Figure 17 25 Re assembling a contig e Reference assembly This will use the consensus sequence of the contig as reference When you click Next you will follow the same steps as described in section 17 5 When you click Finish a new contig is created so you do not lose the information in the old contig 17 9 Secondary peak calling CLC DNA Workbench is able to detect secondary peaks a peak within a peak to help discover heterozygous mutations Looking at the height of the peak below the top peak the CLC DNA Workbench considers all positions in a sequence and if a peak is higher than the threshold set by the user it will be called
522. t features of CLC Protein Workbench and it has additional advanced features CLC Main Workbench holds all basic and advanced features of the CLC Workbenches In June 2007 CLC RNA Workbench was released as a sister product of CLC Protein Workbench and CLC DNA Workbench CLC Main Workbench now also includes all the features of CLC RNA Workbench In March 2008 the CLC Free Workbench changed name to CLC Sequence Viewer In June 2008 the first version of the CLC Genomics Workbench was released due to an extraordinary demand for software capable of handling sequencing data from the new high throughput sequencing systems like 454 Illumina Genome Analyzer and SOLID For an overview of which features all the applications include see http www clcbio com features In December 2006 CLC bio released a Software Developer Kit which makes it possible for anybody with a knowledge of programming in Java to develop plug ins The plug ins are fully integrated with the CLC Workbenches and the Viewer and provide an easy way to customize and extend their functionalities All our software will be improved continuously If you are interested in receiving news about updates you should register your e mail and contact data on http www clcbio com if you haven t already registered when you downloaded the program 1 5 1 New program feature request The CLC team is continuously improving the CLC DNA Workbench with our users interests in mind Therefor
523. t how to handle the results see section 9 2 If not click Finish An example of protein sequence statistics is shown in figure 13 16 1 Protein statistics 1 1 Sequence information haemoglobin beta h0 chain Mus musculus 1 2 Half life Half life mammals Half life yeast Half life E Coli Figure 13 16 Comparative sequence statistics Nucleotide sequence statistics are generated using the same dialog as used for protein Sequence statistics However the output of Nucleotide sequence statistics is less extensive than that of the protein sequence statistics Note The headings of the tables change depending on whether you calculate individual or comparative sequence statistics The output of comparative protein sequence statistics include e Sequence information Sequence type Length Organism CHAPTER 13 GENERAL SEQUENCE ANALYSES 214 Name Description Modification Date Weight This is calculated like this swimunitsinsequence wetght unit links x weight H20 where links is the sequence length minus one and units are amino acids The atomic composition is defined the same way Isoelectric point Aliphatic index e Half life e Extinction coefficient e Counts of Atoms e Frequency of Atoms e Count of hydrophobic and hydrophilic residues e Frequencies of hydrophobic and hydrophilic residues e Count of charged residues e Frequencies of charged residues e Amino
524. t of all qualifiers of all the selected annotations is shown Note that if one of the annotations do not have the qualifier you have chosen it will not be retyped If an annotation has multiple qualifiers of the same type the first is used for the new type e New type You can select from a list of all the pre defined types as well as enter your own annotation type All the selected annotations will then get this type CHAPTER 10 VIEWING AND EDITING SEQUENCES 160 Options O Use this qualifier if exists New type O Use annotation name as type Figure 10 12 The Advanced Retype dialog e Use annotation name as type The annotation s name will be used as type e g if you have an annotation named Promoter it will get Promoter as its type by using this option 10 3 4 Removing annotations Annotations can be hidden using the Annotation Types preferences in the Side Panel to the right of the view see section 10 3 1 In order to completely remove the annotation right click the annotation Delete Delete Annotation x If you want to remove all annotations of one type right click an annotation of the type you want to remove Delete Delete Annota tions of Type type If you want to remove all annotations from a sequence right click an annotation Delete Delete All Annotations The removal of annotations can be undone using Ctrl Z or Undo 3 in the Toolbar If you have more sequences e g in a sequence l
525. ta is possible of you add a location on a network drive The procedure is similar to the one described above When you add a location on a network drive or a removable drive the CHAPTER 3 USER INTERFACE 19 E q Choose folder to add as location Lookin EE Desktop 2 BE K jm Computer ari amp Network Recent Items Desktop Documents LS Computer A Network File name C Users smoensted Desktop Add Files of type All Files Cancel m IMavigalondrea 8 p BS Y o Ofal E o a CLC a EA Deskta Figure 3 5 The new location has been added location will appear inactive when you are not connected Once you connect to the drive again click Update All 1 and it will become active note that there will be a few seconds delay from you connect Opening data The elements in the Navigation Area are opened by Double click the element or Click the element Show 42 in the Toolbar Select the desired way to view the element This will open a view in the View Area which is described in section 3 2 Adding data Data can be added to the Navigation Area in a number of ways Files can be imported from the file system see chapter 7 Furthermore an element can be added by dragging it into the Navigation Area This could be views that are open elements on lists e g search hits or sequence lists and files located on your computer Finally you ca
526. tart local alignments from these initial matches If you are interested in the bioinformatics behind BLAST there is an easy to read explanation of this in section 12 5 With CLC DNA Workbench there are two ways of performing BLAST searches You can either have the BLAST process run on NCBI s BLAST servers http www ncbi nlm nih gov or perform the BLAST search on your own computer The advantage of running the BLAST search on NCBI servers is that you have readily access to the most popular BLAST databases without having to download them to your own computer The advantage of running BLAST on your own computer is that you can use your own Sequence data and that this can sometimes be faster and more reliable for big batch BLAST jobs Figure 12 8 shows an example of a BLAST result in the CLC DNA Workbench SL ATP8al BLAST AtpBai ATPa me a a lM nan IMM 2QUIATBA1 HUMAN e Mm eee NTI2 ATBA2_H 8198 ATBB2_H sp Q9NTIZIATEA HUMAN Probable phospholipicttransporting ATPase IB ATPase class 2 ML 1 TF62 A18B4_H ae Se sity et 144 82 Gaps 291144 2 DA FUN e da ie i i OG3 AT11B HUMAN ee ee ee M MARAT 1A HUMAN me 4 4 Ve 1 e es e e 7 IB4S AT1I1C HUMAN O M O I I l UITA TEEI HONEN m fe ijt aa id 0423JAT8B3_HU MAN SUATESA HUNAN t t ine eet ee 2OCMIATOOO ULIBAAKI sv aa i ele st hon ees ek 6 E 4 Th p BEE Figure 12 1 Display of the output of a BLAST search
527. tch log when finding open reading frames The log will either be saved with the results of the analysis or opened in a view with the results depending on how you chose to handle the results Part Ill Bioinformatics 140 Chapter 10 Viewing and editing sequences Contents 10 1 View sequence 0 08 ee eee ee ee ee a 10 1 1 Sequence settings in Side Panel 0 25502 2 eee 10 1 2 Restriction sites inthe Side Panel 0 0 004 0 wees 10 1 3 Selecting parts of the sequence 00 000 wenn nee 10 1 4 Editing the sequence 2 eee 10 1 5 Sequence regiontypes 0 0 0c e ee ee tee ee ee LG 2 CMN lt cue tee oe wae ew CRBS ew eee ee ae 10 2 1 Using split views to see details of the circular molecule 10 2 2 Mark molecule as circular and specify starting point 10 3 Working with annotations 0 08 ee ee ee es 10 3 1 Viewing annotations 2 2 eee ee 10 3 2 Adding annotations lt ia sa ssa bw kw amp ew oe Fole ew a 10 3 3 Edit annotations 6 45 24 w aw aa we Oe eS ee we HE 10 3 4 Removing annotations no aoao aoao e e a a a a 10 4 Element information a aoa soa a a 08 8 ee ee ee ee ee 10 5 View as text 0 0 ee a a ee annon 10 6 Creating a new sequence 0 08 08 eee eee eee 10 7 Sequence Lists sve eee Cee A eRe ee ee RD we we 10 7 1 Graphical view of sequence lists 0 0 ee ee ee es 10 7 2 Sequence list table a 0 ee
528. tched equally well another place in the mapping it is considered a non specific match This color will CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 299 overrule the other colors Note that if you are mapping with several reference sequences a read is considered a double match when it matches more than once across all the contigs references A non specific match is yellow per default Beside from these preferences all the functionalities of the alignment view are available This means that you can e g add annotations Such as SNP annotations to regions of interest However some of the parameters from alignment views are set at a different default value in the view of contigs Trace data of the sequencing reads are shown if present can be enabled and disabled under the Nucleotide info preference group and the Color different residues option is also enabled in order to provide a better overview of conflicts can be changed in the Alignment info preference group eens Tii i
529. te that for contigs with more than 1000 reads you can only do single residue replacements you can t delete or edit a selection When the compactness is Packed you cannot edit any of the reads 17 7 3 Sorting reads If you wish to change the order of the sequence reads simply drag the label of the sequence up and down Note that this is not possible if you have chosen Gather sequences at top or set the compactness to Packed in the Side Panel You can also sort the reads by right clicking a sequence label and choose from the following options e Sort Reads by Alignment Start Position This will list the first read in the alignment at the top etc e Sort Reads by Name Sort the reads alphabetically e Sort Reads by Length The shortest reads will be listed at the top 17 7 4 Read conflicts When the contig is created conflicts between the reads are annotated on the consensus sequence The definition of a conflict is a position where at least one of the reads have a different residue A conflict can be in two states e Conflict Both the annotation and the corresponding row in the Table FE e Resolved Both the annotation and the corresponding row in the Table E green The conflict can be resolved by correcting the deviating residues in the reads as described above A fast way of making all the reads reflect the consensus sequence is to select the position in the consensus right click the selection and choose Transfer Se
530. ted Plasmir ae pELCATS Tue Jun smoensted Plasmii HE pELCATS Tue Jun smoensted Clonir Pest46 DEVGGEALGF Pest46 LLVVYPWT OF PF68046 FFDSFGDLS S 4 Move to Recycle Bin Figure 3 7 Changing the common name of five sequences Length feed 7599 Mame Description Latin Mame Taxonomy Common Mame Linear art PEBDES E azt pasosa art Peso46 E IEEE tl act Pas225 Ea PF68225 VDEVGGEALI P68225 RLLVVYPWT PF68225 RFFESFGDL Lu EJES Il Common Mame gt Erp Oe Him Ei Ela Li Mme Li El 85 Figure 3 8 A View Area can enclose several views each view is indicated with a tab see right view which shows protein P68225 Furthermore several views can be shown at the same time in this example four views are displayed This chapter deals with the handling of views inside a View Area Furthermore it deals with rearranging the views Section 3 3 deals with the zooming and selecting functions 3 2 1 Open view Opening a view can be done in a number of ways double click an element in the Navigation Area CHAPTER 3 USER INTERFACE 86 or select an element in the Navigation Area File Show Select the desired way to view the element or select an element in the Navigation Area Ctrl O 3 B on Mac Opening a view while another view is already open will show the new view in front of the other view The view that was already open ca
531. ted to the name of the exported file in order for the exported file to work Before exporting you are asked about which of the different settings you want to include in the exported file One of the items in the list is User Defined View Settings If you export this only the information about which of the settings is the default setting for each view is exported If you wish to export the Side Panel Settings themselves see section 5 2 2 The process of importing preferences is similar to exporting Press Ctrl K 3 on Mac to open Preferences Import Browse to and select the cpf file Import and apply preferences 5 5 1 The different options for export and importing To avoid confusion of the different import and export options here is an overview CHAPTER 5 USER PREFERENCES AND SETTINGS 110 e Import and export of bioinformatics data such as sequences alignments etc described in section 7 1 1 e Graphics export of the views which creates image files in various formats described in section 3 e Import and export of Side Panel Settings as described in the next section e Import and export of all the Preferences except the Side Panel settings This is described above 5 6 View settings for the Side Panel The Side Panel is shown to the right of all views that are opened in CLC DNA Workbench By using the settings in the Side Panel you can specify how the layout and contents of the view Figure 5 8 is an exampl
532. ter dye at the 5 end anda quenching dye at the 3 end Fluorescent molecules become excited when they are irradiated and usually emit light However in a TaqMan probe the energy from the fluorescent dye is transferred to the quencher dye by fluorescence resonance energy transfer as long as the quencher and the dye are located in close proximity i e when the probe is intact TaqMan probes are designed to anneal within a PCR product amplified by a standard PCR primer pair If a TaqMan probe is bound to a product template the replication of this will cause the Taq polymerase to encounter the probe Upon doing so the 5 exonuclease activity of the polymerase will cleave the probe This cleavage separates the quencher and the dye and as a result the reporter dye starts to emit fluorescence The TaqMan technology is used in Real Time quantitative PCR Since the accumulation of fluorescence mirrors the accumulation of PCR products it can can be monitored in real time and used to quantify the amount of template initially present in the buffer The technology is also used to detect genetic variation such as SNP s By designing a TaqMan probe which will specifically bind to one of two or more genetic variants it is possible to detect genetic variants by the presence or absence of fluorescence in the reaction A specific requirement of TaqMan probes is that a G nucleotide can not be present at the 5 end since this will quench the fluorescence of th
533. the data To start the assembly select sequences to assemble Toolbox in the Menu Bar Sequencing Data Analyses A Assemble Sequences to Reference F7 This opens a dialog where you can alter your choice of Sequences which you want to assemble You can also add sequence lists Note You can assemble a maximum of 2000 sequences at a time To assemble more sequences you need the CLC Genomics Workbench see http www clcbio com genomics When the sequences are selected click Next and you will see the dialog shown in figure 17 17 q Assemble Sequences to Reference Es p 1 Select some nucleotide set rererence parameters sequences 2 Set reference parameters Reference sequence Choose reference sequence XX ATP8al mRNA reference o Reference sequence Include reference sequence in contig s Only include part of the reference sequence in the contig Do not include reference sequence in contig s Panflirt ecnlved A Previous gt Next Finish XX Cancel Figure 1 17 Setting assembly parameters when assembling to a reference sequence This dialog gives you the following options for assembling e Reference sequence Click the Browse and select element icon uy in order to select a sequence to use as reference e Include reference sequence in contig s This will display a contig data object with the reference sequence at the top and the reads aligned below Th
534. the overview table the following information is shown e Query Since this table displays information about several query sequences the first column is the name of the query sequence e Number of hits The number of hits for this query sequence e For the following list the value of the best hit is displayed together with accession number and description of this hit Lowest E value Greatest identity CHAPTER 12 BLAST SEARCH 182 Greatest positive Greatest hit length Greatest bit score If you wish to save some of the BLAST results as individual elements in the Navigation Area open them and click Save As in the File menu 12 2 3 BLAST graphics The BLAST editor shows the sequences hits which were found in the BLAST search The hit sequences are represented by colored horizontal lines and when hovering the mouse pointer over a BLAST hit sequence a tooltip appears listing the characteristics of the sequence As default the query sequence is fitted to the window width but it is possible to zoom in the windows and see the actual sequence alignments returned from the BLAST server There are several settings available in the BLAST Graphics view e BLAST Layout You can choose to Gather sequences at top Enabling this option affects the view that is shown when scrolling horizontally along a BLAST result If selected the sequence hits which did not contribute to the visible part of the BLAST graphics will be
535. tion numbering in the status bar Such reads are not considered perfectly aligned reads because they don t align in their entire length Include reads with less than perfect alignment Reads with mismatches insertions or dele tions or with unaligned nucleotides at the ends the faded part of a read Note that only reads that are completely covered by the selection will be part of the new contig One of the benefits of this is that you can actually use this tool to extract subset of reads from a contig An example work flow could look like this CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 303 1 Select the whole reference sequence 2 Right click and Extract from Selection 3 Choose to include only paired matches 4 Extract the reads from the new file see section 10 7 3 You will now have all paired reads from the original mapping in a list 17 7 7 Variance table In addition to the standard graphical display of a contig as described above you can also see a tabular overview of the conflicts between the reads by clicking the Table 8 icon at the bottom of the view This will display a new view of the conflicts as shown in figure 17 24 Conflict Possible SNP Resolved conflict Consensus TTATCTCC TGAGGAGGTCAGITGAAACACAGGGAACTGAGAG Apm ii w 4 Position Consensus residue Other residues IUPAC Status Notes Conflict C 1 TZ Cili Tic h Conflict resolution vote Conflict
536. tion of the newly sequenced and maybe unknown sequence If the researchers have no prior information of the sequence and biological content valuable information can often be obtained using BLAST The BLAST algorithm will search for homologous sequences in predefined and annotated databases of the users choice In an easy and fast way the researcher can gain knowledge of gene or protein function and find evolutionary relations between the newly sequenced DNA and well established data After the BLAST search the user will receive a report specifying found homologous sequences and their local alignments to the query sequence 12 5 3 How does BLAST work BLAST identifies homologous sequences using a heuristic method which initially finds short matches between two sequences thus the method does not take the entire sequence space into account After initial match BLAST attempts to start local alignments from these initial matches This also means that BLAST does not guarantee the optimal alignment thus some sequence hits may be missed In order to find optimal alignments the Smith Waterman algorithm Should be used see below In the following the BLAST algorithm is described in more detail Seeding When finding a match between a query sequence and a hit sequence the starting point is the words that the two sequences have in common A word is simply defined as a number of letters CHAPTER 12 BLAST SEARCH 191 For blastp the default word si
537. to produce the best available alignment BLAST is a heuristic method which does not guarantee the best results and therefore you cannot rely on BLAST if you wish to find all the hits in the database Instead use the Smith Waterman algorithm for obtaining the best possible local alignments Smith and Waterman 1981 BLAST only makes local alignments This means that a great but short hit in another sequence may not at all be related to the query sequence even though the sequences align well in a small region It may be a domain or similar It is always a good idea to be cautious of the material in the database For instance the sequences may be wrongly annotated hypothetical proteins are often simple translations of a found ORF on a sequenced nucleotide sequence and may not represent a true protein Don t expect to see the best result using the default settings As described above the settings should be adjusted according to the what kind of query sequence is used and what kind of results you want It is a good idea to perform the same BLAST search with different settings to get an idea of how they work There is not a final answer on how to adjust the settings for your particular sequence 12 5 9 Other useful resources The BLAST web page hosted at NCBI http www ncbi nlm nih gov BLAST Download pages for the BLAST programs http www ncbi nlm nih gov BLAST download shtml Download pages for pre formatted BLAST databases tept 7 te
538. to specify how the Labels shown be shown e No labels This will just display the cut site with no information about the name of the enzyme Placing the mouse button on the cut site will reveal this information as a tool tip e Flag This will place a flag just above the sequence with the enzyme name see an example in figure 18 28 Note that this option will make it hard to see when several cut sites are located close to each other In the circular view this option is replaced by the Radial option e Radial This option is only available in the circular view It will place the restriction site labels as close to the cut site as possible See an example in figure 18 30 e Stacked This is similar to the flag option for linear sequence views but it will stack the labels so that all enzymes are shown For circular views it will align all the labels on each side of the circle This can be useful for clearly seeing the order of the cut sites when they are located closely together See an example in figure 18 29 ae BGH pA Hindili P si xhoil r y vw w Figure 18 28 Restriction site labels shown as flags Note that in a circular view the Stacked and Radial options also affect the layout of annotations CHAPTER 18 CLONING AND CUTTING 329 all Figure 18 30 Restriction site labels in radial layout Sort enzymes Just above the list of enzymes there are three buttons to be used for sorting the list see figu
539. tral part of the dialog contains parameters to define the specificity of TaqMan probes Two parameters can be set e Minimum number of mismatches the minimum total number of mismatches that must CHAPTER 16 PRIMERS 269 exist between a specific TaqMan probe and all sequences which belong to the group not recognized by the probe Minimum number of mismatches in central part the minimum number of mismatches in the central part of the oligo that must exist between a specific TaqMan probe and all sequences which belong to the group not recognized by the probe The lower part of the dialog contains parameters pertaining to primer pairs and the comparison between the outer oligos primers and the inner oligos TaqMan probes Here five options can be set Maximum percentage point difference in G C content described above under Standard PCR Maximal difference in melting temperature of primers in a pair the number of degrees Celsius that primers in the primer pair are all allowed to differ Maximum pair annealing score the maximum number of hydrogen bonds allowed between the forward and the reverse primer in an oligo pair This criteria is applied to all possible combinations of primers and probes Minimum difference in the melting temperature of primer outer and TaqMan probe inner oligos all comparisons between the melting temperature of primers and probes must be at least this different otherwise the solution set is excluded
540. tructing a multiple alignment is much harder The first major challenge in the multiple alignment procedure is how to rank different alignments i e which scoring function to use Since the sequences have a shared history they are correlated through their phylogeny and the scoring function should ideally take this into account Doing so is however not straightforward as it increases the number of model parameters considerably CHAPTER 19 SEQUENCE ALIGNMENT 365 20 40 ri 80 kvlgafsdglah l Q6WN27 muhltgeekaBvtalwokvnva ENGUEENGENNANGASARGANNANEcREGANAs 5 E ATT Q6WN20 muhltosekaavtalwokvnvxevagealoriEssaivvvopwtarffesfadisspdavmsnxkvkahgkkvlgafsdalah Q6WN29 myhltodekaguta HER HEHE H HHHH HHHH HH t HHAH HHHH H HH afisdglah Q6WN25 myhltgeekaBvtalwokvnvdevogealgri sasivvypwtartfosfadistodavmsnpkvkahgkkvigafsdglah Q6WN22 muhltoceksavitimokynvdevogealori Esstvvypwtartfesfodisspdavmonpkvkahokkvlgafsdalah p68225 muhlipegknavit EHEHE EEE TREE H h P68053 yhltgegkaavtalwokvunvdevagealori EsSivvypwtarffosfodisspdavmonpkvkahokkvInsfseglkn P68046 EWANiba DO RO EO ee O RO ee eee sfsdgik n P68231 muhISgdeknavhalwskvkva EHHH HHHH ETT HM P68228 mynisgdeknavholwskvkvdevagealori EsSivvypwtrrffesfodistadavmnnpkvkahoskvInsfgdglsh NP 058652 MWA PtdaeksavSclWakWnpdevGgealarlESSSiVVVOWtaryhoSfodissasaimonPKVKARGKKVTt afnegikn NP_032246 HEBEL BEET EERE EEE HES eH h Q6H1U7 MV APRaeekKnsitsiW gkWaiegtggealgrlPSlSlivyoWwtsrffohfagdisnakavmsnpkWPahgakviva
541. ts see section 10 7 existing align ments and from any combination of the three To create an alignment in CLC DNA Workbench select sequences to align Toolbox in the Menu Bar Alignments and Trees Create Alignment or select sequences to align right click any selected sequence Toolbox Alignments and Trees Create Alignment Ez This opens the dialog shown in figure 19 1 a EB Create Alignment 88 1 Select sequences of same Seectsequencesofsamebype SS s C Cs SSSCSCSCSSS type Projects Selected Elements 6 CLC_Data Ae 094296 Example Data Ae P39524 XX ATP8al genomi As P57792 XX ATPB8al mRNA Ms Q29449 fht ATP8al As QONTI2 Cloning fie Q95X33 5 Primers Protein analyse Protein ortholog FEE ATPBal orth gt Le 222222 RNA secondary Sequencing dat 4 ww b Qy lt enter search term gt Figure 19 1 Creating an alignment If you have selected some elements before choosing the Toolbox action they are now listed in the Selected Elements window of the dialog Use the arrows to add or remove sequences sequence lists or alignments from the selected elements Click Next to adjust alignment algorithm parameters Clicking Next opens the dialog shown in figure 19 2 a BB Create Alignment Eg 1 Select sequences of same Seb parameter type 2 Set parameters Gap settings Gap open cost
542. ts can be visualized in a dot plot as seen in figure 13 10 In this figure three frame shifts for the sequence on the y axis are found 1 Deletion of nucleotides 2 Insertion of nucleotides 3 Mutation out of frame Sequence inversions CHAPTER 13 GENERAL SEQUENCE ANALYSES 207 Ly Figure 13 10 This dot plot show various frame shifts in the sequence See text for details In dot plots you can see an inversion of Sequence as contrary diagonal to the diagonal showing similarity In figure 13 11 you can see a dot plot window length is 3 with an inversion Low complexity regions Low complexity regions in sequences can be found as regions around the diagonal all obtaining a high score Low complexity regions are calculated from the redundancy of amino acids within a limited region Wootton and Federhen 1993 These are most often seen as short regions of only a few different amino acids In the middle of figure 13 12 is a square shows the low complexity region of this sequence Creative Commons License All CLC bio s scientific articles are licensed under a Creative Commons Attribution NonCommercial NoDerivs 2 5 License You are free to copy distribute display and use the work for educational purposes under the following conditions You must attribute the work in its original form and CLC bio has to be clearly labeled as author and provider of the work You may not use this work for commercial purposes You may not alter t
543. tsa EER EES ee Ee 256 16 4 2 Saving PCR fragments a aoao a e a a 256 16 4 3 Adding primer binding annotation s soso oao oaoa oa a a a a a 256 165 5 Stana POR cece we ikae eae reati dakna aa 256 16 5 1 LM lt shes CARRE SEG we Ree SAS Ee ES aa 257 16 5 2 Standard PCR output table 0 a 259 16 6 Nested PCR cece Rew e eRe RH ER EE ee ewe E E 260 16 6 1 Nested PCR output table 0 000 ee eee a 262 of TANDO 2844664 Ree eaeu dee AEDI eee 262 16 7 1 TaqMan output table senai E DS ww 263 16 8 Sequencing primers 1 22 ee ee nunn 264 16 8 1 Sequencing primers output table 000 eeu nee 265 16 9 Alignment based primer and probe design 0 008 0 ee wena 265 16 9 1 Specific options for alignment based primer and probe design 266 16 9 2 Alignment based design of PCR primers 204 266 16 9 3 Alignment based TaqMan probe design 0288 268 16 10 Analyze primer properties 1 eee 269 16 11 Find binding sites and create fragments 2 eee ee ee 271 OA LA Binding parameters lt s ri eseas Ee eee aadd ER EE 211 16 11 2 Results binding sites and fragments 24 4 212 16 12 Order primers 1 eee ee 275 248 CHAPTER 16 PRIMERS 249 CLC DNA Workbench offers graphically and algorithmically advanced design of primers and probes for various purposes This chapter begins with a brief introduction to the general concepts of the primer de
544. tween standard and topology layout The topology layout can help to give an overview of the tree if some of the branches are very short When the sequences include the appropriate annotation it is possible to choose between the accession number and the species names at the leaves of the tree Sequences downloaded from GenBank for example have this information The Labels preferences allows these different node annotations as well as different annotation on the branches The branch annotation includes the bootstrap value if this was selected when the tree was calculated It is also possible to annotate the branches with their lengths CHAPTER 2 TUTORIALS 2 2 12 Tutorial Find restriction sites This tutorial will show you how to find restriction sites and annotate them on a sequence There are two ways of finding and showing restriction sites In many cases the dynamic restriction sites found in the Side Panel of sequence views will be useful since it is a quick and easy way of showing restriction sites In the Toolbox you will find the other way of doing restriction site analyses This way provides more control of the analysis and gives you more output options e g a table of restriction sites and a list of restriction enzymes that can be saved for later use In this tutorial the first section describes how to use the Side Panel to show restriction sites whereas the second section describes the restriction map analysis performed from the T
545. two options e Save the Cloning Experiment This is saved as a sequence list including the specified cut sites This is useful if you need to perform the same process again or double check details CHAPTER 2 TUTORIALS of e Save the construct shown in the circular view This will only save the information on the particular sequence including details about how it was created this can be shown in the History view You can of course save both In that case the history of the construct will point to the sequence list in its own history The construct is shown in figure 2 33 pcDNA4 TO Fragment ATP8a1 mRNA ATO 8 460bp 6000 pUC ori Sall TATA box CMV forward prime CMV Figure 2 33 The Atp8al gene inserted after the CMV promoter 2 1 Tutorial Primer design In this tutorial you will see how to use the CLC DNA Workbench to find primers for PCR amplification of a specific region We use the pcDNA3 atp8a1 sequence from the Primers folder in the Example data This sequence is the pcDNAS vector with the atp8a1 gene inserted In this tutorial we wish design primers that would allow us to generate a PCR product covering the insertion point of the gene This would let us use PCR to check that the gene is inserted where we think it is First open the sequence in the Primer Designer Select the pcDNA3 atp8a1 sequence Show 45 Primer Designer Now the sequence is opened and we are ready to begin d
546. uct will also be added to the input sequence list and the original fragment and vector sequences will be deleted When you click Finish the final construct will be shown see figure 18 7 You can now Save E this sequence for later use The cloning experiment used to design the construct can be saved as well If you check the History Ci of the construct you can see the details about restriction sites and fragments used for the cloning CHAPTER 18 CLONING AND CUTTING 313 pcDNA4 TO Fragment ATP8a1 mRNA AU pUC ori sall Bgill Pnn I I bla TATA box CMV forward prime CMV Figure 18 7 The final construct 18 1 3 Manual cloning If you wish to use the manual way of cloning as opposed to using the cloning work flow explained above in section 18 1 2 you can disregard the panel at the bottom The manual cloning approach is based on a number of ways that you can manipulate the sequences All manipulations of sequences are done manually giving you full control over how the final construct is made Manipulations are performed through right click menus which have three different appearances depending on where you click as visualized in figure 18 8 Sequence label Selection Hindili 1 i fo pBR322 TTTGACAGCTTATCATCGATHEENEERAC SESS ASI JTCACAGTTAAATTGC TCATCGATAAGC GTAGTTTATCACAGTT sequence Details AGTAGCTA TT CGAP Figure 18 8 The red circles mark the two places you can use for manipulating t
547. uence length From 1 to 500 nucleotides 4 Barcode Barcodes length 6 Define barcodes in next step Linker Linker length 4 nucleotides 3 Define tags 3 Choose where to run Option Select nucleotide sequences Search both strands Define tags Barcodes Set barcode options Barcode 2 OF reads in input 43 TOT Toa Ti Sample 2 Some Figure 17 9 Specifying the barcodes as shown in the example of figure 17 6 Click Next to specify the output options First you can choose to create a list of the reads that could not be grouped Second you can create a Summary report showing how many reads were found for each barcode see figure 17 10 There is also an option to create subfolders for each sequence list This can be handy when the results need to be processed in batch mode see section 9 1 A new sequence list will be generated for each barcode containing all the Sequences where this barcode is identified Both the linker and barcode sequences are removed from each of CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 286 1 Multiplexig summary 1 1 Reads per barcode Number of reads Percentage of reads 1 2 Reads per barcode Barcodes Reads Barcode Figure 17 10 An example of a report showing the number of reads in each group the sequences in the list so that only the target sequence remains This means that you can continue the analysis by doing trimming or mapping Note that y
548. uence selection in the cloning view CHAPTER 18 CLONING AND CUTTING 316 e Replace Selection with sequence This will replace the selected region with a sequence The sequence to be inserted can be selected from a list containing all Sequences in the cloning editor e Insert Sequence before Selection Insert a sequence before the selected region The sequence to be inserted can be selected from a list containing all sequences in the cloning editor e Insert Sequence after Selection i Insert a sequence after the selected region The sequence to be inserted can be selected from a list containing all sequences in the cloning editor e Cut Sequence before Selection Xi This will cleave the sequence before the selection and will result in two smaller fragments e Cut Sequence after Selection x This will cleave the sequence after the selection and will result in two smaller fragments e Make Positive Strand Single Stranded r This will make the positive strand of the selected region single stranded e Make Negative Strand Single Stranded tut This will make the negative strand of the selected region single stranded e Make Double Stranded mz This will make the selected region double stranded e Move Starting Point to Selection Start This is only active for circular sequences It will move the starting point of the sequence to the beginning of the selection e Copy Selection 5i This will copy the selected region t
549. uences It is evident that free end gaps are ideal in this situation as the start codons are aligned correctly in the top alignment Treating end gaps as any other gaps in the case of aligning distant homologs where one sequence is partial leads to a spreading out of the short sequence as in the bottom alignment Both algorithms use progressive alignment The faster algorithm builds the initial tree by doing more approximate pairwise alignments than the slower option 19 1 3 Aligning alignments If you have selected an existing alignment in the first step 19 1 you have to decide how this alignment should be treated e Redo alignment The original alignment will be realigned if this checkbox is checked Otherwise the original alignment is kept in its original form except for possible extra equally sized gaps in all sequences of the original alignment This is visualized in figure 19 5 P68873 QEWN20 PF68231 Q6H1U7 P68945 Consensus sequence Logo Conservation CHAPTER 19 SEQUENCE ALIGNMENT MVHLTPEEKS MVHLTGEEKA MVHLSGDEKN MVHLTAEEKN VHWTAEEKQ MVHLTAEEKN MVHCTSEEKs 20 AVTALWGKVN AVTALWGKVN AVHGLWSKVK AITSLWGKVA LI TGLWGKVN AVTALWGKVN Av TaLWeKVa VDEVGGEALG VXEVGGEALG VDEVGGEALG IEQTGGEALG VADCGAEALA VDEVGGEALG VsevGGEAL RLLVVYPW RLLVVYPW RLLVVYPW RLLIVYPW RLLIVYPW RLLVVYPW RLLYVY PW 20 Pese73 MVHLTPEEKS AVTALWGKV NVDEVGG EALGRLLYV Q6WN20 MVHLTGEEKA AVTALWGKV NVXEVG
550. use mouse Mus musculus To obtain more information about this molecule you wish to query the peptides held in the Swiss Prot database to find homologous proteins in humans Homo sapiens using the Basic Local Alignment Search Tool BLAST algorithm This tutorial involves running BLAST remotely using databases housed at the NCBI Your computer must be connected to the internet to complete this tutorial 2 8 1 Performing the BLAST search Start out by select protein ATP8a1 Toolbox BLAST Search NCBI BLAST 2 In Step 1 you can choose which sequence to use as query sequence Since you have already chosen the sequence it is displayed in the Selected Elements list CHAPTER 2 TUTORIALS 62 Click Next In Step 2 figure 2 42 choose the default BLAST program blastp Protein sequence and database and select the Swiss Prot database in the Database drop down menu E E nce BLAST 88 1 Select sequences of same Setprogramps amece 2 Set program parameters Choose program and database Program blastp Protein sequence and database X Database Swiss Prot protein sequences swissprot X Genetic code 1 Standard Database genetic code 1 Standard etn Sal 56 Preview out En Figure 2 42 Choosing BLAST program and database Click Next In the Limit by Entrez query in Step 3 choose Homo sapiens ORGN from the drop down menu to arrive at the search configuration seen in figure 2 43 Including th
551. ut many may just matching by chance not due to any biological similarity Values of E less than one can be entered as decimals or in scientific notiation For example 0 001 1e 3 and 10e 4 would be equivalent and acceptable values e Word Size BLAST is a heuristic that works by finding word matches between the query and database sequences You may think of this process as finding hot spots that BLAST can then use to initiate extensions that might lead to full blown alignments For nucleotide nucleotide searches i e BLASTn an exact match of the entire word is required before an extension is initiated so that you normally regulate the sensitivity and speed of the search by increasing or decreasing the wordsize For other BLAST searches non exact word matches are taken into account based upon the similarity between words The amount of similarity can be varied so that you normally uses just the wordsizes 2 and 3 for these searches e Matrix A key element in evaluating the quality of a pairwise sequence alignment is the substitution matrix which assigns a score for aligning any possible pair of residues The matrix used in a BLAST search can be changed depending on the type of Sequences you are searching with see the BLAST Frequently Asked Questions Only applicable for protein sequences or translated DNA sequences e Gap Cost The pull down menu shows the Gap Costs Penalty to open Gap and penalty to extend Gap Increasing the Gap C
552. values in the Primer Parameters preference group the Calculate button will activate the primer design algorithm After pressing the Calculate button a dialog will appear see figure 16 10 which is similar to the Nested PCR dialog described above see section 16 6 go Calculation parameters Chosen parameters Maximum primer length Minimum primer length Maximum G C content Minimum GIC content Maximum melting temperature Minimum melting temperature Maximum self annealing Maximum self end annealing Maximum secondary structure 3 end must meet G C requirements 5 end must meet G C requirements Primer combination parameters Max percentage point difference in G C content Max difference in melting temperatures within a primer pair Max hydrogen bonds between pairs Max hydrogen bonds between pair ends Minimum difference in melting temperature Inner Outer Maximum length of amplicon Mispriming parameters Use mispriming as exclusion criteria Exact match Minimum number of base pairs required for a match Number of consecutive base pairs required in 3 end Kea rm Figure 16 10 Calculation dialog In this dialog the options to set a minimum and a desired melting temperature difference between outer and inner refers to primer pair and probe respectively Furthermore the central part of the dialog contains an additional parameter e Maximum length of amplicon determines the maximum length of the PCR fragment generated in the TaqMan
553. very similar on these operating systems In areas where differences exist these will be described separately However the term right click is used throughout the manual but some Mac users may have to use Ctrl click in order to perform a right click if they have a single button mouse The most recent version of the user manuals can be downloaded from http www clcbio com usermanuals The user manual consists of four parts e The first part includes the introduction and some tutorials showing how to apply the most significant functionalities of CLC DNA Workbench e The second part describes in detail how to operate all the program s basic functionalities e The third part digs deeper into some of the bioinformatic features of the program In this part you will also find our Bioinformatics explained sections These sections elaborate on the algorithms and analyses of CLC DNA Workbench and provide more general knowledge of bioinformatic concepts e The fourth part is the Appendix and Index Each chapter includes a short table of contents CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 35 1 9 1 Text formats In order to produce a clearly laid out content in this manual different formats are applied e A feature in the program is in bold starting with capital letters Example Navigation Area e An explanation of how a particular function is activated is illustrated by and bold E g select the element Edit Rename
554. w To delete a motif select it and press the Delete key on the keyboard Alternatively click Delete 4 1 in the Tool bar Save the motif list in the Navigation Area and you will be able to use for Motif Search 40 see section 13 7 Chapter 14 Nucleotide analyses Contents 14 1 Convert DNA to RNA 2 2 ee ee unnan nnen 229 14 2 Convert RNA to DNA 2 2 ee eee nnne 230 14 3 Reverse complements of Sequences 00888 ee ee een nn ee 231 14 4 Reverse sequence 2 0 ee ee 232 14 5 Translation of DNA or RNA to protein 0 0 88 eee een eee 232 14 5 1 Translate part of a nucleotide sequence a 2 eae 234 14 6 Find open reading frames 2 0 2 eee ee 234 14 6 1 Open reading frame parameters oa a a ee ee ee 234 CLC DNA Workbench offers different kinds of sequence analyses which only apply to DNA and RNA 14 1 Convert DNA to RNA CLC DNA Workbench lets you convert a DNA sequence into RNA substituting the T residues Thymine for U residues Urasil select a DNA sequence in the Navigation Area Toolbox in the Menu Bar Nucleotide Analyses Convert DNA to RNA or right click a sequence in Navigation Area Toolbox Nucleotide Analyses ZA Convert DNA to RNA kg This opens the dialog displayed in figure 14 1 lf a sequence was selected before choosing the Toolbox action this sequence is now listed in the Selected Elements window of the dialog Use the arrows to add
555. w ee wee we eee ee ee ee 218 13 6 Pattern Discovery 2 2 ee eee ee 219 13 6 1 Pattern discovery search parameters 2 0 0 0 ee ee ees 220 13 6 2 Pattern search output nwo amp eG ee Ee we Ge ew ee EE ee oS ee 221 13 7 Motif Search is ok Se ek ee ee ee ee ee ee ee ee ee a we ee 221 LS DOMO MOS ic we ee ee A a deh DE E TE a 222 13 7 2 Motif search from the Toolbox 2 4 4 Ss Sw ew dA ER E AA 224 13 7 3 Java regular expressions a ase ac dew ew ds Ea a ad 225 13 7 4 Create motif list accra ew we oe a ee ww ee A 22 CLC DNA Workbench offers different kinds of sequence analyses which apply to both protein and DNA The analyses are described in this chapter 13 1 Shuffle sequence In some cases it is beneficial to shuffle a sequence This is an option in the Toolbox menu under General Sequence Analyses It is normally used for statistical analyses e g when comparing an alignment score with the distribution of scores of shuffled sequences Shuffling a sequence removes all annotations that relate to the residues select sequence Toolbox in the Menu Bar General Sequence Analyses nk Shuffle Sequence 2 199 CHAPTER 13 GENERAL SEQUENCE ANALYSES 200 or right click a sequence Toolbox General Sequence Analyses A Shuffle Sequence x This opens the dialog displayed in figure 13 1 a BB Shuffle Sequence ES E Select one or more Selled one or more sequences of same type SCS sequenc
556. w the print dialog which lets you choose e g which pages to print The Print preview window is for preview only the layout of the pages must be adjusted in the Page setup Chapter 7 Import export of data and graphics Contents 7 1 Bioinformatic data formats 2 ee te snn snsss 117 7 1 1 Import of bioinformatic data 2s 6 sau eee Raw eee ee we we eS 118 feo import VYect r NTI data 26 sa dau ce nesana bbw Gwe oe Sw 119 7 1 3 Export of bioinformatics data a a 0 eee ee ee a 122 7 2 External files 1 ee a 124 7 3 Export graphics to files 0 ee ete et 124 7 3 1 Which part of the view to export 2 a a a a ee ee a 125 1 3 2 Save location and file formats 2 002 eee een nee 125 7 3 3 Graphics export parameters no ao aoao oa a a a le 127 Es Exporting protein repots cassar ME E woe 128 7 4 Export graph data points to a file 2 ee ee nnnnnsnnnnanna 128 7 5 Copy paste view output 0 2 ee et 130 CLC DNA Workbench handles a large number of different data formats All data stored in the Workbench are available in the Navigation Area The data of the Navigation Area can be divided into two groups The data is either one of the different bioinformatic data formats or it can be an external file Bioinformatic data formats are those formats which the program can work with e g sequences alignments and phylogenetic trees External files are files or links which are stored in CL
557. we ee eo 304 17 9 Secondary peak calling aw ee he eked eae ne dat bu Sane eee td ae i 305 18 Cloning and cutting 307 Poet Ole CUE Che s e sane ee ee ee oe eee ee Eee oe 308 Toe Wey CIE ne ee Ree ee eee a a TEA E 318 18 3 Restriction site analysis s ao se rar Ed be ek Ge Ww Sw a we be ook S 321 18 4 Gel electrophoresis skew eee ceeucae Rw Gee Hee RD ROBO E 6 340 18 5 Restriction enzyme listS lt eccacideesewedie ak idda RES ER E 343 19 Sequence alignment 347 19 1 Create an alignment ick aneaaee eke bORER SH ELE EAD ERED EO 348 19 2 View alignments nonoo ee 353 19 3 EditalgnmeniS os ia ee eeu ee oe ia nania ow ba ALR RE AA Sof 19 4 MM GC o ise ereda meee he eo eee eh ee Eo we ee Rd SS 359 13 5 Pairwise COM AMON sian 2 cneece kit neii nuntik LOE RS HO 361 19 6 Bioinformatics explained Multiple alignments 2 2 08 8 eae 364 20 Phylogenetic trees 366 20 1 Inferring phylogenetic trees 1 2 a 366 20 2 Bioinformatics explained phylogenetics 00 00 ee euee 311 IV Appendix 315 A Comparison of workbenches and the viewer 376 B Graph preferences 381 C Working with tables 383 CONTENTS fas PRM DOS 2 a se ee ee ERR ee A ee eS D BLAST databases D 1 Peptide sequence databases 00 eee ee ee ee ee ee D 2 Nucleotide sequence databases 0 000 eee ee a D 3 Adding more databases 00 0 a ee ee ee E Restriction enzymes database configuration F Technical information about modif
558. ween the cut sites selected If the entire sequence should be selected as fragment click the Add Current Sequence as Fragment 7 At any time the selection of cut sites can be cleared by clicking the Remove 4 icon to the right of the fragment selections If you just wish to remove the selection of one of the sites right click the site on the sequence and choose De select This Site Defining target vector When selecting among the sequences in the panel at the top the vector sequence has vector appended to its name If you wish to use one of the other sequences as vector select this sequence in the list and click Change to Current The next step is to define where the vector should be cut If the vector sequence should just be opened click the restriction site you want to use for opening If you want to cut off part of the vector click two restriction sites while pressing the Ctrl key on Mac You can also right click the cut sites and use the Select This Site to select a site This will display two options for what the target vector should be for linear vectors there would have been three option as shown in figure 18 5 Just as when cutting out the fragment there is a lost of choices regarding which sequence should be used as the vector At any time the selection of cut sites can be cleared by clicking the Remove 4 icon to the right of the target vector selections If you just wish to remove the selection of one
559. will be able to click Next Go to license download web page Selecting the second option Go to license download web page opens the license web page as shown in 1 4 Click the Request Evaluation License button and you will be able to save the license on your computer e g on the Desktop CHAPTER 1 INTRODUCTION TO CLC DNA WORKBENCH 18 Request an Evaluation License icense please dick the button below Request Evaluation License ul a file containing the license will b d to To begin ur license you Choose License File button and locate the file on your c Figure 1 4 The license web page where you can download a license Back in the Workbench window you will now see the dialog shown in 1 5 License Wizard d CLC DNA Workbench Import a license from a file Please click the button below and locate the file containing your license No file selected Choose License File If you experience any problems please contact The CLC Support Team Proxy Settings Previous Next Quit Workbench Figure 1 5 Importing the license downloaded from the web site Click the Choose License File button and browse to find the license file you saved before e g on your Desktop When you have selected the file click Next Accepting the license agreement Regardless of which option you chose above you will now see the dialog shown in figure 1 6 License Wizard d CLC DNA Workbench Li
560. wn in figure 18 45 See section 10 3 for more information about viewing Sacil T Ei ATPsal MRNA GGTGGGAGGCGCGGCCCCGCGGCAGCTGAGCCC Figure 18 45 The result of the restriction analysis shown as annotations annotations Table of restriction sites The restriction map can be shown as a table of restriction sites see figure 18 46 Each row in the table represents a restriction enzyme The following information is available for each enzyme CHAPTER 18 CLONING AND CUTTING 339 Restriction m 3 Rows 5 Restriction sites table Fiter O Segu Mame Pattern Cyverhang Number Cut position s PERHSBC CjePI ccannnnnnntc 5 151 184 PERH o no Sh PERHGEC No gaga o o A PERHOEC Tso rca Po o qa PERHGEC Jrhili arca Bo o foi TT T EEE DDD Figure 18 46 The result of the restriction analysis shown as annotations Sequence The name of the sequence which is relevant if you have performed restriction map analysis on more than one sequence Name The name of the enzyme Pattern The recognition sequence of the enzyme Overhang The overhang produced by cutting with the enzyme 3 5 or Blunt e Number of cut sites Cut position s The position of each cut Ifthe enzyme cuts more than once the positions are separated by commas If the enzyme s recognition sequence is on the negative strand the cut position is put in brackets as the enzyme Tsol in figure 18 46 whose cut position is 13
561. wn nucleotide N Ambiguity nucleotides R Y etc Create Full contigs including trace data E Show tabular view of contigs Create only consensus sequences A Previous gt Next Finish XX Cancel Figure 17 16 Setting assembly parameters This dialog gives you the following options for assembling e Trim sequence ends before assembly If you have not previously trimmed the sequences this can be done by checking this box If selected the next step in the dialog will allow you to specify settings for trimming see section 17 3 2 e Minimum aligned read length The minimum number of nucleotides in a read which must be successfully aligned to the contig excluded from the assembly If this criteria is not met by a read the read is e Alignment stringency Specifies the stringency of the scoring function used by the alignment step in the contig assembly algorithm A higher stringency level will tend to produce contigs CHAPTER 17 SEQUENCING DATA ANALYSES AND ASSEMBLY 292 with less ambiguities but will also tend to omit more sequencing reads and to generate more and shorter contigs Three stringency levels can be set Low Medium High e Conflicts If there is a conflict i e a position where there is disagreement about the residue A C T or G you can specify how the contig sequence should reflect the conflict Vote A C G T The conflict will be solved by
562. wrote a paper reviewing the BLOSUM62 substitution matrix and how to calculate the scores Eddy 2004 Use of scoring matrices Deciding which scoring matrix you should use in order of obtain the best alignment results is a difficult task If you have no prior knowledge on the sequence the BLOSUM62 is probably the best choice This matrix has become the de facto standard for scoring matrices and is also used as the default matrix in BLAST searches The selection of a wrong scoring matrix will most probable strongly influence on the outcome of the analysis In general a few rules apply to the selection of scoring matrices e For closely related sequences choose BLOSUM matrices created for highly similar align ments like BLOSUMSO You can also select low PAM matrices such as PAM1 e For distant related sequences select low BLOSUM matrices for example BLOSUM45 or high PAM matrices such as PAM250 The BLOSUM matrices with low numbers correspond to PAM matrices with high numbers See figure 13 13 for correlations between the PAM and BLOSUM matrices To summarize if you want to find distant related proteins to a sequence of interest using BLAST you could benefit of using BLOSUM4D5 or similar matrices Other useful resources CHAPTER 13 GENERAL SEQUENCE ANALYSES 211 PAM 1 PAM 120 PAM250 BLOSUM80 BLOSUM62 BLOSUM45 dr Less divergent More divergent Figure 13 13 Relationship between scoring matrices The BLOSUM62 has become a de
563. x Table The translation table to use in the translation For more about translation tables see section 14 5 Only AUG start codons For most genetic codes a number of codons can be start codons Selecting this option only colors the AUG codons green Single letter codes Choose to represent the amino acids with a single letter instead of three letters e Trace data See section 17 1 e Quality scores For sequencing data containing quality scores the quality score information can be displayed along the sequence Show as probabilities Converts quality scores to error probabilities on a O 1 scale i e not log transformed Foreground color Colors the letter using a gradient where the left side color is used for low quality and the right side color is used for high quality The sliders just above the gradient color box can be dragged to highlight relevant levels The colors can be changed by clicking the box This will show a list of gradients to choose from Background color Sets a background color of the residues using a gradient in the same way as described above Graph The quality score is displayed on a graph Learn how to export the data behind the graph in section 4 x Height Specifies the height of the graph x Type The graph can be displayed as Line plot Bar plot or as a Color bar x Color box For Line and Bar plots the color of the plot can be set by clicking the color box For Colors
564. xample data import 30 Excel export file format 395 Expand selection 148 Expect BLAST search 183 Export bioinformatic data 122 dependent objects 123 folder 122 graph in csv format 128 graphics 124 history 123 list of formats 392 multiple files 122 preferences 109 Side Panel Settings 107 tables 395 Export visible area 125 Export whole view 125 Expression analysis 3 Expression clone creating 326 Extensions 30 External files import and export 124 Extinction coefficient 216 Extract part of a contig 301 Extract sequences 165 FASTA file format 393 Feature request 28 Feature table 218 Features see Annotations File name sort sequences based on 2 9 File system local BLAST database 187 408 Filtering restriction enzymes 330 332 336 344 Find in GenBank file 162 in sequence 14 7 results from a finished process 93 Find open reading frames 234 Fit to pages print 115 Fit Width 92 Fixpoints for alignments 351 Floating license 24 Floating license use offline 25 Floating Side Panel 112 Folder create new tutorial 37 Follow selection 142 Footer 116 Format of the manual 35 FormatDB 187 Fragment table 339 Fragment select 149 Fragments separate on gel 341 Free end gaps 349 fsa file format 395 G C content 145 378 G C restrictions 3 end of primer 253 5 end of primer 253 End length 253 Max G C 253 Gap compare number of 363 delete 35 7 ext
565. y of finding restriction sites 12 2 12 2 The Toolbox way of finding restriction sites 13 This chapter contains tutorials representing some of the features of CLC DNA Workbench The first tutorials are meant as a short introduction to operating the program The last tutorials give examples of how to use some of the main features of CLC DNA Workbench Watch video tutorials at http www clcbio com tutorials 2 1 Tutorial Getting started This brief tutorial will take you through the most basic steps of working with CLC DNA Workbench The tutorial introduces the user interface shows how to create a folder and demonstrates how to import your own existing data into the program When you open CLC DNA Workbench for the first time the user interface looks like figure 2 1 At this stage the important issues are the Navigation Area and the View Area The Navigation Area to the left is where you keep all your data for use in the program Most analyses of CLC DNA Workbench require that the data is saved in the Navigation Area There are several ways to get data into the Navigation Area and this tutorial describes how to import existing data The View Area is the main area to the right This is where the data can be viewed In general a View is a display of a piece of data and the View Area can include several Views The Views are represented by tabs and can be organized e g by using drag and drop
566. ying Gateway cloning sites G Formats for import and export G 1 List of bioinformatic data formats ean ck banc ed a E G 2 List of graphics data formats noaoo a a H IUPAC codes for amino acids IUPAC codes for nucleotides J Custom codon frequency tables Bibliography V Index 386 386 386 387 389 390 392 392 395 396 398 399 400 404 Part Introduction Chapter 1 Introduction to CLC DNA Workbench Contents 1 1 Contactinformation 0 2 eee 12 1 2 Download and installation 0 eee ee 12 1 2 1 Program download shee ek et edhe Peake kb we aastasi 12 1 2 2 Installation on Microsoft Windows 050 28528 2 eee 12 1 2 3 Installation on Mac OSX a0 cask hw Re E EE ew ew we 13 1 2 4 Installation on Linux with an installer 850 582506 14 1 2 5 Installation on Linux with an RPM package 4 15 1 3 System requirements 0 eee eee 15 LA LOBOS e224 cu eae wee eee eee rs CA CERCADA 15 1 4 1 Request an evaluation license a ee ee ee ee es 16 1 4 2 Download a license 1 eee ee 19 1 4 3 Import a license from a file 0 0 00 0 ee ee ee es 21 LAA MC MC Gb ea ee he Re EEE Se MEDE 21 1 4 5 Configure license server connection 2 058 552888 24 1 4 6 Limited mode isa se cde owe a Pa ed Ge we SO A amp 2 1 5 About CLC Workbenches 0 08 ee ee eee eee es 27 1 5 1 New program feature request 0 0 2
567. you can see that we are close to the end of the end of Rev3 and the quality of the chromatogram traces is often low near the ends CHAPTER 2 TUTORIALS 48 TCCATCCGGGAAGTT ACGGCTCTA ci Eiu ho FH Assembly layout Conflict Gather sequences at top Show sequence ends TCCATCCGGGAAGTTTACGGCTCTAC Find Conflict Low coverage threshold 8 TCCATCCGGGAAGTT IACGGCTCTAC Annotation types gt Residue coloring KO A amp AL Qf att gt Nucleotide info gt Find b Text Format TCCATCCGGGAAGTT ACGGCTCTACTGCAAAGGAGCTGACACAGTAA AAA ARA AAA AAA ARIANA Figure 2 16 Using the Find Conflict button highlights conflicts Based on this we decide not to trust Rev3 To correct the read select the T in the Rev3 sequence by placing the cursor to the left of it and dragging the cursor across the T Press Delete This will resolve the conflict 2 5 5 Including regions that have been trimmed off Clicking the Find Conflict button again will find the next conflict This is the beginning of a stretch of gaps in the consensus sequence This is because the reads have been trimmed at this position However if you look at the read at the bottom Fwd2 you can see that a lot of the peaks actually seem to be fine so we could just as well include this information in the contig If you scroll a little to the right you can see where the trimmed region begins To include this region i
568. ys apply these settings vsave XM Cancel Figure 5 11 The save settings dialog The settings are specific to the type of view Hence when you save settings of a circular view they will not be available if you open the sequence in a linear view CHAPTER 5 USER PREFERENCES AND SETTINGS 112 Save Settings k Sequence layout Delete Settings Annotation layout Apply Saved Settings P Compact k Annotation types Non compact no wrap Ts Non cormpact with translations k Restriction sites Rasmal colors k Residue coloring Show translation Nucleotide info CLC Standard Settings k Find k Text Format Figure 5 12 Applying saved settings If you wish to export the settings that you have saved this can be done in the Preferences dialog under the View tab see section 5 2 2 The remaining icons of figure 5 10 are used to Expand all groups Collapse all groups and Dock Undock Side Panel Dock Undock Side Panel is to make the Side Panel floating see below 5 6 1 Floating Side Panel The Side Panel of the views can be placed in the right side of a view or it can be floating see figure 5 13 a HH sequence list x Sequence list sequence list Number of rows 5 Accession Definition Modificati Length M15292 i i vo fer APR 1993 110 f l Eme x k Show column Figure 5 13 The floating Side Panel can be moved out of the way e g to allow for a wider view of a table
569. ze is 3 W 3 If a query sequence has a QWRTG the searched words are QWR WRT RIG See figure 12 15 for an illustration of words in a protein sequence Query word W 3 GSVEDTTGSQSLAALLNKCKTPOGQRLVNOQWIKOPLMDKNRIEERLNLVEAFVEDAELROTLOEDL Figure 12 15 Generation of exact BLAST words with a word size of W 3 During the initial BLAST seeding the algorithm finds all common words between the query sequence and the hit sequence s Only regions with a word hit will be used to build on an alignment BLAST will start out by making words for the entire query sequence see figure 12 15 For each word in the query sequence a compilation of neighborhood words which exceed the threshold of T is also generated A neighborhood word is a word obtaining a score of at least T when comparing using a selected scoring matrix see figure 12 16 The default scoring matrix for blastp is BLOSUM62 for explanation of scoring matrices see www clcbio com be The compilation of exact words and neighborhood words is then used to match against the database sequences Query word W 3 GSVEDTTGSQSLAALLNKCKTPOGORLVNOWIKQPLMDKNRIEERLNLVEAFVEDAELROTLQEDL PQG 18 PQG 15 PEG 14 Neighborhood pRG 14 Scores from Words EIS BLOSUM62 matirx PNG 13 PDG 13 PHG 13 PMG 13 PSQ 13 PQA 12 PON 12 Threshold for Bree es neighborhood words T 13 Figure 12 16 Neighborhood BLAST words based on the BLOSUM62 matrix Only words where the threshold T exceeds
570. zip Extract the file included in the zip archive and save it in the settings folder of the Work bench installation folder The file you download contains the standard configuration You should thus update the file to match your specific needs See the comments in the file for more information The name of the properties file you download is gatewaycloning 1 properties You can add several files with different configurations by giving them a different number e g gatewaycloning 2 properties and so forth When using the Gateway tools in the Work bench you will be asked which configuration you want to use see figure F 1 390 APPENDIX F TECHNICAL INFORMATION ABOUT MODIFYING GATEWAY CLONING SITES 391 EE Add atte Sites Set parameters Standard configuration for gateway doning with up to four fragments Problems Em sto Me Figure F 1 Selecting between different gateway cloning configurations Appendix G Formats for import and export G 1 List of bioinformatic data formats Below is a list of bioinformatic data formats i e formats for importing and exporting sequences alignments and trees 392 APPENDIX G FORMATS FOR IMPORT AND EXPORT G 1 1 Sequence data formats File type FASTA AB1 ABI CLC Clone Manager CSV export CSV import DNAstrider DS Gene Embl GCG sequence GenBank Gene Construction Kit Lasergene Nexus Phred PIR NBRF Raw sequence SCF2 SCF3 Staden Swiss Prot Ta

here - CLC bio

Contents

Download Pdf Manuals

Related Search

Related Contents